Analyses included 366 MZ pairs, 386 DZ pairs, and 37,832 unrelated pairs obtained by using age and DNA collection year matched non-cotwin pairs from the twin sets. β-diversity measures between groups were compared via the Wilcoxon-Mann-Whitney test . P values were calculated similarly to as previously described. In short, the pair labels were permuted 10,000 times and the W test statistic collected from each permutation. The P value was then calculated by dividing the number of W test statistics greater than the observed W test statistic plus 1 by the number of permutations plus 1. Biplot analyses were used as implemented in QIIME . In experiments where cohabitation was required, only cotwins 18 and under and those over 18 who identified themselves as cohabitating were included, which removed 328 subjects from the total twin sample who were living separate from their cotwin. This population of 588 twins pairs is referred to as the “cohabitation sample.” Cohen’s D effect size for β-diversity measurements was calculated using the R package ‘effectsize’.Microbial traits included taxonomic groups, OTUs, α -diversity measurements, and principal coordinates from β-diversity measurements , collapsing all perfectly correlated traits. Microbial traits were then processed within each population separately: twin pairs, European unrelated , and Admixture American unrelated . Traits were transformed to z-scores and then categorized as either continuous or categorical . Shapiro Wilk test was performed use the R packaged ‘stats’. Categorical traits were then binned based upon z-score transformation on all non-zero values : zero counts, less than or equal to −1, greater than −1 and equal or less than 0, greater than 0 and less than or equal to 1, greater than 1). Some traits failed to categorize due to lack of variation,rolling flood tables resulting in the final trait counts: twins , EUR unrelated , ADM unrelated . Only the continuous traits were used in the EUR and ADM populations so data is provided only for those traits.
Descriptions of all traits can be found in Additional file 1: Tables S11–14.We performed an analysis of 752 twin pairs from the Colorado Twin Registry to estimate host genetic and environmental contributions to salivary micro-biome composition. The sample included 366 monozygoticpairs , 263 same sex, and 123 opposite sex dizygotic pairs that ranged from 11 to 24 years of age. Taxonomic analyses using sequencing of variable region IV of the 16S rRNA amplicon prepared from the saliva of each twin was carried out using QIIME on high-quality Illumina MiSeq paired end reads as previously reported. We determined phyla abundances to be Firmicutes , Proteobacteria , Bacteriodites , Actinobacteria , and Fusobacteria from the 2664 operational taxonomic units found, which is consistent with the “core” salivary micro-biome we and others have previously reported. All of our analyses included only OTUs that were present in at least 2 subjects and observed at least 10 times in total after rarefying at 2500 reads. This filtering yielded 895 OTUs that were considered for all subsequent experiments. Measurements comparing mean β-diversity among MZ, DZ and unrelated individuals allows for assessment of microbial population differences between groups. With either Bray-Curtis or Weighted UniFrac measures of β-diversity among MZ twin pairs were significantly more similar to each other than DZ twin pairs, and for all 3 β-diversity measurements MZ and DZ twin pairs were significantly more similar to each other than to unrelated individuals . This analysis was also carried out with abundant OTUs and all OTUs with very similar results . Rarefaction at 2500 reads produced consistent results across all rarefactions , so for subsequent analyses, one rarefaction to 2500 reads is shown. We could detect no significant effect on any β-diversity measure due to sex when comparing same sex vs opposite sex dizygotic twin pairs perhaps because the sample size did provide enough power to differentiate sex effects from inter individual variation . In subsequent DZ analyses therefore, opposite sex pairs were included. The Colorado Twin Registry includes highly detailed phenotypic information that is invaluable in identifying and controlling for environmental confounders that may play an important role.It is well-known that MZs tend to cohabitate longer than DZs and indeed our previous work has shown that shared environment influences the oral micro-biome.
Therefore, it was possible that the tendency of MZ cotwins to live together longer could be driving the observed heritability. To examine this potential confounder, we reanalyzed the data in Fig. 1a based on questionnaire data from the sample in which we restricted the analysis to only cohabitating pairs . While ideally we would have also analyzed only twin pairs living apart, our sample size did not permit it. As seen in Fig. 1b, MZs remained significantly more similar to each other than DZ twin pairs for the Bray-Curtis and Weighted UniFrac measurement, and was also observed in the abundant and unfiltered/ unrarefied OTU tables described above . We conclude that cohabitation does not play a significant role in the observed micro-biome heritability. To quantify the differences between groups the Cohen’s D effect size was calculated for all β-diversity measurements for both the full sample and the sample limited to twin pairs who were cohabitating . Comparisons between the unrelated and twin pairs yielded medium to large effect sizes. All other comparisons were either small or negligible, the largest of which being between MZ and DZ pairs for Bray Curtis. To quantify the effect cohabitation had on β-diversity measurements the effect size between all twin pairs and just pairs living together were compared for all measurements yielding only negligible effect sizes consistent with a conclusion that cohabitation was not driving observed heritability. The stability of the oral micro-biome over time in adults is reported to be remarkably high relative to that of other body sites. To confirm and extend this observation, we assessed the stability of the oral micro-biome in longitudinal samples from our cohort for 111 individuals, 2–7 years apart . The mean β-diversity measurements between longitudinal samples were compared to the mean of unrelated individuals of different ages. For all three β-diversity measurements examined subjects were significantly more similar to themselves than were unrelated individuals . Intra class correlation coefficients are useful for estimating heritability of individual observations within a group of related observations ; the higher the ICC values for MZ pairs compared to DZ pairs, the greater the heritability. As shown in Fig. 2, ICC values for essentially all abundant taxa are significantly greater in MZ than DZ pairs.
No significant difference was observed between the same sex and opposite sex DZ pairs across the taxa analyzed. The set of taxa analyzed were those that were categorized as continuous . Significance was established with Wilcoxon Signed Rank tests strongly supporting the heritability of taxon abundance in this twin set. We also tested 4 different alpha diversity measures ,flood and drain tray the first 3 principal coordinates for three different β-diversity measurements and saw that most traits were consistent with the conclusion that MZ cotwins are more similar than DZ cotwins. A complete list of the 41 phenotypes tested and their ICC values can be found in Additional file 1: Tables S4 and S11.Twin modeling approaches are used to estimate the amount of variance attributable to additive genetics , common environment or dominance , and unique environment. An ACE or ADE model was constructed for each of 946 traits including alpha diversity, principal coordinates of β-diversity of taxonomic groups, and individual OTUs. A complete list of the A, C/D, and E values for each of these phenotypes can be found in Additional file 1: Table S5. A power analysis shows that our sample is well powered to model continuous traits but is under powered for categorical traits . Traits that were not categorized as continuous were treated as categorical traits . Therefore, while still of interest, the categorical traits should be viewed with lower confidence . In the twin models both C and D cannot be modeled at the same time since each captures the same variance, but the genetic contribution can be compared between phenotypes modeled with ACE or ADE models. Of the 946 traits 55% were modeled as ACE and 44% ADE. Averaging heritability estimates for traits within each phenotype category described above a trend that PCos of measurements have the highest mean heritability estimates emerged for either the full sample or to just twin pairs that are cohabitating . The most heritable were OTU4483015 that corresponds to an unnamed species of Granulicatella and PCo 2 for Bray-Curtis . To better understand which taxa were driving this PCo a QIIME biplot analysis identified the genus Streptococcus as the most abundant taxon on the first 3 principal coordinates from Bray-Curtis . Repeating the ACE models excluding twin pairs who reported that they had moved out after age 18 did not greatly alter the heritability estimates or other components of the model . The unique environment accounted for most of the variation of the traits tested in both the full and cohabitation sample . Little change in the common environment was observed between the full and cohabitation sample analyses . We compared phenotypes deemed to be heritable in our study with phenotypes seen to be heritable in 5 studies of gut and 1 in dental plaque. We found that 14 of the 44 traits were mentioned with heritability estimates of at least 1% in one or another study, though none showed high statistical significance . This is consistent with the possibility that genes that may drive the heritability in the salivary micro-biome may also have more general influences in other human niches.It is assumed that host genes interacting with the oral micro-biome are responsible for the observed heritability.
The best way to identify them is by the analysis of an association between genetic variation and traits. The power to detect this is a function of the number of individuals, the number of tests and the number and types of SNPs available. The greatest power to uncover association given a fixed sample size is obtained by analyzing a limited number of phenotypes based on prior information rather than repeatedly testing multiple hypotheses on the same data. To limit hypotheses to test we focused on the traits found most heritable in twin studies. Traits found to be most heritable are expected to produce the best results in a genome-wide association study. DNA was previously prepared from saliva and blood of 1480 individuals unrelated to the twins and to each other. Human DNA from this sample was subjected to Affymetrix Chip-based genotype analysis that resulted in 696,388 validated human SNP genotypes per individual. The age of subjects ranged from 11 to 33 years and 29% were female. Ancestry was assigned by weighting a subset of the genotyped SNPs against the 1000 genomes dataset and assigning individuals to ancestry group using principal coordinate analysis plots. The genotyped SNPs were then quality filtered and submitted to the Michigan Imputation Server for phasing and imputation . After quality filtering this produced 6,862,363 European and 8,172,048 American Admixed imputed variants respectively that were used in all subsequent analyses. Imputed SNPs from two different randomly selected chromosomal areas in 68 individuals were resequenced with Sanger sequencing to validate imputation. We found that 65/68 imputed calls validated completely with 3 apparently incorrectly imputed . We conclude that imputation provides significantly greater resolution to SNP-based maps at little cost to accuracy. The salivary micro-biome of the 1480 individuals was characterized by 16S RNA sequencing identifying 2679 OTUs, where again as in the twin study, the most prevalent phyla were Firmicutes , Proteobacteria , Bacteriodetes , Actinobacteria , and Fusobacteria . Filtering by prevalence and abundance as described above produced a total of 931 OTUs used for our studies. The SNP-based heritability of micro-biome phenotypes in the unrelated population was assessed using Genome Complex Trait Analysis that estimates the amount of phenotypic variance that can be explained by SNP-based composite genetic variance. To avoid false positives, the genetic relationship matrix was limited to subjects that were estimated to have IBD < 0.025. The first 10 ancestry principal components from LD-pruned SNPs were included to control for population stratification . Given the relatively small sample size, single trait heritability estimates were not evaluated but rather gross trends were observed across all continuous traits.