Challenges of Comparing Marine Microbiome Community Composition Data Provided by Different Commercial Laboratories and Classification Databases

Mioduchowska, Monika; Iglikowska, Anna; Jastrzębski, Jan P.; Kaczorowska, Anna-Karina; Kotlarska, Ewa; Trzebny, Artur; Weydmann-Zwolicka, Agata

doi:10.3390/w14233855

Open AccessArticle

Challenges of Comparing Marine Microbiome Community Composition Data Provided by Different Commercial Laboratories and Classification Databases

¹

Department of Evolutionary Genetics and Biosystematics, Faculty of Biology, University of Gdansk, 80-308 Gdansk, Poland

²

Department of Marine Plankton Research, Institute of Oceanography, University of Gdansk, 81-378 Gdynia, Poland

³

Bioinformatics Core Facility, University of Warmia and Mazury in Olsztyn, Kortowo, 10-719 Olsztyn, Poland

⁴

Department of Plant Physiology, Genetics and Biotechnology, Faculty of Biology and Biotechnology, University of Warmia and Mazury in Olsztyn, Kortowo, 10-719 Olsztyn, Poland

⁵

Collection of Plasmids and Microorganisms (KPD), Faculty of Biology, University of Gdansk, 80-308 Gdansk, Poland

⁶

Molecular Biology Laboratory, Genetics and Marine Biotechnology Department, Institute of Oceanology of the Polish Academy of Sciences, 81-712 Sopot, Poland

⁷

Molecular Biology Techniques Laboratory, Faculty of Biology, Adam Mickiewicz University, 61-614 Poznan, Poland

^*

Author to whom correspondence should be addressed.

Water 2022, 14(23), 3855; https://doi.org/10.3390/w14233855

Submission received: 13 October 2022 / Revised: 16 November 2022 / Accepted: 23 November 2022 / Published: 26 November 2022

(This article belongs to the Special Issue Conserving Biodiversity in Aquatic Ecosystems: Challenges and Solutions)

Download

Browse Figures

Versions Notes

Abstract

:

In the high-throughput sequencing (HTS) era, a metabarcoding technique based on the bacterial V3–V4 hypervariable region of 16S rRNA analysis requires sophisticated bioinformatics pipelines and validated methods that allow researchers to compare their data with confidence. Many commercial laboratories conduct extensive HTS analyses; however, there is no available information on whether the results generated by these vendors are consistent. In our study, we compared the sequencing data obtained for the same marine microbiome community sample generated by three commercial laboratories. Additionally, as a sequencing control to determine differences between commercial laboratories and two 16S rRNA databases, we also performed a “mock community” analysis of a defined number of microbial species. We also assessed the impact of the choice of two commonly used 16S rRNA databases, i.e., Greengenes and SILVA, on downstream data analysis, including taxonomic classification assignment. We demonstrated that the final results depend on the choice of the laboratory conducting the HTS and the reference database of ribosomal sequences. Our findings showed that the number of produced ASVs (amplicon sequence variants) ranged from 137 to 564. Different putative bacterial endosymbionts could be identified, and these differences correspond to the applied 16S rRNA database. The results presented might be of particular interest to researchers who plan to perform microbiome community analysis using the 16S rRNA marker gene, including the identification of putative bacterial endosymbionts, and serve as a guide for choosing the optimum pipeline to obtain the most accurate and reproducible data.

Keywords:

amplicon sequencing; ASV assignment; HTS; marine microorganisms; metagenetics; “mock community”

1. Introduction

The marine microbiome refers to all marine microorganisms, including bacterioplankton, microorganisms associated with organic and inorganic particles, and those living on and inside marine animals, plants, and macroalgae. It plays a crucial role in the ecology of the sea biosphere, serving as a vital catalyst of all biogeochemical cycles, including photosynthesis, biological pump, and nutrient cycling [1]. To better understand marine ecosystem dynamics, it is essential to gain insight into the relationship between microbial communities and their environment. For many years, however, the study of marine microbial diversity was hampered because only a very small fraction of these microorganisms can be cultivated in laboratory settings. This drawback led to the development of alternative, culture-independent approaches that allow for a more accurate analysis of microbial assemblages [2,3,4] and exploration of the hidden microbial genetic content of natural ecosystems, especially challenging environments, such as deep-sea hydrothermal vents [5].

The first studies that used the bacterial 5S rRNA gene as a marker of microbial community diversity, carried out in the mid-1980s by Stahl et al. [6,7], were based on time-consuming and laborious Sanger sequencing. Nevertheless, the advances in high-throughput sequencing (HTS) platforms in the early 2000s allowed millions of sequences per run to be obtained at a significantly lower cost and much faster than using the Sanger sequencing technology and revolutionized the field of microbial ecology. Nowadays, the high-throughput sequencing of the microbiome community involves analysis of the bacterial hypervariable region of the 16S rRNA gene fragment directly from environmental samples, without the necessity of culturing microorganisms in a laboratory. Finally, the high-throughput technologies provide an unprecedented resolution scale for identifying organisms and the taxonomic diversity of communities [8] and may act as the basis for analyzing ecosystem function. Targeted amplicon-based analyses of 16S rRNA gene fragments have been widely used to investigate microbiome communities in animal intestines [9], as well as from soil [10], freshwater [11], and marine environments [12]. In general, these studies on marine biodiversity represent some of the first applications of a metagenetic-based approach to amplicon sequencing of microorganisms [13].

The Baltic Sea, one of the world’s largest brackish water bodies, is among the most intensely monitored aquatic environments, with hydrographic data collected routinely for over 100 years [14]. However, the first metagenetic analysis of microbial communities was published in 2007 [15]. Earlier reports on the Baltic Sea microbiome focused mainly on quantifying species abundance (see, e.g., [16]). The brackish Baltic Sea microbial community consists of marine- and freshwater-like lineages, which are genetically distinct from their relatives [17,18]. As a result, mapping fresh- and marine-water metagenomes is hindered (ref. [19]: HTS studies on the Gulf of Gdansk using 454 pyrosequencing). Nevertheless, Alneberg et al. [20] presented the Baltic Sea Reference Metagenome (BARM), with 6.8 million genes annotated for taxonomy.

Interestingly, the steady-state microbial loop model describes the role of microorganisms in the cycling of marine carbon and nutrients [21]. The components of the microbial loop, that is, bacterioplankton, microzooplankton, phytoplankton, and picophytoplankton, depend on the rate of photosynthesis and carbon oxidation [22]. However, recent studies have indicated that the network topological properties are associated with hydrolysis methanogenesis balance and substrate characteristics. It was also shown that lower-abundance genera (as low as 0.1%) could perform communication roles, as well as central hub roles, and maintain the functionality and stability of the microbial community [23]. Moreover, marine species’ adaptation to environmental changes may depend on host-associated microorganisms, mostly endosymbiotic bacteria, which impact the biology of their host, including their concerted adaptive response [24]. Benefits derived from these microorganisms include, for example, mutual effects on defense capabilities, nutrition, reproduction, and development [25]. Despite the significance of these interactions and the growing interest in research on host-associated microbes, little is known about the interactions between microbes and their marine host species. Nevertheless, studies on mollusks [26], sponges [27], and corals [28] have revealed the existence of symbiotic microbes that play a decisive role in the life cycles of their hosts.

Previous studies indicated that using different laboratory protocols and taxonomic sequence classifications can produce inconsistent results and significantly impact conclusions concerning bacterial community composition (e.g., [29,30]). Furthermore, Plummer et al. [31] indicated that 16S rRNA gene sequence datasets should be analyzed using suitable bioinformatics tools, as they could also affect the efficiency and reproducibility processes that result in the taxonomic overview. Kennedy et al. [32] showed how targeting different 16S RNA variable regions (V4–V5 or V6–V8) in metagenetic studies impacted the perceived community composition and bacterial diversity. In turn, Balvociute and Huson [33] indicated that the selection of reference taxonomy is essential when comparing microbial classifications.

Linking the HTS sequences to taxonomic information is a critical step in metagenetic studies of microbial communities. Only a few publicly curated 16S rRNA databases that contain annotated sequences of the 16S rRNA gene or its parts allow us to perform taxonomic classification of the microbiome. Generally, the obtained 16S rRNA sequences are classified into taxonomic units based on one of the four 16S rRNA reference databases, which are currently extensively used in the analysis of amplicon sequencing taxonomies, i.e., RDP [34], SILVA [35], Greengenes [36] or NCBI [37]. After this, the metagenetic samples are compared in detail. Mapping them into a common taxonomy showed how similar these taxonomies were and whether ASVs (amplicon sequence variants) obtained from one taxonomy could be transferred to another classification. Each of the commonly used databases, i.e., RDP, SILVA, Greengenes, and NCBI, is based on sources that allow the compilation of taxonomies in different ways. Thus, they differ in size and resolution. Therefore, the comparison of ASV results based on various taxonomic classifications is unclear and complicated. Unfortunately, there is no information available on which commercial laboratories generate more accurate targeted amplicon data.

The present study aimed to compare HTS results (microbial content sequences) produced independently in three commercial laboratories, commonly chosen by researchers worldwide, which we provided with DNA extracted from the same seawater sample taken from the Gulf of Gdansk of the Baltic Sea. Next, using the datasets of microbial sequences obtained from each company, we compared the taxonomic classifications obtained using the Greengenes and SILVA 16S rRNA gene reference databases. This reference is the most widely used database that provides bacterial and archaeal taxonomic classification based on phylogenetic trees [38]. In turn, the SILVA database provides taxonomic classification for bacteria, archaea, and eukarya domains based on common phylogenetic patterns. The problem with differences in HTS as a result of applying different sets of primers and PCR conditions (the annealing temperature, number of cycles and polymerase type) has been intensively explored using mock [39,40,41,42] and, to a lesser extent, natural communities [43,44]. The novelty of our approach relies upon comparing metagenetic data obtained from the same sample prepared according to the same laboratory protocol and indicating how laboratory-to-laboratory differences in high-throughput sequencing and different taxonomy libraries could affect post-analysis and interpretation of the microbial composition. We also used “mock community” samples that contained extracted DNA of defined sets of microorganisms to evaluate HTS reproducibility between the commercial laboratories and reference 16S RNA databases.

2. Materials and Methods

2.1. Sample Collection

A 5 L water sample was collected in the southern Baltic Sea (Gulf of Gdansk) in February 2019. The sampling point was located in Gdynia-Orlowo at 54°28′47″ N and 18°34′00″ E, and the depth at the station was 2.5–3 m. A surface seawater sample was acquired using a plastic oceanographic bucket (the bucket was rinsed three times with seawater collected at the examined station before sampling) in one haul dragged horizontally along the pier over a distance of approximately 10 m. Afterwards, the sample was transported to the Institute of Oceanography, University of Gdansk laboratory and immediately processed. The water sample was first filtered through Whatman Grade 4 filter paper, followed by filtration through a 0.45 μm Whatman GF/C glass microfiber filter (Cytiva, Danaher Corporation (“Danaher”), Washington, DC, USA) to capture a wide range of microbial cell sizes. After filtration, the membranes were frozen (−80 °C) for a month until DNA extraction. “Mock communities” included defined microbial strains, with a known 16S RNA sequence, isolated from a commercial mixture of pure cultures of 8 bacterial and 2 yeast strains (ZymoBIOMICS™ Microbial Community DNA Standard, Zymo Research Corporation, Irvine, CA, USA) and 24 pure bacterial cultures (the cocktail of a ”mock community” obtained from the following two collections: the IOMB Strain Collection (Molecular Biology Laboratory, Institute of Oceanology of the Polish Academy of Sciences), Poland, curated by Dr. Ewa Kotlarska and the Collection of Plasmids and Microorganisms (KPD), Faculty of Biology, University of Gdansk, Poland, curated by Dr. Anna-Karina Kaczorowska). More details can be found in Table 1.

2.2. DNA Isolation

Isolation of the total DNA from the microorganisms collected by seawater filtration was performed using a commercial Genomic Mini AX Bacteria + kit (A&A Biotechnology, Gdansk, Poland), with some modifications to the manufacturer’s instructions (manual provided at https://www.aabiot.com/en/download?code=b90fbeba89bfca87d77964c7d33b7f6db6dcaf35, accessed on 7 March 2019). Briefly, the filter was placed in 300 µL of lysis buffer supplemented with 40 µL of proteinase K (Genomic Mini kit) and 50 µL of lysozyme (5 mg/mL, Fisher BioReagents, Loughborough, Leicestershire, UK). Then, the sample was vortexed for 30 s and incubated with shaking overnight at 37 °C, followed by incubation for 2 h at 50 °C. Next, steps were performed according to the manufacturer’s instructions. Bacterial strains from the IOMB and KPD culture collections (“mock community” positive control), before DNA isolation, were cultivated with agitation in LB Broth (Luria/Miller) (Carl Roth GmbH + Co. KG, Karlsruhe, Germany) at 30 °C. Total DNA isolation was performed using the above-mentioned kit according to the Gram-negative and Gram-positive bacteria protocol. Laboratory procedures were conducted with sterile equipment to avoid sample cross-contamination. Isolated DNA was quantified using a NanoDrop ND-1000 UV–vis spectrophotometer (Thermo Fisher Scientific, Waltham, MA, USA) and stored at −80 °C. We also prepared blank control samples (negative control) for PCR amplification, which contained an ultra-pure water sample, and extracted DNA from the filtration of 2 L of distilled water. The negative control samples in our study were used to detect possible contaminant DNA. However, for these samples, no PCR products were obtained.

2.3. 16S rRNA Amplicon Library Generation

Three commercial laboratories independently performed high-throughput sequencing of the V3–V4 hypervariable region of the bacterial 16S rRNA gene, using the Illumina MiSeq platform with paired-end (PE) technology. We decided not to disclose the company names, and thus used the following abbreviations: “EF”, “GM”, and “MG”. All information concerning sequencing technology, amplified regions of bacterial 16S rRNA, specificity of the primers used, type of reads, applied 16S metagenomic library protocol (including PCR conditions for amplification of the V3–V4 region), automatic de-multiplexing of processed sequences and primary sequence analysis is presented in Table 2.

The generated raw HTS reads obtained from each vendor were deposited under the study accession number PRJNA828587 in NCBI BioSample (https://www.ncbi.nlm.nih.gov/bioproject/PRJNA828587, accessed on 12 October 2022).

2.4. Taxonomic Classification of the Bacterial 16S rRNA Gene

The quality control of the raw reads was performed using fastqc [51] and MultiQC [52] tools. The sequenced data were processed with the Qiime2 [53] platform. Then, the sequences were processed using DADA2 [54] and prepared for further processing using the following three approaches: without trimming, equal trimming, and with optimal trimming. In the first approach, only the raw reads, without trimming, were used; in the second attempt, both adapter trimming and sequence trimming to 250 bp were performed, while in the third approach, adapter trimming and quality (phred33) trimming were performed. The second dataset was prepared by removing adapters with the Trimmomatic tool and cropping all sequences to an equal length of 250 bp in all samples. Illumina adapter trimming was performed using the Trimmomatic tool without quality trimming (ILLUMINACLIP:TruSeq3-PE.fa:2:30:10). Quality trimming was carried out on QIIME2 (1. untrimming, 2. Equal length: --p-trim-left-f 0 --p-trim-left-r 50 --p-trunc-len-f 250 --p-trunc-len-r 250, 3. optimal trimming adjusted to the length and quality of the sample). As implemented in the q2-dada2 plugin, this quality control process additionally filtered any phiX sequences (commonly present in the marker gene Illumina sequence data) that were identified in the sequencing data and filtered chimeric sequences (according to the documentation found at https://docs.qiime2.org/2020.8/ accessed on 20 June 2020). Trimming to 250 bp using the following command was also performed: qiime dada2 denoise-paired --p-trim-left-f 0 --p-trim-left-r 0 --p-trunc-len-f 250 --p-trunc-len-r 250. Taxonomic classifications of all datasets were performed (with the feature-classifier plugin in Qiime2), using four pre-trained classifiers of the following two most popular databases (to show how important the standardization of methodology is and how divergent results could be obtained from them): SILVA v.132-99 [55] (full-reference database for marine microbiome community and “mock community” samples and only additional 16S 515-806 regions for microbiome community of seawater sample) and Greengenes v.13-8-99 [56] (full-reference database for both samples and only additional 16S 515-806 regions for marine microbiome community). Taxa classification using the qiime feature-classifier classify-sklearn with default parameters was performed (https://docs.qiime2.org/2020.8/plugins/available/feature-classifier/classify-sklearn/ accessed on 20 June 2020). Tabular and numerical data were analyzed and visualized using the R 4.0.3 package [57]. The saturation of ASVs was also evaluated (in Qiime2) based on the rarefaction curve for Chao1 diversity indices.

3. Results

3.1. Overall Statistics, Trimming, and De-Multiplexing with DADA2

The highest number of sequences was produced by the MG company, which was two times higher than that obtained by EF and over 50% higher than in the case of HTS carried out at GM. The number of nucleotides was also the highest for the sequential data obtained by MG, which in total, with the number of sequences, gave several times more data compared to the outputs from EF and GM companies (Figure 1, Table 3). In addition, the lengths and quality of the sequences indicated that untrimmed, raw data directly obtained from the sequencing were provided only by MG. In turn, HTS sequences were pre-trimmed in the case of EF and GM. Therefore, the results described here as “not-trimmed” (first approach; only the sequences, without trimming) did not reflect the outcomes for truly untrimmed data, but those that the vendor provided. Thus, the sequencing conducted by MG gave the most favorable results (most untrimmed data) (Figure 1). Demultiplexing of untrimmed sequences confirmed the above findings. HTS sequences obtained from EF and GM laboratories, as pre-trimmed sequences, allowed us to detect the highest number of sequences, nearly 60,000 and 20,000, respectively, compared to the number of sequences obtained from MG, which was above 6000 (Table 3). These results were not surprising because HTS sequences from the EF and GM laboratories were pre-trimmed. Without trimming, sequence ends contain adapter sequences, which meant that the left and right sequences did not pair properly and the final result included paired reads without adapters. Removing adapter sequences and low-quality segments, while maintaining maximum sequence lengths (optimal trimming), increased the number of sequences by almost 10-fold. It was comparable to the number of sequences detected by EF. It should be noted, however, that the total number of base pairs sequenced at GM was two times greater than that obtained by EF; thus, the obtained results concerning the highest number of raw reads were more accurate in the case of GM. Optimal trimming (removal of adapters and trimming of low-quality sequences) of HTS sequences from GM resulted in sequence trimming that meant that they did not pair properly (not enough sequences remained and the overlap was too low). No sequences were identified that could be taxonomically classified, due to the poor quality of input records and their low numbers.

To carry out taxonomic classification and compare datasets from three companies, we decided to shorten all the sequences to the length of the shortest sequences (250 bp), even at the risk of potential data reduction, for all three HTS sets. In addition, all sequences were evened out, regardless of the adapter sequences, assuming that part of the sequences obtained from MG (those whose lengths after cutting off the adapters were less than 250 bp) was rejected at the stage of pairing and alignment. As a result, 17,668 sequences of HTS from the GM laboratory (comparable number to data without trimming), 66,140 sequences of HTS from the EF laboratory, and 27,059 sequences of HTS from the MG laboratory, respectively, were generated.

3.2. 16S rRNA Community Analysis

The data on the microbial content of the samples from the Gulf of Gdansk (Baltic Sea) generated by the three commercial laboratories varied in the number of sequences, from 88,135 to 164,764 sequences, after denoising and chimera checking. Taxonomic classification was carried out at seven levels (domain, phylum, class, order, family, genus, and species) using two databases, i.e., Greengenes and SILVA, with the following two reference data sets: only variable fragments (515–806) between V3 and V4 16S rRNA and full-reference data sets. Finally, a total of four reference data sets were used. Our findings show that the number of ASVs (defined as a unique taxonomic feature) ranged from 137 to 564 and these differences corresponded to the databases used, i.e., Greengenes vs. SILVA (Table 4; Tables S1–S4). This alpha rarefaction curve flattened to the right, indicating that a sufficient number of sequences was used to identify ASVs (Figure 2; Table S5).

All ASV tables are available in the Supplementary Materials (Tables S1–S4).

At the domain level, only HTS data obtained from the MG laboratory allowed us to identify sequences that belonged to bacteria and archaea domains. Only sequences that belonged to bacteria were identified in the other two datasets obtained for the same microbiome sample and processed by EF and GM laboratories (Table 4). The three microbiome profiles contained 36 phyla; however, 22 were identified using the Greengenes database and 29 by the SILVA database, respectively (Table 5). Proteobacteria were the most abundant phylum in all the taxonomic classifications (56% (EF), 47% (GM), and 39% (MG), respectively). Actinobacteria and bacteroides were the second and third most dominant phylum in both ASV classifications (21%/16% (EF), 14%/9% (GM), and 9%/14% (MG), respectively). Moreover, the microbiome contained specific/unique phyla found in low frequency (<5%), i.e., Caldithrix, Chlorobi, GN02, GN04, TM6, TM7, and WS2, which were denoted using Greengenes, while Calditrichaeota, Dadabacteria, Deinococcus-Thermus, Dependentiae, Epsilonbacteraeota, Kiritimatiellaeota, Latescibacteria, Modulibacteria, Nitrospinae, Nitrospirae, Omnitrophicaeota, Poribacteria, Tenericutes and Zixibacteria were detected using the SILVA database. The difference in percentages of these unique phyla in all three microbiome profiles reflects the effect of the database used as the reference. In addition, HTS results from three laboratories also showed differences in ASV classification. The highest number of phyla in total, 22, was obtained from the EF and MG commercial laboratories. Only 13 phyla were identified based on the results from the GM commercial laboratory (Table 5).

Analysis of the correlation between the three samples after classifying ASVs at the species level showed that a higher correlation was observed within the reference size (fragments vs. full-reference) and between samples. This result was similar for both databases, but was more pronounced in Greengenes (Figure 3).

To compare the quality of taxonomic assignment based on individual methods, we characterized the number of shared (common) ASVs (determined based on the sequence variants with identical sequences) (Figure 4). The number of common ASVs shared between the microbiome datasets obtained from the three laboratories was higher for all taxonomic ranks than the number of identified unique ASVs, i.e., only those that received a name after taxonomic assignment (Figure 4). Common ASVs were limited for the data obtained from GM compared to MG and EF, which were observed at each taxonomic level (Table 5). In turn, MG’s data gave more ASVs than EF’s data at lower levels, and this number decreased at higher levels, but it was comparable (Figure 4 and Table 5). The number of ASVs at lower taxonomic levels suggested that we would obtain more ASVs at the ASV classification level with a comparable number of sequences after the de-multiplexing step. Figure 4 and Table 5 show that the SILVA database allows us to obtain more sequences classified as ASVs, and regardless of the use of “full-reference” or “515–806”, the results were consistent and more homogeneous, as compared to the Greengenes database.

3.3. Putative Bacterial Endosymbionts

We identified three ASVs that were classified as putative bacterial endosymbionts. Rickettsiales ASV was present in all ASV taxonomies and HTS datasets received from all three laboratories. In turn, sequence mapping to a single putative bacterial endosymbiont was performed only in the case of the Greengenes database for “Candidatus Xiphinematobacter” (only from the EF laboratory) and “Candidatus Portiera” (from three laboratories) (Table 6).

3.4. Mock Community

To assess the bias in HTS sequencing in the three commercial laboratories, a “mock community” positive control, consisting of a commercial community cocktail that contained the DNA of 8 microorganisms (with two additional yeast strains) and DNA isolated from 24 pure bacterial cultures, was applied. The alpha rarefaction graph showed that a sufficient number of sequences was used to identify ASVs (Figure 2). The expected versus observed composition of the “mock community” control, with the relative abundance of the applied microorganisms, is listed in Table 7. The following number of sequences were obtained from the three commercial laboratories: 122,914 (EF), 160,494 (MG), and 184,891 (GM). Using two 16S rRNA reference databases, i.e., Greengenes and SILVA, only 13 out of 32 microorganisms had a uniform annotation, and the remaining differed in the level of taxonomic classification. Nevertheless, the sequences obtained from the MG and GM laboratories were accurate for the mock microbial community DNA and all bacterial species were identified. In turn, the sequences obtained from the EF laboratory appeared to be the least accurate because three bacterial strains were not identified. None of the yeast DNA present in the commercial “mock community” was found in any laboratory. Overall, the expected sequences from the bacterial species cultured in the laboratory were generally consistent with those identified from the Greengenes and SILVA databases. The observed differences in clustered ASV sequences were related to the assigned taxonomic level. However, all the expected bacterial species were identified in the datasets obtained only from the two laboratories (i.e., GM and MG). In turn, the datasets from all laboratories identified all the bacterial species derived from the ZymoBIOMICS™ Microbial Community DNA Standard and no errors in the taxonomic classification were found. Nevertheless, in both cases, the different frequency of individual bacterial species in the microbial composition indicates that rare bacteria in the microbial composition could be identified less frequently.

It should be noted that the prices for the HTS analysis (per sample) performed in the three commercial laboratories presented here were as follows: EUR 162 (GM), EUR 68 (EF), and EUR 65 (MG).

4. Discussion

For accurate amplicon-based metagenetic analysis, the choice of molecular methodology and the taxonomic library is essential to convert raw sequencing data into biologically meaningful information, including the most detailed taxonomic results. Here, we assessed to what extent the differences in HTS datasets obtained from the three commercial laboratories and the application of two different curated public databases affect the final findings on the observed sample alpha-diversity and whether it is reproduced at the ASV level. We applied the Greengenes database, which contains taxonomic information for the domains of bacteria and archaea [36], and SILVA taxonomy, which is dedicated to the domains of bacteria, archaea and eukarya [35]. Finally, by comparing taxonomies using both sequencing libraries from the databases, we found that the number of identified ASVs was the lowest for the Greengenes database. It may be connected with the fact that the SILVA taxonomy is the largest out of the 16S-based taxonomies, which is additionally shared with the NCBI, and ultimately generated more ASV samples. The Greengenes database has not been updated since May 2013 [38]. Sequencing of the “mock communities” supported the results obtained for the marine sample, confirming that the taxonomic identification pipeline and selected commercial laboratory accuracy are crucial to obtain reliable sequencing results. The consequences of taxonomic misclassification in HTS analysis can be serious, especially in the cases of incorrect diagnoses in clinical microbiology [63]. These aspects are also critical when screening environmental samples for rare or low-abundant taxa [64].

The most critical step in the investigation of microbial diversity based on the 16S rRNA gene amplicon analysis is the choice of primers [50,65]. Primer bias is a well-known phenomenon in molecular studies, which has been already addressed [66]. The choice of primer set directly affects the relative abundances [67]. It was also previously indicated that using inappropriate sets of primers provided an output in the form of questionable biological conclusions (e.g., [68]), i.e., applying suboptimal sets of primers can lead to the under-representation of ASVs [69]. In addition, the selection against single species, as well as whole groups, was also shown (e.g., [70]). In our study, the sequencing companies used different primer sets, which precludes the direct comparison of the sequencing results. Consequently, primer pairs need to be carefully selected to avoid accumulative bias. Bacteria and archaea represent major drivers of the biogeochemical cycles [12], and it is worth underlining that only the HTS data obtained from the MG laboratory identified sequences that belonged to both of these domains. This observation indicated that the primers used by the MG commercial laboratory could recover sequences from archaea. On the contrary, the primers used by EF [47,48] and GM [49] commercial laboratories were specific to bacteria. In addition, the quantity and quality of raw data received from the MG laboratory allowed quality control and sensitivity analysis.

In all three commercial laboratories, investigations were performed by sequencing the V3–V4 hypervariable region of the 16SrRNA gene (the choice of the hypervariable region can affect the microbial community composition, e.g., [70]). For this purpose, the Illumina MiSeq platform was used. The process leading to HTS amplicon library preparation is challenging and prone to biases and errors [39]. The parameters of PCR-based DNA amplification are critical as they can affect the quality of amplicons and, subsequently, further analyses. Many attempts have been made to standardize HTS methods [71]. These efforts resulted in optimizing protocols for amplicon-based microbiome analyses [39]. However, some problems still exist [65].

Until recently, the genetic diversity of the marine microbiome has been poorly explored; however, it has been indicated that microorganisms are crucial for biogeochemical cycling and, thus, for marine ecosystems’ function. It was also observed that microbial communities change with the depth of water columns [72], i.e., Actinobacteria and Firmicutes emerge, while Alphaproteobacteria and Bacteroidetes become less abundant [12]. However, as in the global ocean’s microbiome, Alphaproteobacteria were the dominant bacteria in the Baltic Sea, as revealed by the previous studies [73,74]. Moreover, the 16S rRNA gene fragment analysis revealed that seawater bacterial communities consisted of a small number of dominant taxa and a high number of so-called ‘rare biosphere’ taxa [75]. The latter contribute disproportionately to community dynamics [76]. The superior, rare vs. dominant bacterial taxa observed in our study were strongly affected by the applied taxonomic library and the laboratory in which HTS was performed. Overall, the most dominant phyla were Proteobacteria and Bacteroides, which is consistent with the results of previous studies that focused on taxa analysis of Baltic Sea bacterioplankton communities [77,78]. Nevertheless, due to the lack of optimized laboratory “gold standard” protocols and post-processing bioinformatic analysis of microbial communities, the comparison of our data with previously published reports on microbial communities in the Baltic Sea cannot be performed correctly. In the Supplementary Material (Table S6, [79,80,81,82]), we have provided examples of the use of different methodologies, that is, different approaches to microbiome studies based on sequencing (high-throughput sequencing and “shotgun” sequencing), regions of the bacterial 16S rRNA gene and primers, as well as reference sequence databases. Thus, comparing microbiome data is challenging and the data presented in this study may be of particular interest to researchers who plan to perform microbiome community analysis using the 16S rRNA gene, including the identification of putative bacterial endosymbionts. The present study can also serve as a guide for selecting a pipeline to obtain the most accurate and reproducible data. Moreover, taxonomy is still subject to progressive changes; thus, in Table 5 we have provided both the phylum names used in the databases and those according to the new taxonomy proposed by Oren and Garrity [58]. Overall, being aware of the problems that arise when comparing samples of microbiome communities is crucial to avoid critical errors in interpreting data sets.

Interestingly, we also identified putative bacterial endosymbionts, whose ASV presence depended on the laboratory and/or library data used. One endosymbiont was an order of Rickettsiales (intracellular bacteria) that inhabit many eukaryotic hosts, including protists and metazoans, as pathogens or endosymbionts. Some of them are capable of manipulating host cell reproduction [83,84]. These bacteria tend to decrease their genome size and content, while increasing their pathogenicity [85,86,87,88]. In turn, the identified “Candidatus Xiphinematobacter” is a Gram-stain-negative non-motile and non-sporulating bacterium that typically inhabits plant pathogenic nematodes (the Xiphinema americanum group). It occurs in the germinal zones of both ovaries and seems entirely dependent on its female nematode host. This endosymbiont presumably induces mother-to-daughter parthenogenesis in its host [89,90]. Furthermore, this putative bacterial endosymbiont was recorded in the gut of wild freshwater fish (tench: Tinca tinca) [91]. To date, there is no evidence of its survival in the external environment [92]. Therefore, we presume that the sequences that correspond to this bacterium in the sample probably originated from host cells that were present in filtrated water.

Another ASV in our dataset is a putative bacterial endosymbiont matched to the genus “Candidatus Portiera”, an endosymbiont of the whitefly [93]. “Ca. Portiera” was reported to be a common marine γ-proteobacteria [94], potentially contributing to the degradation of hydrocarbons [95]. In the Gulf of Maine, “Ca. Portiera” is in high abundance in seawater samples, although this bacterium was also noted in the guts of copepods, but in significantly lower numbers [96].

5. Conclusions

In summary, we evaluated the differences and similarities in the HTS results obtained depending on the selection of the reference 16S rRNA database and the choice of commercial laboratory in which high-throughput sequencing was performed. Each of the three laboratories involved used different procedures at the stage of amplicon library preparation. As a result, we obtained different microbiome profiles for the same seawater sample, and in the control experiment for mock community samples. Our results show why optimizing library processing protocols and using standardized procedures are essential in analyzing metagenetic datasets. Otherwise, the comparison of HTS data obtained by researchers from different laboratories will be burdened with significant bias. In our study, sequencing conducted by a laboratory designated as MG appeared to be the most beneficial in terms of the number of sequences and total frequency of ASVs per sample. It seems that this commercial laboratory is a better choice for researchers. Moreover, only HTS data from this laboratory allowed us to identify sequences from the domains of bacteria and archaea. We found that different taxonomic libraries could affect the post-analysis and interpretation of the microbial composition. In general, the number of unique and common ASVs was limited for the data obtained from the GM laboratory compared to others (EF and MG). It was congruent at each taxonomical level. When comparing Greengenes and SILVA databases, we could not conclude which of the two 16S rRNA reference databases were superior for amplification-based metagenetic analysis. On the one hand, the SILVA database allowed us to obtain more sequences classified as ASVs. It was also found that the taxonomic classification varied in unique ASVs at all seven taxonomic levels (domain, phylum, class, order, family, genus, and species). On the other hand, more ASVs that were classified as putative bacterial endosymbionts were identified using the Greengenes database, including Rickettsiales, “Candidatus Xiphinematobacter“, and “Candidatus Portiera“; the SILVA database allowed for the detection of Rickettsiales only. Nevertheless, the most abundant results concerning the number of classified ASVs at the species level were obtained using the SILVA full-reference library, and this database allowed us to conduct a deeper analysis of the microbiome profile. Thus, our study suggests that the choice of methodology for analyzing microbial communities is crucial to answering the question “Who is there?”, which is fundamental to microbial ecology.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/w14233855/s1, Tables S1–S4. List of the amplicon sequence variants (ASVs) obtained from the Greengenes and SILVA libraries: S1-gg-13-8-99-nb-classifierASVs_table full-reference, S2-gg-13-8-99-515-806-nb-classifierASVs_table, S3-silva-132-99-nb-classifierASVs_table full reference; S4-silva-132-99-515-806-nb-classifierASVs_table; Table S5. Calculated asymptotes and coverages for Chao1 diversity index based on the ASV numbers generated using the SILVA full-reference database; Table S6. Comparison of the HTS methodology applied for the analysis of the composition of the Baltic Sea microbiome [17,18,47,48,49,50,79,80,81,82].

Author Contributions

Conceptualization, M.M. and A.W.-Z.; methodology, M.M.; formal analysis, M.M, J.P.J. and A.T.; investigation, M.M.; resources, A.I., A.-K.K. and E.K.; writing—original draft preparation, M.M., A.I., J.P.J., A.-K.K., E.K., A.T. and A.W.-Z.; writing—review and editing, M.M., A.I., J.P.J., A.-K.K., E.K., A.T. and A.W.-Z.; visualization, M.M. and A.W.-Z.; supervision, A.W.-Z.; funding acquisition, A.W.-Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Science Centre, Poland, grant number 2017/27/B/NZ8/01056.

Data Availability Statement

The generated raw HTS reads obtained from each vendor were deposited under the study accession number PRJNA828587SUB11354992 in NCBI BioSample (https://www.ncbi.nlm.nih.gov/bioproject/PRJNA828587, accessed on 12 October 2022).

Acknowledgments

The authors would like to thank Ryszard Kuczyński from the Department of Marine Plankton Research (University of Gdansk) for his help in collecting water samples from the Baltic Sea. We are immensely grateful to Tadeusz Kaczorowski from the Laboratory of Extremophiles Biology, Department of Microbiology (University of Gdansk), for their detailed comments that helped to improve the manuscript. This research was supported in part by PLGrid Infrastructure.

Conflicts of Interest

The authors declare no conflict of interest.

References

Arrigo, K.R. Marine microorganisms and global nutrient cycles. Nature 2005, 437, 349–355. [Google Scholar] [CrossRef] [PubMed]
Handelsman, J. Metagenomics: Application of genomics to uncultured microorganisms. Microbiol. Mol. Biol. Rev. 2004, 4, 669–685. [Google Scholar] [CrossRef] [PubMed] [Green Version]
DeLong, E.F. The microbial ocean from genomes to biomes. Nature 2009, 459, 200–206. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Rinke, C.; Schwientek, P.; Sczyrba, A.; Ivanova, N.N.; Anderson, I.J.; Cheng, J.-F.; Darling, A.; Malfatti, S.; Swan, B.K.; Gies, E.A.; et al. Insights into the phylogeny and coding potential of microbial dark matter. Nature 2013, 499, 431–437. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Aevarsson, A.; Kaczorowska, A.K.; Adalsteinsson, B.T.; Ahlqvist, J.; Al-Karadaghi, S.; Altenbuchner, J.; Arsin, H.; Átlasson, Ú.Á.; Brandt, D.; Cichowicz-Cieślak, M.; et al. Going to extremes—A metagenomic journey into the dark matter of life. FEMS Microbiol. Lett. 2021, 368, fnab067. [Google Scholar] [CrossRef] [PubMed]
Stahl, D.A.; Lane, D.J.; Olsen, G.J.; Pace, N.R. Analysis of hydrothermal vent associated symbionts by ribosomal RNA sequences. Science 1984, 224, 409–411. [Google Scholar] [CrossRef]
Stahl, D.A.; Lane, D.J.; Olsen, G.J.; Pace, N.R. Characterization of a Yellowstone hot spring microbial community by 5S rRNA sequences. Appl. Environ. Microbiol. 1985, 49, 1379–1384. [Google Scholar] [CrossRef] [Green Version]
Bourlat, S.J.; Borja, A.; Gilbert, J.; Taylor, M.I.; Davies, N.; Weisberg, S.B.; Griffith, J.F.; Lettieri, T.; Field, D.; Benzie, J.; et al. Genomics in marine monitoring: New opportunities for assessing marine health status. Mar. Pollut. Bull. 2013, 74, 19–31. [Google Scholar] [CrossRef]
Mioduchowska, M.; Zając, K.; Bartoszek, K.; Madanecki, P.; Kur, J.; Zając, T. 16S rRNA-based metagenomic analysis of the gut microbial community associated with the DUI species Unio crassus (Bivalvia: Unionidae). J. Zoolog. Syst. Evol. Res. 2020, 58, 615–623. [Google Scholar] [CrossRef]
Wei, Z.; Gu, Y.; Friman, V.-P.; Kowalchuk, G.A.; Xu, Y.; Shen, Q.; Jousset, A. Initial soil microbiome composition and functioning predetermine future plant health. Sci. Adv. 2019, 5, eaaw0759. [Google Scholar] [CrossRef]
Lee, C.S.; Kim, M.; Lee, C.; Yu, Z.; Lee, J. The microbiota of recreational freshwaters and the implications for environmental and public health. Front. Microbiol. 2016, 7, 1826. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Stal, J.L.; Cretoiu, M.S. The Marine Microbiome: An Untapped Source of Biodiversity and Biotechnological Potential; Springer: Cham, Switzerland, 2016; pp. 1–229. [Google Scholar]
Sogin, M.L.; Morrison, H.G.; Huber, J.A.; Welch, M.D.; Huse, S.M.; Neal, P.R.; Arrieta, J.M.; Herndl, G.J. Microbial diversity in the deep sea and the underexplored “rare biosphere”. Proc. Natl. Acad. Sci. USA 2006, 103, 12115–12120. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Fonselius, S.; Valderrama, J. One hundred years of hydrographic measurements in the Baltic Sea. J. Sea Res. 2003, 49, 229–241. [Google Scholar] [CrossRef] [Green Version]
Hardeman, F.; Sjoling, S. Metagenomic approach for the isolation of a novel low-temperature-active lipase from uncultured bacteria of marine sediment. FEMS Microbiol. Ecol. 2007, 59, 524–534. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Rheinheimer, G.; Gocke, K.; Hoppe, H.G. Vertical distribution of microbiological and hydrographic-chemical parameters in different areas of the Baltic Sea. Mar. Ecol. Prog. Ser. 1989, 52, 55–70. [Google Scholar] [CrossRef]
Herlemann, D.P.; Labrenz, M.; Jürgens, K.; Bertilsson, S.; Waniek, J.J.; Andersson, A.F. Transitions in bacterial communities along the 2000 km salinity gradient of the Baltic Sea. ISME J. 2011, 5, 1571–1579. [Google Scholar] [CrossRef] [Green Version]
Hu, Y.O.; Karlson, B.; Charvet, S.; Andersson, A.F. Diversity of Pico-to mesoplankton along the 2000 km salinity gradient of the Baltic Sea. Front. Microbiol. 2016, 7, 679. [Google Scholar] [CrossRef] [Green Version]
Hugerth, L.W.; Larsson, J.; Alneberg, J.; Lindh, M.V.; Legrand, C.; Pinhassi, J.; Andersson, A.F. Metagenome-assembled genomes uncover a global brackish microbiome. Genome Biol. 2015, 16, 279. [Google Scholar] [CrossRef] [Green Version]
Alneberg, J.; Sundh, J.; Bennke, C.; Beier, S.; Lundin, D.; Hugerth, L.W.; Pinhassi, J.; Kisand, V.; Riemann, L.; Jürgens, K.; et al. BARM and BalticMicrobeDB, a reference metagenome and interface to meta-omic data for the Baltic Sea. Sci. Data 2018, 5, 180146. [Google Scholar] [CrossRef] [Green Version]
Azam, F.; Fenchel, T.; Field, J.G.; Gray, J.S.; Meyer-Reil, L.A.; Thingstad, F. The ecological role of water-column microbes in the sea. Mar. Ecol. Prog. Ser. 1983, 10, 257–263. [Google Scholar] [CrossRef]
Taylor, A.H.; Joint, I. A steady-state analysis of the ‘microbial loop’ in stratified systems. Mar. Ecol. Prog. Ser. 1990, 59, 1–17. [Google Scholar] [CrossRef]
Guo, B.; Zhang, L.; Sun, H.; Gao, M.; Yu, N.; Zhang, O.; Mou, A.; Liu, Y. Microbial co-occurrence network topological properties link with reactor parameters and reveal the importance of low-abundance genera. Npj Biofilms Microbiomes 2022, 8, 3. [Google Scholar] [CrossRef] [PubMed]
O’Brien, P.A.; Webster, N.S.; Miller, D.J.; Bourne, D.G. Host-microbe coevolution: Applying evidence from model systems to complex marine invertebrate holobionts. mBio 2019, 10, e02241-18. [Google Scholar] [CrossRef] [Green Version]
Pais, R.; Lohs, C.; Wu, Y.; Wang, J.; Aksoy, S. The obligate mutualist Wigglesworthia glossinidia influences reproduction, digestion, and immunity processes of its host, the tsetse fly. Appl. Environ. Microbiol. 2008, 74, 5965–5974. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Ruehland, C.; Blazejak, A.; Lott, C.; Loy, A.; Erséus, C.; Dubilier, N. Multiple bacterial symbionts in two species of co-occurring gutless oligochaete worms from Mediterranean Sea grass sediments. Environ. Microbiol. 2018, 10, 3404–3416. [Google Scholar] [CrossRef] [PubMed]
Webster, N.S.; Thomas, T. The Sponge Hologenome. mBio 2016, 7, e00135-16. [Google Scholar] [CrossRef] [Green Version]
Krediet, C.J.; Ritchie, K.B.; Paul, V.J.; Teplitski, M. Coral-associated micro-organisms and their roles in promoting coral health and thwarting diseases. Proc. R. Soc. B 2013, 280, 20122328. [Google Scholar] [CrossRef] [Green Version]
Sergeant, M.J.; Constantinidou, C.; Cogan, T.; Penn, C.W.; Pallen, M.J. High-throughput sequencing of 16S rRNA gene amplicons: Effects of extraction procedure, primer length and annealing temperature. PLoS ONE 2012, 7, e38094. [Google Scholar] [CrossRef]
Cruaud, P.; Vigneron, A.; Lucchetti-Miganeh, C.; Ciron, P.E.; Godfroy, A.; Cambon-Bonavita, M.A. Influence of DNA extraction method, 16S rRNA targeted hypervariable regions, and sample origin on microbial diversity detected by 454 pyrosequencing in marine chemosynthetic ecosystems. Appl. Environ. Microbiol. 2014, 80, 4626–4639. [Google Scholar] [CrossRef] [Green Version]
Plummer, E.; Twin, J.; Bulach, D.M.; Garland, S.M.; Tabrizi, S.N. A comparison of three bioinformatics pipelines for the analysis of preterm gut microbiota using 16S rRNA gene sequencing data. J. Proteom. Bioinform. 2015, 8, 283–291. [Google Scholar] [CrossRef]
Kennedy, J.; Flemer, B.; Jackson, S.A.; Morrissey, J.P.; O’Gara, F.; Dobson, A.D. Evidence of a putative deep sea specific microbiome in marine sponges. PLoS ONE 2014, 9, e91092. [Google Scholar] [CrossRef] [Green Version]
Balvociute, M.; Huson, D.H. SILVA, RDP, Greengenes, NCBI and OTT—How do these taxonomies compare? BMC Genom. 2019, 18, 114. [Google Scholar]
Wang, Q.; Garrity, G.M.; Tiedje, J.M.; Cole, J.R. Naïve Bayesian classifier for rapid assignment of rRNA sequences into the new bacterial taxonomy. Appl. Environ. Microbiol. 2007, 73, 5261–5267. [Google Scholar] [CrossRef] [Green Version]
Yilmaz, P.; Parfrey, L.W.; Yarza, P.; Gerken, J.; Pruesse, E.; Quast, C.; Schweer, T.; Peplies, J.; Ludwig, W.; Glöckner, F.O. The SILVA and “All-species Living Tree Project (LTP)” taxonomic frameworks. Nucleic Acids Res. 2014, 42, 643–648. [Google Scholar] [CrossRef] [Green Version]
McDonald, D.; Price, M.N.; Goodrich, J.; Nawrocki, E.P.; DeSantis, T.Z.; Probst, A.; Andersen, G.L.; Knight, R.; Hugenholtz, P. An improved Greengenes taxonomy with explicit ranks for ecological and evolutionary analyses of bacteria and archaea. ISME J. 2012, 6, 610–618. [Google Scholar] [CrossRef]
Federhen, S. The NCBI taxonomy database. Nucleic Acids Res. 2012, 40, 136–143. [Google Scholar] [CrossRef] [Green Version]
Caporaso, J.G.; Kuczynski, J.; Stombaugh, J.; Bittinger, K.; Bushman, F.D.; Costello, E.K.; Fierer, N.; Peña, A.G.; Goodrich, J.K.; Gordon, J.I.; et al. QIIME allows analysis of high-throughput community sequencing data. Nat. Methods 2010, 7, 335–336. [Google Scholar] [CrossRef] [Green Version]
Gohl, D.M.; Vangay, P.; Garbe, J.; MacLean, A.; Hauge, A.; Becker, A.; Gould, T.J.; Clayton, J.B.; Johnson, T.J.; Hunter, R.; et al. Systematic improvement of amplicon marker gene methods for increased accuracy in microbiome studies. Nat. Biotechnol. 2016, 34, 942–949. [Google Scholar] [CrossRef] [Green Version]
McGovern, E.; Waters, S.M.; Blackshields, G.; McCabe, M.S. Evaluating established methods for rumen 16S rRNA amplicon sequencing with mock microbial populations. Front. Microbiol. 2018, 9, 1365. [Google Scholar] [CrossRef] [Green Version]
Yeh, Y.C.; Needham, D.M.; Sieradzki, E.T.; Fuhrman, J.A. Taxon disappearance from microbiome analysis reinforces the value of mock communities as a standard in every sequencing run. MSystems 2018, 3, e00023-18. [Google Scholar] [CrossRef] [Green Version]
Gołębiewski, M.; Tretyn, A. Generating amplicon reads for microbial community assessment with next-generation sequencing. J. Appl. Microbiol. 2000, 128, 330–354. [Google Scholar] [CrossRef]
Ibarbalz, F.M.; Pérez, M.V.; Figuerola, E.L.; Erijman, L. The bias associated with amplicon sequencing does not affect the quantitative assessment of bacterial community dynamics. PLoS ONE 2014, 9, e99722. [Google Scholar] [CrossRef] [Green Version]
Piwosz, K.; Shabarova, T.; Pernthaler, J.; Posch, T.; Šimek, K.; Porcal, P.; Salcher, M.M. Bacterial and eukaryotic small-subunit amplicon data do not provide a quantitative picture of microbial communities, but they are reliable in the context of ecological interpretations. mSphere 2020, 5, e00052-20. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Moskot, M.; Kotlarska, E.; Jakóbkiewicz-Banecka, J.; Gabig-Cimińska, M.; Fari, K.; Węgrzyn, G.; Wróbel, B. Metal and antibiotic resistance of bacteria isolated from the Baltic Sea. Int Microbiol 2012, 15, 131–139. [Google Scholar]
Toruńska-Sitarz, A.; Kotlarska, E.; Mazur-Marzec, H. Biodegradation of nodularin and other nonribosomal peptides by the Baltic bacteria. Int. Biodeterior. Biodegrad. 2018, 134, 48–57. [Google Scholar] [CrossRef]
Turner, S.; Pryer, K.M.; Miao, V.P.W.; Palmer, J.D. Investigating deep phylogenetic relationships among cyanobacteria and plastids by small subunit rRNA sequence analysis. J. Eukaryot. Microbiol. 1999, 46, 327–338. [Google Scholar] [CrossRef]
Kisand, V.; Cuadros, R.; Wikner, J. Phylogeny of culturable estuarine bacteria catabolizing riverine organic matter in the N Baltic. Appl. Environ. Microbiol. 2002, 68, 379–388. [Google Scholar] [CrossRef] [Green Version]
Eiler, A.; Heinrich, F.; Bertilsson, S. Coherent dynamics and association networks among lake bacterioplankton taxa. ISME J. 2012, 6, 330–342. [Google Scholar] [CrossRef] [Green Version]
Klindworth, A.; Pruesse, E.; Schweer, T.; Peplles, J.; Quast, C.; Horn, M.; Glöckner, F.O. Evaluation of general 16S ribosomal RNA gene PCR primers for classical and next-generation sequencing-based diversity studies. Nucleic Acids Res 2013, 41, e1. [Google Scholar] [CrossRef]
Andrews, S. FastQC A Quality Control tool for High Throughput Sequence Data. Available online: http://www.bioinformatics.babraham.ac.uk/projects/fast-qc/ (accessed on 25 November 2014).
Ewels, P.; Magnusson, M.; Lundin, S.; Käller, M. MultiQC: Summarize analysis results for multiple tools and samples in a single report. Bioinformatics 2016, 32, 3047–3048. [Google Scholar] [CrossRef] [Green Version]
Bolyen, E.; Rideout, J.R.; Dillon, M.R.; Bokulich, N.A.; Abnet, C.; Al-Ghalith, G.A.; Alexander, H.; Alm, E.J.; Arumugam, M.; Asnicar, F.; et al. QIIME 2: Reproducible, interactive, scalable, and extensible microbiome data science. PeerJ 2018, 6, e27295v2. [Google Scholar] [CrossRef]
Callahan, B.J.; McMurdie, P.J.; Rosen, M.J.; Han, A.W.; Johnson, A.J.A.; Holmes, S.P. DADA2: High-resolution sample inference from Illumina amplicon data. Nat. Methods 2016, 13, 581–583. [Google Scholar] [CrossRef]
Quast, C.; Pruesse, E.; Yilmaz, P.; Gerken, J.; Schweer, T.; Yarza, P.; Peplies, J.; Glöckner, F.O. The SILVA ribosomal RNA gene database project: Improved data processing and web-based tools. Nucleic Acids Res. 2012, 41, D590–D596. [Google Scholar] [CrossRef]
DeSantis, T.Z.; Hugenholtz, P.; Larsen, N.; Rojas, M.; Brodie, E.L.; Keller, K.; Huber, T.; Dalevi, D.; Hu, P.; Andersen, G.L. Greengenes, a chimera-checked 16S rRNA gene database and workbench compatible with ARB. Appl. Environ. Microbiol. 2006, 72, 5069–5072. [Google Scholar] [CrossRef] [Green Version]
R Core Team. R: A Language and Environment for Statistical Computing; R Foundation for Statistical Computing: Vienna, Austria; Available online: https://www.R-project.org/ (accessed on 12 October 2022).
Oren, A.; Garrity, G.M. Valid publication of the names of forty-two phyla of prokaryotes. Int. J. Syst. Evol. Microbiol. 2021, 71, 10. [Google Scholar] [CrossRef]
Skerman, V.B.D.; McGowan, V.; Sneath, P.H.A. Approved lists of bacterial names. Int. J. Syst. Bacteriol. 1980, 30, 225–420. [Google Scholar] [CrossRef] [Green Version]
Parte, A.C.; Sardà Carbasse, J.; Meier-Kolthoff, J.P.; Reimer, L.C.; Göker, M. List of Prokaryotic names with Standing in Nomenclature (LPSN) moves to the DSMZ. Int. J. Syst. Evol. Microbio. 2020, 70, 5607–5612. [Google Scholar] [CrossRef]
The List of Prokaryotic names with Standing in Nomenclature (LPSN). Available online: https://lpsn.dsmz.de/ (accessed on 12 October 2022).
Index Fungorum. Available online: http://www.indexfungorum.org/ (accessed on 12 October 2022).
McIntyre, A.B.; Ounit, R.; Afshinnekoo, E.; Prill, R.I.; Hénaff, E.; Alexander, N.; Minot, S.S.; Danko, D.; Foox, J.; Ahsanuddin, S.; et al. Comprehensive benchmarking and ensemble approaches for metagenomic classifiers. Genome Biol. 2017, 18, 182. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Beaudry, M.S.; Wang, J.; Kieran, T.J.; Thomas, J.; Bayona-Vásquez, N.J.; Gao, B.; Devault, A.; Brunelle, B.; Lu, K.; Wang, J.-S.; et al. Improved microbial community characterization of 16S rRNA via metagenome hybridization capture enrichment. Front. Microbiol. 2021, 12, 644662. [Google Scholar] [CrossRef]
Schloss, P.D.; Gevers, D.; Westcott, S.L. Reducing the effects of PCR amplification and sequencing artifacts on 16S rRNA-based studies. PLoS ONE 2011, 6, e27310. [Google Scholar] [CrossRef] [Green Version]
Bahram, M.; Anslan, S.; Hildebrand, F.; Bork, P.; Tedersoo, L. Newly designed 16S rRNA metabarcoding primers amplify diverse and novel archaeal taxa from the environment. Env. Microbiol. Rep. 2019, 11, 487–494. [Google Scholar] [CrossRef] [Green Version]
Hamady, M.; Knight, R. Microbial community profiling for human microbiome projects: Tools, techniques, and challenges. Genome Res. 2009, 19, 1141–1152. [Google Scholar] [CrossRef]
Baker, G.C.; Smith, J.J.; Cowan, D.A. Review and re-analysis of domain-specific 16S primers. J. Microbiol. Methods 2003, 55, 541–555. [Google Scholar] [CrossRef] [Green Version]
Tringe, S.G.; Hugenholtz, P. A renaissance for the pioneering 16S rRNA gene. Curr. Opin. Microbiol. 2008, 11, 442–446. [Google Scholar] [CrossRef] [Green Version]
Youssef, N.; Sheik, C.S.; Krumholz, L.R.; Najar, F.Z.; Roe, B.A.; Elshahed, M.S. Comparison of species richness estimates obtained using nearly complete fragments and simulated pyrosequencing-generated fragments in 16S rRNA gene-based environmental surveys. Appl. Environ. Microbiol. 2009, 75, 5227–5236. [Google Scholar] [CrossRef] [Green Version]
Gilbert, J.A.; Jansson, J.K.; Knight, R. The Earth Microbiome project: Successes and aspirations. BMC Biol. 2014, 12, 69. [Google Scholar] [CrossRef] [Green Version]
Wu, J.; Gao, W.; Johnson, R.H.; Zhang, W.; Meldrum, D.R. Integrated metagenomic and metatranscriptomic analyses of microbial communities in the meso- and bathypelagic realm of North Pacific Ocean. Mar. Drugs 2013, 11, 3777–3801. [Google Scholar] [CrossRef] [Green Version]
Morris, R.M.; Rappe, M.S.; Connon, S.A.; Vergin, K.L.; Siebold, W.A.; Carlson, C.A.; Giovannoni, S.J. SAR11 clade dominates ocean surface bacterioplankton communities. Nature 2002, 420, 806–810. [Google Scholar] [CrossRef]
Dupont, C.L.; Larsson, J.; Yooseph, S.; Ininbergs, K.; Goll, J.; Asplund-Samuelsson, J.; McCrowm, J.P.; Celepli, N.; Zeigler Allen, L.; Ekman, M.; et al. Functional tradeoffs underpin salinity-driven divergence in microbial community composition. PLoS ONE 2014, 9, e89549. [Google Scholar] [CrossRef]
Gilbert, J.A.; Field, D.; Swift, P.; Newbold, L.; Oliver, A.; Smyth, T.; Somerfield, P.J.; Huse, S.; Joint, I. The seasonal structure of microbial communities in the Western English Channel. Environ. Microbiol. 2009, 11, 3132–3139. [Google Scholar] [CrossRef]
Shade, A.; Jones, S.E.; Caporaso, J.G.; Handelsman, J.; Knight, R.; Fierer, N.; Gilbert, J.A. Conditionally rare taxa disproportionately contribute to temporal changes in microbial diversity. mBio 2014, 5, e01371-14. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Andersson, A.F.; Riemann, L.; Bertilsson, S. Pyrosequencing reveals contrasting seasonal dynamics of taxa within Baltic Sea bacterioplankton communities. ISME J. 2010, 4, 171–181. [Google Scholar] [CrossRef] [PubMed]
Lindh, M.V.; Figueroa, D.; Sjöstedt, J.; Baltar, F.; Lundin, D.; Andersson, A.; Legrand, C.; Pinhassi, J. Transplant experiments uncover Baltic Sea basin-specific responses in bacterioplankton community composition and metabolic activities. Front. Microbiol. 2015, 6, 223. [Google Scholar] [CrossRef] [PubMed]
Kublanov, I.V.; Perevalova, A.A.; Slobodkina, G.B.; Lebedinsky, A.V.; Bidzhieva, S.K.; Kolganova, T.V.; Kaliberda, E.N.; Rumsh, L.D.; Haertlé, T.; Bonch-Osmolovskaya, E.A. Biodiversity of thermophilic prokaryotes with hydrolytic activities in hot springs of Uzon Caldera, Kamchatka (Russia). Appl. Environ. Microbiol. 2009, 75, 286–291. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Walters, W.A.; Caporaso, J.G.; Lauber, C.L.; Berg-Lyons, D.; Fierer, N.; Knight, R. PrimerProspector: De novo design and taxonomic analysis of barcoded polymerase chain reaction primers. Bioinformatics 2011, 27, 1159–1161. [Google Scholar] [CrossRef] [Green Version]
Iasakova, T.R.; Kanapatskiy, T.A.; Toshchakov, S.V.; Korzhenkov, A.A.; Ulyanova, M.O.; Pimenov, N.V. The Baltic Sea methane pockmark microbiome: The new insights into the patterns of relative abundance and ANME niche separation. Mar. Environ. 2022, 173, 105533. [Google Scholar] [CrossRef] [PubMed]
Dinasquet, J.; Kragh, T.; Schroter, M.L.; Søndergaard, M.; Riemann, L. Functional and compositional succession of bacterioplankton in response to a gradient in bioavailable dissolved organic carbon. EMI 2013, 15, 2616–2628. [Google Scholar] [CrossRef] [PubMed]
Merhej, V.; Raoult, D. Rickettsial evolution in the light of comparative genomics. Biol. Rev. 2011, 86, 379–405. [Google Scholar] [CrossRef]
Werren, J.H.; Baldo, L.; Clark, M.E. Wolbachia: Master manipulators of invertebrate biology. Nat. Rev. Microbiol. 2008, 6, 741–751. [Google Scholar] [CrossRef]
Merhej, V.; Royer-Carenzi, M.; Pontarotti, P.; Raoult, D. Massive comparative genomic analysis reveals convergent evolution of specialized bacteria. Biol. Direct 2009, 4, 13. [Google Scholar] [CrossRef] [Green Version]
Fournier, P.-E.; El Karkouri, K.; Leroy, Q.; Robert, C.; Giumelli, B.; Renesto, P.; Socolovschi, C.; Parola, P.; Audic, S.; Raoult, D. Analysis of the Rickettsia africae genome reveals that virulence acquisition in Rickettsia species may be explained by genome reduction. BMC Genomics 2009, 10, 166. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Ogata, H.; Audic, S.; Renesto-Audiffren, P.; Fournier, P.-E.; Barbe, V.; Samson, D.; Roux, V.; Cossart, P.; Weissenbach, J.; Claverie, J.-M.; et al. Mechanisms of evolution in Rickettsia conorii and R. prowazekii. Science 2001, 293, 2093–2098. [Google Scholar] [CrossRef] [PubMed]
Renvoisé, A.; Merhej, V.; Georgiades, K.; Raoult, D. Intracellular Rickettsiales: Insights into manipulators of eukaryotic cells. Trends Mol. Med. 2011, 17, 573–583. [Google Scholar]
Vandekerckhove, T.T.; Coomans, A.; Cornelis, K.; Baert, P.; Gillis, M. Use of the Verrucomicrobia-specific probe EUB338-III and fluorescent in situ hybridization for detection of “Candidatus Xiphinematobacter” cells in nematode hosts. Appl. Environ. Microbiol. 2002, 68, 3121–3125. [Google Scholar] [CrossRef] [Green Version]
Vandekerckhove, T.T.; Navarro, J.B.; Coomans, A.; Hedlund, B.P. Genus II. Candidatus Xiphinematobacter. In Bergey’s Manual of Systematic Bacteriology; Krieg, N.R., Staley, J.T., Hedlund, B.P., Paster, B.J., Ward, N., Ludwig, W., Whitman, W.B., Eds.; Springer: New York, NY, USA, 2011; pp. 838–841. [Google Scholar]
Dulski, T.; Kozłowski, K.; Ciesielski, S. Habitat and seasonality shape the structure of tench (Tinca tinca L.) gut microbiome. Sci. Rep. 2020, 10, 1–11. [Google Scholar] [CrossRef] [Green Version]
Vandekerckhove, T.T.; Willems, A.; Gillis, M.; Coomans, A. Occurrence of novel verrucomicrobial species, endosymbiotic in Xiphinema americanum-group species (Nematoda, Longidoridae) and associated with parthenogenesis. Int. J. Syst. Evol. Microbiol. 2000, 50, 2197–2205. [Google Scholar] [CrossRef] [Green Version]
Bing, X.L.; Yang, J.; Zchori-Fein, E.; Wang, X.W.; Liu, S.S. Characterization of a newly discovered symbiont of the whitefly Bemisia tabaci (Hemiptera: Aleyrodidae). Appl. Environ. Microbiol. 2013, 79, 569–575. [Google Scholar] [CrossRef] [Green Version]
Giovannoni, S.J.; Rappe, M. Chapter 3: Evolution, diversity, and molecular ecology of marine prokaryotes. In Microbial Ecology of the Oceans; Kirchman, D.L., Ed.; Wiley-Liss, Inc.: Hoboken, NJ, USA, 2000; pp. 47–84. [Google Scholar]
Lamendella, R.; Strutt, S.; Borglin, S.; Chakraborty, R.; Tas, N.; Mason, O.U.; Hultman, J.; Prestat, E.; Hazen, T.C.; Jansson, J.K. Assessment of the deepwater horizon oil spill impact on Gulf Coast microbial communities. Front. Microbiol. 2014, 5, 130. [Google Scholar] [CrossRef] [Green Version]
Moisander, P.H.; Sexton, A.D.; Daley, M.C. Stable associations masked by temporal variability in the marine copepod microbiome. PLoS ONE 2015, 10, e0138967. [Google Scholar] [CrossRef]

Figure 1. Unique and duplicate sequences of the amplified bacterial 16S rRNA gene fragment (two paired-end sequences primers described as p1 and p2) obtained by three commercial laboratories (EF, GM and MG).

Figure 2. The rarefaction curve of Chao1 diversity indices of ASVs obtained from three laboratories, including the microbiome community of the analyzed marine sample (EF, GM and MG) and mock community control (EF_MC, GM_MC and MG_MC).

Figure 3. Pearson correlation between ASV classifications at the species level obtained from the Greengenes (left) and SILVA (right) libraries. The samples named “EF”, “GM” and “MG” correspond to 16S fragment databases (515–806); names with “*” refer to full-reference databases.

Figure 4. Comparison of two taxonomies (Greengenes and SILVA) and output of the three commercial laboratories (EF: red, GM: blue and MG: yellow) visualized by Venn diagrams that show ASVs at seven taxonomic levels (L1 domain, L2 phylum, L3 class, L4 order, L5 family, L6 genus and L7 species level).

Table 1. Microbial composition of the “mock communities”: bacterial species cultured in the laboratory and ZymoBIOMICS™ Microbial Community DNA Standard.

Bacterial Species Cultured in the Laboratory
Expected Taxon	Accession Number (Literature Data if Available)	Source of Isolation
Acinetobacter johnsonii (**)	KPD 1303	Rainwater
Aeromonas sp. (*)	IOMB 800	Seawater, the Baltic Sea
Aliivibrio fischeri (previously Vibrio fischeri) (**)	KPD 141	Seawater
Alteromonas haloplanktis (*)	IOMB 474 [45]	Seawater, the Baltic Sea
Chryseobacterium indoltheticum (**)	KPD 1306	Rainwater
Enterobacter asburiae (**)	KPD 1375	Pond water
Erwinia billingiae (**)	KPD 1325	Rainwater
Klebsiella pneumoniae (*)	IOMB 754	Seawater, the Baltic Sea
Marinomonas sp. (*)	IOMB 406 [45]	Seawater, the Baltic Sea
Micrococcus luteus (**)	KPD 778	Palm surface
Novosphingobium resinovorum (**)	KPD 1310	Rainwater
Ochrobactrum sp. (*)	CCNP0038 [46]	Sediment, the Baltic Sea
Pantoea vagans (**)	KPD 1311	Rainwater
Paracoccus sp. (*)	IOMB 231 [45]	Seawater, the Baltic Sea
Photobacterium sp. (*)	IOMB 384 [45]	Seawater, the Baltic Sea
Pseudomonas chlororaphis (**)	KPD 1374	Pond water
Psychrobacter immobilis (**)	KPD 1363	Fish from Lake Żarnowieckie, Poland
Raoultella terrigena (**)	KPD 1302	Rainwater
Rathayibacter caricis (*)	IOMB 359 [45]	Seawater, the Baltic Sea
Rheinheimera aquamaris (*)	CCNP0045 [46]	Sediment, the Baltic Sea
Serratia liquefaciens (*)	IOMB 517 [45]	Seawater, the Baltic Sea
Shewanella baltica (*)	IOMB 300 [45]	Seawater, the Baltic Sea
Vibrio harveyi (**)	KPD 143	Seawater
Yersinia massiliensis (**)	KPD 1318	Rainwater
ZymoBIOMICS™ Microbial Community DNA Standard
Expected taxon	Accession number	Source of extracted DNA
Bacillus subtilis	B-354	Bacteria
Cryptococcus neoformans	Y-2534	Yeast
Enterococcus faecalis	B-537	Bacteria
Escherichia coli	B-1109	Bacteria
Lactobacillus fermentum	B-1840	Bacteria
Listeria monocytogenes	B-33116	Bacteria
Pseudomonas aeruginosa	B-3509	Bacteria
Saccharomyces cerevisiae	Y-567	Yeast
Salmonella enterica	B-4212	Bacteria
Staphylococcus aureus	B-41012	Bacteria

Notes: * Bacterial strains deposited in the IOMB Strain Collection (Molecular Biology Laboratory, Institute of Oceanology of the Polish Academy of Sciences), Poland, curated by Dr. Ewa Kotlarska. ** Bacterial strains from the Collection of Plasmids and Microorganisms (KPD), Faculty of Biology, University of Gdansk, Poland, curated by Dr. Anna-Karina Kaczorowska.

Table 2. Comparison of the HTS methodology used by commercial laboratories.

Commercial Laboratory	EF	GM	MG
Region of bacterial 16S rRNA	V3–V4	V3–V4	V3–V4
Primers	fwd and rev [47,48] *	341F and 785R [49] **	V3-F and V4-R [50] ***
Type of read	Paired-end	Paired-end	Paired-end
Library protocol	In-house sequencing library preparation protocol	16S metagenomic library preparation guide	16S metagenomic sequencing library preparation part # 15,044,223 Rev. B
Automatic de-multiplexing of raw reads and primary sequence analysis	MiSeq	MiSeq	MiSeq

Notes: * fwd: 5′-TACGGGAGGCAGCAG-3′ and rev: 5′-CCAGGGTATCTAATCC-3′. ** 341F: 5′-CCTACGGGNGGCWGCAG-3′ and 785R: 5′-GACTACHVGGGTATCTAATCC-3′ (nucleotide codes according to the International Union of Pure and Applied Chemistry (IUPAC): N–A/C/G/T, W–A/T, H–A/C/T and V–A/C/G), *** V3-F: 5′-TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGCCTACGGGNGGCWGCAG-3′ and V4-R: 5′-GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGGACTACHVGGGTATCTAATCC-3′ (nucleotide codes according to the International Union of Pure and Applied Chemistry (IUPAC): N–A/C/G/T, W–A/T, H–A/C/T and V–A/C/G).

Table 3. Total sequence frequency per sample obtained from three commercial laboratories: EF, GM and MG. Abbreviations: NA—not applicable.

Samples	Sequence Length	Total Sequences	Not Trimmed	Optimal Trimmed	Equal Trimmed
EF	284–285	88,135	59,933	64,691	66,140
GM	251	103,726	19,513	NA	17,668
MG	301	164,764	6167	56,617	27,059

Table 4. Comparison of HTS results from three commercial laboratories.

Commercial Laboratory	EF	GM	MG
Number of sequences	88,135	103,726	164,764
Length of sequences	forward: 285; reverse: 284	forward and reverse: 251	forward and reverse: 301
%GC	forward: 53.8; reverse: 54.7	forward: 56.1; reverse: 53.6	forward: 54.9; reverse: 54.0
Number of ASVs *	272/351/535/564	137/197/251/265	237/344/509/554
Classification rate summary (at least 97% sequence similarity)—domain level	Bacteria: 66,140; archaea: 0	Bacteria: 17,668; archaea: 0	Bacteria: 27,032; archaea: 27

Note:* Greengenes 515–806/Greengenes full-reference/SILVA 515–806/SILVA full-reference.

Table 5. Classification rate summary (at least 97% sequence similarity) of the 16S rRNA metagenetic library at phylum level based on the following databases: Greengenes 515–806/Greengenes full-reference/SILVA 515–806/SILVA full-reference. Abbreviations: N—not available.

Phylum Names From Databases/Updated According to the work of Oren and Garrity [58]	Commercial Laboratories
	EF	GM	MG
Acidobacteria/ Acidobacteriota corrig. phyl. nov.	112/1231/1227/1231	114/737/748/748	152/925/972/931
Actinobacteria/N	14069/14144/14134/14144	2511/2505/2511/2511	2532/2583/2566/2583
Armatimonadetes/ Armatimonadota corrig. phyl. nov.	8/8/8/8	0/0/0/0	12/12/12/12
Bacteroidetes/ Bacteroidota corrig. phyl. nov.	10764/10764/11164/11146	1624/1624/1744/1742	3833/3827/4160/4126
Caldithrix/ Calditrichota corrig. phyl. nov.	59/93/0/0	0/29/0/0	0/53/0/0
Calditrichaeota/N	0/0/75/93	0/0/29/29	0/0/5362
Chlorobi/ Chlorobiota corrig. phyl. nov.	220/382/0/0	22/116/0/0	24/291/0/0
Chloroflexi/ Chloroflexota corrig. phyl. nov.	396/396/396/396	477/675/595/675	1415/1862/1693/1865
Cyanobacteria/N	398/420/420/420	610/649/649/649	1303/1341/1372/1370
Dadabacteria/N	0/0/0/0	0/0/0/0	0/0/0/5
Deinococcus-Thermus/ Deinococcota corrig. phyl. nov.	0/0/0/0	0/0/0/0	0/0/0/4
Dependentiae/N	0/0/0/82	0/0/0/0	0/0/0/19
Epsilonbacteraeota/N	0/0/328/328	0/0/75/75	0/0/170/170
Fibrobacteres/ Fibrobacterota corrig. phyl. nov.	22/22/22/22	0/0/0/0	15/15/15/15
Firmicutes/N	113/41/41/41	43/30/72/30	172/101/147/101
Fusobacteria/ Fusobacteriota corrig. phyl. nov.	20/20/20/20	12/12/12/12	0/0/0/0
GN02/N	9/9/0/0	0/0/0/0	0/0/0/0
GN04/N	5/5/0/0	0/0/0/0	10/10/0/0
Gemmatimonadetes/ Gemmatimonadota corrig. phyl. nov.	113/302/288/302	16/94/94/94	22/181/181/181
Kiritimatiellaeota/ Kirimitatiellota corrig. phyl. nov.	0/0/15/15	0/0/26/26	0/0/49/51
Latescibacteria/N	0/0/28/41	0/0/4/4	0/0/68/72
Modulibacteria/N	0/0/0/0	0/0/0/0	0/0/6/6
Nitrospinae/ Nitrospinota corrig. phyl. nov.	0/0/0/0	0/0/46/46	0/0/64/64
Nitrospirae/ Nitrospirota corrig. phyl. nov.	172/213/207/213	97/112/112/112	60/92/92/92
Omnitrophicaeota/N	0/0/7/7	0/0/0/0	0/0/0/2
Patescibacteria/N	0/0/9/9	0/0/20/20	0/0/103/108
Planctomycetes/ Planctomycetota corrig. phyl. nov.	61/111/123/111	1606/1651/1627/1651	2604/2997/2854/3000
Poribacteria/N	0/0/0/0	0/0/0/0	0/0/2/2
Proteobacteria/N	37162/37548/37000/37220	8365/8443/8318/8322	10546/10773/10527/10554
Spirochaetes/ Spirochaetota corrig. phyl. nov.	14/14/14/14	0/0/0/0	0/0/0/0
TM6/N	82/82/0/0	0/0/0/0	6/19/0/0
TM7/N	0/0/0/0	0/0/0/0	9/65/0/0
Tenericutes/N	0/0/35/35	0/0/0/0	0/0/38/38
Verrucomicrobia/ Verrucomicrobiota corrig. phyl. nov.	223/242/223/223	821/939/915/913	1281/1565/1514/1514
WS3/N	14/41/0/0	0/4/0/7	17/74/0/14
Zixibacteria/N	0/0/380/5	0/0/71/0	0/0/354/10

Table 6. Putative bacterial endosymbiont. Abbreviations: N—not available; P—present (number of sequences identified).

Putative Bacterial Endosymbiont	Commercial Laboratory
Putative Bacterial Endosymbiont	EF*	GM*	MG*
“Candidatus Xiphinematobacter”	P(28)/N/N/N	N/N/N/N	N/N/N/N
Rickettsiales	P(127)/P(74)/P(16)/P(177)	N/P(14)/N/N	P (13)/P(28)/P(13)/P(13)
“Candidatus Portiera”	N/P(509)/N/N	N/P(107)/N/N	N/P(95)/N/N

Note: * Greengenes 515–806/Greengenes full-reference/SILVA 515–806/SILVA full-reference.

Table 7. Comparison of HTS results for the “mock communities” obtained from three different commercial laboratories.

Bacterial Species Cultured in the Laboratory
Expected Taxon	The Concentration of DNA (ng/µL)	Identified Taxon Using Greengenes/SILVA Databases	The Theoretical Microbial Composition in “Mock Communities” (%)	Results from Three Laboratories (%)
Expected Taxon	The Concentration of DNA (ng/µL)	Identified Taxon Using Greengenes/SILVA Databases		EF	GM	MG
Acinetobacter johnsonii, Bouvet and Grimont 1986	9.7	Acinetobacter johnsonii	2.57	7.90	7.31	5.29
Aeromonas sp., Stanier 1943 (Approved Lists 1980)	9.00	Gammaproteobacteria sp.1/Aeromonas sp.	2.39	2.03	1.59	1.47
Aliivibrio fischeri (previously Vibrio fischeri) (Beijerinck 1889), Urbanczyk et al., 2007 (previously Vibrio fischeri (Beijerinck 1889), Lehmann and Neumann 1896 (Approved Lists 1980))	7.2	Vibrio sp.	1.91	0.00	0.02	0.03
Alteromonas haloplanktis (ZoBell and Upham 1944), Reichelt and Baumann 1973	8.50	Alteromonadaceae sp./Alteromonas sp.	2.25	0.02	0.01	0.01
Chryseobacterium indoltheticum (Campbell and Williams 1951), Vandamme et al., 1994	38.4	Chryseobacterium sp./Chryseobacterium indoltheticum	10.19	33.05	25.62	28.90
Enterobacter asburiae, Brenner et al., 1988	17.1	Enterobacteriaceae sp.1/Enterobacter asburiae	4.54	8.36	8.28	10.16
Erwinia billingiae, Mergaert et al., 1999	11.4	Erwinia sp./Erwinia billingiae	3.02	0.05	0.14	0.33
Klebsiella pneumoniae (Schroeter 1886), Trevisan 1887 (Approved Lists 1980)	23.10	Klebsiella sp./Klebsiella pneumoniae	6.13	0.07	0.30	0.41
Marinomonas sp., Landschoot and De Ley 1984	32.30	Marinomonas sp.	8.57	2.61	5.61	8.10
Micrococcus luteus (Schroeter 1872), Cohn 1872 (Approved Lists 1980)	9.6	Micrococcus luteus	2.55	0.28	2.04	1.13
Novosphingobium resinovorum (Delaporte and Daste 1956), Lim et al., 2007	13.3	Novosphingobium resinovorum	3.53	1.00	2.47	0.47
Ochrobactrum sp., Holmes et al., 1988	4.00	Ochrobactrum sp.	1.06	0.09	0.23	0.05
Pantoea vagans, Brady et al., 2009	12.9	Gammaproteobacteria sp.2/Pantoea vagans	3.42	4.24	3.32	3.61
Paracoccus sp., Davis 1969 (Approved Lists 1980)	6.10	Paracoccus sp.	1.62	0.12	0.47	0.13
Photobacterium sp., Beijerinck 1889 (Approved Lists 1980)	11.40	Photobacterium sp.	3.02	1.03	1.84	1.62
Pseudomonas chlororaphis (Guignard and Sauvageau 1894), Bergey et al., 1930 (Approved Lists 1980)	10.6	Pseudomonas sp./Pseudomonas chlororaphis	2.81	0.07	0.61	0.83
Psychrobacter immobilis, Juni and Heym 1986	14.5	Psychrobacter sp./Psychrobacter immobilis	3.85	13.84	12.52	10.20
Raoultella terrigena (Izard et al., 1981), Drancourt et al., 2001	19.7	Enterobacteriaceae sp.2/Raoultella terrigena	5.23	4.64	3.39	4.34
Rathayibacter caricis, Dorofeeva et al., 2002	7.50	Rathayibacter caricis	1.99	0.00	0.08	0.01
Rheinheimera aquamaris, Yoon et al., 2007	44.70	Rheinheimera sp.	11.86	2.07	2.22	2.76
Serratia liquefaciens (Grimes and Hennerty 1931), Bascomb et al., 1971 (Approved Lists 1980)	13.10	Serratia sp./Serratia liquefaciens	3.47	0.00	0.05	0.06
Shewanella baltica, Ziemke et al., 1998	16.70	Shewanella sp./Shewanella baltica	4.43	3.54	3.73	3.37
Vibrio harveyi (Johnson and Shunk 1936), Baumann et al., 1981	9	Vibrio sp.2/Vibrio harveyi	2.39	2.12	2.00	1.30
Yersinia massiliensis, Merhej et al., 2008	17.2	Yersinia sp./Yersinia massiliensis	4.56	5.54	5.67	6.01
ZymoBIOMICS™ Microbial Community DNA Standard
Expected Taxon *	The Theoretical Concentration of DNA (ng/µL)	Identified Taxon Using the SILVA/Greengenes Databases	The Theoretical Microbial Composition in “Mock Communities” (%)	Results from Three Laboratories (%)
Expected Taxon *	The Theoretical Concentration of DNA (ng/µL)	Identified Taxon Using the SILVA/Greengenes Databases		EF	GM	MG
Bacillus subtilis (Ehrenberg 1835), Cohn 1872 (Approved Lists 1980)	1.2	Bacillus sp./Bacillus subtilis	0.46	0.93	1.48	1.32
Cryptococcus neoformans (San Felice), Vuill. 1901	0.2	n/a	n/a	0.00	0.00	0.00
Enterococcus faecalis (Andrewes and Horder 1906), Schleifer and Kilpper-Bälz 1984	1.2	Enterococcus sp./Enterococcus faecalis	0.26	0.68	1.32	0.62
Escherichia coli (Migula 1895), Castellani and Chalmers 1919 (Approved Lists 1980)	1.2	Escherichia coli	0.27	0.12	0.35	0.36
Lactobacillus fermentum, Beijerinck 1901 (Approved Lists 1980)	1.2	Lactobacillus sp./Lactobacillus fermentum	0.49	0.65	1.27	0.84
Listeria monocytogenes (Murray et al., 1926), Pirie 1940 (Approved Lists 1980)	1.2	Listeria sp./Listeria monocytogenes	0.38	0.51	1.48	1.00
Pseudomonas aeruginosa (Schroeter 1872), Migula 1900 (Approved Lists 1980)	1.2	Pseudomonas sp./Pseudomonas aeruginosa	0.11	0.07	0.11	0.15
Saccharomyces cerevisiae (Desm.), Meyen 1838	0.2	n/a	n/a	0.00	0.00	0.00
Salmonella enterica (ex Kauffmann and Edwards 1952), Le Minor and Popoff 1987	1.2	Salmonella enterica	0.28	3.75	3.33	4.30
Staphylococcus aureus, Rosenbach 1884 (Approved Lists 1980)	1.2	Staphylococcus aureus	0.41	0.63	1.17	0.86

Notes: * Full prokaryotic names with authorities and years were used based on the Approved Lists of Bacterial Names [59] published as The List of Prokaryotic names with Standing in Nomenclature (LPSN) (https://lpsn.dsmz.de/) accessed on 12 October 2022 [60,61]. The full names of the fungi with authorities were obtained from the Index Fungorum webpage (http://www.indexfungorum.org/ accessed on 12 October 2022) [62].

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Mioduchowska, M.; Iglikowska, A.; Jastrzębski, J.P.; Kaczorowska, A.-K.; Kotlarska, E.; Trzebny, A.; Weydmann-Zwolicka, A. Challenges of Comparing Marine Microbiome Community Composition Data Provided by Different Commercial Laboratories and Classification Databases. Water 2022, 14, 3855. https://doi.org/10.3390/w14233855

AMA Style

Mioduchowska M, Iglikowska A, Jastrzębski JP, Kaczorowska A-K, Kotlarska E, Trzebny A, Weydmann-Zwolicka A. Challenges of Comparing Marine Microbiome Community Composition Data Provided by Different Commercial Laboratories and Classification Databases. Water. 2022; 14(23):3855. https://doi.org/10.3390/w14233855

Chicago/Turabian Style

Mioduchowska, Monika, Anna Iglikowska, Jan P. Jastrzębski, Anna-Karina Kaczorowska, Ewa Kotlarska, Artur Trzebny, and Agata Weydmann-Zwolicka. 2022. "Challenges of Comparing Marine Microbiome Community Composition Data Provided by Different Commercial Laboratories and Classification Databases" Water 14, no. 23: 3855. https://doi.org/10.3390/w14233855

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Challenges of Comparing Marine Microbiome Community Composition Data Provided by Different Commercial Laboratories and Classification Databases

Abstract

1. Introduction

2. Materials and Methods

2.1. Sample Collection

2.2. DNA Isolation

2.3. 16S rRNA Amplicon Library Generation

2.4. Taxonomic Classification of the Bacterial 16S rRNA Gene

3. Results

3.1. Overall Statistics, Trimming, and De-Multiplexing with DADA2

3.2. 16S rRNA Community Analysis

3.3. Putative Bacterial Endosymbionts

3.4. Mock Community

4. Discussion

5. Conclusions

Supplementary Materials

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI