FISABIO Bioinformatics Service - Comparative Taxonomy Pipeline (V4)

Home Introduction Samples summary Taxonomy tables Barplots Krona Alpha diversity
Rarefaction Beta diversity Clustering CCA NMDS Analysis of Variance Paired tests
Methods References          

Covernote: This demo shows the functioning and outputs of Comparative Taxonomy pipeline (Downstream Analysis) from FISABIO Sequencing and Bioinformatics service. Data used belong to already published work and are used only for demonstrative purpose.

Project basename: DemoTaxonomyUpstream

Project advisor email: seqserbioinfo_fisabio@gva.es

Project advisor phone: +34 961 925929


Introduction

The aim of this document is to drive you through the reading of the analysis. The basic idea of the manuscript is to provide a user-friendly approach to 16S rDNA gene microbial analysis.

By clicking on hyperlinks it is possible to navigate among the table of contents, figures, tables and bibliographic references. All statistics have been obtained using The RStatistics software Rstatistics, making use of several Open Source libraries such as gdata, vegan, etc.

All figures can be found in the folder figures and subfolders. All tables are stored in the folder tables_and_biomes. Each file name self-explains the data. Figures are all in PDF (Portable Document Format) format and can be visualized by standard PDF viewer Acrobat or Evince. PDF allows to resize figures without loosing/altering the resolution. This format is easy to be exported or converted in other formats. Moreover you will find also low resolution images in PNG (Portable Network Graphic) format.

Data filtering

  • Main data reported in contingency tables or KRONA diversity representation have been obtained considering the whole dataset including singletons or and under-represented taxa.
  • All statistical representations (with exception of barplots) have been obtained reducing the dataset removing those taxa which were represented by less than 3 sequence.
  • All statistics have been carried filtering out unidentified ud-Archaea and ud-Bacteria which collect reads with Phylum scores <= 0.8. Reads belonging to these category are generally not related to 16S rDNA gene and assigned to the higher taxonomic rank with very low score values.

Tables are formatted by tabs or commas. All tables can be easily imported in almost all spreadsheet softwares.

Some of the files are stored in gzip format, a common Unix compression algorithm and can be unzipped by the command:

 # only unix systems # zipping by: gzip file

# unzipping: gunzip file.gz

Another kind of file which include in just one element folders and subfolders is labelled by the extension tar.gz which means that all the elements are stored in just one file and could be decompressed using the command:

 # only unix systems tar xvzf file.tar.gz

In Microsoft Windows systems, standard decompressing tools should be able to access these kinds of files.

We hope you find this report of help and for any suggestions or bug notification, please contact us by e-mail to seqserbioinfo_fisabio@gva.es.

By clicking on hyperlinks it is possible to navigate among the table of contents, figures, tables and bibliographic references.

For disambiguation of terminology we suggest the read of Marchesi et al. (2015).

For those analyses based on PCR reaction, we also suggest the reading of Gonzalez et al. (2012).


Main data summary

Hereafter some main project data and metadata used for the analysis.

Project base name: DemoV4

Project advisor: FISABIO Sequencing and Bioinformatics Service

Project advisor email: seqserbioinfo_fisabio@gva.es

Numeric factor(s) used:

Categorical factor(s) used: CMD.Type, CMD.Fraction, CMD.SampleID

Following table summarize the number of reads considered for each sample and its taxon distribution.

Samples summary table
Sample NReads MeanLength SDLength
AE-1 1108 490.00 95.10
AE-2 2782 481.18 112.18
AE-3 1343 480.93 112.75
AE-4 2952 474.37 117.16
AQ-1 1349 470.96 127.20
AQ-2 697 476.42 120.84
AQ-3 1172 479.72 107.16
AQ-4 1993 449.00 143.65
AW-1 3135 492.78 96.58
AW-2 2148 495.07 83.96
AW-3 1948 490.04 92.34
AW-4 3241 474.81 118.54
BR-1 2084 490.57 99.12
BR-2 3321 482.58 112.46
BR-3 57 342.93 196.08
BR-4 1837 410.74 173.01
BT-1 830 489.67 100.21
BT-2 3727 493.28 94.24
BT-3 60 353.77 196.29
BT-4 1636 454.02 140.68

Taxonomy reads distribution tables

This section provides taxonomy tables calculated at various taxonomic ranks: as Phylum, Class, Order, Family, Genus and Species.

NB: Species taxonomic rank is provided only if greengen database is chosen.

Only reads longer than 50nt have been considered.

Contingency and proportion tables are stored in tables and biomes main folder.

Phylum contingency table
Taxon AE.1 AE.2 AE.3 AE.4 AQ.1 AQ.2 AQ.3 AQ.4 AW.1 AW.2 AW.3 AW.4 BR.1 BR.2 BR.3 BR.4 BT.1 ...
Archaea;ud-Archaea 3 18 1 5 4 0 1 12 6 7 1 12 3 7 0 17 1 1
Bacteria;Actinobacteria 6 138 26 34 8 34 3 22 12 91 19 26 1 1 0 27 0 ...
Bacteria;Bacteroidetes 55 229 16 54 110 37 10 17 262 81 21 45 841 182 0 132 367 ...
Bacteria;Cyanobacteria/Chloroplast 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ...
Bacteria;Deinococcus-Thermus 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 ...
Bacteria;Firmicutes 997 2058 1202 2710 1139 469 1104 1750 2751 1769 1834 2978 1122 2913 37 1372 434 ...
Bacteria;Fusobacteria 0 18 0 0 0 2 0 0 1 6 0 0 0 0 0 0 0 ...
Bacteria;Lentisphaerae 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 ...
Bacteria;Proteobacteria 19 161 4 5 10 109 1 4 11 115 1 10 63 20 0 2 7 ...
Bacteria;Synergistetes 0 0 0 0 0 0 0 0 0 0 0 0 0 13 0 0 0 ...
Bacteria;ud-Bacteria 26 153 94 144 78 45 53 188 91 73 72 170 54 184 20 287 21 ...
Bacteria;Verrucomicrobia 0 7 0 0 0td> 1 0 0 0 6 0 0 0 0 0 0 0 ...

 


NB: Family and Genus contingency tables are similarly provided but omitted in this demo.


Phylum proportion table
Taxon AE.1 AE.2 AE.3 AE.4 AQ.1 AQ.2 AQ.3 AQ.4 AW.1 AW.2 AW.3 AW.4 BR.1 BR.2 ...
Archaea;ud-Archaea 0.271 0.647 0.074 0.169 0.297 0.000 0.085 0.602 0.191 0.326 0.051 0.370 0.144 0.211 ...
Bacteria;Actinobacteria 0.542 4.960 1.936 1.152 0.593 4.878 0.256 1.104 0.383 4.236 0.975 0.802 0.048 0.030 ...
Bacteria;Bacteroidetes 4.964 8.231 1.191 1.829 8.154 5.308 0.853 0.853 8.357 3.771 1.078 1.388 40.355 5.480 ...
Bacteria;Cyanobacteria/Chloroplast 0.181 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 ...
Bacteria;Deinococcus-Thermus 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.032 0.000 0.000 0.000 0.000 0.000 ...
Bacteria;Firmicutes 89.982 73.976 89.501 91.802 84.433 67.288 94.198 87.807 87.751 82.356 94.148 91.885 53.839 87.715 ...
Bacteria;Fusobacteria 0.000 0.647 0.000 0.000 0.000 0.287 0.000 0.000 0.032 0.279 0.000 0.000 0.000 0.000 ...
Bacteria;Lentisphaerae 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.030 ...
Bacteria;Proteobacteria 1.715 5.787 0.298 0.169 0.741 15.638 0.085 0.201 0.351 5.354 0.051 0.309 3.023 0.602 ...
Bacteria;Synergistetes 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.391 ...
Bacteria;ud-Bacteria 2.347 5.500 6.999 4.878 5.782 6.456 4.522 9.433 2.903 3.399 3.696 5.245 2.591 5.540 ...
Bacteria;Verrucomicrobia 0.000 0.252 0.000 0.000 0.000 0.143 0.000 0.000 0.000 0.279 0.000 0.000 0.000 0.000 ...

NB: Family and Genus proportion tables are similarly provided but omitted in this demo.


Barplots

In this section accumulative barplots of reads distributions are shown. Columns are ordered by metadata and samples. In such a way it is possible to visualize different profiles according to the factor used for ordering columns.

Only taxa represented more than 1% are reported.

Factor Barplot
CMD.Type

CMD.Fraction
CMD.SampleID


KronaViewer representation

The following link open a web page where all diversity is summarised using KRONA tool. Taxonomic annotation tables are parsed to this tool and an interactive viewer will show taxa distributions by samples. The tool allows browsing and zooming within each taxonomic rank. Images could be exported by clicking on Snapshot button for further uses. If you use these images, please cite Ondov, et al. (2011).


Alpha diversity

This section describes diversity indexes characterizing every sample at Phylum, Family and Genus taxonomy ranks levels. All tables can be found in the Diversity folder. Columns report for each sample:

  • Diversity estimators
  • Shannon Index
  • Simpson Index
  • invSimpson
  • fisherAlpha
  • Richness estimators
  • Number of observations
  • Chao1
  • Chao1standard error
  • ACE
  • ACE standard error
Alpha diversity at Phylum level
  Shannon Simpson invSimpson fisherAlpha OBS CHAO1 CHAO1.SE ACE ACE.SE
AE-1 0.46 0.19 1.23 1.00 7 7 0.00 7.00 1.31
AE-2 0.98 0.44 1.78 1.01 8 8 0.00 8.00 0.94
AE-3 0.44 0.19 1.24 0.81 6 6 0.46 7.12 0.98
AE-4 0.37 0.15 1.18 0.72 6 6 0.00 6.00 1.15
AQ-1 0.60 0.28 1.38 0.81 6 6 0.00 6.00 1.22
AQ-2 1.06 0.51 2.06 1.08 7 7 0.23 8.00 0.97
AQ-3 0.26 0.11 1.12 0.83 6 7 2.22 9.18 1.72
AQ-4 0.47 0.22 1.28 0.76 6 6 0.00 6.00 0.91
AW-1 0.48 0.22 1.29 0.99 8 9 2.25 12.05 1.34
AW-2 0.74 0.31 1.46 1.05 8 8 0.00 8.00 1.37
AW-3 0.28 0.11 1.13 0.77 6 7 2.22 NA NA
AW-4 0.37 0.15 1.18 0.71 6 6 0.00 6.00 0.91
BR-1 0.91 0.55 2.20 0.76 6 6 0.46 7.11 0.99
BR-2 0.50 0.22 1.29 0.98 8 9 2.25 12.07 1.45
BR-3 0.65 0.46 1.84 0.40 2 2 0.00 NA NA
BR-4 0.81 0.41 1.70 0.77 6 6 0.00 6.00 0.91
BT-1 0.84 0.53 2.13 0.71 5 5 0.45 6.10 1.12
BT-2 0.48 0.22 1.29 0.83 7 7 0.23 8.11 1.40
BT-3 0.71 0.41 1.70 0.66 3 3 0.00 3.00 0.82
BT-4 0.66 0.31 1.46 0.79 6 6 0.00 6.00 1.15

NB: Family and Genus Alpha diversity tables are similarly provided but omitted in this demo.


Rarefaction analysis

Rarefaction analysis permits estimating the overall diversity covered by the obtained squences. Next picture shows the Rarefaction curves calculated for each sample.

Rarefaction calculated at Family level

General rarefaction graphs, The figures represent the number of sequences (x axis) versus the number of Families (y axis).

Global rarefaction curves

Phylum Family Genus

Rarefaction curves grouped by metadata.

group Phylum Family Genus
CMD.Type-A
CMD.Type-B

etc…


Beta Diversity

Jaccard and Sorensen indexes.

Phylum

Jaccard and Sorensen distances boxplot for each group of samples. Below the figures distance matrices and paired statistics tables are also provided.

  IndexesBoxplots
Phylum-CMD.Type
Phylum-CMD.Fraction
Phylum-CMD.SampleID

 

Following tables shows paired statistics tests. Only associations with p-values <=0.05 are reported.

  metadata level1 level2 index mean1 mean2 wilcoxon.pr
1 CMD.Type A B Sorensen 0.8059905 0.5813109 0.0000084
14 CMD.SampleID 1 4 Sorensen 0.6588802 0.8000000 0.0002521
16 CMD.SampleID 2 4 Sorensen 0.6472727 0.8000000 0.0032936
17 CMD.SampleID 3 4 Sorensen 0.5120000 0.8000000 0.0096629
18 CMD.Type A B Jaccard 0.7318783 0.4838542 0.0000084
31 CMD.SampleID 1 4 Jaccard 0.5683810 0.8000000 0.0002521
33 CMD.SampleID 2 4 Jaccard 0.5638095 0.8000000 0.0032936
34 CMD.SampleID 3 4 Jaccard 0.4400000 0.8000000 0.0096629

NB: Family and Genus Beta diversity data are similarly provided but omitted in this demo.


Clustering

In this section various cluster analyses are reported. A general clustering involving all samples initially reported. Clustering for each sample belonging to a group of metadata indicated by the user through the metadata file by CMD_Type, CMD_Fraction, CMD_SampleID categorical factors is carried out and figures are stored in the figures folder autodefined by the proper figure names.

NB: When running through metadata factors, if the number of samples is less than 3 clustering can not be calculated and will be skipped.

All clusterings have been obtained using distance matrices for families and genera. 1000 bootstraps have been applied to perform cluster statistics.

The dendrograms reports AU and BP scores (p-vlaues). AU p-value (printed in red color in default) is the abbreviation of "approximately unbiased" p-value, which is calculated by multiscale bootstrap resampling. BP value (printed in green color in default) is "bootstrap probability" value, which is less accurate than AU value as p-value. One can consider that clusters (edges) with high AU values (e.g. 95%) are strongly supported by data .

Red frames reports approximatively unbiased p-values greather than 95.

Phylum Families Genera

Clustering by factors

Following pictures show again clustering based on factors samples subgroups.

Goup Phylum Family Genus
CMD.TypeA
CMD.TypeB

etc…


NB: Family and Genus Clustering plots are similarly provided but omitted in this demo.


CCA

Canonical Correspondence Analysis

This section provides Canonical Correspondence Analysis (CCA) explaining taxonomical data trough environmental factors. The functions are based on Legendre algorithm (Legendre, et al., 1999): in CCA Chi-square transformed data matrix is submitted to weighted linear regression on constraining variables, and the fitted values are submitted to correspondence analysis performed via singular value decomposition (svd).

Global CCA

Phylum Family Genus

CCA by factors

Following pictures show again CCA considering also factors-based ordination.

Factor

Phylum

Family Genus
CMD.Type
CMD.Fraction

etc...


NMDS

Non Metric Multidimensional Scaling is applied as ordination method to reduce dimensions of distance matrices of samples or groups of samples. Distance matrices have been obtained using Bray-Curtis dissimilarity.

Factor Phylum Family Genus
CMD.Type
CMD.Fraction
CMD.SampleID

Analysis of variance

The analysis of variance is performed using the adonis function (Permutational Multivariate Analysis of Variance Using Distance Matrices) from the R vegdist package which performs partitioning sums of squares using semi-metric and metric distance matrices. We have applied ‘Bray-Curtis' distance. This analysis is analogous to MANOVA (multivariate analysis of variance). M.J. Anderson refers to the method as ‘permutational MANOVA' (formerly ‘nonparametric MANOVA') (Anderson, 2001). Further, as its inputs are linear predictors, and a response matrix of an arbitrary number of columns (2 to millions), it is a robust alternative to both parametric MANOVA and to ordination methods for describing how variation is attributed to different experimental treatments or uncontrolled covariates. It is also analogous to redundancy analysis (Legendre, et al., 1999).

Phylum

ADONIS analysis carried out at Phylum level
  Df SumsOfSqs MeanSqs F.Model R2 Pr..F.
CMD.Type 1 0.0514 0.0514 2.7380 0.1320 0.0889
Residuals 18 0.3378 0.0188 NA 0.8680 NA
Total 19 0.3892 NA NA 1.0000 NA
CMD.Fraction 4 0.0531 0.0133 0.5929 0.1365 0.8272
Residuals1 15 0.3361 0.0224 NA 0.8635 NA
Total1 19 0.3892 NA NA 1.0000 NA
CMD.SampleID 3 0.1780 0.0593 4.4941 0.4573 0.0020
Residuals2 16 0.2112 0.0132 NA 0.5427 NA
Total2 19 0.3892 NA NA 1.0000 NA

NB: Family and Genus ADONIS tables are similarly provided but omitted in this demo.

Paired tests

Following  tables show wilcoxon test between groups for each factors. Only associations with p-values <= 0.05 are reported.

Phylum

  metadata level1 level2 taxa mean1 mean2 wilcoxon.pr
1 CMD.Type A B Bacteria;Actinobacteria 1.9281547 0.4856277 0.0149354
99 CMD.SampleID 1 3 Bacteria;Bacteroidetes 21.8628023 0.6593011 0.0119252
102 CMD.SampleID 1 3 Bacteria;Firmicutes 76.3832742 98.5750400 0.0119252
104 CMD.SampleID 1 3 Bacteria;Proteobacteria 1.3773228 0.0926583 0.0119252
105 CMD.SampleID 1 4 Bacteria;Actinobacteria 0.3263628 1.3743634 0.0121858
106 CMD.SampleID 1 4 Bacteria;Bacteroidetes 21.8628023 4.0612483 0.0367139
111 CMD.SampleID 1 4 Bacteria;Proteobacteria 1.3773228 0.2391984 0.0121858
113 CMD.SampleID 2 3 Bacteria;Bacteroidetes 6.5713870 0.6593011 0.0119252
114 CMD.SampleID 2 3 Bacteria;Firmicutes 83.9865679 98.5750400 0.0119252
117 CMD.SampleID 2 3 Bacteria;Proteobacteria 5.9110708 0.0926583 0.0119252
125 CMD.SampleID 2 4 Bacteria;Proteobacteria 5.9110708 0.2391984 0.0121858
129 CMD.SampleID 3 4 Bacteria;Bacteroidetes 0.6593011 4.0612483 0.0361451

NB: Family and Genus Paired tests tables are similarly provided but omitted in this demo.

Other methods

Library preparation

16S rDNA gene amplicons were amplified following the 16S rDNA gene Metagenomic Sequencing Library Preparation Illumina protocol. The gene-specific sequences used in this protocol target the 16S rDNA gene V3-V4 region. Illumina adapter overhang nucleotide sequences are added to the gene-specific sequences. The primers are selected from the Klindworth et al. publication (Klindworth, et al., 2013). The full length primer sequences, using standard IUPAC nucleotide nomenclature, to follow the protocol targeting this region are:

16S rDNA gene Amplicon PCR Forward Primer = 5'
TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGCCTACGGGNGGCWGCAG

16S rDNA gene Amplicon PCR Reverse Primer = 5'
GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGGACTACHVGGGTATCTAATCC

We used microbial genomic DNA (5 ng/ul in 10 mM Tris pH 8.5) to initiate the protocol. After 16S rDNA gene amplification, the mutiplexing step was performed using Nextera XT Index Kit (FC-131-1096). We run 1 ul of the PCR product on a Bioanalyzer DNA 1000 chip to verify the size, the expected size on a Bioanalyzer trace is 550bp. After size verification the libraries were sequenced using a 2x300pb paired-end run (MiSeq Reagent kit v3 (MS-102-3001)) on a MiSeq Sequencer according to manufacturer's instructions (Illumina).


Quality assessment

Quality assessment was performed by the use of prinseq-lite program (Schmieder, et al., 2011) applying following parameters:

  • min_length: 50
  • trim_qual_right: 30
  • trim_qual_type: mean
  • trim_qual_window: 10

R1 and R2 from Illumina sequencing where joined using fastq-join from ea-tools suite (Aronesty, 2011)


Bioinformatic analysis

Data have been obtained using an ad-hoc pipeline written in R Rstatistics language (R Core Team, 2012), making use of several Open Source libraries such as gdata, vegan, etc. Data have been grouped and stratified according to the metadata file provided by the user.

All sequences have been maintained in one table. For each read, sequence length, taxonomic annotation, cluster of membership and user defined metadata, have been compiled.


Taxonomic annotation

Taxonomic affiliations have been assigned using the RDP_classifier from the Ribosomal Database Project (Cole, et al., 2009). All data have been tabulated. Reads with RDP score value below 0.8 have been assigned to the upper taxonomic rank leaving last rank as unidentified (e.g. Firmicutes, Bacillales, Bacillaceae, Unidentified Bacillaceae).


R packages

The pipeline core runs by R Rstatistics language (R Core Team, 2012). All calculation and statistics have been carried out within this environment. Here a list of the used packages and their references.

  • lattice: graphics for R (Sarkar, 2008).
  • knitr, knitcitations, markdown: report and reference environment (Xie, 2014; Boettiger, 2014; Allaire, et al., 2014).
  • Bioconductor packages for genomics data: Biostrings.

References

Allaire J, et al. markdown: Markdown rendering for R. R package version 0.7.4. 2014. URL: http://CRAN.R-project.org/package=markdown.

M. J. Anderson. "A new method for non-parametric multivariate analysis of variance". In: Austral Ecology 26.1 (2001), pp. 32–46. ISSN: 14429985. DOI: 10.1046/j.1442-9993.2001.01070.x. URL: http://doi.wiley.com/10.1046/j.1442-9993.2001.01070.x.

E. Aronesty. ea-utils: Command-line tools for processing biological sequencing data. ea-utils: FASTQ processing utilities. 2011. URL: http://code.google.com/p/ea-utils.

C. Boettiger. knitcitations: Citations for knitr markdown files. R package version 1.0.5. 2014. URL: http://CRAN.R-project.org/package=knitcitations.

Cole JR, Wang Q, Cardenas E, Fish J, Chai B, Farris RJ, Kulam-Syed-Mohideen AS, McGarrell DM, Marsh T, Garrity GM, Tiedje JM. The Ribosomal Database Project: improved alignments and new tools for rRNA analysis. Nucleic Acids Res. 2009 Jan;37(Database issue):D141-5. doi: 10.1093/nar/gkn879. Epub 2008 Nov 12. [PubMed Central:PMC2686447] [DOI:10.1093/nar/gkn879] [PubMed:19004872] , pp. D141–145.

Klindworth A, Pruesse E, Schweer T, Peplies J, Quast C, Horn M, Glöckner FO. Evaluation of general 16S ribosomal RNA gene PCR primers for classical and next-generation sequencing-based diversity studies. Nucleic Acids Res. 2013 Jan 7;41(1):e1. doi: 10.1093/nar/gks808. Epub 2012 Aug 28 [PubMed Central:PMC3592464] [DOI:10.1093/nar/gks808] [PubMed:22933715] , p. e1.

Legendre, P, et al. (1999) "DISTANCE-BASED REDUNDANCY ANALYSIS : TESTING MULTISPECIES RESPONSES IN MULTIFACTORIAL ECOLOGICAL EXPERIMENTS". In: Ecological Monographs 69.1, pp. 1–24. ISSN: 00129615. DOI: 10.2307/2657192. URL: http://www.jstor.org/stable/2657192?origin=crossref.

Ondov BD, Bergman NH, Phillippy AM. Interactive metagenomic visualization in a Web browser. BMC Bioinformatics. 2011 Sep 30;12:385. doi: 10.1186/1471-2105-12-385. [PubMed Central:PMC4520061] [DOI:10.1186/s40168-015-0094-5] [PubMed:26229597] 12:385.

Marchesi JR, Ravel J. The vocabulary of microbiome research: a proposal.Microbiome. 2015 Jul 30;3:31. [PubMed Central:PMC4520061] [DOI:10.1186/s40168-015-0094-5] [PubMed:26229597] , p. 31.

R Core Team. R: A Language and Environment for Statistical Computing. ISBN 3-900051-07-0. R Foundation for Statistical Computing. Vienna, Austria, 2012. URL: http://www.R-project.org/.

Sarkar D. Lattice: Multivariate Data Visualization with R. ISBN 978-0-387-75968-5. New York: Springer, 2008. URL: http://lmdvr.r-forge.r-project.org.

Schmieder R, Edwards R. Quality control and preprocessing of metagenomic datasets. Bioinformatics. 2011 Mar 15;27(6):863-4. . [PubMed Central:PMC3051327] [DOI:10.1093/bioinformatics/btr026] [PubMed:21278185] , pp. 863–864.

Xie Y. Dynamic Documents with R and knitr. ISBN 978-1482203530. Chapman and Hall/CRC, 2014. URL: http://yihui.name/knitr/.