View on GitHub


Recovery of complete genomes from metagenomes

Download this project as a .zip file Download this project as a tar.gz file

This project contains scripts and tutorials on how to assemble individual microbial genomes from metagenomes, as described in:

Genome sequences of rare, uncultured bacteria obtained by differential coverage binning of multiple metagenomes

Mads Albertsen, Philip Hugenholtz, Adam Skarshewski, Gene W. Tyson, Kåre L. Nielsen and Per .H. Nielsen

Nature Biotechnology 2013, doi: 10.1038/nbt.2579

What is differential coverage binning?

Differential coverage binning refers to the use of the abundance of the bacteria in the samples as the primary method of extracting genomes from metagenomes. This is compared to many other binning methods where sequence composition (e.g. tetranucleotide patterns) is used for binning, which is hampered by local sequence deviations within genomes.

The great advantage of using abundance is that it allows the use of abundance estimates from multiple samples, thereby increasing the binning resolution greatly. Given that sequencing prices continiues to drop it is already cheaper to generate data than to analyse it.

Step-by-step guide

The guide covers all aspects, from which samples that can be used, to binning, finishing and validation of the extracted genomes. Rstudio (a powerfull IDE to R) is used as the main tool for data handling as it allows integration of all relevant data, which is key for dealing with large and complex datasets. The guide to binning in R is also available in R markdown format here, which allows direct recreation of all figures in the guide using the original data from the paper.

In the overview section a short description of the workflow is given, along with a detailed workflow figure that summarise the different steps in the process of assembling individual genomes from metagenomes.

In addition to the online guide a PDF version of the original published guide is available here.

Data availability

If you want to reanalyze the data used in this study, the raw fastq reads can be obtained from NCBI SRA: SRX206471 (HP+) and SRX247688 (HP-). The assembled contigs can be obtained from NCBI GenBank under accesion number APMI01000000.

In addition, all processed data ready for binning in R can be obtained here.