This report documents the binning of a Nitrospira Comammox genome from an enrichment reactor (ENR6). See Daims et al., 2015: Complete Nitrification by Nitrospira Bacteria for further details.
The metagenome data is analysed using the mmgenome package.
library("mmgenome")
The Rmarkdown file Load_data.Rmd describes the loading of the data and can be imported using the mmimport
function. However, the preprocessed data can also be downloaded directly from figshare: Daims_ENR6. Hence, here we import the prepocessed data from figshare instead.
load("Daims_ENR6.RData")
The object d
contains information on scaffolds and essential genes within the scaffolds. For each scaffold the dataset contains the following information: The columns ENR4A
, ENR4E
, ENR4F
and ENR6
contain the coverage information from 4 different samples; PC1
, PC2
and PC3
contain coordinates of the three first principal components from a PCA analysis on tetranucleotide frequencies; essential
contain information taxonomic information for each scaffold based on classification on essential genes.
colnames(d$scaffolds)
## [1] "scaffold" "length" "gc" "ENR4A" "ENR4E"
## [6] "ENR4F" "ENR6" "PC1" "PC2" "PC3"
## [11] "essential"
The basic statistics of the full dataset can be summarised using the mmstats
function.
mmstats(d, ncov = 4)
## General Stats
## n.scaffolds 112.00
## GC.mean 63.00
## N50 176895.00
## Length.total 6806891.00
## Length.max 1306334.00
## Length.mean 60775.80
## Coverage.ENR4A 291.73
## Coverage.ENR4E 255.06
## Coverage.ENR4F 469.64
## Coverage.ENR6 283.26
## Ess.total 214.00
## Ess.unique 106.00
The combination of the coverage of sample ENR4A
and ENR4E
provides the cleanest separation of the two genomes and are used for binning. A subspace is defined that clearly seperates the Nitrospira from the Betaproteobacteria.
p <- mmplot(data = d, x = "ENR4A", y = "ENR4E", log.x = T, log.y = T, color = "essential")
#p
#sel <- mmplot_locator(p)
sel <- data.frame(ENR4A = c(322, 394, 919, 805, 394),
ENR4E = c(296, 542, 525, 256, 200))
mmplot_selection(p, sel)
The scaffolds included in the defined subspace are extracted using the mmextract
function.
dA <- mmextract(d, sel)
The mmstats
function applies to any extracted object. Hence, it can be used directly on the subset.
mmstats(dA, ncov = 4)
## General Stats
## n.scaffolds 11.00
## GC.mean 59.20
## N50 558735.00
## Length.total 3291638.00
## Length.max 1306334.00
## Length.mean 299239.80
## Coverage.ENR4A 468.69
## Coverage.ENR4E 347.57
## Coverage.ENR4F 645.88
## Coverage.ENR6 341.64
## Ess.total 108.00
## Ess.unique 105.00
Now that we are happy with the genome bin, the scaffolds can be exported to a separate fasta file using mmexport
. The genome were then further reassembled using additional Oxford Nanopore data.
mmexport(data=dA, assembly=assembly, file = "Nitrospira_ENR6.fa")
Figure ED XX in the supplementary of Daims et al., 2015: Complete Nitrification by Nitrospira Bacteria is shown here for complete reproducibility.
gg_color_hue <- function(n) {
hues = seq(15, 375, length=n+1)
hcl(h=hues, l=65, c=100)[1:n]
}
cols = gg_color_hue(4)
mmplot(data = d,
x = "ENR4A",
y = "ENR6",
log.x = T,
log.y = T,
color = "essential") +
scale_x_log10(limits = c(1,1000), breaks = c(1, 10, 100, 1000)) +
scale_y_log10(limits = c(1,1000), breaks = c(1, 10, 100, 1000)) +
scale_size_area(breaks = c(10000, 50000, 100000, 500000, 1000000), max_size = 20, labels = c(10, 50, 100, 500, 1000), name = "Scaffold Length (Kbp)") +
scale_color_manual(name = "Taxonomy", values = cols[c(3,4)]) +
ylab("Coverage (Sample ENR6)") +
xlab("Coverage (Sample ENR4A)") +
theme(panel.background = element_blank(),
panel.grid.minor = element_blank(),
panel.grid.major = element_blank(),
axis.line = element_line(color = "black"),
axis.ticks = element_line(color = "black"),
axis.text = element_text(color = "black"),
legend.key = element_blank())