This report documents the binning of a Nitrospira Comammox genome from an enrichment reactor (ENR4).See Daims et al., 2015: Complete Nitrification by Nitrospira Bacteria for further details.
In case you haven’t installed the mmgenome package, see the Load data example.
library("mmgenome")
The Rmarkdown file Load_data.Rmd describes the loading of the data and can be imported using the mmimport
function. However, the preprocessed data can also be downloaded directly from figshare: Daims_ENR4. Hence, here we import the prepocessed data from figshare instead.
load("Daims_ENR4.RData")
The object d
contains information on scaffolds and essential genes within the scaffolds. For each scaffold the dataset contains the following information: The columns ENR4A
, ENR4E
, ENR4F
and ENR6
contain the coverage information from 4 different samples; PC1
, PC2
and PC3
contain coordinates of the three first principal components from a PCA analysis on tetranucleotide frequencies; essential
contain information taxonomic information for each scaffold based on classification on essential genes; rRNA
contain taxonomic information on scaffolds that have an associated 16S rRNA gene.
colnames(d$scaffolds)
## [1] "scaffold" "length" "gc" "ENR4A" "ENR4E"
## [6] "ENR4F" "ENR6" "PC1" "PC2" "PC3"
## [11] "essential" "rRNA16S"
The basic statistics of the full dataset can be summarised using the mmstats
function.
mmstats(d, ncov = 4)
## General Stats
## n.scaffolds 161.00
## GC.mean 65.60
## N50 198700.00
## Length.total 12846008.00
## Length.max 1306334.00
## Length.mean 79788.90
## Coverage.ENR4A 157.66
## Coverage.ENR4E 139.64
## Coverage.ENR4F 255.62
## Coverage.ENR6 150.12
## Ess.total 419.00
## Ess.unique 107.00
The combination of the coverage of sample ENR4A
and ENR4E
provides the cleanest separation of the two genomes and are used for binning. A subspace is defined that clearly seperates the Nitrospira from the three other species.
p <- mmplot(data = d, x = "ENR4A", y = "ENR4E", log.x = T, log.y = T, color = "essential")
#p
#sel <- mmplot_locator(p)
sel <- data.frame(ENR4A = c(322, 394, 919, 805, 394),
ENR4E = c(296, 542, 525, 256, 200))
mmplot_selection(p, sel)
The scaffolds included in the defined subspace are extracted using the mmextract
function.
dA <- mmextract(d, sel)
The mmstats
function applies to any extracted object. Hence, it can be used directly on the subset.
mmstats(dA, ncov = 4)
## General Stats
## n.scaffolds 10.00
## GC.mean 59.20
## N50 893253.00
## Length.total 3289576.00
## Length.max 1306334.00
## Length.mean 328957.60
## Coverage.ENR4A 469.08
## Coverage.ENR4E 347.90
## Coverage.ENR4F 646.77
## Coverage.ENR6 341.68
## Ess.total 108.00
## Ess.unique 105.00
Now that we are happy with the genome bin, the scaffolds can be exported to a separate fasta file using mmexport
. The genome were then further reassembled using additional Oxford Nanopore data.
mmexport(data=dA, assembly=assembly, file = "Nitrospira_ENR4.fa")
Figure ED XX in the supplementary of Daims et al., 2015: Complete Nitrification by Nitrospira Bacteria is shown here for complete reproducibility.
mmplot(data = d,
x = "ENR4A",
y = "ENR4E",
log.x = T,
log.y = T,
color = "essential") +
scale_x_log10(limits = c(1,1000), breaks = c(1, 10, 100, 1000)) +
scale_y_log10(limits = c(1,1000), breaks = c(1, 10, 100, 1000)) +
scale_size_area(breaks = c(10000, 50000, 100000, 500000, 1000000), max_size = 20, labels = c(10, 50, 100, 500, 1000), name = "Scaffold Length (Kbp)") +
scale_color_discrete(name = "Taxonomy") +
xlab("Coverage (Sample ENR4A)") +
ylab("Coverage (Sample ENR4E)") +
theme(panel.background = element_blank(),
panel.grid.minor = element_blank(),
panel.grid.major = element_blank(),
axis.line = element_line(color = "black"),
axis.ticks = element_line(color = "black"),
axis.text = element_text(color = "black"),
legend.key = element_blank())