This report documents the binning of a Nitrospira genome from the GWW sample. See Daims et al., 2015: Complete Nitrification by Nitrospira Bacteria for further details.
The metagenome data is analysed using the mmgenome package.
library("mmgenome")
The Rmarkdown file Load_data.Rmd describes the loading of the data and can be imported using the mmimport
function. However, the preprocessed data can also be downloaded directly from figshare: Daims_GWW. Hence, here we import the prepocessed data from figshare instead.
load("Daims_GWW.RData")
As seen from the overview plots, there are numerous Nitrospira species. This one is located between multiple other species.
p <- mmplot(data = d,
x = "HPD",
y = "HPF1",
log.x = T,
log.y = T,
color = "essential",
minlength = 1000,
factor.shape = "solid") +
scale_x_log10(limits=c(1, 10)) +
scale_y_log10(limits=c(10, 100))
#p
#sel <- mmplot_locator(p)
sel <- data.frame(HPD = c(3.66, 4.13, 5.11, 6.32, 5.99, 4.51, 3.68),
HPF1 = c(25.8, 32.4, 35.6, 30.1, 23.4, 20.3, 21.4))
mmplot_selection(p, sel)
The scaffolds included in the defined subspace are extracted using the mmextract
function.
dA <- mmextract(d, sel)
The mmstats
function applies to any extracted object. Hence, it can be used directly on the subset.
mmstats(dA, ncov = 2)
## General Stats
## n.scaffolds 982.00
## GC.mean 53.90
## N50 29503.00
## Length.total 13326250.00
## Length.max 176346.00
## Length.mean 13570.50
## Coverage.HPD 5.06
## Coverage.HPF1 27.01
## Ess.total 352.00
## Ess.unique 106.00
mmplot_pairs(data = dA,
variables = c("HPD", "HPF1","gc", "PC1", "PC2", "PC3"),
minlength = 5000,
color = "essential")
We do a another selection to remove other Nitrospira scaffolds that were included.
p <- mmplot(data = dA,
x = "PC1",
y = "PC2",
color = "essential",
minlength = 1000,
highlight = "10801")
#p
#sel <- mmplot_locator(p)
sel <- data.frame(PC1 = c(-0.0512, -0.0367, 0.00971, 0.0898, 0.113, 0.037, -0.0431),
PC2 = c(-0.0888, -0.169, -0.193, -0.171, -0.0713, -0.0472, -0.058))
mmplot_selection(p, sel)
The scaffolds included in the defined subspace are extracted using the mmextract
function.
dB <- mmextract(dA, sel, exclude = "10801")
The mmstats
function applies to any extracted object. Hence, it can be used directly on the subset.
mmstats(dB, ncov = 2)
## General Stats
## n.scaffolds 357.00
## GC.mean 56.40
## N50 41715.00
## Length.total 8101188.00
## Length.max 176346.00
## Length.mean 22692.40
## Coverage.HPD 4.82
## Coverage.HPF1 27.55
## Ess.total 199.00
## Ess.unique 104.00
Finally the binned scaffolds are exported.
mmexport(data = dB, assembly = assembly, file = "GWW_Nitrospira8.fa")