This report documents the binning of a Nitrospira genome from the GWW sample. See Daims et al., 2015: Complete Nitrification by Nitrospira Bacteria for further details.
The metagenome data is analysed using the mmgenome package.
library("mmgenome")
The Rmarkdown file Load_data.Rmd describes the loading of the data and can be imported using the mmimport
function. However, the preprocessed data can also be downloaded directly from figshare: Daims_GWW. Hence, here we import the prepocessed data from figshare instead.
load("Daims_GWW.RData")
This Nitrospira seem to assemble very nicely. Hence, additional effort is done to obtain a clean genome bin.
p <- mmplot(data = d,
x = "HPD",
y = "HPF1",
log.x = F,
log.y = F,
color = "essential",
minlength = 1000,
factor.shape = "solid") +
xlim(10, 50) +
ylim(1, 7.5)
#p
#sel <- mmplot_locator(p)
sel <- data.frame(HPD = c(26.8, 27.5, 30.3, 38.9, 39.6, 35.8, 26.3),
HPF1 = c(3.51, 2.85, 2.81, 3.72, 4.76, 5.9, 4.74))
mmplot_selection(p, sel)
The scaffolds included in the defined subspace are extracted using the mmextract
function. One scaffold is excluded using network plots.
dA <- mmextract(d, sel, exclude = "47021")
The mmstats
function applies to any extracted object. Hence, it can be used directly on the subset.
mmstats(dA, ncov = 2)
## General Stats
## n.scaffolds 49.00
## GC.mean 55.80
## N50 155638.00
## Length.total 3443025.00
## Length.max 320146.00
## Length.mean 70265.80
## Coverage.HPD 33.75
## Coverage.HPF1 4.15
## Ess.total 91.00
## Ess.unique 90.00
Unsing PE reads.
dB <- mmextract_network(subset = dA,
original = d,
network = pe,
nconnections = 1,
type = "direct")
… and then plot the new subset.
mmplot_network(data = dB,
network = pe,
nconnections = 1,
color = "essential",
scale.links = 0.5)
We do a final selection to remove other Nitrospira scaffolds that were included.
p <- mmplot(data = dB,
x = "HPD",
y = "HPF1",
log.x = T,
log.y = T,
color = "essential",
minlength = 1000)
#p
#sel <- mmplot_locator(p)
sel <- data.frame(HPD = c(19.3, 21, 90.2, 161, 169, 21.6),
HPF1 = c(2.42, 4.38, 21.2, 23.2, 6.76, 1.56))
mmplot_selection(p, sel)
The scaffolds included in the defined subspace are extracted using the mmextract
function.
dC <- mmextract(dB, sel)
The mmstats
function applies to any extracted object. Hence, it can be used directly on the subset.
mmstats(dC, ncov = 2)
## General Stats
## n.scaffolds 66.00
## GC.mean 55.80
## N50 155638.00
## Length.total 3487713.00
## Length.max 320146.00
## Length.mean 52844.10
## Coverage.HPD 33.77
## Coverage.HPF1 4.15
## Ess.total 106.00
## Ess.unique 99.00
Finally the binned scaffolds are exported.
mmexport(data = dC, assembly = assembly, file = "GWW_Nitrospira3.fa")