This report documents the binning of a Nitrospira genome from the GWW sample. See Daims et al., 2015: Complete Nitrification by Nitrospira Bacteria for further details.
The metagenome data is analysed using the mmgenome package.
library("mmgenome")
The Rmarkdown file Load_data.Rmd describes the loading of the data and can be imported using the mmimport
function. However, the preprocessed data can also be downloaded directly from figshare: Daims_GWW. Hence, here we import the prepocessed data from figshare instead.
load("Daims_GWW.RData")
As seen from the overview plots, there are numerous Nitrospira species. This Nitrospira seem to be relatively easy seperated from other genomes.
p <- mmplot(data = d,
x = "HPD",
y = "HPF1",
log.x = T,
log.y = T,
color = "essential",
minlength = 1000,
factor.shape = "solid") +
xlim(1, 100) +
ylim(25, 150)
#p
#sel <- mmplot_locator(p)
sel <- data.frame(HPD = c(32.9, 34, 41.2, 49.1, 58.1, 56.8, 45.7, 39.5),
HPF1 = c(62.1, 56.5, 58.4, 65.8, 85.5, 98.8, 90.5, 75.4))
mmplot_selection(p, sel)
The scaffolds included in the defined subspace are extracted using the mmextract
function.
dA <- mmextract(d, sel)
The mmstats
function applies to any extracted object. Hence, it can be used directly on the subset.
mmstats(dA, ncov = 2)
## General Stats
## n.scaffolds 537.00
## GC.mean 55.80
## N50 6542.00
## Length.total 2445522.00
## Length.max 35221.00
## Length.mean 4554.00
## Coverage.HPD 44.91
## Coverage.HPF1 74.16
## Ess.total 74.00
## Ess.unique 73.00
dB <- mmextract_network(subset = dA,
original = d,
network = pe,
nconnections = 1,
type = "direct")
We do a another selection to remove other Nitrospira scaffolds that were included.
p <- mmplot(data = dB,
x = "PC1",
y = "PC2",
color = "essential",
minlength = 1000)
#p
#sel <- mmplot_locator(p)
sel <- data.frame(PC1 = c(-0.0513, -0.0556, 0.00595, 0.0954, 0.125, 0.107, 0.000222),
PC2 = c(-0.0819, -0.152, -0.202, -0.203, -0.112, -0.0362, -0.0329))
mmplot_selection(p, sel)
The scaffolds included in the defined subspace are extracted using the mmextract
function.
dC <- mmextract(dB, sel)
The mmstats
function applies to any extracted object. Hence, it can be used directly on the subset.
mmstats(dC, ncov = 2)
## General Stats
## n.scaffolds 1585.00
## GC.mean 55.70
## N50 5807.00
## Length.total 6225312.00
## Length.max 176346.00
## Length.mean 3927.60
## Coverage.HPD 27.13
## Coverage.HPF1 60.34
## Ess.total 202.00
## Ess.unique 99.00
We do a another selection to remove other Nitrospira scaffolds that were included.
p <- mmplot(data = dC,
x = "HPD",
y = "HPF1",
log.x = T,
log.y = T,
color = "essential",
minlength = 1000,
highlight = dA) +
scale_x_log10(limits=c(10,500)) +
scale_y_log10(limits=c(10,500))
#p
#sel <- mmplot_locator(p)
sel <- data.frame(HPD = c(25.4, 25, 46.3, 153, 198, 98.3, 46, 33.2),
HPF1 = c(44, 53.1, 110, 303, 289, 97.6, 49.5, 41.8))
mmplot_selection(p, sel)
The scaffolds included in the defined subspace are extracted using the mmextract
function.
dD <- mmextract(dC, sel)
The mmstats
function applies to any extracted object. Hence, it can be used directly on the subset.
mmstats(dD, ncov = 2)
## General Stats
## n.scaffolds 637.00
## GC.mean 55.80
## N50 6223.00
## Length.total 2706790.00
## Length.max 35221.00
## Length.mean 4249.30
## Coverage.HPD 44.11
## Coverage.HPF1 72.84
## Ess.total 83.00
## Ess.unique 82.00
Finally the binned scaffolds are exported.
mmexport(data = dD, assembly = assembly, file = "GWW_Nitrospira4.fa")