Introduction

This report documents the binning of a Nitrospira genome from the GWW sample. See Daims et al., 2015: Complete Nitrification by Nitrospira Bacteria for further details.

Load the mmgenome package

The metagenome data is analysed using the mmgenome package.

library("mmgenome")

Import data

The Rmarkdown file Load_data.Rmd describes the loading of the data and can be imported using the mmimport function. However, the preprocessed data can also be downloaded directly from figshare: Daims_GWW. Hence, here we import the prepocessed data from figshare instead.

load("Daims_GWW.RData")

Extract Nitrospira 8

As seen from the overview plots, there are numerous Nitrospira species. This one is located between multiple other species.

p <- mmplot(data = d,
            x = "HPD", 
            y = "HPF1", 
            log.x = T, 
            log.y = T, 
            color = "essential", 
            minlength = 1000,
            factor.shape = "solid") +
  scale_x_log10(limits=c(1, 10)) +
  scale_y_log10(limits=c(10, 100))

#p
#sel <- mmplot_locator(p)

sel <- data.frame(HPD  =  c(3.66, 4.13, 5.11, 6.32, 5.99, 4.51, 3.68),
                  HPF1  =  c(25.8, 32.4, 35.6, 30.1, 23.4, 20.3, 21.4))

mmplot_selection(p, sel)

The scaffolds included in the defined subspace are extracted using the mmextract function.

dA <- mmextract(d, sel)

The mmstats function applies to any extracted object. Hence, it can be used directly on the subset.

mmstats(dA, ncov = 2)
##               General Stats
## n.scaffolds          982.00
## GC.mean               53.90
## N50                29503.00
## Length.total    13326250.00
## Length.max        176346.00
## Length.mean        13570.50
## Coverage.HPD           5.06
## Coverage.HPF1         27.01
## Ess.total            352.00
## Ess.unique           106.00
mmplot_pairs(data = dA, 
             variables = c("HPD", "HPF1","gc", "PC1", "PC2", "PC3"), 
             minlength = 5000, 
             color = "essential")

Clean-up other species (Subset B)

We do a another selection to remove other Nitrospira scaffolds that were included.

p <- mmplot(data = dA,
            x = "PC1", 
            y = "PC2", 
            color = "essential", 
            minlength = 1000,
            highlight = "10801")
#p
#sel <- mmplot_locator(p)

sel <- data.frame(PC1  =  c(-0.0512, -0.0367, 0.00971, 0.0898, 0.113, 0.037, -0.0431),
                  PC2  =  c(-0.0888, -0.169, -0.193, -0.171, -0.0713, -0.0472, -0.058))

mmplot_selection(p, sel)

The scaffolds included in the defined subspace are extracted using the mmextract function.

dB <- mmextract(dA, sel, exclude = "10801")

The mmstats function applies to any extracted object. Hence, it can be used directly on the subset.

mmstats(dB, ncov = 2)
##               General Stats
## n.scaffolds          357.00
## GC.mean               56.40
## N50                41715.00
## Length.total     8101188.00
## Length.max        176346.00
## Length.mean        22692.40
## Coverage.HPD           4.82
## Coverage.HPF1         27.55
## Ess.total            199.00
## Ess.unique           104.00

Export the scaffolds

Finally the binned scaffolds are exported.

mmexport(data = dB, assembly = assembly, file = "GWW_Nitrospira8.fa")