Introduction

This report documents the binning of a Nitrospira genome from the GWW sample. See Daims et al., 2015: Complete Nitrification by Nitrospira Bacteria for further details.

Load the mmgenome package

The metagenome data is analysed using the mmgenome package.

library("mmgenome")

Import data

The Rmarkdown file Load_data.Rmd describes the loading of the data and can be imported using the mmimport function. However, the preprocessed data can also be downloaded directly from figshare: Daims_GWW. Hence, here we import the prepocessed data from figshare instead.

load("Daims_GWW.RData")

Extract Nitrospira 3

This Nitrospira seem to assemble very nicely. Hence, additional effort is done to obtain a clean genome bin.

p <- mmplot(data = d,
            x = "HPD", 
            y = "HPF1", 
            log.x = F, 
            log.y = F, 
            color = "essential", 
            minlength = 1000,
            factor.shape = "solid") +
  xlim(10, 50) +
  ylim(1, 7.5)

#p
#sel <- mmplot_locator(p)

sel <- data.frame(HPD  =  c(26.8, 27.5, 30.3, 38.9, 39.6, 35.8, 26.3),
                  HPF1  =  c(3.51, 2.85, 2.81, 3.72, 4.76, 5.9, 4.74))

mmplot_selection(p, sel)

The scaffolds included in the defined subspace are extracted using the mmextract function. One scaffold is excluded using network plots.

dA <- mmextract(d, sel, exclude = "47021")

The mmstats function applies to any extracted object. Hence, it can be used directly on the subset.

mmstats(dA, ncov = 2)
##               General Stats
## n.scaffolds           49.00
## GC.mean               55.80
## N50               155638.00
## Length.total     3443025.00
## Length.max        320146.00
## Length.mean        70265.80
## Coverage.HPD          33.75
## Coverage.HPF1          4.15
## Ess.total             91.00
## Ess.unique            90.00

Subset B

Unsing PE reads.

dB <- mmextract_network(subset = dA, 
                        original = d, 
                        network = pe, 
                        nconnections = 1, 
                        type = "direct")

… and then plot the new subset.

mmplot_network(data = dB, 
               network = pe, 
               nconnections = 1, 
               color = "essential", 
               scale.links = 0.5)

Subset C

We do a final selection to remove other Nitrospira scaffolds that were included.

p <- mmplot(data = dB,
            x = "HPD", 
            y = "HPF1", 
            log.x = T, 
            log.y = T, 
            color = "essential", 
            minlength = 1000)
#p
#sel <- mmplot_locator(p)

sel <- data.frame(HPD  =  c(19.3, 21, 90.2, 161, 169, 21.6),
                  HPF1  =  c(2.42, 4.38, 21.2, 23.2, 6.76, 1.56))

mmplot_selection(p, sel)

The scaffolds included in the defined subspace are extracted using the mmextract function.

dC <- mmextract(dB, sel)

The mmstats function applies to any extracted object. Hence, it can be used directly on the subset.

mmstats(dC, ncov = 2)
##               General Stats
## n.scaffolds           66.00
## GC.mean               55.80
## N50               155638.00
## Length.total     3487713.00
## Length.max        320146.00
## Length.mean        52844.10
## Coverage.HPD          33.77
## Coverage.HPF1          4.15
## Ess.total            106.00
## Ess.unique            99.00

Export the scaffolds

Finally the binned scaffolds are exported.

mmexport(data = dC, assembly = assembly, file = "GWW_Nitrospira3.fa")