Introduction

This report documents the binning of a Nitrospira genome from the GWW sample. See Daims et al., 2015: Complete Nitrification by Nitrospira Bacteria for further details.

Load the mmgenome package

The metagenome data is analysed using the mmgenome package.

library("mmgenome")

Import data

The Rmarkdown file Load_data.Rmd describes the loading of the data and can be imported using the mmimport function. However, the preprocessed data can also be downloaded directly from figshare: Daims_GWW. Hence, here we import the prepocessed data from figshare instead.

load("Daims_GWW.RData")

Extract Nitrospira 4

As seen from the overview plots, there are numerous Nitrospira species. This Nitrospira seem to be relatively easy seperated from other genomes.

p <- mmplot(data = d,
            x = "HPD", 
            y = "HPF1", 
            log.x = T, 
            log.y = T, 
            color = "essential", 
            minlength = 1000,
            factor.shape = "solid") +
  xlim(1, 100) +
  ylim(25, 150)

#p
#sel <- mmplot_locator(p)

sel <- data.frame(HPD  =  c(32.9, 34, 41.2, 49.1, 58.1, 56.8, 45.7, 39.5),
                  HPF1  =  c(62.1, 56.5, 58.4, 65.8, 85.5, 98.8, 90.5, 75.4))

mmplot_selection(p, sel)

The scaffolds included in the defined subspace are extracted using the mmextract function.

dA <- mmextract(d, sel)

The mmstats function applies to any extracted object. Hence, it can be used directly on the subset.

mmstats(dA, ncov = 2)
##               General Stats
## n.scaffolds          537.00
## GC.mean               55.80
## N50                 6542.00
## Length.total     2445522.00
## Length.max         35221.00
## Length.mean         4554.00
## Coverage.HPD          44.91
## Coverage.HPF1         74.16
## Ess.total             74.00
## Ess.unique            73.00

Using PE reads (Subset B)

dB <- mmextract_network(subset = dA, 
                        original = d, 
                        network = pe, 
                        nconnections = 1, 
                        type = "direct")

Clean-up other species (Subset C)

We do a another selection to remove other Nitrospira scaffolds that were included.

p <- mmplot(data = dB,
            x = "PC1", 
            y = "PC2", 
            color = "essential", 
            minlength = 1000)
#p
#sel <- mmplot_locator(p)

sel <- data.frame(PC1  =  c(-0.0513, -0.0556, 0.00595, 0.0954, 0.125, 0.107, 0.000222),
                  PC2  =  c(-0.0819, -0.152, -0.202, -0.203, -0.112, -0.0362, -0.0329))

mmplot_selection(p, sel)

The scaffolds included in the defined subspace are extracted using the mmextract function.

dC <- mmextract(dB, sel)

The mmstats function applies to any extracted object. Hence, it can be used directly on the subset.

mmstats(dC, ncov = 2)
##               General Stats
## n.scaffolds         1585.00
## GC.mean               55.70
## N50                 5807.00
## Length.total     6225312.00
## Length.max        176346.00
## Length.mean         3927.60
## Coverage.HPD          27.13
## Coverage.HPF1         60.34
## Ess.total            202.00
## Ess.unique            99.00

Clean-up other Nitropira (Subset D)

We do a another selection to remove other Nitrospira scaffolds that were included.

p <- mmplot(data = dC,
            x = "HPD", 
            y = "HPF1", 
            log.x = T,
            log.y = T,
            color = "essential", 
            minlength = 1000,
            highlight = dA) +
  scale_x_log10(limits=c(10,500)) +
  scale_y_log10(limits=c(10,500))

#p
#sel <- mmplot_locator(p)

sel <- data.frame(HPD  =  c(25.4, 25, 46.3, 153, 198, 98.3, 46, 33.2),
                  HPF1  =  c(44, 53.1, 110, 303, 289, 97.6, 49.5, 41.8))

mmplot_selection(p, sel)

The scaffolds included in the defined subspace are extracted using the mmextract function.

dD <- mmextract(dC, sel)

The mmstats function applies to any extracted object. Hence, it can be used directly on the subset.

mmstats(dD, ncov = 2)
##               General Stats
## n.scaffolds          637.00
## GC.mean               55.80
## N50                 6223.00
## Length.total     2706790.00
## Length.max         35221.00
## Length.mean         4249.30
## Coverage.HPD          44.11
## Coverage.HPF1         72.84
## Ess.total             83.00
## Ess.unique            82.00

Export the scaffolds

Finally the binned scaffolds are exported.

mmexport(data = dD, assembly = assembly, file = "GWW_Nitrospira4.fa")