Introduction

This report documents the initial genome extraction of Candidatus Propionivibrio aalborgensis in Albertsen et al., 2016: “Candidatus Propionivibrio aalborgensis”: a novel glycogen accumulating organism abundant in full-scale enhanced biological phosphorus removal plants.

Load the mmgenome package

In case you haven’t installed the mmgenome package, see the Load data example.

library("mmgenome")

Import data

The Rmarkdown file Load_data.Rmd describes the loading of the data and can be imported using the mmimport function. However, the preprocessed data can also be downloaded directly from figshare: Holmes. Hence, here we import the prepocessed data from figshare instead.

load("Holmes.RData")

Data overview

The object d contains information on scaffolds and essential genes within the scaffolds. For each scaffold the dataset contains the following information: The columns H09.06, H11.05, H11.25, H12.13 and H12.09 contain the coverage information from 5 different samples; PC1, PC2 and PC3 contain coordinates of the three first principal components from a PCA analysis on tetranucleotide frequencies; essential contain information taxonomic information for each scaffold based on classification on essential genes; rRNA contain taxonomic information on scaffolds that have an associated 16S rRNA gene; pps_xxx contain taxonomic information obtained using PhyloPythiaS+.

colnames(d$scaffolds)
##  [1] "scaffold"   "length"     "gc"         "H09.06"     "H11.05"    
##  [6] "H11.25"     "H12.13"     "H12.19"     "PC1"        "PC2"       
## [11] "PC3"        "essential"  "rRNA16S"    "pps_phylum" "pps_class" 
## [16] "pps_order"  "pps_family" "pps_genus"

The basic statistics of the full dataset can be summarised using the mmstats function.

mmstats(d, ncov = 5)
##                 General Stats
## n.scaffolds          30725.00
## GC.mean                 48.20
## N50                   2951.00
## Length.total      50450465.00
## Length.max          230584.00
## Length.mean           1642.00
## Coverage.H09.06          0.31
## Coverage.H11.05          1.92
## Coverage.H11.25        253.90
## Coverage.H12.13         48.17
## Coverage.H12.19         28.32
## Ess.total              794.00
## Ess.unique             108.00

Propionivibrio

The combination of the coverage of sample H11.25 and H11.05 provides the cleanest separation of the two genomes and are used for binning.

p <- mmplot(data = d, x = "H11.25", y = "H11.05", log.x = T, log.y = T, color = "essential", minlength = 3000)

#p
#sel <- mmplot_locator(p)

sel <- data.frame(H11.25  =  c(847, 2350, 7530, 8550, 2450, 974),
                  H11.05  =  c(12.1, 46.2, 94.9, 64.7, 11.2, 6.7))

mmplot_selection(p, sel) +
  theme(axis.line.x = element_line(), 
        axis.line.y = element_line())