Introduction

This report documents Figure 2 in Albertsen et al., 2016: “Candidatus Propionivibrio aalborgensis”: a novel glycogen accumulating organism abundant in full-scale enhanced biological phosphorus removal plants.

Load the mmgenome package

In case you haven’t installed the mmgenome package, see the Load data example.

library("mmgenome")

Import data

The Rmarkdown file Load_data.Rmd describes the loading of the data and can be imported using the mmimport function. However, the preprocessed data can also be downloaded directly from figshare: Holmes. Hence, here we import the prepocessed data from figshare instead.

load("Holmes.RData")

Data overview

The object d contains information on scaffolds and essential genes within the scaffolds. For each scaffold the dataset contains the following information: The columns H09.06, H11.05, H11.25, H12.13 and H12.09 contain the coverage information from 5 different samples; PC1, PC2 and PC3 contain coordinates of the three first principal components from a PCA analysis on tetranucleotide frequencies; essential contain information taxonomic information for each scaffold based on classification on essential genes; rRNA contain taxonomic information on scaffolds that have an associated 16S rRNA gene; pps_xxx contain taxonomic information obtained using PhyloPythiaS+.

colnames(d$scaffolds)
##  [1] "scaffold"   "length"     "gc"         "H09.06"     "H11.05"    
##  [6] "H11.25"     "H12.13"     "H12.19"     "PC1"        "PC2"       
## [11] "PC3"        "essential"  "rRNA16S"    "pps_phylum" "pps_class" 
## [16] "pps_order"  "pps_family" "pps_genus"

The basic statistics of the full dataset can be summarised using the mmstats function.

mmstats(d, ncov = 5)
##                 General Stats
## n.scaffolds          30725.00
## GC.mean                 48.20
## N50                   2951.00
## Length.total      50450465.00
## Length.max          230584.00
## Length.mean           1642.00
## Coverage.H09.06          0.31
## Coverage.H11.05          1.92
## Coverage.H11.25        253.90
## Coverage.H12.13         48.17
## Coverage.H12.19         28.32
## Ess.total              794.00
## Ess.unique             108.00

Figure 2: Metagenome overview

mmplot(data = d, x = "H11.25", y = "H12.19", log.x = T, log.y = T, color = "essential", minlength = 5000) +
  xlab("Coverage (2013-11-25)") + 
  ylab("Coverage (2013-12-19)") +
  scale_y_log10(limits=c(1,500)) +
  scale_x_log10(breaks = c(10,100,1000, 10000), limits = c(10,10000)) +
  scale_size_area(breaks = c(10000, 50000, 200000), max_size = 10, name = "Scaffold length (bp)") +
  annotate("text", x = 5000, y = 80, label = "Propionivibrio", size = 3, color = "darkred") +
  annotate("text", x = 200, y = 160, label = "Accumulibacter", size = 3, color = "darkred") +
  scale_color_discrete(name ="Taxonomic classification") +
  theme(axis.text = element_text(size = 8),
        axis.title = element_text(size = 10),
        legend.title = element_text(size = 10),
        legend.text = element_text(size = 8),
        axis.line.x = element_line(),
        axis.line.y = element_line()
        )