This report documents Figure 2 in Albertsen et al., 2016: “Candidatus Propionivibrio aalborgensis”: a novel glycogen accumulating organism abundant in full-scale enhanced biological phosphorus removal plants.
In case you haven’t installed the mmgenome package, see the Load data example.
library("mmgenome")
The Rmarkdown file Load_data.Rmd describes the loading of the data and can be imported using the mmimport
function. However, the preprocessed data can also be downloaded directly from figshare: Holmes. Hence, here we import the prepocessed data from figshare instead.
load("Holmes.RData")
The object d
contains information on scaffolds and essential genes within the scaffolds. For each scaffold the dataset contains the following information: The columns H09.06
, H11.05
, H11.25
, H12.13
and H12.09
contain the coverage information from 5 different samples; PC1
, PC2
and PC3
contain coordinates of the three first principal components from a PCA analysis on tetranucleotide frequencies; essential
contain information taxonomic information for each scaffold based on classification on essential genes; rRNA
contain taxonomic information on scaffolds that have an associated 16S rRNA gene; pps_xxx
contain taxonomic information obtained using PhyloPythiaS+.
colnames(d$scaffolds)
## [1] "scaffold" "length" "gc" "H09.06" "H11.05"
## [6] "H11.25" "H12.13" "H12.19" "PC1" "PC2"
## [11] "PC3" "essential" "rRNA16S" "pps_phylum" "pps_class"
## [16] "pps_order" "pps_family" "pps_genus"
The basic statistics of the full dataset can be summarised using the mmstats
function.
mmstats(d, ncov = 5)
## General Stats
## n.scaffolds 30725.00
## GC.mean 48.20
## N50 2951.00
## Length.total 50450465.00
## Length.max 230584.00
## Length.mean 1642.00
## Coverage.H09.06 0.31
## Coverage.H11.05 1.92
## Coverage.H11.25 253.90
## Coverage.H12.13 48.17
## Coverage.H12.19 28.32
## Ess.total 794.00
## Ess.unique 108.00
mmplot(data = d, x = "H11.25", y = "H12.19", log.x = T, log.y = T, color = "essential", minlength = 5000) +
xlab("Coverage (2013-11-25)") +
ylab("Coverage (2013-12-19)") +
scale_y_log10(limits=c(1,500)) +
scale_x_log10(breaks = c(10,100,1000, 10000), limits = c(10,10000)) +
scale_size_area(breaks = c(10000, 50000, 200000), max_size = 10, name = "Scaffold length (bp)") +
annotate("text", x = 5000, y = 80, label = "Propionivibrio", size = 3, color = "darkred") +
annotate("text", x = 200, y = 160, label = "Accumulibacter", size = 3, color = "darkred") +
scale_color_discrete(name ="Taxonomic classification") +
theme(axis.text = element_text(size = 8),
axis.title = element_text(size = 10),
legend.title = element_text(size = 10),
legend.text = element_text(size = 8),
axis.line.x = element_line(),
axis.line.y = element_line()
)