This report documents the GWW dataset, where mulitple Comammox Nitrospira genome bins were extracted. See Daims et al., 2015: Complete Nitrification by Nitrospira Bacteria for further details.
library("mmgenome")
The Rmarkdown file Load_data.Rmd describes the loading of the data and can be imported using the mmimport
function. However, the preprocessed data can also be downloaded directly from figshare: Daims_VetMed. Hence, here we import the prepocessed data from figshare instead.
load("Daims_GWW.RData")
The object d
contains information on scaffolds and essential genes within the scaffolds. For each scaffold the dataset contains the following information: The columns HPD
and HPF1
contain the coverage information from 4 different samples; PC1
, PC2
and PC3
contain coordinates of the three first principal components from a PCA analysis on tetranucleotide frequencies; essential
contain information taxonomic information for each scaffold based on classification on essential genes; rRNA16S
contain taxonomic information on scaffolds that have an associated 16S rRNA gene.
colnames(d$scaffolds)
## [1] "scaffold" "length" "gc" "HPD" "HPF1"
## [6] "PC1" "PC2" "PC3" "essential" "rRNA16S"
mmstats(d, ncov = 2)
## General Stats
## n.scaffolds 183703.00
## GC.mean 57.40
## N50 4254.00
## Length.total 568838836.00
## Length.max 614729.00
## Length.mean 3096.50
## Coverage.HPD 13.31
## Coverage.HPF1 15.63
## Ess.total 11625.00
## Ess.unique 109.00
Scaffolds with essential genes classified to Nitrospira are extracted for a cleaner visualisation later.
nit <- subset(d$scaffolds, essential == "Nitrospirae")
The assembly is decent even though it is from a full-scale sample with a high degree of micro-diversity.
Here I’ve highlighted the scaffolds that contains essential genes from Nitrospira. There is a large number of abundant Nitrospira! Some assembles very nicely, while others are affected by micro-diversity.