Introduction

This report documents the VetMed dataset, where a Commamox Nitrospira genome bin was extracted. See Daims et al., 2015: Complete Nitrification by Nitrospira Bacteria for further details.

Load the mmgenome package

library("mmgenome")

Import data

The Rmarkdown file Load_data.Rmd describes the loading of the data and can be imported using the mmimport function. However, the preprocessed data can also be downloaded directly from figshare: Daims_VetMed. Hence, here we import the prepocessed data from figshare instead.

load("Daims_VetMed.RData")

Data overview

The object d contains information on scaffolds and essential genes within the scaffolds. For each scaffold the dataset contains the following information: The columns VM23 and VMPS contain the coverage information from 4 different samples; PC1, PC2 and PC3 contain coordinates of the three first principal components from a PCA analysis on tetranucleotide frequencies; essential contain information taxonomic information for each scaffold based on classification on essential genes; rRNA contain taxonomic information on scaffolds that have an associated 16S rRNA gene; esomy and esomx are esom coordinates for each scaffold.

colnames(d$scaffolds)
##  [1] "scaffold"  "length"    "gc"        "VM23"      "VMPS"     
##  [6] "PC1"       "PC2"       "PC3"       "essential" "rRNA16S"  
## [11] "esomy"     "esomx"

The basic statistics of the full dataset can be summarised using the mmstats function.

mmstats(d, ncov = 2)
##               General Stats
## n.scaffolds        68129.00
## GC.mean               60.60
## N50                 3010.00
## Length.total   175731459.00
## Length.max        439284.00
## Length.mean         2579.40
## Coverage.VM23         12.43
## Coverage.VMPS          7.24
## Ess.total           3583.00
## Ess.unique           109.00

Extract nitrospira scaffolds

Scaffolds with essential genes classified to Nitrospira are extracted for a cleaner visualisation later.

nit <- subset(d$scaffolds, essential == "Nitrospirae")

Overview plot

The assembly is decent even though it is from a full-scale sample with a high degree of micro-diversity.