R/amp_subset_samples.R
amp_subset_samples.Rd
Subsets the data in ampvis2 objects based on metadata and returns the subsetted object.
amp_subset_samples( data, ..., minreads = 0, rarefy = NULL, normalise = FALSE, removeAbsents = TRUE )
data | (required) Data list as loaded with |
---|---|
... | Logical expression indicating elements or rows to keep in the metadata. Missing values are treated as |
minreads | Minimum number of reads pr. sample. Samples below this value will be removed initially. (default: |
rarefy | Rarefy species richness to this value by using |
normalise | (logical) Normalise the OTU read counts to 100 (ie percent) per sample BEFORE the subset. (default: |
removeAbsents | (logical) Whether to remove OTU's that may have 0 read abundance in all samples after the subset. (default: |
A modifed ampvis2 object
The subset is performed on the metadata by subset()
and the abundance- and taxonomy tables are then adjusted accordingly.
By default the raw read counts in the abundance matrix are normalised (transformed to percentages) by some plotting functions automatically (for example amp_heatmap
, amp_timeseries
, and more). This means that the relative abundances shown will be calculated based on the remaining taxa after the subset, not including the removed taxa, if any. To circumvent this, set normalise = TRUE
when subsetting with the amp_subset_taxa
and amp_subset_samples
functions, and then set normalise = FALSE
in the plotting function. This will transform the OTU counts to relative abundances BEFORE the subset, and setting normalise = FALSE
will skip the transformation in the plotting function, see the example below.
data("MiDAS") subsettedData <- amp_subset_samples(MiDAS, Plant %in% c("Aalborg West", "Aalborg East"), normalise = TRUE ) amp_heatmap(subsettedData, group_by = "Plant", tax_aggregate = "Phylum", tax_add = "Genus", normalise = FALSE )
McMurdie, P.J. & Holmes, S. (2014). Waste not, want not: Why
rarefying microbiome data is inadmissible. PLoS Comput Biol
10(4): e1003531. DOI:10.1371/journal.pcbi.1003531
Kasper Skytte Andersen ksa@bio.aau.dk
Mads Albertsen MadsAlbertsen85@gmail.com
# Load example data data("MiDAS") # Show a short summary about the data by simply typing the name of the object in the console MiDAS#> ampvis2 object with 5 elements. #> Summary of OTU table: #> Samples OTUs Total#Reads Min#Reads Max#Reads Median#Reads #> 658 14969 20890850 10480 46264 31800 #> Avg#Reads #> 31749.01 #> #> Assigned taxonomy: #> Kingdom Phylum Class Order Family #> 14969(100%) 14477(96.71%) 12737(85.09%) 11470(76.63%) 9841(65.74%) #> Genus Species #> 7380(49.3%) 28(0.19%) #> #> Metadata variables: 5 #> SampleID, Plant, Date, Year, Period# Keep only samples containing Aalborg West or East in the Plant column MiDASsubset <- amp_subset_samples(MiDAS, Plant %in% c("Aalborg West", "Aalborg East"))#>#> #># Summary MiDASsubset#> ampvis2 object with 5 elements. #> Summary of OTU table: #> Samples OTUs Total#Reads Min#Reads Max#Reads Median#Reads #> 68 9457 2072678 17772 44326 30962.5 #> Avg#Reads #> 30480.56 #> #> Assigned taxonomy: #> Kingdom Phylum Class Order Family Genus #> 9457(100%) 9240(97.71%) 8305(87.82%) 7665(81.05%) 6767(71.56%) 5244(55.45%) #> Species #> 23(0.24%) #> #> Metadata variables: 5 #> SampleID, Plant, Date, Year, Period# Keep only samples containing Aalborg West or East in the Plant column # and remove the sample "16SAMP-749". Remove any sample(s) with less than 20000 total reads MiDASsubset2 <- amp_subset_samples(MiDAS, Plant %in% c("Aalborg West", "Aalborg East") & !SampleID %in% c("16SAMP-749"), minreads = 20000 )#>#> #># Summary MiDASsubset2#> ampvis2 object with 5 elements. #> Summary of OTU table: #> Samples OTUs Total#Reads Min#Reads Max#Reads Median#Reads #> 64 9368 1987574 21472 44326 31413.5 #> Avg#Reads #> 31055.84 #> #> Assigned taxonomy: #> Kingdom Phylum Class Order Family Genus #> 9368(100%) 9154(97.72%) 8222(87.77%) 7589(81.01%) 6699(71.51%) 5194(55.44%) #> Species #> 23(0.25%) #> #> Metadata variables: 5 #> SampleID, Plant, Date, Year, Period