This function reads an OTU-table and corresponding sample metadata, and returns a list for use in all ampvis2 functions. It is therefore required to load data with amp_load before any other ampvis functions can be used.

amp_load(
  otutable,
  metadata = NULL,
  fasta = NULL,
  tree = NULL,
  pruneSingletons = FALSE,
  ...
)

Arguments

otutable

(required) OTU-table with the read counts of all OTU's, where the last 7 columns is the taxonomy (Kingdom -> Species). Can be a data frame, matrix, or path to a delimited text file or excel file which will be read using either fread or read_excel, respectively.

metadata

(recommended) Sample metadata with any information about the samples. If none provided, dummy metadata will be created. Can be a data frame, matrix, or path to a delimited text file or excel file which will be read using either fread or read_excel, respectively.

fasta

(optional) Path to a FASTA file with reference sequences for all OTU's in the OTU-table. (default: NULL)

tree

(optional) Phylogenetic tree of class "phylo" as loaded with read.tree. (default: NULL)

pruneSingletons

(logical) Remove OTU's only observed once in all samples. (default: FALSE)

...

(optional) If otutable and/or metadata is a path to a delimited text file, then additional arguments are passed on to fread.

Value

A list of class "ampvis2" with 3 to 5 elements.

Details

The amp_load function validates and corrects the provided data frames in different ways to make it suitable for the rest of the ampvis functions. It is important that the provided data frames match the requirements as described in the following sections to work properly.

The OTU-table

The OTU-table contains information about the OTUs, their assigned taxonomy and their read counts in each sample. The provided OTU-table must be a data frame with the following requirements:

  • The rows are OTU IDs and the columns are samples.

  • The last 7 columns are the corresponding taxonomy assigned to the OTUs, named "Kingdom", "Phylum", "Class", "Order", "Family", "Genus", "Species".

  • The OTU ID's are expected to be in eiher the rownames of the data frame or in a column called "OTU". Otherwise the function will stop with a message.

  • The column names of the data frame are the sample IDs, exactly matching those in the metadata, (and the last 7 columns named Kingdom -> Species, of course). Thus, the first row should contain the read counts of one of the OTUs in each sample, NOT the sample IDs and taxonomy.

  • Generally avoid special characters and spaces in row- and column names.

A minimal example is available with data("example_otutable").

The metadata

The metadata contains additional information about the samples, for example where each sample was taken, date, pH, treatment etc, which is used to compare and group the samples during analysis. The amount of information in the metadata is unlimited, it can contain any number of columns (variables), however there are a few requirements:

  • The sample IDs must be in the first column. These sample IDs must match exactly to those in the OTU-table.

  • Column classes matter, categorical variables should be loaded either as.character() or as.factor(), and continuous variables as.numeric(). See below.

  • Generally avoid special characters and spaces in row- and column names.

If for example a column is named "Year" and the entries are simply entered as numbers (2011, 2012, 2013 etc), then R will automatically consider these as numerical values (as.numeric()) and therefore the column as a continuous variable, while it is a categorical variable and should be loaded as.factor() or as.character() instead. This has consequences for the analysis as R treats them differently. Therefore either use the colClasses = argument when loading a csv file or col_types = when loading an excel file, or manually adjust the column classes afterwards with fx metadata$Year <- as.character(metadata$Year).

The amp_load function will automatically use the sample IDs in the first column as rownames, but it is important to also have an actual column with sample IDs, so it is possible to fx group by that column during analysis. Any unmatched samples between the otutable and metadata will be removed.

A minimal example is available with data("example_metadata").

Reference sequences

A fasta file with the raw sequences can be loaded as well, which will then be available in the refseq element of the ampvis2 object. These sequences will not be used in any ampvis2 function other than the two subset functions amp_subset_samples and amp_subset_taxa, so that they can be exported with amp_export_fasta. The fasta file is loaded with the read.FASTA function from the ape package.

See also

Examples

# Be sure to use the correct function to load your .csv files, see ?read.table() if (FALSE) { # Read the OTU-table as a data frame. It is important to set check.names = FALSE. myotutable <- read.delim("data/otutable.txt", check.names = FALSE) # Read the metadata, often an excel sheet. If .csv make sure the first column will be kept and NOT # loaded as rownames! The top row should be loaded column names mymetadata <- read_excel("data/metadata.xlsx", col_names = TRUE) # Combine the data with amp_load() to make it compatible with ampvis2 functions. # Uncomment the fasta line to load reference sequences (not required). d <- amp_load( otutable = myotutable, metadata = mymetadata, fasta = "path/to/fastafile.fa" # optional ) # Show a short summary about the data by simply typing the name of the object in the console d } # Minimal example metadata: data("example_metadata") example_metadata
#> # A tibble: 8 x 4 #> SampleID Plant Date Year #> <chr> <chr> <dttm> <dbl> #> 1 16SAMP_3893 Aalborg E 2014-02-06 00:00:00 2014 #> 2 16SAMP_3913 Aalborg E 2014-07-03 00:00:00 2014 #> 3 16SAMP_3941 Aalborg E 2014-08-19 00:00:00 2014 #> 4 16SAMP_3946 Aalborg E 2014-11-13 00:00:00 2014 #> 5 16SAMP_3953 Aalborg W 2014-02-04 00:00:00 2014 #> 6 16SAMP_4591 Aalborg W 2014-05-05 00:00:00 2014 #> 7 16SAMP_4597 Aalborg W 2014-08-18 00:00:00 2014 #> 8 16SAMP_4603 Aalborg W 2014-11-12 00:00:00 2014
# Minimal example otutable: data("example_otutable") example_otutable
#> 16SAMP_3893 16SAMP_3913 16SAMP_3941 16SAMP_3946 16SAMP_3953 16SAMP_4591 #> OTU_1 23 15 273 51 127 190 #> OTU_2 675 565 331 411 430 780 #> OTU_3 780 733 405 199 1346 1114 #> OTU_4 272 233 1434 256 736 1338 #> OTU_5 560 339 509 598 223 145 #> OTU_6 906 766 133 390 232 1458 #> OTU_7 297 218 418 130 1354 198 #> OTU_8 28 8 155 72 156 101 #> OTU_9 0 0 9 0 19 25 #> OTU_10 373 256 19 415 43 102 #> 16SAMP_4597 16SAMP_4603 Kingdom Phylum #> OTU_1 220 83 k__Bacteria p__Chloroflexi #> OTU_2 699 820 k__Bacteria p__Actinobacteria #> OTU_3 1630 112 k__Bacteria p__Actinobacteria #> OTU_4 1224 564 k__Bacteria p__Proteobacteria #> OTU_5 212 1619 k__Bacteria p__Chloroflexi #> OTU_6 560 287 k__Bacteria p__Firmicutes #> OTU_7 283 116 k__Bacteria p__Actinobacteria #> OTU_8 151 25 k__Bacteria p__Nitrospirae #> OTU_9 58 0 k__Bacteria p__Bacteroidetes #> OTU_10 73 138 k__Bacteria p__Bacteroidetes #> Class Order Family #> OTU_1 c__SJA-15 o__C10_SB1A f__C10_SB1A #> OTU_2 c__Actinobacteria o__Micrococcales f__Intrasporangiaceae #> OTU_3 c__Acidimicrobiia o__Acidimicrobiales f__Microthricaceae #> OTU_4 c__Betaproteobacteria o__Rhodocyclales f__Rhodocyclaceae #> OTU_5 c__Anaerolineae o__Anaerolineales f__Anaerolineaceae #> OTU_6 c__Bacilli o__Lactobacillales f__Carnobacteriaceae #> OTU_7 c__Acidimicrobiia o__Acidimicrobiales f__Microthricaceae #> OTU_8 c__Nitrospira o__Nitrospirales f__Nitrospiraceae #> OTU_9 c__Sphingobacteriia o__Sphingobacteriales f__Saprospiraceae #> OTU_10 c__Sphingobacteriia o__Sphingobacteriales f__Saprospiraceae #> Genus Species #> OTU_1 g__Candidatus Amarilinum s__ #> OTU_2 g__Tetrasphaera s__ #> OTU_3 g__Candidatus Microthrix s__ #> OTU_4 g__Dechloromonas s__ #> OTU_5 g__Candidatus Villogracilis s__ #> OTU_6 g__Trichococcus s__ #> OTU_7 g__Candidatus Microthrix s__ #> OTU_8 g__Nitrospira s__sublineage I #> OTU_9 g__QEDR3BF09 s__ #> OTU_10 g__MK04 s__