This function reads an OTU-table and corresponding sample metadata, and returns a list for use in all ampvis2 functions. It is therefore required to load data with amp_load before any other ampvis functions can be used.

amp_load(
otutable,
fasta = NULL,
tree = NULL,
pruneSingletons = FALSE,
...
)

## Arguments

otutable (required) OTU-table with the read counts of all OTU's, where the last 7 columns is the taxonomy (Kingdom -> Species). Can be a data frame, matrix, or path to a delimited text file or excel file which will be read using either fread or read_excel, respectively. (recommended) Sample metadata with any information about the samples. If none provided, dummy metadata will be created. Can be a data frame, matrix, or path to a delimited text file or excel file which will be read using either fread or read_excel, respectively. (optional) Path to a FASTA file with reference sequences for all OTU's in the OTU-table. (default: NULL) (optional) Phylogenetic tree of class "phylo" as loaded with read.tree. (default: NULL) (logical) Remove OTU's only observed once in all samples. (default: FALSE) (optional) If otutable and/or metadata is a path to a delimited text file, then additional arguments are passed on to fread.

## Value

A list of class "ampvis2" with 3 to 5 elements.

## Details

The amp_load function validates and corrects the provided data frames in different ways to make it suitable for the rest of the ampvis functions. It is important that the provided data frames match the requirements as described in the following sections to work properly.

## The OTU-table

The OTU-table contains information about the OTUs, their assigned taxonomy and their read counts in each sample. The provided OTU-table must be a data frame with the following requirements:

• The rows are OTU IDs and the columns are samples.

• The last 7 columns are the corresponding taxonomy assigned to the OTUs, named "Kingdom", "Phylum", "Class", "Order", "Family", "Genus", "Species".

• The OTU ID's are expected to be in eiher the rownames of the data frame or in a column called "OTU". Otherwise the function will stop with a message.

• The column names of the data frame are the sample IDs, exactly matching those in the metadata, (and the last 7 columns named Kingdom -> Species, of course). Thus, the first row should contain the read counts of one of the OTUs in each sample, NOT the sample IDs and taxonomy.

• Generally avoid special characters and spaces in row- and column names.

A minimal example is available with data("example_otutable").

The metadata contains additional information about the samples, for example where each sample was taken, date, pH, treatment etc, which is used to compare and group the samples during analysis. The amount of information in the metadata is unlimited, it can contain any number of columns (variables), however there are a few requirements:

• The sample IDs must be in the first column. These sample IDs must match exactly to those in the OTU-table.

• Column classes matter, categorical variables should be loaded either as.character() or as.factor(), and continuous variables as.numeric(). See below.

• Generally avoid special characters and spaces in row- and column names.

If for example a column is named "Year" and the entries are simply entered as numbers (2011, 2012, 2013 etc), then R will automatically consider these as numerical values (as.numeric()) and therefore the column as a continuous variable, while it is a categorical variable and should be loaded as.factor() or as.character() instead. This has consequences for the analysis as R treats them differently. Therefore either use the colClasses =  argument when loading a csv file or col_types =  when loading an excel file, or manually adjust the column classes afterwards with fx metadata$Year <- as.character(metadata$Year).

The amp_load function will automatically use the sample IDs in the first column as rownames, but it is important to also have an actual column with sample IDs, so it is possible to fx group by that column during analysis. Any unmatched samples between the otutable and metadata will be removed.

A minimal example is available with data("example_metadata").

## Reference sequences

A fasta file with the raw sequences can be loaded as well, which will then be available in the refseq element of the ampvis2 object. These sequences will not be used in any ampvis2 function other than the two subset functions amp_subset_samples and amp_subset_taxa, so that they can be exported with amp_export_fasta. The fasta file is loaded with the read.FASTA function from the ape package.

amp_load, amp_subset_samples, amp_subset_taxa

## Examples

# Be sure to use the correct function to load your .csv files, see ?read.table()
if (FALSE) {
# Read the OTU-table as a data frame. It is important to set check.names = FALSE.
myotutable <- read.delim("data/otutable.txt", check.names = FALSE)

# Read the metadata, often an excel sheet. If .csv make sure the first column will be kept and NOT
# loaded as rownames! The top row should be loaded column names

# Combine the data with amp_load() to make it compatible with ampvis2 functions.
# Uncomment the fasta line to load reference sequences (not required).
otutable = myotutable,
fasta = "path/to/fastafile.fa" # optional
)

# Show a short summary about the data by simply typing the name of the object in the console
d
}

example_metadata#> # A tibble: 8 x 4
#>   SampleID    Plant     Date                 Year
#>   <chr>       <chr>     <dttm>              <dbl>
#> 1 16SAMP_3893 Aalborg E 2014-02-06 00:00:00  2014
#> 2 16SAMP_3913 Aalborg E 2014-07-03 00:00:00  2014
#> 3 16SAMP_3941 Aalborg E 2014-08-19 00:00:00  2014
#> 4 16SAMP_3946 Aalborg E 2014-11-13 00:00:00  2014
#> 5 16SAMP_3953 Aalborg W 2014-02-04 00:00:00  2014
#> 6 16SAMP_4591 Aalborg W 2014-05-05 00:00:00  2014
#> 7 16SAMP_4597 Aalborg W 2014-08-18 00:00:00  2014
#> 8 16SAMP_4603 Aalborg W 2014-11-12 00:00:00  2014
# Minimal example otutable:
data("example_otutable")
example_otutable#>        16SAMP_3893 16SAMP_3913 16SAMP_3941 16SAMP_3946 16SAMP_3953 16SAMP_4591
#> OTU_1           23          15         273          51         127         190
#> OTU_2          675         565         331         411         430         780
#> OTU_3          780         733         405         199        1346        1114
#> OTU_4          272         233        1434         256         736        1338
#> OTU_5          560         339         509         598         223         145
#> OTU_6          906         766         133         390         232        1458
#> OTU_7          297         218         418         130        1354         198
#> OTU_8           28           8         155          72         156         101
#> OTU_9            0           0           9           0          19          25
#> OTU_10         373         256          19         415          43         102
#>        16SAMP_4597 16SAMP_4603     Kingdom            Phylum
#> OTU_1          220          83 k__Bacteria    p__Chloroflexi
#> OTU_2          699         820 k__Bacteria p__Actinobacteria
#> OTU_3         1630         112 k__Bacteria p__Actinobacteria
#> OTU_4         1224         564 k__Bacteria p__Proteobacteria
#> OTU_5          212        1619 k__Bacteria    p__Chloroflexi
#> OTU_6          560         287 k__Bacteria     p__Firmicutes
#> OTU_7          283         116 k__Bacteria p__Actinobacteria
#> OTU_8          151          25 k__Bacteria    p__Nitrospirae
#> OTU_9           58           0 k__Bacteria  p__Bacteroidetes
#> OTU_10          73         138 k__Bacteria  p__Bacteroidetes
#>                        Class                 Order                Family
#> OTU_1              c__SJA-15           o__C10_SB1A           f__C10_SB1A
#> OTU_2      c__Actinobacteria      o__Micrococcales f__Intrasporangiaceae
#> OTU_3      c__Acidimicrobiia   o__Acidimicrobiales    f__Microthricaceae
#> OTU_4  c__Betaproteobacteria      o__Rhodocyclales     f__Rhodocyclaceae
#> OTU_5        c__Anaerolineae     o__Anaerolineales    f__Anaerolineaceae
#> OTU_6             c__Bacilli    o__Lactobacillales  f__Carnobacteriaceae
#> OTU_7      c__Acidimicrobiia   o__Acidimicrobiales    f__Microthricaceae
#> OTU_8          c__Nitrospira      o__Nitrospirales     f__Nitrospiraceae
#> OTU_9    c__Sphingobacteriia o__Sphingobacteriales     f__Saprospiraceae
#> OTU_10   c__Sphingobacteriia o__Sphingobacteriales     f__Saprospiraceae
#>                              Genus         Species
#> OTU_1     g__Candidatus Amarilinum             s__
#> OTU_2              g__Tetrasphaera             s__
#> OTU_3     g__Candidatus Microthrix             s__
#> OTU_4             g__Dechloromonas             s__
#> OTU_5  g__Candidatus Villogracilis             s__
#> OTU_6              g__Trichococcus             s__
#> OTU_7     g__Candidatus Microthrix             s__
#> OTU_8                g__Nitrospira s__sublineage I
#> OTU_9                 g__QEDR3BF09             s__
#> OTU_10                     g__MK04             s__