This is a short guide to the basics of ampvis2, how to load data, as well as a few basic visualisation functions using example data. A complete explanation of all functions, their purpose, different arguments and examples can be found in the Functions tab. First, install ampvis2 as described on the front page.

Loading data

ampvis2 imports data directly from any of the commonly used QIIME, mothur, and UPARSE amplicon pipelines. Both QIIME and mothur (through the make.biom script) can export data in the BIOM format and can be loaded with the amp_import_biom function. Data from the UPARSE pipeline can be imported by the amp_import_uparse function. The data is then loaded into a single ampvis2 object by the amp_load() function, which results in a list of data frames, see the below illustration of the data structure and typical workflow of ampvis2:

To demonstrate how to load data from raw CSV files download and unzip this minimal example data. The otutable.csv and metadata.csv files can then be imported into R by an appropriate read function:

myotutable <- read.csv("example_otutable.csv", check.names = FALSE)
mymetadata <- read.csv("example_metadata.csv", check.names = FALSE)

Note that there are multiple types of CSV files. To change the separator to fx a comma you can adjust with the argument sep = ",". Refer to the documentation of read.table. To load excel files use the readxl::read_excel function instead.

After the two files have been loaded into R, check that the resulting data frames have been loaded correctly (click them in the working environment to the right under “Data”), so that either the rownames or a column named “OTU” contains the OTU ID’s in the loaded otutable, and that the first column in the metadata are the matching sample ID’s. It is highly recommended to read the documentation of amp_load() before loading data.

Now combine the data using amp_load(), which checks the data and combines it into a single ampvis2 object, making it easier to manipulate, filter and subset all elements of the data at once for analysis:

d <- amp_load(otutable = myotutable,
              metadata = mymetadata)

The individual dataframes in the list can be explored with View(d$metadata).

Filtering and subsetting

With the ampvis2 package comes a large example data set with 573 samples taken from the activated sludge from 55 Danish Wastewater Treatment Plants in the period 2006-2013, which can be loaded with data("MiDAS"). Simply typing the name of any ampvis2 object in the console will show a short summary of the data:

## ampvis2 object with 5 elements. 
## Summary of OTU table:
##      Samples         OTUs  Total#Reads    Min#Reads    Max#Reads Median#Reads 
##          658        14969     20890850        10480        46264        31800 
##    Avg#Reads 
##     31749.01 
## Assigned taxonomy:
##       Kingdom        Phylum         Class         Order        Family 
##   14969(100%) 14477(96.71%) 12737(85.09%) 11470(76.63%)  9841(65.74%) 
##         Genus       Species 
##   7380(49.3%)     28(0.19%) 
## Metadata variables: 5 
##  SampleID, Plant, Date, Year, Period

If you have loaded the raw DNA sequences of the OTUs from a FASTA file (with the fasta = argument) you can also get a short summary by typing the name of the ampvis2 object followed by $refseq:

## 14969 DNA sequences in binary format stored in a list.
## Mean sequence length: 472.922 
##    Shortest sequence: 425 
##     Longest sequence: 525 
## Labels:
## OTU_1
## OTU_2
## OTU_3
## OTU_4
## OTU_5
## OTU_6
## ...
## Base composition:
##     a     c     g     t 
## 0.261 0.225 0.319 0.194 
## (Total: 7.08 Mb)

The loaded data can be subsetted based on variables in the metadata using the amp_subset_samples() function, which can then be stored as a new object and analysed separately:

MiDASsubset <- amp_subset_samples(MiDAS, Plant %in% c("Aalborg West", "Aalborg East"))
## 590 samples and 5512 OTUs have been filtered 
## Before: 658 samples and 14969 OTUs
## After: 68 samples and 9457 OTUs

or for a more complex subset, you can subset based on two or more variables using “&” to separate the conditions, or simply use the function more than once. The “!” (logical NOT operator) can be thought of as “except” and is useful to remove fx outliers. Furthermore, the minreads = 10000 argument removes any sample(s) with total amount of reads below the chosen threshold:

MiDASsubset <- amp_subset_samples(MiDAS, Plant %in% c("Aalborg West", "Aalborg East") & !SampleID %in% c("16SAMP-749"), minreads = 10000)
## 591 samples and 5539 OTUs have been filtered 
## Before: 658 samples and 14969 OTUs
## After: 67 samples and 9430 OTUs

The amp_subset_taxa() function instead subsets based on the taxonomy, where you simply provide a vector with the taxa you are interested in, separated by a comma:

MiDAS_Chloroflexi_Actinobacteria <- amp_subset_taxa(MiDAS, tax_vector=c("p__Chloroflexi", "p__Actinobacteria"))
## 12245 OTUs have been filtered 
## Before: 14969 OTUs
## After: 2724 OTUs

The taxonomic rank is indicated by fx "p__" for phylum and "g__" for genus etc, followed by the name of the taxon (case-sensitive, first letter almost always capital). To filter individual OTUs simply provide the OTU name(s) as-is in a vector, fx c("OTU_1206").


All ampvis2 plots are generated using the ggplot2 package. You can change the look of the plots to better suit your needs, add more layers to the plots and use other ggplot2 functions in combination with ampvis plots if needed. Refer to the ggplot2 documentation for more information. amp_heatmap() by default aggregates to phylum level and shows the top 10 phyla, ordered by mean read abundance across all samples:

amp_heatmap(MiDASsubset, group_by = "Plant")

There are many arguments you can use to suit your needs, for a full list see the reference (click the function names to go to its reference page). For example, you can manually select the level at which to aggregate, how many to show, add additional higher level taxonomic information, group the samples differently by the metadata, hide the values, change the colors and scaling, and much more. You can also adjust the text labels for better readability or adjust the positioning of the legend (adjusting ggplot2 plots is always done with “+” after the actual ampvis2 function):

            group_by = "Plant",
            facet_by = "Year",
            tax_aggregate = "Genus",
            tax_add = "Phylum",
            tax_show = 25,
            color_vector = c("white", "darkred"),
            plot_colorscale = "sqrt",
            plot_values = FALSE) +
  theme(axis.text.x = element_text(angle = 45, size=10, vjust = 1),
        axis.text.y = element_text(size=8),


amp_boxplot() generates boxplots, again ordered by mean read abundance across all samples:


The arguments you can provide are similar to those used in amp_heatmap() and other ampvis2 functions:

            group_by = "Period",
            tax_show = 5,
            tax_add = "Phylum")


The amp_ordinate() function has been expanded to support 7 different ordination methods, various data transformations and interactive plots by using Plotly. By default any OTU with an abundance no higher than 0.1% in any sample is removed, which drastically improves the calculation time. You can of course adjust this threshold manually by changing the filter_species = 0.1 argument. Other than this, there are only four main arguments that are involved in the actual calculations, the rest are just various plotting features. These four are type = "", transform = "", distmeasure = "", and lastly constrain = "" for constrained ordination (only used in Redundancy Analysis (RDA) or Canonical Correspondence Analysis (CCA)).

When analysing microbial community composition data it is recommended to use the Hellinger transformation (see Legendre et al, 2001 or Numerical Ecology for details) for most types of ordination methods except the distance-based ordination methods (Principal Coordinates Analysis (PCoA) and non-Metric Multidimensional Scaling (nMDS)), where you also have to select a distance measure manually by the distmeasure = "" argument, for example Bray-Curtis dissimilarities:

             type = "pcoa",
             distmeasure = "bray",
             sample_color_by = "Plant",
             sample_colorframe = TRUE,
             sample_colorframe_label = "Plant") + theme(legend.position = "blank")

It is also possible to track changes over time by setting the sample_trajectory and sample_trajectory_group arguments, to for example reveal temporal patterns per WWTP:

             type = "pcoa",
             distmeasure = "bray",
             sample_color_by = "Plant",
             sample_colorframe_label = "Plant", 
             sample_trajectory = "Date", 
             sample_trajectory_group = "Plant")

And lastly an example of constrained ordination by Canonical Correspondence Analysis (CCA), which in this case reveals how samples taken at different seasonal periods (constrain = "Period") of the year can explain the data:

ordinationresult <- amp_ordinate(MiDASsubset, 
             type = "CCA",
             constrain = "Period",
             transform = "Hellinger",
             sample_color_by = "Period",
             sample_shape_by = "Plant",
             sample_colorframe = TRUE,
             sample_colorframe_label = "Period",
             detailed_output = TRUE)

Notice that the plot can be saved as a more detailed object (detailed_output = TRUE), so additional details about the ordination result can be obtained for evaluation, fx a screeplot by ordinationresult$screeplot or some more raw data by ordinationresult$model.

Read more

There are numerous other functions to try out as well as complete documentation for each function highlighted above. Go explore in the Functions tab.