Example pipeline setup and execution
Preliminary setup and common instructions
1. Setting up the MiCoNE environments
Before execution of the MiCoNE pipeline we need to install the environments:
micone install
Warning
This command will take a considerable amount of time (several hours) as MiCoNE will install all the conda environments
If you wish to install only a subset of the environments, you can specify the environments to install using the -e option:
micone install -e <env1>
The list of supported environments can be found in the Installing the environments section.
2. Initializing the pipeline template
To set up the nextflow workflow template for the desired workflow, you can use the micone init command:
micone init -w <workflow> -o <pipeline_dir>
This initializes the workflow in the pipeline_dir folder. For a list of supported workflow see the Initializing the pipeline template section.
Detailed information about the various files in the pipeline folder can be found in the Configuration and the pipeline template section.
3. Downloading data and setting up the pipeline template
Download the data directory from here and put it under
<pipeline_dir>/nf_micone/data.Update the
sample_sheet.csvandmetadata.jsonfiles in the base<pipeline_dir>directory to reflect the samples and metadata of the data you wish to analyze.Update the
nextflow.configfile if you wish to make any changes to the default configuration. The default configuration files can be found here and the supported configuration options can be found in the tables in the Configuration and the pipeline template section.
Note
Example configurations used for the manuscript can be found in the scripts/runs folder of the MiCoNE-pipeline-paper repository.
4. Run the pipeline
To run the pipeline, you can use the run.sh script in the <pipeline_dir>:
conda activate micone
# To run the code locally
bash run.sh
# To run the code on the cluster using the scheduler
qsub run.sh
Full pipeline workflow
First follow the instructions in steps 1-3 in the Preliminary setup and common instructions section.
Let us assume that you have multiplexed (run1, run2, and run3) paired end 16S sequence data stored in the <pipeline_dir>/seqs folder. To run the pipeline you will need the following:
forward.fastq.gz: Forward readsreverse.fastq.gz: Reverse readsbarcodes.fastq.gz: Barcodesmapping.tsv: Mapping filesample_metadata.tsv: Sample metadata file
Warning
Keep the file names as they are. The pipeline might have issues if the file names are changed.
An example sample_sheet.csv file will look like this:
id |
run |
forward |
reverse |
barcodes |
mapping |
sample_metadata |
|---|---|---|---|---|---|---|
id1 |
run1 |
sequences/run1/seqs/forward.fastq.gz |
sequences/run1/seqs/reverse.fastq.gz |
sequences/run1/seqs/barcodes.fastq.gz |
sequences/run1/mapping.tsv |
sequences/run1/sample_metadata.tsv |
id2 |
run2 |
sequences/run2/seqs/forward.fastq.gz |
sequences/run2/seqs/reverse.fastq.gz |
sequences/run2/seqs/barcodes.fastq.gz |
sequences/run2/mapping.tsv |
sequences/run2/sample_metadata.tsv |
id3 |
run3 |
sequences/run3/seqs/forward.fastq.gz |
sequences/run3/seqs/reverse.fastq.gz |
sequences/run3/seqs/barcodes.fastq.gz |
sequences/run3/mapping.tsv |
sequences/run3/sample_metadata.tsv |
Note
These files must follow the qiime2 supported formats. For more information about the supported formats see the qiime2 documentation.
Network inference workflow
Before running this workflow make sure that your OTU tables have taxonomy metadata and sample metadata information. You must run the workflow from the TA step if they do not.
First follow the instructions in steps 1-3 in the Preliminary setup and common instructions section.
Let us assume that you have 3 sets of OTU tables (id1, id2, and id3) you wish to analyze. To run the pipeline you will need the following:
otu_table.tsv: OTU table in.tsvformatobs_metadata.tsv: Taxonomy assignments in.tsvformat. It must contain the following columns: “Kingdom”, “Phylum”, “Class”, “Order”, “Family”, “Genus”, “Species”. The latter columns can be dropped if you have grouped your taxonomy at a higher level.sample_metadata.tsv: Sample metadata filechildren_map.json: A file that maps the current taxonomy ids to lower taxonomic level. Can be an empty JSON if you wish to ignore this field.
Warning
Keep the file names as they are. The pipeline might have issues if the file names are changed.
An example sample_sheet.csv file will look like this:
id |
otu_table |
obs_metadata |
sample_metadata |
children_map |
|---|---|---|---|---|
id1 |
inputs/id1/otu_table.tsv |
inputs/id1/obs_metadata.tsv |
inputs/id1/sample_metadata.tsv |
inputs/id1/children_map.json |
id2 |
inputs/id2/otu_table.tsv |
inputs/id2/obs_metadata.tsv |
inputs/id2/sample_metadata.tsv |
inputs/id2/children_map.json |
id3 |
inputs/id3/otu_table.tsv |
inputs/id3/obs_metadata.tsv |
inputs/id3/sample_metadata.tsv |
inputs/id3/children_map.json |
Note
Example data can be found here.