Example pipeline setup and execution
Preliminary setup and common instructions
1. Setting up the MiCoNE environments
Before execution of the MiCoNE
pipeline we need to install the environments:
micone install
Warning
This command will take a considerable amount of time (several hours) as MiCoNE will install all the conda
environments
If you wish to install only a subset of the environments, you can specify the environments to install using the -e
option:
micone install -e <env1>
The list of supported environments can be found in the Installing the environments section.
2. Initializing the pipeline template
To set up the nextflow
workflow template for the desired workflow, you can use the micone init
command:
micone init -w <workflow> -o <pipeline_dir>
This initializes the workflow
in the pipeline_dir
folder. For a list of supported workflow see the Initializing the pipeline template section.
Detailed information about the various files in the pipeline folder can be found in the Configuration and the pipeline template section.
3. Downloading data and setting up the pipeline template
Download the data directory from here and put it under
<pipeline_dir>/nf_micone/data
.Update the
sample_sheet.csv
andmetadata.json
files in the base<pipeline_dir>
directory to reflect the samples and metadata of the data you wish to analyze.Update the
nextflow.config
file if you wish to make any changes to the default configuration. The default configuration files can be found here and the supported configuration options can be found in the tables in the Configuration and the pipeline template section.
Note
Example configurations used for the manuscript can be found in the scripts/runs
folder of the MiCoNE-pipeline-paper repository.
4. Run the pipeline
To run the pipeline, you can use the run.sh
script in the <pipeline_dir>
:
conda activate micone
# To run the code locally
bash run.sh
# To run the code on the cluster using the scheduler
qsub run.sh
Full pipeline workflow
First follow the instructions in steps 1-3 in the Preliminary setup and common instructions section.
Let us assume that you have multiplexed (run1
, run2
, and run3
) paired end 16S sequence data stored in the <pipeline_dir>/seqs
folder. To run the pipeline you will need the following:
forward.fastq.gz
: Forward readsreverse.fastq.gz
: Reverse readsbarcodes.fastq.gz
: Barcodesmapping.tsv
: Mapping filesample_metadata.tsv
: Sample metadata file
Warning
Keep the file names as they are. The pipeline might have issues if the file names are changed.
An example sample_sheet.csv
file will look like this:
id |
run |
forward |
reverse |
barcodes |
mapping |
sample_metadata |
---|---|---|---|---|---|---|
id1 |
run1 |
sequences/run1/seqs/forward.fastq.gz |
sequences/run1/seqs/reverse.fastq.gz |
sequences/run1/seqs/barcodes.fastq.gz |
sequences/run1/mapping.tsv |
sequences/run1/sample_metadata.tsv |
id2 |
run2 |
sequences/run2/seqs/forward.fastq.gz |
sequences/run2/seqs/reverse.fastq.gz |
sequences/run2/seqs/barcodes.fastq.gz |
sequences/run2/mapping.tsv |
sequences/run2/sample_metadata.tsv |
id3 |
run3 |
sequences/run3/seqs/forward.fastq.gz |
sequences/run3/seqs/reverse.fastq.gz |
sequences/run3/seqs/barcodes.fastq.gz |
sequences/run3/mapping.tsv |
sequences/run3/sample_metadata.tsv |
Note
These files must follow the qiime2
supported formats. For more information about the supported formats see the qiime2 documentation.
Network inference workflow
Before running this workflow make sure that your OTU tables have taxonomy metadata and sample metadata information. You must run the workflow from the TA step if they do not.
First follow the instructions in steps 1-3 in the Preliminary setup and common instructions section.
Let us assume that you have 3 sets of OTU tables (id1
, id2
, and id3
) you wish to analyze. To run the pipeline you will need the following:
otu_table.tsv
: OTU table in.tsv
formatobs_metadata.tsv
: Taxonomy assignments in.tsv
format. It must contain the following columns: “Kingdom”, “Phylum”, “Class”, “Order”, “Family”, “Genus”, “Species”. The latter columns can be dropped if you have grouped your taxonomy at a higher level.sample_metadata.tsv
: Sample metadata filechildren_map.json
: A file that maps the current taxonomy ids to lower taxonomic level. Can be an empty JSON if you wish to ignore this field.
Warning
Keep the file names as they are. The pipeline might have issues if the file names are changed.
An example sample_sheet.csv
file will look like this:
id |
otu_table |
obs_metadata |
sample_metadata |
children_map |
---|---|---|---|---|
id1 |
inputs/id1/otu_table.tsv |
inputs/id1/obs_metadata.tsv |
inputs/id1/sample_metadata.tsv |
inputs/id1/children_map.json |
id2 |
inputs/id2/otu_table.tsv |
inputs/id2/obs_metadata.tsv |
inputs/id2/sample_metadata.tsv |
inputs/id2/children_map.json |
id3 |
inputs/id3/otu_table.tsv |
inputs/id3/obs_metadata.tsv |
inputs/id3/sample_metadata.tsv |
inputs/id3/children_map.json |
Note
Example data can be found here.