Welcome to AGB 2025 Microbiota Classification Pipeline

Repository for the AGB 2025 common class project.

AGB 2025 is a dockerised Nextflow workflow that turns raw stool sequencing reads into clinically actionable labels (healthy vs non-healthy) and rich microbiome analytics. Built by the students of the subject AGB_2025.

In this wiki page you will find the information about the pipeline context, the sample processing and the decisions made through each of the modules.

Quick start (Docker edition)

Prerequisites to start the pipeline

Tool	macOS (Homebrew)	Ubuntu / Debian	Notes
Docker ≥ 24	`brew install --cask docker` Launch Docker Desktop	`sudo apt install docker.io`	Ensure the Docker daemon is running and your user has access.
Nextflow ≥ 23.10	`brew install nextflow`	`curl -s https://get.nextflow.io \| bash && sudo` `chmod +x nextflow` `mv nextflow /usr/local/bin/`	The pipeline pulls everything in containers.
Memory Requirement			At least 4 GB of available system memory is required for Kraken2.

1 · Setup Installing

Before running the pipeline you must (i) clone the repository, (ii) make the custom Docker image available, (iii) download the reference databases and (iv) create the run.

1.1 Clone the repository

git clone https://github.com/egenomics/agb2025.git
cd agb2025

1.2 Docker image

This pipeline uses a custom Docker image (agb2025-python) to merge metadata with MultiQC output using pandas and csvkit. To build the Docker Image, run this in the terminal

docker build -t agb2025-python -f Dockerfile .

The image is used in the process that merges metadata.tsv with multiqc_fastqc.txt. This step will fail unless agb2025-python is built locally beforehand.

1.3 Download the reference databases

In addition, it is mandatory to install two databases: Kraken2 and the classifier SILVA. To install them, run this command line:

chmod +x INSTALLME.sh
./INSTALLME.sh

The INSTALLME.sh script will save both databases in databases/.

1.4 Create the run

To test the developing version of the pipeline (main.nf), a run needs to be created, sample raw data downloaded and the corresponding metadata created. Executing the create._run.sh script will create a local folder called runs/<run_id>/ following the run naming convention. This folder will contain 15 paired fastqs in raw_data/ and a metadata.tsv in metadata/. If you don't remove the folder and you reuse the same run_id, you will only need to do that once.

chmod +x create_run.sh
./create_run.sh

This script is only needed to simulate a real sequencing run during development.

After executing ./create_run.sh, a runs/<run_id>/ folder will be created. Copy the run_id. It will be used to run the pipeline. You can alternatively copy the run_id from the last line of the output of the ./create_run.sh script.

2 · Running the pipeline

After copying the run_id, use this command line to run the pipeline:

nextflow run main.nf --run_id <run_id> --sampling_depth <number> --auto_rarefaction TRUE -profile docker
# i.e. nextflow run main.nf --run_id R01310525 --sampling_depth 10000 --auto_rarefaction TRUE -profile docker

To resume in case of an interrupted run, to skip completed tasks, we highly recommend to use the argument -resume:

nextflow run main.nf --run_id <run_id> --sampling_depth <number> --auto_rarefaction TRUE -profile docker -resume

Scripts Overview

create_run.sh – prepares a run folder with raw fastq files and metadata.
INSTALLME.sh – automatically downloads and extracts kraken2 and qiime2 databases.

3 · Data Visualization

To explore the pipeline outputs interactively, you can launch the integrated Shiny app and visualize taxonomy profiles and sample metadata.

3.1 Launching the Shiny App

To start the app, give execution permissions and run the launch script:

chmod +x shiny_app.sh
./shiny_app.sh

Once the app is running, a browser window will open where you can manually upload your processed files for visualization.

3.2 Using Pipeline Output

To directly visualize the output generated by the pipeline, you need to convert QIIME-formatted files to the long format expected by the app. To do it, you can use the following script:

python scripts/convert_qiime_to_long.py [taxonomy.tsv path] [feature_table.tsv path] [output_file path]

This will create a .tsv file suitable for upload into the Shiny app.

Notes

Filtered fastq's and outputs/ directories are not pushed to github due to size limits.
multiqc summary merging into metadata is done via csv tools and logged in outputs/run_<run_id>/.

Name		Name	Last commit message	Last commit date
Latest commit History 295 Commits
.nextflow/cache		.nextflow/cache
controls/metadata		controls/metadata
data		data
group2_B		group2_B
img		img
metadata		metadata
modules		modules
scripts		scripts
tests		tests
visualization		visualization
.gitignore		.gitignore
.nf-core.yml		.nf-core.yml
Dockerfile		Dockerfile
INSTALLME.sh		INSTALLME.sh
LICENSE		LICENSE
README.md		README.md
create_run.sh		create_run.sh
download_samples.sh		download_samples.sh
main.nf		main.nf
modules.json		modules.json
nextflow.config		nextflow.config
shiny_app.sh		shiny_app.sh
shiny_log.txt		shiny_log.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Welcome to AGB 2025 Microbiota Classification Pipeline

Repository for the AGB 2025 common class project.

Quick start (Docker edition)

Prerequisites to start the pipeline

1 · Setup Installing

1.1 Clone the repository

1.2 Docker image

1.3 Download the reference databases

1.4 Create the run

2 · Running the pipeline

Scripts Overview

3 · Data Visualization

3.1 Launching the Shiny App

3.2 Using Pipeline Output

Notes

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors 20

Uh oh!

Languages

License

egenomics/agb2025

Folders and files

Latest commit

History

Repository files navigation

Welcome to AGB 2025 Microbiota Classification Pipeline

Repository for the AGB 2025 common class project.

Quick start (Docker edition)

Prerequisites to start the pipeline

1 · Setup Installing

1.1 Clone the repository

1.2 Docker image

1.3 Download the reference databases

1.4 Create the run

2 · Running the pipeline

Scripts Overview

3 · Data Visualization

3.1 Launching the Shiny App

3.2 Using Pipeline Output

Notes

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors 20

Uh oh!

Languages

Packages