A modern, high-performance pipeline for analyzing barcoded amplicon sequencing data with Unique Molecular Identifiers (UMI).
This package is a complete modernization of the original UMIErrorCorrect published in Clinical Chemistry (2022).
- High Performance: Parallel processing of genomic regions and fastp-based preprocessing.
- Modern Tooling: Built with
typer,pydantic,loguru, andhatch. - Easy Installation: Fully PEP 621 compliant, installable via
piporuv. - Comprehensive: From raw FASTQ to error-corrected VCFs and consensus statistics.
- Robust: Extensive test suite and type safety.
- bwa for alignment
Fastp is highly recommended, but not mandatory, for preprocessing. If you do not have fastp installed or run with --no-fastp, the pipeline will use cutadapt for adapter trimming only.
The --no-qc flag disables quality control steps. If QC is enabled (default) but fastqc or multiqc are not installed, the pipeline will raise a warning but finish successfully.
Use uv for lightning-fast installation:
# Installs globally
uv tool install umierrorcorrect2
# Install in your venv
uv pip install umierrorcorrect2Or standard pip:
pip install umierrorcorrect2The command-line tool is named umierrorcorrect2. Run the full pipeline on a single sample:
umierrorcorrect2 run \
-r1 sample_R1.fastq.gz \
-r2 sample_R2.fastq.gz \
-r hg38.fa \
-o results/Run the pipeline on multiple samples in a folder (searches recursively for FASTQ files):
umierrorcorrect2 run \
-i folder_with_fastq_files/ \
-r hg38.fa \
-o results/For detailed instructions, see the User Guide or run:
umierrorcorrect2- User Guide: Detailed usage instructions for all commands.
- Docker Guide: Running with containers.
- Implementation Details: Architecture and design overview.
Osterlund T., Filges S., Johansson G., Stahlberg A. UMIErrorCorrect and UMIAnalyzer: Software for Consensus Read Generation, Error Correction, and Visualization Using Unique Molecular Identifiers, Clinical Chemistry, 2022. doi:10.1093/clinchem/hvac136