Skip to content

Pipeline for UMI error correction and variant calling from amplicon sequencing data

License

Notifications You must be signed in to change notification settings

sfilges/umierrorcorrect2

 
 

Repository files navigation

UMIErrorCorrect2

PyPI version CI codecov Python 3.10+ License: MIT Ruff

A modern, high-performance pipeline for analyzing barcoded amplicon sequencing data with Unique Molecular Identifiers (UMI).

This package is a complete modernization of the original UMIErrorCorrect published in Clinical Chemistry (2022).

Key Features

  • High Performance: Parallel processing of genomic regions and fastp-based preprocessing.
  • Modern Tooling: Built with typer, pydantic, loguru, and hatch.
  • Easy Installation: Fully PEP 621 compliant, installable via pip or uv.
  • Comprehensive: From raw FASTQ to error-corrected VCFs and consensus statistics.
  • Robust: Extensive test suite and type safety.

Dependencies

Mandatory

  • bwa for alignment

Optional

  • fastp for preprocessing
  • fastqc for quality control
  • multiqc for quality control / report aggregation

Fastp is highly recommended, but not mandatory, for preprocessing. If you do not have fastp installed or run with --no-fastp, the pipeline will use cutadapt for adapter trimming only.

The --no-qc flag disables quality control steps. If QC is enabled (default) but fastqc or multiqc are not installed, the pipeline will raise a warning but finish successfully.

Installation

Use uv for lightning-fast installation:

# Installs globally
uv tool install umierrorcorrect2

# Install in your venv
uv pip install umierrorcorrect2

Or standard pip:

pip install umierrorcorrect2

Quick Start

The command-line tool is named umierrorcorrect2. Run the full pipeline on a single sample:

umierrorcorrect2 run \
    -r1 sample_R1.fastq.gz \
    -r2 sample_R2.fastq.gz \
    -r hg38.fa \
    -o results/

Run the pipeline on multiple samples in a folder (searches recursively for FASTQ files):

umierrorcorrect2 run \
    -i folder_with_fastq_files/ \
    -r hg38.fa \
    -o results/

For detailed instructions, see the User Guide or run:

umierrorcorrect2

Documentation

Citation

Osterlund T., Filges S., Johansson G., Stahlberg A. UMIErrorCorrect and UMIAnalyzer: Software for Consensus Read Generation, Error Correction, and Visualization Using Unique Molecular Identifiers, Clinical Chemistry, 2022. doi:10.1093/clinchem/hvac136

About

Pipeline for UMI error correction and variant calling from amplicon sequencing data

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Python 92.2%
  • Shell 7.3%
  • Dockerfile 0.5%