Krewlyzer: Comprehensive cfDNA Feature Extraction Toolkit

Krewlyzer is a high-performance toolkit for extracting biological features from cell-free DNA (cfDNA) sequencing data. Designed for cancer genomics, liquid biopsy research, and clinical bioinformatics.

Built with Python + Rust for maximum performance. The compute-intensive core uses PyO3 to deliver 5-50x speedups over pure Python.

Tip

Full Documentation: msk-access.github.io/krewlyzer

Why Krewlyzer?

Cancer cells leave molecular fingerprints in your blood. Krewlyzer finds them.

The Fragmentomics Advantage

Traditional Liquid Biopsy	Fragmentomics with Krewlyzer
Look for specific mutations	Analyze how DNA is cut
Need prior knowledge of tumor	Works without knowing mutations
Miss ~50% of early cancers	Detect more cancers, earlier

Key insight: Tumor DNA fragments are shorter (~145bp) than healthy DNA (~166bp). Krewlyzer quantifies this difference and extracts ML-ready features.

What You Get

Feature	Clinical Use
Fragment size ratios	Tumor burden estimation
Cutting patterns	Tissue of origin identification
Nucleosome positioning	Epigenetic profiling
Mutation-specific sizes	MRD monitoring

New to cfDNA? Read What is Cell-Free DNA? for background.

Quick Install

# Docker (recommended - all data bundled)
docker pull ghcr.io/msk-access/krewlyzer:latest

# Clone + Install (development)
git clone https://github.com/msk-access/krewlyzer.git && cd krewlyzer
git lfs pull && pip install -e .

# pip + Data Clone (custom environments)
pip install krewlyzer
git clone --depth 1 https://github.com/msk-access/krewlyzer.git ~/.krewlyzer-data
cd ~/.krewlyzer-data && git lfs pull
export KREWLYZER_DATA_DIR=~/.krewlyzer-data/src/krewlyzer/data

Note

pip users: The KREWLYZER_DATA_DIR env var is required to locate bundled assets. See Installation Guide for details.

Quick Start

# Run all fragmentomics features
krewlyzer run-all -i sample.bam --reference hg19.fa --output results/

# Generate unified JSON for ML pipelines
krewlyzer run-all -i sample.bam --reference hg19.fa --output results/ --generate-json

# Individual tools
krewlyzer extract -i sample.bam -r hg19.fa -o output/
krewlyzer fsc -i output/sample.bed.gz -o output/

# Panel data (MSK-ACCESS) with target regions
krewlyzer run-all -i sample.bam -r hg19.fa -o results/ \
    --target-regions panel_targets.bed \
    --pon-model msk-access.pon.parquet

Features

Command	Description	Output
`extract`	Extract fragments from BAM	`.bed.gz`
`motif`	End motif & MDS scores	`.EndMotif.tsv`, `.MDS.tsv`
`fsc`	Fragment size coverage	`.FSC.tsv`
`fsr`	Fragment size ratios	`.FSR.tsv`
`fsd`	Size distribution by arm	`.FSD.tsv`
`wps`	Windowed protection score	`.WPS.parquet`
`ocf`	Orientation-aware fragmentation	`.OCF.tsv`
`region-entropy`	TFBS/ATAC size entropy	`.TFBS.tsv`, `.ATAC.tsv`
`uxm`	Fragment-level methylation	`.UXM.tsv`
`mfsd`	Mutant vs wild-type sizes	`.mFSD.tsv`
`build-pon`	Build Panel of Normals	`.pon.parquet`
`run-all`	All features in one pass	All outputs
`--generate-json`	Unified JSON for ML	`.features.json`

Panel Mode (`--target-regions`)

For targeted sequencing panels (MSK-ACCESS):

krewlyzer run-all -i sample.bam -r hg19.fa -o results/ \
    --target-regions panel_targets.bed

GC model: Trained on off-target fragments (unbiased)
Outputs: Split into .tsv (off-target) and .ontarget.tsv
Auto-PON: Use -A xs2 to auto-load bundled PON for z-scores
ML negatives: Use -A xs2 --skip-pon to output raw features (no z-scores)

Documentation

Getting Started - 5-minute quickstart
Installation - Docker, pip, development
Usage Guide - CLI reference
Feature Details - Per-feature documentation
Nextflow Pipeline - Batch processing

Name		Name	Last commit message	Last commit date
Latest commit History 340 Commits
.agent		.agent
.github/workflows		.github/workflows
docs		docs
nextflow		nextflow
rust		rust
scripts		scripts
src/krewlyzer		src/krewlyzer
tests		tests
.gitattributes		.gitattributes
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
krewlyzer_all_docs.md		krewlyzer_all_docs.md
mkdocs.yml		mkdocs.yml
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Krewlyzer: Comprehensive cfDNA Feature Extraction Toolkit

Why Krewlyzer?

The Fragmentomics Advantage

What You Get

Quick Install

Quick Start

Features

Panel Mode (`--target-regions`)

Documentation

Citation

License

About

Uh oh!

Releases 11

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

License

msk-access/krewlyzer

Folders and files

Latest commit

History

Repository files navigation

Krewlyzer: Comprehensive cfDNA Feature Extraction Toolkit

Why Krewlyzer?

The Fragmentomics Advantage

What You Get

Quick Install

Quick Start

Features

Panel Mode (--target-regions)

Documentation

Citation

License

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 11

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Panel Mode (`--target-regions`)

Packages