Krewlyzer is a high-performance toolkit for extracting biological features from cell-free DNA (cfDNA) sequencing data. Designed for cancer genomics, liquid biopsy research, and clinical bioinformatics.
Built with Python + Rust for maximum performance. The compute-intensive core uses PyO3 to deliver 5-50x speedups over pure Python.
Tip
Full Documentation: msk-access.github.io/krewlyzer
Cancer cells leave molecular fingerprints in your blood. Krewlyzer finds them.
| Traditional Liquid Biopsy | Fragmentomics with Krewlyzer |
|---|---|
| Look for specific mutations | Analyze how DNA is cut |
| Need prior knowledge of tumor | Works without knowing mutations |
| Miss ~50% of early cancers | Detect more cancers, earlier |
Key insight: Tumor DNA fragments are shorter (~145bp) than healthy DNA (~166bp). Krewlyzer quantifies this difference and extracts ML-ready features.
| Feature | Clinical Use |
|---|---|
| Fragment size ratios | Tumor burden estimation |
| Cutting patterns | Tissue of origin identification |
| Nucleosome positioning | Epigenetic profiling |
| Mutation-specific sizes | MRD monitoring |
New to cfDNA? Read What is Cell-Free DNA? for background.
# Docker (recommended - all data bundled)
docker pull ghcr.io/msk-access/krewlyzer:latest
# Clone + Install (development)
git clone https://github.com/msk-access/krewlyzer.git && cd krewlyzer
git lfs pull && pip install -e .
# pip + Data Clone (custom environments)
pip install krewlyzer
git clone --depth 1 https://github.com/msk-access/krewlyzer.git ~/.krewlyzer-data
cd ~/.krewlyzer-data && git lfs pull
export KREWLYZER_DATA_DIR=~/.krewlyzer-data/src/krewlyzer/dataNote
pip users: The KREWLYZER_DATA_DIR env var is required to locate bundled assets. See Installation Guide for details.
# Run all fragmentomics features
krewlyzer run-all -i sample.bam --reference hg19.fa --output results/
# Generate unified JSON for ML pipelines
krewlyzer run-all -i sample.bam --reference hg19.fa --output results/ --generate-json
# Individual tools
krewlyzer extract -i sample.bam -r hg19.fa -o output/
krewlyzer fsc -i output/sample.bed.gz -o output/
# Panel data (MSK-ACCESS) with target regions
krewlyzer run-all -i sample.bam -r hg19.fa -o results/ \
--target-regions panel_targets.bed \
--pon-model msk-access.pon.parquet| Command | Description | Output |
|---|---|---|
extract |
Extract fragments from BAM | .bed.gz |
motif |
End motif & MDS scores | .EndMotif.tsv, .MDS.tsv |
fsc |
Fragment size coverage | .FSC.tsv |
fsr |
Fragment size ratios | .FSR.tsv |
fsd |
Size distribution by arm | .FSD.tsv |
wps |
Windowed protection score | .WPS.parquet |
ocf |
Orientation-aware fragmentation | .OCF.tsv |
region-entropy |
TFBS/ATAC size entropy | .TFBS.tsv, .ATAC.tsv |
uxm |
Fragment-level methylation | .UXM.tsv |
mfsd |
Mutant vs wild-type sizes | .mFSD.tsv |
build-pon |
Build Panel of Normals | .pon.parquet |
run-all |
All features in one pass | All outputs |
--generate-json |
Unified JSON for ML | .features.json |
For targeted sequencing panels (MSK-ACCESS):
krewlyzer run-all -i sample.bam -r hg19.fa -o results/ \
--target-regions panel_targets.bed- GC model: Trained on off-target fragments (unbiased)
- Outputs: Split into
.tsv(off-target) and.ontarget.tsv - Auto-PON: Use
-A xs2to auto-load bundled PON for z-scores - ML negatives: Use
-A xs2 --skip-ponto output raw features (no z-scores)
- Getting Started - 5-minute quickstart
- Installation - Docker, pip, development
- Usage Guide - CLI reference
- Feature Details - Per-feature documentation
- Nextflow Pipeline - Batch processing
If you use Krewlyzer, please cite:
- DELFI (FSR): Cristiano S, et al. Nature 2019
- WPS: Snyder MW, et al. Cell 2016
- OCF: Sun K, et al. Genome Res 2019
- UXM: Loyfer N, et al. Nature 2022
See Citation & Scientific Background for full references.
GNU Affero General Public License v3.0 (AGPL-3.0). See LICENSE.
Developed by Ronak Shah (@rhshah) at Memorial Sloan Kettering Cancer Center.