FastKM

FastKM is a lightweight C++ tool for fast k-mer marker lookup in long reads using a minimal perfect hash function (MPHF) and a compact probabilistic fingerprint to control false positives. It scans FASTQ/FASTA long-read sequences and reports per-read marker statistics (counts, sd, coverage) across one or more k-mer databases (e.g., per-haplotype marker sets at multiple k sizes).

What it does

Given:

a list of k-mer databases (one per file / k / label), and
a gzipped long-read file,

FastKM:

builds an MPHF index for each k-mer set (constant-time queries),
scans each long read using a rolling hash (ntHash),
checks both forward and reverse-complement k-mers,
outputs a tabular matrix with per-read features per database:
- n_* = number of marker hits
- m_* = mean distance between consecutive hits (bp)
- s_* = stddev of distances (bp)
- cov_* = approximate span coverage (%) based on first/last hit positions
- size_* = k-mer size

Input formats

1) K-mer database list (argument 1)

A text file where each line has 3 fields:

The kmerdb.txt file has the following content:

/trio_data/unique-mers/uk15/hapA_only_kmers.txt 15 A
/trio_data/unique-mers/uk15/hapB_only_kmers.txt 15 B
/trio_data/unique-mers/uk18/hapA_only_kmers.txt 18 A
/trio_data/unique-mers/uk18/hapB_only_kmers.txt 18 B
/trio_data/unique-mers/uk21/hapA_only_kmers.txt 21 A
/trio_data/unique-mers/uk21/hapB_only_kmers.txt 21 B
/trio_data/unique-mers/uk24/hapA_only_kmers.txt 24 A
/trio_data/unique-mers/uk24/hapB_only_kmers.txt 24 B

The columns are :

File with uniq k-mers
k-mer size
haplotype

2) Long reads file (argument 2)

gzipped FASTQ/FASTA supported via kseq + zlib, Reads shorter than 500 bp are skipped.

Run the code

./FastKM  kmerdb.txt long-reads.fastq.gz <number_of_cores>

Citation

If you use FastKM in academic work, please cite the associated repository and (if applicable) the manuscript where FastKM is described.

License

MIT LICENSE.

Contact

Maintainer: Alex Di Genova Issues/feature requests: please open a GitHub issue in this repository.

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
libs		libs
FastKM.cpp		FastKM.cpp
LICENSE		LICENSE
README.md		README.md
kmerdb.txt		kmerdb.txt
makefile		makefile

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FastKM

What it does

Input formats

1) K-mer database list (argument 1)

2) Long reads file (argument 2)

Run the code

Citation

License

Contact

About

Uh oh!

Releases 1

Packages

Languages

License

digenoma-lab/FastKM

Folders and files

Latest commit

History

Repository files navigation

FastKM

What it does

Input formats

1) K-mer database list (argument 1)

2) Long reads file (argument 2)

Run the code

Citation

License

Contact

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages