This project involves an in-depth academic analysis of advanced error correction techniques used in DNA-based data storage systems. As digital data storage pushes the boundaries of conventional media, DNA emerges as a promising, ultra-dense, and durable medium. However, it poses unique challenges due to biological noise, including insertions, deletions, and substitutions.
Our research paper explores how classical and modern error correction codes (ECCs) are adapted or redesigned to cope with these challenges, balancing redundancy, accuracy, and cost.
- Encoding Strategies: Translating binary data to quaternary nucleotide sequences using robust mappings.
- Channel Modeling: Treating DNA synthesis, storage, and sequencing processes as noisy channels with biological constraints.
- Biological Error Patterns: Analyzing the types and probabilities of insertions, deletions (indels), and substitutions in DNA sequences.
We studied and compared the performance of several ECCs tailored to DNA storage:
- Hamming Codes: Efficient for small error correction with low redundancy.
- Low-Density Parity Check (LDPC) Codes: High performance with iterative decoding, robust against complex error patterns.
- Convolutional Codes: Suitable for continuous streams and handling burst errors.
- Hybrid Schemes: Proposed combinations of multiple codes and reverse complement-aware encoding to enhance robustness.
- Moksha Choksi
- Dhanika Kothari
- Kavya Veer
- Sindhuja Babu