Survey of information encoding techniques for DNA
File(s)1906.11062v1.pdf (2.15 MB)
Working paper
Author(s)
Heinis, Thomas
Type
Working Paper
Abstract
Key to DNA storage is encoding the information to a sequence of nucleotides
before it can be synthesised for storage. Definition of such an encoding or
mapping must adhere to multiple design restrictions. First, not all possible
sequences of nucleotides can be synthesised. Homopolymers, e.g., sequences of
the same nucleotide, of a length of more than two, for example, cannot be
synthesised without potential errors. Similarly, the G-C content of the
resulting sequences should be higher than 50\%. Second, given that synthesis is
expensive, the encoding must map as many bits as possible to one nucleotide.
Third, the synthesis (as well as the sequencing) is error prone, leading to
substitutions, deletions and insertions. An encoding must therefore be designed
to be resilient to errors through error correction codes or replication.
Fourth, for the purpose of computation and selective retrieval, encodings
should result in substantially different sequences across all data, even for
very similar data. In the following we discuss the history and evolution of
encodings.
before it can be synthesised for storage. Definition of such an encoding or
mapping must adhere to multiple design restrictions. First, not all possible
sequences of nucleotides can be synthesised. Homopolymers, e.g., sequences of
the same nucleotide, of a length of more than two, for example, cannot be
synthesised without potential errors. Similarly, the G-C content of the
resulting sequences should be higher than 50\%. Second, given that synthesis is
expensive, the encoding must map as many bits as possible to one nucleotide.
Third, the synthesis (as well as the sequencing) is error prone, leading to
substitutions, deletions and insertions. An encoding must therefore be designed
to be resilient to errors through error correction codes or replication.
Fourth, for the purpose of computation and selective retrieval, encodings
should result in substantially different sequences across all data, even for
very similar data. In the following we discuss the history and evolution of
encodings.
Date Issued
2019-06-24
Citation
2019
Publisher
arXiv
Copyright Statement
© 2019 The Author(s)
Sponsor
Commission of the European Communities
Identifier
http://arxiv.org/abs/1906.11062v1
Grant Number
893320
Subjects
q-bio.QM
q-bio.QM
cs.DB
cs.DS
cs.IT
math.IT
Publication Status
Published