An organism's genome serves as a genetic blueprint, storing all information needed to build and maintain the organism. Genomes often come in copies, where each copy stems from one of the ancestors. Due to mutation and recombination events these sequences differ genetically, each copy is called a haplotype. The analysis of haplotypes plays an important role in genetics, medicine, and various other disciplines. For example, viruses such as HIV change their genomes very quickly during an infection. As a result, they populate their host as a cloud of closely-related virus strains: a viral quasispecies. This allows the virus to adapt to its environment, which makes it hard to cure the viral infection. Accurate reconstruction of each of the individual viral haplotypes causing the infection could lead to improved treatment plans and the development of novel medicine.
We present several approaches for haplotype reconstruction that operate in a "de novo" fashion, meaning that our methods do not require any prior information on the genome content. This type of approach avoids any biases towards pre-known genomes and allows for discovery and assembly of novel haplotypes. We present new techniques to address the computational challenges that come with de novo genome assembly. When combined, our tools form the first de novo approach to full-length viral quasispecies reconstruction and achieve results with an accuracy beyond any existing method.