An enrichment protocol and analysis pipeline for long read sequencing of the hepatitis B virus transcriptome

Hepatitis B virus (HBV) is one of the smallest human DNA viruses and its 3.2 Kb genome encodes multiple overlapping open reading frames, making its viral transcriptome challenging to dissect. Previous studies have combined quantitative PCR and Next Generation Sequencing to identify viral transcripts and splice junctions, however the fragmentation and selective amplification used in short read sequencing precludes the resolution of full length RNAs. Our study coupled an oligonucleotide enrichment protocol with state-of-the-art long read sequencing (PacBio) to identify the repertoire of HBV RNAs. This methodology provides sequencing libraries where up to 25 % of reads are of viral origin and enable the identification of canonical (unspliced), non-canonical (spliced) and chimeric viral-human transcripts. Sequencing RNA isolated from de novo HBV infected cells or those transfected with 1.3 × overlength HBV genomes allowed us to assess the viral transcriptome and to annotate 5′ truncations and polyadenylation profiles. The two HBV model systems showed an excellent agreement in the pattern of major viral RNAs, however differences were noted in the abundance of spliced transcripts. Viral-host chimeric transcripts were identified and more commonly found in the transfected cells. Enrichment capture and PacBio sequencing allows the assignment of canonical and non-canonical HBV RNAs using an open-source analysis pipeline that enables the accurate mapping of the HBV transcriptome.

All Keywords
【저자키워드】 HBV, RNA splicing, PacBio, long read sequencing, transcriptome assembly,