A cell's gene expression profile (what RNAs and proteins are present in a cell) defines a cell's identity. It ensures that cells that all contain the same DNA sequence, can do very different things: muscle cells can contract, an immune cell can (assist in) clearing pathogens and a neuron can transmit signals. A key step in gene expression is transcription: the process of transcribing parts of the DNA into RNAs. RNA Polymerase II (RNAPII) is the machine that transcribes all protein-coding genes into messenger RNAs. Transcription is a multi-step process with tight regulation, because an unbalance in any of the steps will cause gene expression changes which can result in disease.
One key "decision point" happens during early elongation: will RNAPII go into productive elongation, or will it terminate early? In recent years, we have learnt that early termination is not uncommon during transcription at protein-coding genes, yet clearly enough RNAPII goes into productive elongation to generate the needed full-length mRNAs. How the correct balance between termination and elongation is achieved is currently unclear.
Over the last 10-15 years, it's become apparent that RNAPII also transcribes many regions of the genome into non-coding RNAs. Some of these have well established functions, such as small nuclear RNAs and microRNAs. The largest number though are generate from places in the genome called 'enhancers'. There, RNAPII produces enhancer RNAs, or eRNAs. Moreover, RNAPII transcribes a non-coding upstream antisense RNA (uaRNA, also called PROMPT) next to most protein-coding genes. The function of transcribing these eRNAs and uaRNAs remains an active topic of investigation and debate. What we do know is that most of these non-coding RNAs are much shorter than a typical mRNAs, meaning that the termination/elongation balance is shifted much more towards early termination. This may be important for genome stability, as RNAPII transcription running rampant throughout the genome would cause collisions and subsequent DNA damage.
With our research, we aim to understand how the balance between early termination and productive elongation is achieved at both protein-coding genes and non-coding loci. By understanding this key step in gene expression, we will open new avenues to study how this process gets misregulated in disease and exploit this for therapeutic gain.
In the Vlaming group, we will study the regulation of this elongation/termination balance both from the DNA perspective, and from the perspective of protein regulators.
To study what elements in the transcribed sequence (encoded in the DNA or RNA) control the fate of RNAPII, during my postdoctoral fellowship I have developed the INSERT-seq approach.
Schematic representation of the INSERT-seq approach. At a reporter locus, a library of inserts is introduced by CRISPR–Cas9. Using different strategies, the effects of each insert on RNA abundance (steady-state or nascent) and protein expression are determined in high throughput.
Using this approach, I found that the composition of the transcribed sequence is a critical determinant of RNAPII elongation potential. I identified that the high GC content of the early transcribed regions of protein-coding genes favors transcription elongation. The GC content of most uaRNAs and eRNAs is much lower, and this contributes to their early termination. Furthermore, the presence of splice elements is important: not only does the process of splicing stimulate transcription, the 5'SS can autonomously promote transcription as well.
This work forms a foundation for future work in the lab. We have evidence that additional sequence elements have a role in dictating the elongation/termination balance, and through a combination of additional INSERT-seq screens and advanced data analysis, we will uncover these elements. For these novel sequences, we will decipher in what contexts they act and how their signals are conveyed to RNAPII.
In parallel, we will use CRISPR screening to identify new protein regulators that differentially control coding and non-coding transcription. For these proteins, we will study their genome-wide transcriptional effects through cutting-edge nascent RNA sequencing approaches, and will uncover what underlies their target specificity.
Here is the recording of a seminar Hanneke presented about her postdoctoral work in the Fragile Nucleosome series: