Splice it up

    Abdullah Kahraman is harnessing long-read sequencing to identify new biomarkers and drug targets

    Sequential improvements

    Prof. Dr. Kahraman, your team at the FHNW School of Life Sciences are analysing patterns in gene sequences to understand and predict disease. Tell us how gene sequencing has evolved.

    The first generation of DNA sequencing began in 1977 when Fred Sanger published his method of sequencing fragmented DNA molecules. Later, in 1987, it became possible to automate Sanger’s sequencing method, speeding up the process to a scale where the sequencing of the entire human genome could be initiated. Before the first draft of the human genome was announced in 2000, a second generation of DNA sequencers was developed that could sequence millions of tiny DNA fragments in a massively parallel fashion. Now, the third generation of DNA sequencing machines can sequence single long DNA frag­ments that, in combination with single-cell and location information, can give us an unprecedent ed view of how DNA and RNA molecules underline human biology and diseases.

    Alternative splices reveal biomarkers of disease

    What types of information can be found using long-read sequencing?

    When RNA is transcribed from DNA, non-coding regions in the RNA sequence are cleaved off (spliced out). The remaining coding regions are combined into different messenger RNAs. This cellular process called alternative splicing is the reason why the small number of genes in our DNA can generate all the different proteins that cells in our body need.

    In cancer cells, alternative splicing is often broken. As a result, proteins are produced that promote the survival and growth of tumour cells. With second-generation sequencing, accurately identifying the complete sequence of long transcripts has always been challenging. However, long-read sequencing now enables the determi­nation of the entire sequence of individual transcripts, providing us with an unprecedented opportunity to explore the full diversity of RNA molecules in normal and cancer cells.

    In collaboration with the Functional Genomics Center in Zurich, my team uses the newest sequencing technologies to study tumour develop­ment and therapy resistance in cancer patients. We develop software, machine learning and data-bases, and integrate the novel datasets with mutational, structural and protein expression data. Our goal is to identify patterns that can predict disease progression and drug response, aiming to detect cancers early enough that patients can be treated without their tumors spreading into metastases.

    Form and function

    What are some challenges in analysing RNA sequences?

    Figuring out if a transcript isoform is a driver of cancer is not easy. Although long-read sequencing allows us to examine the diversity of all cell transcripts, it also tends to reveal many novel transcripts for which no biological function is known. Our first studies suggest that most of these transcripts are technical artifacts or the result of unfinished splicing. Understanding which tran­scripts are biologically relevant and which are only artifacts is an important scientific question that we address in my research group using our broad expertise in machine learning and omics data analysis.

    Targeting transcripts

    What types of drugs or therapies can be developed to target disrupted alternative splicing?

    There are currently two types of therapies for disrupted splicing. One class of drugs targets the protein-RNA complex called splicesome, while the other binds to pathogenic RNA molecules to modify their splicing. Splicesome inhibitors are approved therapies for cancer patients who have mutated splicesome genes. Splicing modifiers, in contrast, are antisense RNA and small molecule drugs that target splicing events in rare diseases, for example, the drug Risdiplam by Roche. It’s a small molecule that can activate the gene SMA2 by inducing the inclusion of an exon in its RNA, thereby restoring muscle motor function in Spinal Muscular Atrophy patients.

    The next frontier: Whole protein sequencing

    Which technologies will shape the future of our genetic understanding of cancer?

    We currently lack a thorough understanding of proteins and protein complexes in cancer. The problem with current proteomics methods is that they can only detect single short peptides from long protein sequences. I believe, therefore, that the next frontier will be whole protein sequencing. A promising technology in this field is nanopore sequencing. Uncovering entire protein sequences will be a game changer for target discovery, not only as a tool to validate alternative splicing identifications but also to assess protein expres­sion and regulatory modifications of proteins.

    At the same time, machine learning and artificial intelligence will become fundamental technologies for future cancer treatment in hospitals. Clinicians are already using large language models to write structured diagnostics reports or automatically detect tumour cells in biopsy images. Artificial intelligence agents in hospitals are enabling the collection and integra­tion of heterogeneous data from patients and will help clinicians identify the best therapy path for their patients. My team is involved in developing both machine learning and AI agents through collaborations and contributions to flagship grant proposals. We hope our work will improve the treatment journey of cancer patients in hospitals and contribute to fighting this devastating disease.

    Key facts

     

    Partners:

    University of Zürich and University Hospital Zürich, Functional Genomics Center Zürich, University Hospital Basel

    Financing:

    Krebsliga Zürich, EMDO Grant, SNF Practice to Science