Splice it up

Abdullah Kahraman is harnessing long-read sequencing to identify new biomarkers and drug targets

Sequential improvements

Prof. Dr. Kahraman, your team at the FHNW School of Life Sciences are analysing patterns in gene sequences to understand and predict disease. Tell us how gene sequencing has evolved.

The first generation of DNA sequencing began in 1977 when Fred Sanger published his method of sequencing fragmented DNA molecules. Later, in 1987, it became possible to automate Sanger’s sequencing method, speeding up the process to a scale where the sequencing of the entire human genome could be initiated. Before the first draft of the human genome was announced in 2000, a second generation of DNA sequencers was developed that could sequence millions of tiny DNA fragments in a massively parallel fashion. Now, the third generation of DNA sequencing machines can sequence single long DNA fragments that, in combination with single-cell and location information, can give us an unprecedented view of how DNA and RNA molecules underline human biology and diseases.

Alternative splices reveal biomarkers of disease

What types of information can be found using long-read sequencing?

When RNA is transcribed from DNA, non-coding regions in the RNA sequence are cleaved off (spliced out). The remaining coding regions are combined into different messenger RNAs. This cellular process called alternative splicing is the reason why the small number of genes in our DNA can generate all the different proteins that cells in our body need.

In cancer cells, alternative splicing is often broken. As a result, proteins are produced that promote the survival and growth of tumour cells. With second-generation sequencing, accurately identifying the complete sequence of long transcripts has always been challenging. However, long-read sequencing now enables the determination of the entire sequence of individual transcripts, providing us with an unprecedented opportunity to explore the full diversity of RNA molecules in normal and cancer cells.

In collaboration with the Functional Genomics Center in Zurich, my team uses the newest sequencing technologies to study tumour development and therapy resistance in cancer patients. We develop software, machine learning and databases, and integrate the novel datasets with mutational, structural and protein expression data. Our goal is to identify patterns that can predict disease progression and drug response, aiming to detect cancers early enough that patients can be treated without their tumors spreading into metastases.

1/2
Abdullah Kahraman uses machine learning and AI, aiming to detect cancer before tumors spread into metastases.

2/2
Ribonucleic acid (RNA) contains coding and non-coding regions. Coding regions go on to become messenger RNA (mRNA), which is responsible for transmitting the genetic information required to produce proteins.

Form and function

What are some challenges in analysing RNA sequences?

Figuring out if a transcript isoform is a driver of cancer is not easy. Although long-read sequencing allows us to examine the diversity of all cell transcripts, it also tends to reveal many novel transcripts for which no biological function is known. Our first studies suggest that most of these transcripts are technical artifacts or the result of unfinished splicing. Understanding which transcripts are biologically relevant and which are only artifacts is an important scientific question that we address in my research group using our broad expertise in machine learning and omics data analysis.

Targeting transcripts

What types of drugs or therapies can be developed to target disrupted alternative splicing?

There are currently two types of therapies for disrupted splicing. One class of drugs targets the protein-RNA complex called splicesome, while the other binds to pathogenic RNA molecules to modify their splicing. Splicesome inhibitors are approved therapies for cancer patients who have mutated splicesome genes. Splicing modifiers, in contrast, are antisense RNA and small molecule drugs that target splicing events in rare diseases, for example, the drug Risdiplam by Roche. It’s a small molecule that can activate the gene SMA2 by inducing the inclusion of an exon in its RNA, thereby restoring muscle motor function in Spinal Muscular Atrophy patients.

The next frontier: Whole protein sequencing

Which technologies will shape the future of our genetic understanding of cancer?

We currently lack a thorough understanding of proteins and protein complexes in cancer. The problem with current proteomics methods is that they can only detect single short peptides from long protein sequences. I believe, therefore, that the next frontier will be whole protein sequencing. A promising technology in this field is nanopore sequencing. Uncovering entire protein sequences will be a game changer for target discovery, not only as a tool to validate alternative splicing identifications but also to assess protein expression and regulatory modifications of proteins.

At the same time, machine learning and artificial intelligence will become fundamental technologies for future cancer treatment in hospitals. Clinicians are already using large language models to write structured diagnostics reports or automatically detect tumour cells in biopsy images. Artificial intelligence agents in hospitals are enabling the collection and integration of heterogeneous data from patients and will help clinicians identify the best therapy path for their patients. My team is involved in developing both machine learning and AI agents through collaborations and contributions to flagship grant proposals. We hope our work will improve the treatment journey of cancer patients in hospitals and contribute to fighting this devastating disease.

Key facts
Partners:	University of Zürich and University Hospital Zürich, Functional Genomics Center Zürich, University Hospital Basel
Financing:	Krebsliga Zürich, EMDO Grant, SNF Practice to Science