Lundi 25 novembre 2019 14h00-16h00

Salle TD 9.02, Bâtiment 9 Université de Montpellier –Campus Triolet

Improving clinical omics applications using k-mer approachesThe past decade has seen a dramatic increase in the amount of sequencing data produced to dissect human disease and biology. However, the number of clinically actionable discoveries produced from these are remarkably low. A pessimistic view would only consider the 59 genes designated by the American Collegeof Medical Genetics as medically actionable. In retrospect, this result is unsurprising as many genetic components of human disease consist of an interaction of a large number of small variations; most studies however are of small size and thus do not have sufficient statistical power to infer these interactions. The difficulty of putting together clinical studies with large cohorts is further compounded by major computational hurdles in exploiting them. These data are large, sensitive and heterogeneous. As a consequence, they cannot freely circulate between research labs, they are difficult to analyze using currently available software and data from different studies are difficult to integrate.

In this seminar, I will show how we use k-mer based approaches to address multiple issues pertaining to the use and interpretation of health-related sequencing data.Specifically, how we can tune simple machine learning techniques to better explore sequencing data without the requirement of a reference genome, implement dimension reduction techniques to allow easy integration of data from multiple sources, implement indexing strategies to store more compact versions of sequencing data and finally, generate software that enables easy exchange of specific parts of sequencing data that are relevant to disease between research groups. These data are highly compressed and importantly they preserve patient anonymity.

 

William Ritchie is head of the team Intelligence artificielle et régulation génique at Institute of Human Genetics (IGH). The lab looks for opportunities in medical and fundamental biology data where information theory and machine learning can make a substantial impact. Our focus has shifted from analysis of non-coding RNAs such as microRNAs and intron retaining RNAs to more global analysis of patient omics data.