Algorithms for causal learning and comparative analysis for genomic data

Machnik, Nick Noel

Algorithms for causal learning and comparative analysis for genomic data - Institute of Science and Technology Austria 2024

Thesis

Abstract Acknowledgements About the Author List of Collaborators and Publications Table of Contents List of Figures List of Tables List of Algorithms List of Abbreviations 1 Background 2 Causal interference for multiple risk factors and diseases from genomics data 3 Detection of age specific genetic effects 4 Comparison of Hi-C experiments using structural similarity 5 Conclusion Bibliography

This thesis consists of two pieces of work in the broader feld of computational biology, both of which are methods for the analysis of large scale biological data, implemented in efcient software. Chapter 2 introduces a statistical software for causal discovery and inference from observed genetic marker and phenotypic trait data. We explore in simulation how well the method can fne-map genetic efects, fnd the correct causal structure among tens of traits and millions of genetic markers, and infer the causal efect size for the discovered causal relations. We then apply the method to 8 million markers and 17 traits from the UK Biobank and show that many relationships found with other methods are likely due to the efects of hidden confounders. Chapter 3 describes how this method can be applied to longitudinal data. I show how one can incorporate the background knowledge present in the known order of measurements to improve the accuracy of the causal discovery process, and explore the method’s ability to identify age specifc genetic efects, and how the error rates of this recovery are infuenced by missing data due to diferent censoring mechanisms. Chapter 4 introduces a statistical software for the comparison of chromatin contact maps based on the structural similarity index. We explore the robustness of the method to noise and size diferences of the compared maps, show how it can measure evolutionary conservation of topological features by providing a similarity ranking of syntenic regions, and fnally how it can detect alterations in 3D genome structure due to genetic mutations in samples of medical relevance.

Powered by Koha