Large-scale Proteomics for Disease Prediction, Health Evaluation, and Personalized Medicine
Skrár
phd_thesis_thjodbjorg_20251130.pdf (21.54 MB)
Dagsetning
Höfundar
Journal Title
Journal ISSN
Volume Title
Útgefandi
University of Iceland, School of Engineering and Natural Sciences, Faculty of Electrical and Computer Engineering
Útdráttur
Biomarkers derived from plasma proteomics enable personalized medicine by guiding prevention and treatment, selecting clinical trial participants, and evaluating therapeutic response. Plasma protein levels are influenced by both genetic and environmental factors, and they reflect current health, whereas genotypes are stable. Therefore, proteins are an important source of biomarkers of current health, which can be used to monitor disease progression and regression. With advances in high-throughput technologies, such as the Olink and SomaScan platforms, it is now possible to measure thousands of proteins in blood across tens of thousands of individuals.Here, large-scale proteomics data are integrated with machine learning to develop protein-based biomarkers that capture individual variation in health and disease risk. Using large plasma proteomics datasets of ∼5,000 (SomaScan) or ∼3,000 (Olink) plasma protein levels we identified protein-disease associations and derived protein risk scores (ProtRSs) for more than a hundred diseases. For many diseases, considerable improvement was observed, while for others, the ProtRSs improved baseline prediction negligibly. To disentangle genetic and environmental contributions, we analysed genotype-adjusted plasma protein levels. Generally, this adjustment strengthened the association with disease phenotypes, suggesting that changes in plasma protein levels are usually the consequences of disease rather than the cause.ProtRSs for death, atherosclerotic cardiovascular disease (ASCVD) events, and coronary artery disease (CAD) were developed further, and robustly tested against established baselines. Non-linear models and feature selection models were tested, but Lasso penalized linear models were generally found to be among the best performing models. The mortality risk score outperformed predictors based on conventional mortality risk factors and correlated with measures of frailty in an independent dataset. In ASCVD and CAD prediction, ProtRSs significantly improved established models, albeit modestly, in independent datasets. For CAD prediction, a polygenic risk score (PRS) for CAD also improved upon established risk models, with the best performance achieved when protein and PRSs were combined, while a metabolite risk score did not add further benefit.In addition, we used plasma protein levels to determine organ age, i.e., the biological age of organs, and compared these with chronological age to calculate organ age gaps. Positive organ age gaps were associated with multiple diseases, and negative age gaps with good health, though the organ-specificity varied across organs. We further separated organ-specific ageing from ageing shared across all organs, generating organ-specific age gaps that showed higher organ specificity and a shared age gap that was a stronger predictor of mortality than any individual organ age gap. This organ age approach could potentially help with understanding age and disease-related changes in organs, but it currently has limitations that make direct assumptions about changes with age difficult.On the whole, this work gives insight into the uses and limitations of plasma proteomics for clinical applications and medical research in general.
Lýsing
Efnisorð
Öldrun, Personalized medicine, Proteomics, Machine learning, Machine learning