Penalized Models for Cancer Prognosis and Treatment Response

Mojtaba Kanani Sarcheshmeh, MSc student – University of Calgary

Supervisor – Qingrun Zhang

Collaborator – Dr. Xingyi Guo, and his group at Vanderbilt University Medical Center

Project Description: This project aims to develop a novel mathematical model and its scalable
implementation for high-dimensional feature selection with prior domain
knowledge integrated. In particular, we will develop hierarchical penalized
regression techniques by extending Group Lasso [1], LassoNet [2], and Graphical
Lasso [3]. The application will be to identify key genomic signatures and gene
networks associated with cancer prognosis and treatment response. Group Lasso
selects variable groups, identifying pathways or co-regulated gene sets linked to
clinical outcomes. Graphical Lasso estimates a sparse inverse covariance matrix,
uncovering co-expression networks and gene interactions. LassoNet, a lasso
penalized neural network, efficiently selects predictive multi-gene signatures from
high-dimensional data. By integrating these methods, we aim to gain a
comprehensive view of interacting terms which are critical in the context of
cancer progression and treatment. Based upon our existing publications [4–7], we
will focus on theoretical and algorithmic innovations to extend these methods to
high-dimensional data settings common in cancer research. In collaboration with
other professors in Math/Stats and Cumming School of Medicine at the
University of Calgary, we will also apply the methods to the large-scale cancer
omics data that we have access through the TCGA and CPTAC Consortium,
the largest repository of cancer data generation in the world [8–33].

References
[1] Ming Yuan and Yi Lin. “Model selection and estimation in regression with grouped variables”. In: Journal of
the Royal Statistical Society Series B: Statistical Methodology 68.1 (2006), pp. 49–67.
[2] Ismael Lemhadri et al. “Lassonet: A neural network with feature sparsity”. In: Journal of Machine Learning
Research 22.127 (2021), pp. 1–29.
[3] Jerome Friedman, Trevor Hastie, and Robert Tibshirani. “Sparse inverse covariance estimation with the
graphical lasso”. In: Biostatistics 9.3 (2008), pp. 432–441.
[4] Pathum Kossinna et al. “Stabilized COre gene and Pathway Election uncovers pan-cancer shared pathways
and a cancer-specific driver”. In: Science Advances 8.51 (2022), eabo2846. doi: 10.1126/sciadv.abo2846.
eprint: https://www.science.org/doi/pdf/10.1126/sciadv.abo2846. url: https://www.science.org/
doi/abs/10.1126/sciadv.abo2846.
[5] Jingni He, Qing Li, and Qingrun Zhang. “rvTWAS: identifying gene–trait association using sequences by
utilizing transcriptome-directed feature selection”. In: Genetics 226.2 (Nov. 2023), iyad204. issn: 1943-2631.
doi: 10.1093/genetics/iyad204. eprint: https://academic.oup.com/genetics/article-pdf/226/2/
iyad204/56724856/iyad204.pdf. url: https://doi.org/10.1093/genetics/iyad204.
[6] Dinghao Wang et al. “cLD: Rare-variant linkage disequilibrium between genomic regions identifies novel
genomic interactions”. In: PLOS Genetics 19.12 (Dec. 2023), pp. 1–21. doi:
10.1371/journal.pgen.1011074. url: https://doi.org/10.1371/journal.pgen.1011074.
[7] Qing Li et al. “XA4C: eXplainable representation learning via Autoencoders revealing Critical genes”. In:
PLOS Computational Biology 19.10 (Oct. 2023), pp. 1–19. doi: 10 . 1371 / journal . pcbi . 1011476. url:
https://doi.org/10.1371/journal.pcbi.1011476.
[8] Katherine A Hoadley et al. “Cell-of-origin patterns dominate the molecular classification of 10,000 tumors from
33 types of cancer”. In: Cell 173.2 (2018), pp. 291–304.
2
[9] Tathiane M Malta et al. “Machine learning identifies stemness features associated with oncogenic
dedifferentiation”. In: Cell 173.2 (2018), pp. 338–354.
[10] Ashton C Berger et al. “A comprehensive pan-cancer molecular study of gynecologic and breast cancers”. In:
Cancer cell 33.4 (2018), pp. 690–705.
[11] Angel Garcia-Diaz et al. “Interferon receptor signaling pathways regulating PD-L1 and PD-L2 expression”. In:
Cell reports 19.6 (2017), pp. 1189–1201.
[12] Pornpimol Charoentong et al. “Pan-cancer immunogenomic analyses reveal genotype-immunophenotype
relationships and predictors of response to checkpoint blockade”. In: Cell reports 18.1 (2017), pp. 248–262.
[13] Theo A Knijnenburg et al. “Genomic and molecular landscape of DNA damage repair deficiency across the
cancer genome atlas”. In: Cell reports 23.1 (2018), pp. 239–254.
[14] Hui Zhang et al. “Integrated proteogenomic characterization of human high-grade serous ovarian cancer”. In:
Cell 166.3 (2016), pp. 755–765.
[15] Francisco Sanchez-Vega et al. “Oncogenic signaling pathways in the cancer genome atlas”. In: Cell 173.2 (2018),
pp. 321–337.
[16] Mahmoud Ghandi et al. “Next-generation characterization of the cancer cell line encyclopedia”. In: Nature
569.7757 (2019), pp. 503–508.
[17] Cancer Genome Atlas Research Network Tissue source sites: Duke University Medical School McLendon Roger 1
Friedman Allan 2 Bigner Darrell 1 et al. “Comprehensive genomic characterization defines human glioblastoma
genes and core pathways”. In: Nature 455.7216 (2008), pp. 1061–1068.
[18] “Pan-cancer analysis of whole genomes”. In: Nature 578.7793 (2020), pp. 82–93.
[19] Cancer Genome Atlas Research Network et al. “Comprehensive molecular characterization of urothelial bladder
carcinoma”. In: Nature 507.7492 (2014), p. 315.
[20] Florent Petitprez et al. “B cells are associated with survival and immunotherapy response in sarcoma”. In:
Nature 577.7791 (2020), pp. 556–560.
[21] Beth A Helmink et al. “B cells and tertiary lymphoid structures promote immunotherapy response”. In: Nature
577.7791 (2020), pp. 549–555.
[22] Cyriac Kandoth et al. “Mutational landscape and significance across 12 major cancer types”. In: Nature
502.7471 (2013), pp. 333–339.
[23] Cancer Genome Atlas Research Network et al. “Integrated genomic analyses of ovarian carcinoma”. In: Nature
474.7353 (2011), p. 609.
[24] Jianfang Liu et al. “An integrated TCGA pan-cancer clinical data resource to drive high-quality survival
outcome analytics”. In: Cell 173.2 (2018), pp. 400–416.
[25] Matthew H Bailey et al. “Comprehensive characterization of cancer driver genes and mutations”. In: Cell 173.2
(2018), pp. 371–385.
[26] Bing Zhang et al. “Proteogenomic characterization of human colon and rectal cancer”. In: Nature 513.7518
(2014), pp. 382–387.
[27] Philipp Mertins et al. “Proteogenomics connects somatic mutations to signalling in breast cancer”. In: Nature
534.7605 (2016), pp. 55–62.
[28] Jeffrey W Tyner et al. “Functional genomic landscape of acute myeloid leukaemia”. In: Nature 562.7728 (2018),
pp. 526–531.
[29] Yang Liu et al. “Comparative molecular analysis of gastrointestinal adenocarcinomas”. In: Cancer cell 33.4
(2018), pp. 721–735.
[30] Francesca Petralia et al. “Pan-cancer proteogenomics characterization of tumor immunity”. In: Cell (2024).
[31] Kuan-lin Huang et al. “Pathogenic germline variants in 10,389 adult cancers”. In: Cell 173.2 (2018), pp. 355–370.
[32] Rikke B Holmgaard et al. “Tumor-expressed IDO recruits and activates MDSCs in a Treg-dependent manner”.
In: Cell reports 13.2 (2015), pp. 412–424.
[33] Liya Ding et al. “PARP inhibition elicits STING-dependent antitumor immunity in Brca1-deficient ovarian
cancer”. In: Cell reports 25.11 (2018), pp. 2972–2980.
[34] Qingrun Zhang. Google Scholar. url: https://scholar.google.co.uk/citations?user=H_iXVUEAAAAJ&hl=
en.
3