Predictive and Causal Modeling of Health Risks and Health Outcomes from Clinical, Socio-demographic and other data
(funded in part by grants from the National Science Foundation and the National Institutes of Health)
There is increasing recognition that environmental and contextual factors can have a significant impact on the health outcomes in diseases such as cancer, obesity, diabetes, heart disease. The advent of "big data" offers enormous potential for understanding and predicting health risks, intervention outcomes, and personalized treatments, ultimately improving population health through integrative analysis of heterogeneous, fine-grained, richly structured, longitudinal patient data. This project aims to bring together an interdisciplinary team of researchers to to understand the clinical, behavioral, biomedical environmental, and contextual (e.g., socio-demographic) factors that contribute to increased risk of specific diseases, e.g., breast cancer; and developing evidence-based practices for supplemental screening, as well as behavioral or clinical interventions to mitigate the risk. The project leverages the Penn State Digital Collaboratory for Precision Health Research, a secure infrastructure for data access and use policy compliant integration of geo-coded EHR data with other e.g., socio-demographic data. The key methodological and informatics innovations in the project have to do with the development of novel algorithms and tools for predictive modeling of health risks and health outcomes by integrating clinical, biomedical, environmental, socio-demographic and behavioral data. Work in progress is aimed at:
- Developing customizable, auditable, modular, data access and use policy compliant software workflows for integration of electronic health records (EHR) data with selected environmental, behavioral, biomedical, socio-demographic data
- Applying the workflow to securely assemble and share data sets that can be used to address specific clinical or biomedical, or population health related research questions;
- Develop novel algorithms for predictive and causal modeling of health risks from the resulting "big data"
- Elicit the environmental, contextual, behavioral, health status and health care factors that are reliable predictors of health risks, health outcomes, or effective interventions through integrative analysis of clinical, environmental, behavioral, biomedical, socio-demographic data.