This is Penn State

Comparative Analysis of Biomolecular Networks

(funded in part by a grant from the USDA and in part by a National Science Foundation IGERT fellowship)

Network models are playing an increasingly important role in the interpretation of complex interactions among genes, proteins, regulatory RNAs, small ligands and other signaling agents. In particular, comparative analysis of network models of biomolecular interactions across different species or tissues has emerged as an important tool for identifying conserved modules, predicting functions of specific genes or proteins and studying the evolution of biological processes, among other applications. Hence, there is a growing need for scalable modular, and extensible algorithms software for construction, querying, and comparative analysis of diverse types of biomolecular networks. Our work has led to: Development of a suite of modular graph kernel based scalable and customizable algorithms and their open source implementations (as part of BiNA, ) for aligning protein-protein interaction networks and gene co-expression networks (Towfic et al., 2009). Graph kernels allow the computation of global alignment of networks to be broken down into a set of local graph kernel computations, thus contributing to the scalability of the technique. Some results of this research include:

  • Application of comparative protein-protein interaction network analyses to reliably distinguish orthologs from paralogs (Towfic et al., 2010) and of comparative gene co-expression network analyses to identify B-cell ligand processing pathways (Towfic et al., 2012).
  • Characterization of gene expression changes during the onset of photosynthesis (Lonosky et al., 2004), differentiation of retinal stem cells into rod photoreceptors (Hecker et al., 2010).
  • Characterization differences in the proteome of murine retinal and brain derived progenitor cells (Dunn-Thomas et al., 2008).
  • Development of databases and software tools for capture, analysis, annotation, and integration of gene expression data with other types of ‘omics’ data (Couture et al., 2009).
  • Development of BioNetwork Bench, an open source, user-friendly suite of database and software tools for constructing, querying, and analyzing gene and protein network models. BioNetworkbench currently supports a broad class of gene and protein network models (eg, weighted and un-weighted, undirected graphs, multi-graphs). Bionetworkbench enables biologists to analyze public as well as private gene expression, macromolecular interaction and annotation data; interactively query gene expression datasets; integrate data from multiple networks; query multiple networks for interactions of inter¬est; store and selectively share the data as well as results of analyses. BioNetworkbench is fully interoper¬able with, Cytoscape, a popular open-source software suite for visualizing macromolecular interaction networks (Kohutyuk et al., 2012; Hecker et al., 2008).
  • Development and analysis of a machine learning algorithm for inference of temporal Boolean network models from multivariate time series data, with applications to inference of genetic networks from gene expression data (Silvescu and Honavar, 2001).
Work in progress is aimed at the:
  • Further development of BiNA, our modular, extensible, and scalable suite of graph kernel based algorithms to enable alignment of richer networks including (a) Undirected graphs that contain multiple types of links (e.g., interaction, co-localization, etc. in the case of protein-protein interaction networks), or multiple types of nodes (e.g., in the case of macromolecular interaction networks that simultaneously model the interactions among proteins, RNA, DNA, etc.), or both; (b) Directed graphs with one or more types of links (up or down regulation of one gene by another in the case of transcriptional networks), or one or more types of nodes, or both as in the case of richly annotated signaling networks and metabolic networks; (c) The weighted counterparts of undirected (e.g., gene expression correlation networks) as well as directed graphs and (d) Undirected or directed multi-graphs with multiple links between nodes as well as variants that accommodate sets of labels on nodes (e.g., Gene ontology functional annotation, subcellular localization, etc.) and links, as well as their weighted counterparts.
  • Systematic evaluation of the graph kernel based network alignment algorithms (including comparisons them with existing algorithms) on several representative applications including: identifying differences in patterns of biomolecular interactions across different species or tissues; identifying conserved modules or subnetworks; predicting functions of specific genes or proteins (and identification of functional orthologs); compensating for limited experimental data concerning biomolecular interactions within one species through transfer of information from another species on several benchmark datasets.