Department of Biomedical Informatics |
Joel Saltz, MD, PhD, Chair
The National Library of Medicine defines biomedical informatics as the intersection of basic informational and computing sciences with an application domain in health care and biomedicine. The Department of Biomedical Informatics (BMI) is a collaboration of computer scientists, image analysis/ computer vision specialists, systems biologists/ bioinformaticians and clinical informaticians who apply their skills and interests to problems at the interface between the physical/computational sciences and the biological sciences. Departmental researchers apply distributed and parallel computing techniques to data querying, retrieval and integration, imaging, simulation, medical informatics and computational biology, and they develop middleware and optimizations to grid-enable projects in the biological, medical and physical sciences. Over the past year, the BMI received 15 awards from federal sponsors with total funding exceeding $2.2 million in addition to ongoing support of over $1 million.
Ongoing Research Programs:
- High End, Data Intensive and Grid Computing Research: Faculty and staff researchers in the multiscale computing area work to develop middleware technology and techniques to enable management, sharing and manipulation of data at multiple scales across heterogeneous, dynamic collections of storage and computation systems. Some application areas include: - Large-scale collaborative biomedical clinical studies - Analysis of gene expression and functional imaging information - Imaging, analysis and simulation of oil reservoirs and data-driven control of oil production - Analysis of satellite data - Analysis of multiresolution, multiple-grid simulation datasets
- Image Analysis/Computer Vision: The imaging research group focuses on in vivo imaging middleware, microscopic and radiologic image registration and analysis, computer vision, machine learning, medical imaging, generalized principal component analysis, geometric theories of computer vision, and symmetry-based recognition and matching.
- Systems Biology/Bioinformatics: The systems biology/bioinformatics group focuses on bioinformatic analysis of gene regulation involving chromatin, transcription factor interactions with DNA, promoter analysis and miRNA. Another line of investigation is the development of computational and evolutionary sciences in a comparative genomics context, including the development of novel phylogenetic methods to correlate genotypes and phenotypes, and to find diagnostic polymorphisms among organisms. This group has begun to study molecular changes associated with zoonoses and pandemics from a wholegenome perspective with emphasis on corona viruses (SARS) and influenza (avian and other influenza strains).
Research Accomplishments of 2006
- Grid Computing/Middleware Development (caGrid 1.0 Release): A national team of researchers (led by Scott Oster, MS; Shannon Hastings, MS; Steve Langella, MS; Tahsin Kurc, PhD; and Joel Saltz, MD, PhD) from the Multiscale Computing Laboratory in the BMI released the second distribution of caGrid 1.0, a suite of tools, resources and computer software that enables researchers around the world to tap into libraries of data and genetic information that, until now, have been largely inaccessible. The BMI group is the lead developer site for caGrid 1.0, an integral part of the cancer Biomedical Informatics Grid (caBIG) announced by the National Cancer Institute (NCI) in 2004. The caBIG network will allow scientists at cancer centers, medical centers and research laboratories worldwide to share information and analytic capabilities efficiently and securely. To accomplish this, the caBIG program has developed common applications, tools, data and analytical resources, information standards and grid software infrastructure to enable programs and databases at remote institutions to quickly interact. caGrid 1.0 is the unifying architecture and operating environment for systems and applications in caBIG. The caGrid 1.0 release contains such new features as: a tool for rapidly developing caBIG-compatible data and analytical grid services; tools for administering a security infrastructure; and a portal providing a dynamic view of services running on caGrid, along with information about research institutions and service providers participating in caBIG. Other collaborators on the caGrid 1.0 project include the National Cancer Institute Center for Bioinformatics, University of Chicago/Argonne National Laboratory, Duke Comprehensive Cancer Center, ScenPro Inc., SemanticBits, LLC, Science Application International Corp. and Booz Allen Hamilton.
- In Vivo Imaging (IVI) Middleware Development (GridImage): The IVI Middleware group, led by Joel Saltz, MD, PhD; Metin Gurcan, PhD; Tony Pan, MS; and Ashish Sharma, PhD, developed a Grid-aware image reviewing system (GridIMAGE) that allows practitioners to: select images from multiple geographically distributed DICOM servers; send those images to a group of human readers and computer-assisted detection (CAD) algorithms; and compare interpretations from human readers and CAD algorithms. GridImage was developed leveraging the National Cancer Institute caGrid infrastructure and is designed to help identify lung nodules on thoracic computed tomography. GridIMAGE enables researchers and clinicians to share datasets and CAD analytical resources. It also allows human readers to view and specify regions of interest. The infrastructure can support any type of distributed review. caGrid data and analytical services are used to link DICOM image databases and CAD systems, and to interact with human readers. Moreover, the service-oriented and distributed structure of the GridIMAGE framework enables a flexible system that can be deployed in an institution (linking multiple DICOM servers and CAD algorithms) as well as in a Grid environment (linking the resources of collaborating research groups). GridIMAGE allows practitioners to obtain interpretations from human readers or CAD algorithms. It also enables cooperative imaging groups to perform image-interpretation tasks associated with research. The system can be deployed within an institution as an Enterprise system to facilitate access to multiple DICOM servers and CAD systems.
The current implementation of GridIMAGE carries out pulmonary nodule detection for thoracic computerized tomography (CT) images. The practitioner queries a set of image databases and selects studies, series or images to be interpreted. For each image, the practitioner can specify one or more radiologists and the CAD algorithms to be used to identify candidate nodules. Pulmonary nodule candidates are identified as three-dimensional structures and are defined using parameters that specify a boundary and/or centroid. A graphical user interface (GUI) has been developed to display CAD findings on CT images as overlays. Using the GUI, the practitioner can: query, select and preview subsets of distributed lung CT databases to be included in a study; select CAD algorithms and human readers for the analysis; and visualize human annotations and markups along with those produced by CAD algorithms.
- Bioinformatics/Systems Biology (Adapting Google Earth to Track the Spread of Avian Influenza) – Daniel Janies, PhD, and his group have designed an interactive map of the spread of the avian flu virus (H5N1) that for the first time incorporates information from viral genomes, geography and evolution to track the spread of the virus among various hosts and will help predict where the next outbreak is likely to occur. As part of the process, the group tested hypotheses about key strains, or genotypes, of the virus that appear to be heading west and have the ability to infect humans.
The team used software to build an evolutionary map of the virus’s mutations. They projected their evolutionary map onto the globe using Keyhole Markup Language (KML) available in GoogleTM Earth. Similar to a legend found on a road map, the evolutionary map uses colors and symbols to indicate which types of hosts carry the virus and the distribution of dangerous genotypes that could infect humans. Designers used TimeSpan, another function of GoogleTM Earth, to animate the westward spread of the virus from Asia to Europe and Africa over the past decade. Clicking on a specific viral isolate or junction in the interactive map generates a window revealing diagnostic mutations in the virus that distinguish one strain from another. All data in the map is linked to other resources at the National Institute of Health’s GenBank. This enables the comparison of findings about viruses in the real world with pre-existing laboratory hypotheses. With the evolutionary tree visualized in the globe, questions can be asked about the virus’s geographics, host and mutations that enable transmission from birds to mammals.
The team studied genomic sequence data from 351 isolates of the virus. They were especially interested in discovering if certain hosts were carrying discrete forms of the virus and which viruses carried mutations enabling transmission to humans. The visualization was useful for generating hypotheses that were subsequently tested by applying statistical tests to the evolutionary tree. This allowed the investigators to ask whether mutations were associated with hosts or certain geographic regions by chance, or whether adaptations of the virus to new hosts and in new regions were being tracked. Janies and colleagues found no genotypes associated with mutations in hemagglutinin (HA), nor in neuraminidase (NA) that were significantly associated with any specific type of host. They did, however, find a strong association between genotype Lysine-627 in an internal protein, polymerase basic 2, that is linked to increased replication and virulence of H5N1 in laboratory mice.
- Nucleosome Positions Predicted Through Comparative Genomics: DNA sequence has long been recognized as an important contributor to nucleosome positioning, which has the potential to regulate access to genes. The extent to which the nucleosomal architecture at promoters is delineated by the underlying sequence is being worked out. In collaboration with a group of investigators led by Dr. Franklin Pugh at Penn State, Ilya Ioschikes, PhD, used comparative genomics to report a genome-wide map of nucleosome positioning sequences (NPSs) located in the vicinity of all Saccharomyces cerevisiae genes (Letters, Nature Genetics 38(10)1210-1215). The group found that the underlying DNA sequence provides a very good predictor of nucleosome locations that have been experimentally mapped to a small fraction of the genome. Notably, distinct classes of genes possess characteristic arrangements of NPSs that may be important for their regulation. In particular, genes that have a relatively compact NPS arrangement over the promoter region tend to have a TATA box buried in an NPS and tend to be highly regulated by chromatin modifying and remodeling factors. Ioschikhes conducted the computational correlation searches.
- High Performance Computing: Umit Catalyurek, PhD, extended his work in combinatorial algorithms and parallel computing. Combinatorial algorithms are an enabling technology for scientific computing, especially for large-scale problems and high-performance computing. Partitioning and load balancing are important issues in parallel scientific computing relating to performance and efficiency of large-scale parallel computing clusters. His work involves developing computational hypergraph models for applications with irregular data dependencies. Hypergraphs provide more generalized abstractions than graphs; hence, they are more flexible in modeling complex problems. Following Catalyurek’s early works, hypergraphs are used today for workload partitioning in parallel processing. The explosion of data in all scientific fields, especially biomedical areas, necessitates parallel computational systems with tens or hundreds of processors whose efficiency is vastly improved by the types of models and algorithms developed by Catalyurek and collaborators.
This work laid the foundation for Catalyurek’s collaborative DoE SciDAC application, CSCAPES (Combinatorial Scientific Computing and Petascale Simulations) Institute. Only four SciDAC institutes were funded nationwide. Led by Old Dominion University (ODU), the CSCAPES Institute is a collaboration among researchers at ODU, Sandia National Laboratories, Argonne National Laboratory, The Ohio State University and Colorado State University. The era of petascale computing is looming and has enormous potential for scientific simulation, but it also presents challenges. Petascale machines are likely to have hundreds of thousands of processors, complex memory hierarchies and relatively poor network performance. Scientific applications that will run on these machines will involve complex multiscale or multiphase physics, adaptive meshes and sophisticated numerical methods. Harnessing the potential of high-end computers to solve such complex problems is a challenge that the CSCAPES Institute is addressing. The Institute aims to develop and deploy fundamental technologies in high-performance computing. It will work with other SciDAC research groups to integrate software tools into other codes.
|
|