CNARI

Computational Neuroscience Applications Research Infrastructure

Human Neuroscience Laboratory
Department of Neurology
Department of Psychology
Computation Institute

The University of Chicago

Summary

Bioinformatics Infrastructure for Large Scale Studies of Aphasia Recovery: Large prospective studies of aphasia recovery that incorporate anatomical, physiological, and behavioral data are virtually non-existent. This has a significant impact on virtually all research into the diagnosis, prognosis, and treatment of aphasia, since we do not know the natural course of the disease, and thus cannot adequately inform patients and families or assess the effects of therapeutic interventions. We believe that the complexities of data management, particularly regarding anatomical and physiological data, represent a major stumbling block to the design and execution of such studies. With such diverse sources of information as demographic and medical data, cognitive and linguistic test results, electrophysiological recordings, and many types of brain images, it is hard enough to perform single case studies that attempt to relate these data to each other, let alone studies that include statistically meaningful numbers of participants. Even when the problem is restricted to a single data type, such as functional MRI data, we do not have the ability to scale up the methods used in individual subjects to larger groups. Both the large volume of data and the complexity of data processing cause difficulties. We thus propose to build computational infrastructure (R21 phase) to facilitate the prospective investigation of aphasia recovery (R33 phase). The infrastructure is based on the use of (a) database technology to represent diverse data types within a single representational framework; and (b) "grid" computing to distribute data and data processing over many storage devices and computers, using software developed in federally (NSF) funded basic computational research that allows investigators to express complex data processing algorithms in a convenient manner. The longitudinal aphasia study will use structural and functional MRI and diffusion tensor imaging, along with language and cognitive measures, to characterize the natural course of physiological and behavioral recovery from aphasia. The physiology of recovery will be quantified in neural network models of individual patient imaging data and their mathematical "fit" to normative templates derived from imaging data on healthy age-matched adults. The changes in these models over time will be related to the behavioral changes to construct a theory of recovery. The computational infrastructure will provide the means to encode the diverse types of data needed for aphasia recovery research in such a way that complex queries involving multiple data types (e.g., brain activation and language performance) can be retrieved easily, and that queries requiring significant computer processing (e.g., peak detection in imaging time series) can be answered quickly due to grid computing. Finally, this infrastructure and data will be shared, and a user of the system from virtually anywhere could pose such questions using the relational database query interface.

Funding

NATIONAL INSTITUTE ON DEAFNESS AND OTHER COMMUNICATION DISORDERS
Grant Number: 1R21DC008638-01
Project Title: Bioinformatics Infrastructure for Large Scale Studies of Aphasia Recovery

Additional Support
The Swift System underlying the CNARI infrastructure is supported by the NATIONAL SCIENCE FOUNDATION under Grant OCI-0721939.
Applications of CNARI to motor stroke recovery and advanced computational techniques is supported by JAMES S. MCDONNELL FOUNDATION.

Members

Uri Hasson
Mike Wilde

Tibi Stef-Praun
Sarah Kenny

Steve Small
Ian Foster

Links

Email to all CNARI members and associates: cnari@ci.uchicago.edu

Participate in the CNARI Wiki

R21 Final Report 23 June 2008

Publications

Accelerating Medical Research using the Swift Workflow System

Stef-Praun, T., Clifford, B., Foster, I., Hasson, U., Hategan, M., Small, S. L., Wilde, M., & Zhao, Y. (2007). Accelerating Medical Research using the Swift Workflow System. Studies in Health Technology and Informatics, 126, 207-216. (download full paper pdf)

Abstract. Both medical research and clinical practice are starting to involve large quantities of data and to require large-scale computation, as a result of the digitization of many areas of medicine. For example, in brain research -- the domain that we consider here -- a single research study may require the repeated processing, using computationally demanding and complex applications, of thousands of files corresponding to hundreds of functional MRI studies. Execution efficiency demands the use of parallel or distributed computing, but few medical researchers have the time or expertise to write the necessary parallel programs.
The Swift system addresses these concerns. A simple scripting language, SwiftScript, provides for the concise high-level specification of workflows that invoke various application programs on potentially large quantities of data. The Swift engine provides for the efficient execution of these workflows on sequential computers, parallel computers, and/or distributed grids that federate the computing resources of many sites. Last but not least, the Swift provenance catalog keeps track of all actions performed, addressing vital bookkeeping functions that so often cause difficulties in large computations.
To illustrate the use of Swift for medical research, we describe its use for the analysis of functional MRI data as part of a research project examining the neurological mechanisms of recovery from aphasia after stroke. We show how SwiftScript is used to encode an application workflow, and present performance results that demonstrate our ability to achieve significant speedups on both a local parallel computing cluster and multiple parallel clusters at distributed sites.

Improving the Analysis, Storage and Sharing of Imaging Data using Relational Databases and Computing Clusters

Hasson, U., Skipper, J. I., Wilde, M. J., Nusbaum, H. C., & Small, S. L. (2008). Improving the analysis, storage and sharing of neuroimaging data using relational databases and distributed computing. Neuroimage, 39(2), 693-706. (download full paper pdf)

Abstract. The increasingly complex research questions addressed by imaging research impose substantial demands on computational infrastructures. These infrastructures need to support management of massive amounts of data in a way that affords rapid and precise data analysis, to allow collaborative research, and to achieve these aims securely and with minimum management overhead. Here we present an approach that overcomes many current limitations in data analysis and data sharing. This approach is based on open source database management systems that support complex data queries as an integral part of data analysis, flexible data sharing, and parallel and distributed data processing using cluster computing and Grid computing resources. We assess the strengths of these approaches as compared to current frameworks based on flat file storage. We then describe in detail the implementation of such a system at the University of Chicago, and provide a concrete description of how it was used to enable a complex analysis of fMRI time series data.

Analyzing neuroimaging data in Grid environments using relational databases and a dedicated workflow language

Hasson, U., Andric, M., Kenny, S., Wilde, M. J., & Small, S. L. (2008). Analyzing neuroimaging data in Grid environments using relational databases and a dedicated workflow language. To be presented at the First INCF Congress of NeuroInformatics, Stockholm, Sweden, September 2008.

Short Abstract: Neuroimaging research imposes substantial demands on computational infrastructures. Already, these infrastructures need to support management of massive amounts of data while at the same time affording rapid analysis, access to highly specific subsets of data, and secure remote access for collaborators. We have recently described an architecture that is used in practice to achieve these goals, which relies on distributed database management systems. Here we present two recently introduced and central components of this system: (a) an extension of this architecture that utilizes the SWIFT workflow system to enable analysis of DBMS-stored neuroimaging data on GRID sites (e.g., TeraGrid), and (b) an interface between this system and commonly used neuroimaging GUIs (AFNI, SUMA) that makes it possible to immediately graphically depict the results of databases queries.(download full abstract pdf)


small@uchicago.edu