The Brain-Image Database Project:
Morphometrics Research




Goals

Given medical images (structure, S) and corresponding clinical (function, F) variables:

Methods



Generate RAVENS maps, and collect clinical data (function variables), for each subject
Discretize images to reflect dilation or contraction at each voxel; convert function variables to categorical variables
Examine voxels for assications with the function variable(s) across subjects
Cluster voxels based on similar structure-function conditional-probability distributions


Our approach to morphological analysis is based on a Bayesian-network representation of multivariate morphology-function associations. Preprocessing generates registered maps; these maps indicate voxelwise volumetric changes relative to a (registration) standard. Since algorithms that generate Bayesian networks from a database require that either all variables be Gaussian or that they all be discrete, and given concerns in the literature about the distribution of residual spatial variability after high-dimensional warping, we discretize the clinical and voxel data, and obtain categorical maps and categorical clinical information. In the implementation described here, we discretized voxels into one of two classes: dilated and contracted. We could, however, discretize into three or more classes, for example, including unchanged as a third class; our approach does not require that categorical variables be binary. In addition, although we consider here the presence of one clinical variable for the sake of clarity, our approach has no inherent limitation on the number of clinical or voxel variables.

Given the categorical information described above, consisting of voxelwise morphological changes and corresponding clinical variables of interest, we can employ any of several methods for generating a Bayesian network from a database. Given the vast number of variables potentially involved, we designed a heuristic algorithm for detecting voxels v for which a particular function (clinical) variable f, has the same conditional-probability distribution given v. We first examine each voxel variable for a possible association with a particular clinical variable (e.g., presence or absence of schizophrenia); the voxel variable with the strongest association with the clinical variable is called the leader variable, and add this association to the Bayesian network. We then search for all voxel variables that are associated with the clinical variable, yet are rendered conditionally independent of the clinical variable given the leader variable; this step groups voxels that potentially induce similar conditional-probability distributions for f, when the clinical variable is conditioned on any one of these voxels.

Here is a more detailed description of the algorithm:


We then apply one of two methods for determining whether these conditional-probability distributions are, in fact, similar; we call the first method Bayesian thresholding (BT), and the second method Bayesian clustering (BC). BT applies a threshold difference to a candidate distribution; if that distribution is within that difference of the conditional distribution of the leader variable given the clinical variable, then the distributions are considered equivalent, and the voxels are clustered. In contrast to BT, BC does not require a threshold; this approach is based on a method for latent-variable induction for Bayesian networks. The induced variable is a class indicator, and effectively labels voxels as being similar to the leader variable or not. A third approach, which we have not explored yet, consists of applying a multivariate clustering algorithm to the conditional-probability distributions, the goal being to find distributions that are similar to each other, which would indicate that the corresponding voxels have similar associations to the clinical variable. A fourth approach, promising theoretically but not implemented, is based on modeling the metadistribution over conditional-probability distributions as a Dirichlet distribution, as was done for derivation of the K2 algorithm.

After removing the leader variable and all other voxel variables found to have similar associations with the clinical variable, we repeat the search for another leader variable, until no more leader variables exist, at which point the algorithm returns a group of voxels (a region of interest) for each leader variable. Note that we have not mentioned spatial constraints in this description. Although there are implicit spatial constraints during image-processing steps (due to smoothing, for example), there are no such requirements during the data-mining process.

We have begun to evaluate these methods using BLSA data and simulated data. We have already constructed two simulated data sets, one containing a simple atrophy-deficit association, and one containing a nonlinear association among morphological changes in two noncontiguous regions and clinical deficit, to evaluate these methods. We have also applied these methods to cross-sectional BLSA data, and will apply them to longitudinal BLSA data, providing a more difficult challenge, since longitudinal morphological changes are more subtle than cross-sectional changes in the BLSA.

This file was last modified