Combining multiple approaches for gene microarray classification

[abstract]

Motivation: The microarray report measures the expressions of tens of thousands of genes, producing a feature vector that is high in dimensionality and that contains much irrelevant information. This dimensionality degrades classification performance. Moreover, datasets typically contain few samples for training, leading to the "curse of dimensionality" problem. It is essential, therefore, to find good methods for reducing the size of the feature set.

Results: In this paper, we propose a method for gene microarray classification that combines different feature reduction approaches for improving classification performance. Using a support vector machine (SVM) as our classifier, we examine an SVM trained using a set of selected genes; an SVM trained using the feature set obtained by Neighborhood Preserving Embedding (NPE) feature transform; a set of SVMs trained using a set of orthogonal wavelet coefficients of different wavelet mothers; a set of SVMs trained using texture descriptors extracted from the microarray, considering it as an image; and an ensemble that combines the best feature extraction methods listed above. The positive results reported offer confirmation that combining different features extraction methods greatly enhances system performance. The experiments were performed using several different datasets, and our results (expressed as both accuracy and area under the ROC curve) show the goodness of the proposed approach with respect to the state of the art.

Availability: The MATHLAB code of the proposed approach is publicly available at bias.csr.unibo.it/nanni/micro.rar.

Keywords:

[full paper]