Wavelet images and chou's pseudo amino acid composition for protein classification


The last decade has seen an explosion in the collection of protein data. To actualized the potential offered by this wealth of data, it is important to develop machine systems capable of classifying and extracting features from proteins. Reliable machine systems for protein classification offers many benefits, including the promise of finding novel drugs and vaccines. In developing our system, we analyze and compare several feature extraction methods used in protein classification that are based on the calculation of texture descriptors starting from a wavelet representation of the protein. We then feed these texture-based representations of the protein into an Adaboost ensemble of neural network or a support vector machine classifier. In addition, we perform experiments that combine our feature extraction methods with a standard method that is based on the Chou's pseudo amino acid composition. Using several datasets, we show that our best approach outperforms standard methods.

The Matlab code of the proposed protein descriptors is available at bias.csr.unibo.it/nanni/wave.rar.

Keywords Proteins classification; machine learning; ensemble of classifiers, support vector machines.

[full paper]