Protein classification using texture descriptors extracted from the protein backbone image


In this work we propose a method for protein classification that combines different texture descriptors extracted from the 2-D distance matrix obtained from the 3-D tertiary structure of a given protein. Instead of considering all atoms in the protein, the distance matrix is calculated by considering only those atoms that belong to the protein backbone. The positive results reported in this paper offer further experimental confirmation that the distance matrix contains sufficient information for describing a protein. Moreover, we show that combining features extracted from the primary structure with features extracted from the distance matrix increases the performance of our classification system. We demonstrate this finding by comparing the performance of an ensemble of classifiers that uses the combined features. The classifiers used in our experiments are support vector machines and random subspace of support vector machines. The experimental results, validated using three different datasets (protein fold recognition, DNA-binding proteins recognition, biological processes and molecular functions recognition) along with different texture feature extraction methods (variants of local binary patterns, radon feature transform based approaches, and Haralick descriptors) demonstrate the effectiveness of the proposed approach. Particularly interesting are the results in the classification of 27 types of structural properties: our proposed approach achieves significant improvement compared with other reported methods.

Keywords protein classification; texture descriptors; primary structure; local binary patterns; Radon transform; Haralick features; support vector machines.

[full paper]