Bird and whale species identification using sound images


Image identification of animals is mostly centered on identifying them based on their appearance, but there are other ways images can be used to identify animals, including by representing the sounds they make with images. In this paper, we present a novel and effective approach for automated identification of birds and whales using some of the best texture descriptors in the computer vision literature. The visual features of sounds are built starting from the audio file and are taken from images constructed from different spectrograms and from harmonic and percussion images. These images are divided into subwindows from which sets of texture descriptors are extracted. The experiments reported in this paper using a dataset of Bird vocalizations targeted for specie recognition and a dataset of right whale calls targeted for whale detection (as well as three well-known benchmarks for music genre classification) demonstrate that the fusion of different texture features enhances performance. The experiments also demonstrate that the fusion of different texture features with audio features is not only comparable with existing audio signal approaches but statistically improves some of the stand-alone audio features. The code for the experiments will be publicly available at

Keywords audio classification, animal identification, texture, image processing, acoustic features, ensemble of classifiers, pattern recognition.