Birds Part One
I decided to have a crack at an App to identify birds from their song. The idea is to capture lots of songs, train a machine learning algorithm with them and use the model to predict which bird sang a new song sample.
To start off I had a wander along the river capturing some bird song on my phone. This gave me a mono audio file which I loaded up in an audio editor.
Good machine learning requires your training data to have good features. This means you must give the algorithm meaningful signals to look at, rather than throwing all the data you can find at it and hoping for the best.
Bird song is made up of a series of different pitched sounds of varying volume at different intervals. The waveform of the audio gives some indication of the length and volume of each sound, but not the pitch. For identifying birds I think the pitch and length of the sounds will be important. The volume will probably be less useful since it will depend on how far away the bird is (though this may be useful to determining one bird from another).
A spectrogram shows the pitch (or frequency) of the sound (the y axis), as well as its length (the x axis) and the volume (the intensity of the colour). It actually shows the smooth changes in pitch of the bird song. Looking through the recording its easy to tell the difference between the call of a swift and that of a robin by the shape of the lines on the spectrogram. It also allows you to distinguish other sounds (like low frequency wind or road noise) with the potential to remove them. An isolated image of the spectrogram of the song converted to a black and white bitmap could be a good thing to feed into a learning algorithm as training data.
Buoyed by this success I found a free library for doing Fast Fourier Transforms and whipped up some quick code to create spectrograms from audio. There is scope of eking out some extra detail but after very little tweaking I was quite pleased with the results:
The call of a robin
The call of a swift
There are still quite a few problems to overcome, such as
- detecting when a call starts and stops
- isolating the calls of individual birds
- getting enough training data for the classifier
- correctly labelling the training data (I am no bird expert)
- whether everything can run well enough on a phone
It's a promising start though. Watch this space for more bird related excitement.