Monday, October 5, 2009

Using pylucene to index audio files

Lucene is a quite efficient full-text indexing solution. I tried to use it to index my audio file tags to be able to launch mplayer or command line audio player without having to use complex and time consuming 'find' command to build playlists.

Here is a quick'n'dirty solution:


The search function is quite simple too:



A quick demo:
Indexing music database:

time ./tsearch.py index /home/fv/music/SANE /home/fv/musicindex
16701 files indexed 
Done 16702
./tsearch.py index /home/fv/music/SANE /home/fv/musicindex  21,67s user 10,18s system 3% cpu 14:14,36 total

Searching:

time ./tsearch.py search "love OR hate" /home/fv/musicindex > playlist.m3u
./tsearch.py search "love OR hate" /home/fv/musicindex  0,52s user 0,09s system 14% cpu 4,310 total

A more complex search:


time ./tsearch search "love OR hate OR (title: rain in blood AND artist: slayer)" /home/fv/musicindex > playlist.m3u
./tsearch.py search  /home/fv/musicindex  0,48s user 0,07s system 74% cpu 0,730 total

Although this code sample is not perfect but consider it more as a proof of concept than a ready to use solution.

The full code:



The next step is to clean my audio files tags by retrieving tags from Last.fm webservice and put them in the "genre" tag. Then automatically retrieve songs lyrics and index it using the same method.