I used pymavlink to produce an easily-readable format, and then wrote a program in Scala to produce a *.wav file. The most tedious part is to interpolate the data from the log entities to produce the waveform (straightforward and works well even if you interpolate linearly). I am sure an easier way exists for the second part, but I generally hate using libraries for small problems.
One non-obvious thing is to use the right data source from the logs. The most detailed data is currently available in a form of batch logging, in entries ISBH/ISBD, but those log the sensors in a circular way with a period of one second, so it does not make a coherent sound picture. The better source is VSTB entries, aka video stabilization logs, which are streamed all the time, but the sampling frequency is lower. It is enough, though, to hear what is happening