Best View : FireFox v.3 - 1280 x 800
Soliloquist's Blog : All Conversations are One-Way.

My Favourite Posts, Pick One :

Sunday, December 24, 2006

WAVE, MP3, and MIDI, what's the difference?

For so long, I’ve been curious about how to convert an MP3 file into a MIDI file so that it would occupy lesser space on my harddisk, my cellphone, or internet bandwidth. I like to edit sound files, combine them, mix and strip them, convert into WMA, MP3, OGG, or else. But it would be so ridiculous when I don’t get the very basic concepts about it all. So here it is. Hope this post can teach us something useful. Hehe.. Don’t hesitate to ask or comment.


For sound you need an instrument and a 'musician' (in the broadest sense of these words). If you like to hear a piano sound, you need to have a piano and someone playing it. If you love the sound of breaking glass, the neighbours window will do fine as an instrument and their son, throwing a baseball very wide, could be the great musician to satisfy you ;-).

Let's stick to the piano. When the pianist is playing the piano, little hammers strike the piano strings. The strings start to vibrate and make the air-molecules around them vibrate. The air-molecules pass this on to other air-molecules, until finally this vibration of air hits your eardrums. Sound is how you (your brains) interpret this vibration.

The continuous 'flow of vibrations' from the sound source (to your ears) is called the sound wave, which can be represented on paper (or on screen) as a wavy line, although this line would be very fanciful when it represents 'normal' sound.

When two or more instruments are played at the same time, all the vibrations coming from those instruments will be mixed (in the air), so there still will only be ONE sound wave. (Even more complex than from the piano alone).


When you want to save sound, so you can hear it later, you can record the sound wave in several possible ways.
In the past people could only save sound in an analogous way. The sound wave was 'printed' on a tape or a (vinyl) disk. A bit similar to the way you would draw a shaky, wavy line on paper.
Nowadays it's possible to record sound in a digital way. Therefore the sound wave is 'cut' into thin slices, called samples. Each of these samples gets a value, depending on its position on the 'wavy line'. This way we 'convert' an analogue sound wave into a string of values. But don't forget, this long string of numbers (values) still represents the sound wave, the vibration of the air. No more, no less. This string of numbers, that can be stored on CD, Harddisk or Tape is called a WAVE file.

Two things are important in this process; the sample rate and the sample value.
The sample rate tells you how many samples are taking per second of sound, i.e. in how many slices a second of sound wave is cut. More samples/second (thinner slices) mean a better preservation of sound quality. A typical sample rate for real good quality is 44.100 samples per second.
Then these samples will be given a value. To be able to make a good distinction between the various samples you need a broad range of numbers.

Think about the athletes that run the 100 metres. If we could only measure their time in full seconds, the numbers 1 through 16 would be sufficient. The good ones would all do the 100 metres in 10 seconds, ergo they would all be world champion. Since we don't want that, we measure their time in thousandth of seconds, which gives us a broad range of 16.000 numbers (in 16 secs) to make a good distinction between the athletes.

We need something in that same order when we assign values to samples. Since computers work with bytes and 1 byte (256 numbers) is not really enough for reasons of quality, we use 2 bytes per sample, which gives us 65.536 numbers to choose from.

Now you also know, why (quality) WAVE files are huge. For one second of sound you need 44.100 x 2 = 88.200 bytes and that is just one channel. For stereo you have to double that of course, which brings you at a total of 176.400 bytes for one second of sound. A minute of sound will cost you roughly 10,5 megabytes.


MP3 is the file extension for MPEG Audio compressed files. The .mp3 files are WAVE files, but they are compressed in a very special way. Maybe you have heard of file compression methods or maybe you even use a program like PKZIP or WINZIP to make .zip files yourself. This however is a completely different compression method.

When you compress a file and turn it into a .zip file, nothing is left out. It's a method to save ALL data in a smart way using less space. There are lots of possibilities to do that, but let me give you one very simple example.

When there are 40 dashes in a standard file, they are written as:
taking 40 bytes of space.

Another way of writing these 40 dashes is: 40x- (40 times -) which only takes 4 bytes of space. The compression ratio in this example is 10:1, which is, as you will understand, quite exceptional and certainly not the average for a complete file.

The advantage is, ALL data is still there, although the file takes up less space. The downside is, a .zip file has to be 'unzipped' before you can use it, which means that (after 'unzipping' it) it will take up the same amount of space as it did before it was 'zipped'. In addition, 'zipping' a WAVE file will not bring you very much. A compression ratio of 2:1 at the most.

The compression method that is used to make .mp3 files is totally different. In this method some things are actually left out, but in a very smart way, so you won't notice (hear) it. Information that is not important will be stripped. Based on the research of human perception the encoder decides what information is important and what can be discarded.

When a sound wave hits your eardrums, the incoming data is analyzed by your brain. The brain interprets the sound and filters out irrelevant information, which means you just don't hear everything that is in the sound wave.

Another simple example:
You're listening to the James Blunt’s using your headphones. Now turn off the walkman. You can hear everything that's going on around you. The headphones over (or in ;-) your ears do not really block the sound that is coming from the 'outside'. Turn the walkman back on and listen to the James Blunt’s again. This time you won't hear 'outside' sounds, although they're still there. The music on your headphones is so loud in comparison to the 'outside' sound, that this 'outside' sound is filtered out by your brain.

MPEG Audio compression does this job for you. It's called "perceptual coding." This is quite clever, because the information that would be stripped by your own "brain-filter" anyway, no longer needs to occupy hard disk space or internet bandwidth. You have to be a bit careful though, because if you encode at a very strong compression rate, MPEG also strips information that is audible, but with 'light' compression (up to a ratio of approximately 12:1) you won't hear the difference between the .mp3 file and the uncompressed original. Compression rates of 12:1 without loosing quality are pretty normal for MPEG Audio compression.
The disadvantage of MPEG Audio compression is, that there is a lot of processing power required to encode and play files.


Let's go back to the pianist we met in the section about WAVE. We see him play the piano ('commanding' the piano) and we hear the sound. We already saw, that we can record this sound. (see WAVE)
Suppose I don't like the piano player and I want to get rid of him (for whatever reason), but I still like to hear that piano play the tunes. In that case I must record the actions ('commands') of the piano player and find a way to execute these 'commands' upon the piano. Well, they thought of a thing like this ages ago and developed the player piano, also called pianola. The 'commands' of the pianist were recorded on a roll of paper (the piano roll) by punching holes in the paper at exactly the right places. That way a 'smart' mechanism could play the piano. These piano rolls, representing a sequence of 'commands', are in a way the first MIDI files.

Todays techniques give us many more possibilities and we don't need the roll of paper anymore, but the idea is about the same. In a MIDI file we record (lay down) all the 'commands' of the musicians playing their instrument. So there is no sound in a MIDI file, there are only 'commands'. In MIDI these 'commands' are called messages or events.

How can I convert a WAVE file to a MIDI file?

If the question is: Is it possible to have the computer 'convert' a WAVE file to a MIDI file in such a way that the MIDI file (when played back) sounds like the original WAVE file? The answer is: NO!
These are completely different concepts. It's like asking: How can I convert a cake back into 'the separate operations of the baker' AND 'the original ingredients (eggs, sugar, butter, flower, etc)'?
A MIDI file is a sequence of commands to control one or more pieces of equipment (synthesizers most of the time). These commands are not sounds, they are recorded operations to DO something (mostly to GENERATE sound).

A WAVE file IS sound. It is the recording of a sound wave. It is the mix of all the given things (instruments, voices, background noises) you could have heard at the moment of recording. A lot of info (in fact almost all of it), that you need for a MIDI file, is lost. Like with the cake. When the cake is at your table, all data about the baking process is gone.

There is a lot of discussion going on (continuously) about WAV-to- MID conversion, done by computer/software. Don't be confused by people, who say it can be done or that it is (should be) possible. You'll hear all kind of academic twaddle in this respect. Like FFT, one of the most popular buzzwords (which by the way stands for Fast Fourier Transform) or some other kind of fancy gobbledygook. The problem is a lot harder, than these theorists like you to believe.

For people some sounds sound as music. We can like the sound of 50 musicians playing 50 instruments at the same time, because for us humans, the notes that are played by these 50 musicians are related in some way. To us it's music, to a computer it's just noise. Because of this 'relation' between instruments, that we humans hear in music, we can distinguish separate instruments (or instrument groups like violins) more or less. I say more or less, because when the orchestra of 50 musicians is playing at full strength, it will be impossible to pick out all the 'moves' of every individual instrument. Nevertheless, our ability to discern the different instruments fairly good in general, enables us to 'translate' a piece of music (by just listening to it) into a MIDI file. A MIDI file that, on play back, can come close to the original piece.

A computer (program) does not have that ability, that sense. It can not even distinguish music from noise. To the computer (program) it's just sound and we ask it to unravel that. If you'd like to know what that means, try to imagine the following: There are 50 musicians on stage, all having hearing protection so they can't hear each other. They all start playing a different piece of music at the same time. Do you have any idea how that sounds? It's still only those 50 musicians you liked so much before, but do you think you could make a MIDI file out of it this time?

1 comment:

  1. I have been using Midi Samples for a while now, I used to use WAV files but it find that midi files are much more compatible with most music software and allow a great about customisation or editing in the software.