Programming Musical Instrument Emulators?

Can someone provide me information pertaining to programming musical instrument emulators. As an example, see here (Smule's Ocarina app for the iPhone).

I am to find sufficient information on this topic. Running with the ocarina app as an example, how are the individual notes produced? Since the results are based on strength of breath and which "holes" are held down, some of it must be handled programmatically, but is the whole sound generated programmatically or would it use a sound sample on the back-end and modify that (or those, if multiple samples are used)?

Are there any resources on this topic? All of my searches come up with information on how to play music (just standard audio) or how to make music (in music editing software), but none on how to do what is shown in that video.

Responses needn't be strictly related to ocarinas, though I wouldn't mind if they were.

Solution

That particular musical instrument sounds to me like it's a fairly simple synthesis module, based perhaps on a square wave or FM, with a reverb filter tacked on. So I'm guessing it's artifically generated sound all the way down. If you were going to build one of these instruments yourself, you could use a sample set as your basis instead if you wished. There's another possibility I'm going to mention a ways below.

Dealing with breath input: The breath input is generally translated to a value that represents the air pressure on the input microphone. This can be done by taking small chunks of the input audio signal and calculating the peak or RMS of each chunk. I prefer RMS, which is calculated by something like:

int BUFFER_SIZE = 1024; // just for purposes of this example
float buffer[BUFFER_SIZE]; // 1 channel of float samples between -1.0 and 1.0
float rms = 0.0f;
for (int i=0; i<BUFFER_SIZE; ++i) {
    rms += buffer[i]*buffer[i];
}
rms = sqrt(rms/BUFFER_SIZE);

In MIDI, this value is usually transmitted on the channel CC2 as a value between 0 and 127. That value is then used to continually control the volume of the output sound. (On the iPhone, MIDI may or may not be used internally, but the concept's the same. I'll call this value CC2 from here on out regardless.)

Dealing with key presses: The key presses in this case are probably just mapped directly to the notes that they correspond to. These would then be sent as new note events to the instrument. I don't think there's any fancy modeling there.

Other forms of control: The Ocarina instrument uses the tilt of the iPhone to control vibrato frequency and volume. This is usually modeled simply by a low-frequency oscillator (LFO) that's scaled, offset, and multiplied with the output of the rest of your instrument to produce a fluttering volume effect. It can also be used to control the pitch of your instrument, where it will cause the pitch to fluctuate. (This can be hard to do right if you're working with samples, but relatively easy if you're using waveforms.) Fancy MIDI wind controllers also track finger pressure and bite-down pressure, and can expose those as parameters for you to shape your sound with as well.

Breath instruments 201: There are some tricks that people pull to make sounds more expressive when they are controlled by a breath controller:

Make sure that your output is only playing one note at a time; switching to a new note automatically ends the previous note.
Make sure that the volume from the old note to the new note remains smooth if the breath pressure is constant and the key presses are connected. This allows you to distinguish between legato playing and detached playing.

Breath instruments 301: And then we get to the fun stuff: how to simulate overblowing, timbre change, partial fingering, etc. like a real wind instrument can do. There are several approaches I can think of here:

Mix in the sound of the breath input itself, perhaps filtered in some way, to impart a natural chiff or breathiness to your sound.
Use crossfading between velocity layers to transform the sound at high velocities into an entirely different sound. In other words, you literally fade out the old sound while you're fading in the new sound; they're playing the same pitch, but the new tonal characteristics of the new sound will make themselves gradually apparent.
Use a complex sound with plenty of high-frequency components. Hook up a low-pass filter whose cutoff frequency is controlled by CC2. Have the cutoff frequency increase as the value of CC2 increases. This can increase the high frequency content in an interesting way as you blow harder on the input.
The hard-core way to do this is called physical modeling. It involves creating a detailed mathematical model of the physical behavior of the instrument you're trying to emulate. Doing this can give you a quite realistic overblowing effect, and it can capture many subtle effects of how the breath input and fingering shape the sound. There's a quick overview of this approach at Princeton's Sound Lab and a sample instrument to poke at in the STK C++ library – but be warned, it's not for the mathematically faint of heart!