Generating Audio UI
every time i do linear audio (i.e. create a quicktime soundtrack), i always have the same reaction: "this is SO much easier than doing interactive audio!" i envy the hollywood SFX guys their ability to craft a sequence of sounds to tell a story, convey an emotion, or rock you outa your chair ...
but it got me thinking about how to generate an Audio UI that sounds more like the ones in the movies.

here's the thing: everytime you see/hear anybody using a computer, on big screen or small, there's always an Audio UI, and it always sounds great! it's completely complimentary to the task being performed, indeed, is specifically designed to inform the viewer "fingerprint matched" or "displaying and highlighting requested data" or "prepare for jump to hyperspace". it's also rendered in high resolution by magical speakers, perfectly mixed and balanced for the ambient environment, and never gets in the way of the dialog.
it's like that woody allen - annie hall - marshall mcluhan joke:

"boy, if life were only like this!"
in the real world, alas, it don't work like that. most computers (if the system sounds are enabled at all) basically go "fifth up = good, fifth down = bad" -- they're like the incredible hulk: extremely powerful, limited vocabulary.
even mobile devices, which tend to have more sophisticated Audio UIs (i.e. ringtones), are still pretty simple ... and not just phones, anymore. i recently picked up a lovely red Cybershot camera, which sports a touchscreen AND a very nice, very transparent, very Sony, set of system sounds.
the more complex yet amazingly harmonious Sidekick Audio UI (he said, blowing on his fingertips) actually only consists of about 16 different sounds, plus a few standards (like "sad clown" low-battery). i get some extra mileage using combinations, but it's still "one event = one audio file," again and again and again, so it's no surprise that many customers run silent (except for the ringtones, of course).
it's the curse of video game repetition!! it's space invaders circa 1981, and every time you hit Fire!, it plays that same exact low-rez "lazer" waveform, over and over and over and over and over and AMAN! that's why it's so annoying!!!
game audio guys will do whatever's necessary to avoid that kind of repetition, and numerous techniques have been developed to prevent it. when you run around in grand theft auto or halo, you don't hear the same two footstep samples over and over -- the footfall sounds change depending on many different factors (speed, weight, terrain, material, etc etc etc).
in fact, since you must vary the sounds over time, how to do it becomes "a big question with a short answer" => you vary them "appropriately".
you can vary them in a way that conveys information to the user about game states (now i'm running in boots over concrete), or survival tactics (there's an enemy in the grass over there). you can vary them so they sound more like the natural world (each cricket chirp slightly different than the others). you can vary them completely at random, simply to increase the amount of variation ...
SO (and here comes the big leap) how about if you did the exact same thing for Audio UI?
how about if you generated sounds based on the context of your interaction with the device. for example, instead of that same frackin' doorslam whenever someone logs off AOL IM, the system could generate an "exit" sound based on the guy's screen name ...
how about varying the sound of typing using a stochastic algorithm, so instead of playing the same beep for every keystroke, the device played subtle melodies based on what you were texting. you could vary the pitch, the volume ("exclamation point" louder than "comma"), the length ("period" shorter than "underscore"), by letter, by modifier key, by phase of the moon ... or by all of the above.
that way, the Audio UI becomes more like a game soundtrack: never played the same way twice, always changing based on user interaction -- but that's just the beginning ...
imagine if the system could predict what you'd *expect* to hear in a given situation. let's say you command-F a word on a web page: if found, generate a "there it is" sound based on the search criteria; if not, generate a "hmm, not there" sound ... based on the same criteria. the two sounds would be different but complimentary.
imagine if the system knew you were displaying a set of responses to a query: it could then play the little "plink plink plink" as the entries appeared, like they always do in the movies. you could then be fairly sure that the next "enter" would actually mean "select this item," and so the system could generate a "selected" sound, instead of just playing the "pushed a button" beep. here's one of my favorite examples of this:

The Governator selects the appropriate response to an inquiry
(note the horrible things YouTube compression does to high frequency UI FX)
and imagine if your Audio UI listened to and sampled it's own audio environment, the way mockingbird calls mimic car horns and chainsaws. in fact, as i write this, a mockingbird in my yard is doing his "listen to me! my song is good! my genes are strong!" mating ritual, and wow, talk about audio inventiveness and musical variation! gimme an Audio UI that is one tenth as fun to listen to as that, and i'll be a happy camper ...
- pdx
Categories
AudioRead More Entries by Peter Drescher.

The first thing I do when I install a new application is to switch off ALL user interface sounds. I find them distracting, annoying and often overly loud.
Now, this is partly due to the fact that they're badly produced, implemented or just plain 'wrong', but also because I don't WANT sonic feedback when using computers.
I am a sound designer in the games industry, and the irony that struck me when reading your article was that pretty much the ONLY sound effect we purposely don't put randomisation on; is the UI sounds. There are many reasons for this, including the conventions that have been set by existing UI sounds, but also because people require familiarity with UI sounds to associate them with their actions so we need consistancy. Also, they are often tonal based and by putting pitch variation you are opening up a hornets nest where things sound either dissonant or you are forced to put things in a scale, further complicating things when playing alongside existing audio.
Having said that, I agree that things could be more context sensitive. A 'select' sound can be the same every time... but wouldn't it be better if it was responsive and representative of what you were selecting, for example.
Unfortunately, sound has been considered an after-thought by most media (film, tv, games) but in things such as user interfaces there just isn't the attention given to it at all, which is a shame as audio can be such a great tool. A missed opportunity i feel.
I'll still keep my UI sounds switched off for now though :D
mr riley,
i'm glad you like the sidekick audio UI, and of course, others do as well (including janet jackson, as noted in a previous blog) ... but i wonder what percentage of sidekick users have your obvious good taste in audio. certainly, the system sounds can be completely annoying to the guy sitting next to you, and running in stealth mode is thus more polite (and easier to get away with) during meetings.
nevertheless, people expect computers, cell phones, and consoles, to make some sort of noise during use, in part because that's what they do in the movies! these days, a device with NO audio UI would seem quaint, like a phone with no screen.
but getting an interactive audio system (many files, played in various ways, based on user input) to sound like a movie soundtrack (one file, played sequentially, synced to picture) is a challenge my game audio brethren struggle with everyday. i just want to use what they've learned to make the UI a little more fun to listen to ...
The Newton sort of did this for tap sounds, which repeated a 17-note pattern..
And I leave the sounds on with my Sidekick; they're great. One thing I especially find it useful for is when deleting a large number of email messages; it's easier for me to count sounds than watch the screen. It doesn't hurt that people around me think I'm playing a video game instead of processing email. :-)