The Digital Ear

Audio-to-MIDI Conversion Software - The Digital Ear

Overview

Recently, I became frustrated with my Creative AWE64 card and MIDI scoring software.  I was disappointed by the huge difference between human and (untweaked) computer performance, especially for breath and bowed instruments.  So I looked into ways to spice things up.  I experimented with various controllers, including breath and pitch wheel.  I also delved into software techniques involving Cakewalk CAL scripts,  CSound programming,  and Superconductor.   However, I found that all of these approaches are quite limiting. They are simply either too difficult and/or time consuming to use effectively or they do not deliver the quality I am after. Eventually, I became aware that the subtle nuance I wanted when realizing my compositions could be attained (easily) only through recording live human performance.  And it could only be captured to MIDI through the use of sophisticated (and thus expensive) MIDI-enabled instruments. But this was at direct odds with my whole reason for choosing to compose and realize music with personal computer, cheap sound card, and scoring software: I wanted an inexpensive yet powerful system for making and distributing music.  And I wanted one that did not require massive digital storage, organizing a group of musicians, maintaining a practice space, or any of the other real-world inconveniences that such an endeavor typically imposes.  Then I did a bit more thinking, and I had a wonderful insight: vocal performance is the least expensive, most natural, and most direct way of expressing desired performance parameters.   After that, I started to track down software that could translate audio, such as voice recording, to MIDI performance data.  This article describes the three best audio-to-MIDI offerings I have found to date.

The Digital Ear:  "Best Results; A Little Slow But Easiest To Use."

Of the three, I like Digital Ear the best. This program comes closest to capturing the full expression found in singing, wind, string and other continuously variable pitch instruments. The product takes an original approach to conversion in that it captures the initial note of each unbroken phrase and uses MIDI pitch wheel data to capture the rest of the pitch, pitch slur, and vibrato elements for that phrase.  Marvelously, it also captures the continuous volume of the performance on the MIDI Sound Volume control.  It also has the option of capturing the brightness of the performance on any control # of your choosing, defaulting to #74 (MIDI Sound Brightness). Unfortunately, Digital Ear does not currently do polyphonic capture.  But for the moment this is not something that really matters to me: I am most concerned with adding life to the AWE64s blown and bowed instruments which does not really require polyphonic capture.

Conclusion: The Digital Ear software, in my opinion, produces fantastic results that are well worth the $80 price tag. The only things I would like to see improved or added are:

  1. The ability to record volume data to arbitrary control #.  A major reason for this is because the AWE64 produces glitches when responding to volume data but not when responding to equivalent expression controller data. Don't ask me why.
  2. Improve pitch tracking, especially for fine-grained vibrato work.  Perhaps the power tools that are made available by registering the software do this.  Alas, I have not yet registered the software.
  3. The ability to change the beats per minute of the captured work during preview, before exporting to MIDI.
  4. The ability to do real-time capture and act as a MIDI input source for recording software.

Although it is probably asking too much to do this with no feedback delay as would be required to use the tool in live performance, even a 40-80 ms delay would be very tolerable for the purposes of studio recording and live tuning of the various capture parameters for optimum performance. Unfortunately, the demo version of the Digital Ear only does about 2 seconds of capture on each .wav file you submit.  The people at Digital Ear must know they have a good product.  However, the included demo MIDIs are really cool and I have proven to myself from several two-second conversions that the product really does work as advertised.


Intelliscore: "Second Best Results; Hardest And Slowest To Use"

Intelliscore (at http://www.intelliscore.net/) is the most sophisticated of the three in that it can do polyphony.  This sounded promising at first, but I found that it was quite difficult to get good results.   There are quite a few parameters that one needs to specify, and it seems that one must set them all "just right" for the software to do a good job.  The fact that these settings must be changed, based on the instrument being captured and the material being performed, does not help much either.  In fact, I played with the settings provided for converting each demo sample, and even slight modifications resulted in completely unusable output.  Adding the fact that the software only understands fixed pitch instruments,  this is a product that I find too limited and difficult for regular use. Also, the software takes a long time to do the conversion (even on a 500Mhz machine).   So in the trial and error process of finding settings that work for a particular sample, you can end up spending quite a bit of time using this program before you start to get even slightly usable results.

Conclusion: Despite its sophisticated ability to do polyphonic audio-to-MIDI conversion, this software is not what I am after at the moment.  If I want to capture polyphonic percussion-class instrument performance,  I get better results from a good MIDI keyboard.  And if I want to capture continuously variable pitch instruments, I am S.O.L..  Plus, I lose any dynamics that are not initial velocity related.  And timbre is just plain ignored. Of course, the product is OK for someone who has no ear but wants to capture recorded piano work to MIDI.  And it might be good for a more advanced musician who wants to capture recorded keyboard work that is too difficult to transcribe by other methods.  Anyway, the idea is great and I look forward to more work and refinement in this area from the people at Intelliscore.


Inst2Midi: "Worst Results; Best Integration With Other MIDI Products, Fastest Results"

In my opinion, it currently has the most potential of the three. Disappointingly, though, it currently yields the worst conversion results.  I feel Inst2Midid has the most potential because it does real-time conversion and acts as a live MIDI input source for recording software.  Add to that the fact that the "Nerds" have ambitions to provide real-time polyphonic capture of live performance to MIDI and it could spell "Killer App."   Unfortunately, I have had little success with getting good results.  Apparently the software can do an "acceptable" job if set up properly.  At least this is indicated by listening to the provided demo audio and corresponding midi files.  The biggest problem I have with this product is that it uses the standard keyboard-biased approach of extracting only the start time, quantized pitch, velocity, and duration information that any cheap MIDI keyboard can produce.  This fact produces very irritating results because slurs and vibrato tend to turn into messy sounding gliss, trill or grace-note ornamentation, or worse...much worse.  Add to this the poor choice of hysterisis settings (note locking stability) that the program exhibits, an you are very likely to get noise rather than music in all but the most subdued and controlled performances.

Conclusion: Once again: why not just perform the same stuff from a MIDI-enabled keyboard?  To me, the whole idea of converting audio to MIDI is to capture as much of  the great expressive quality that only continuously variable pitch, volume and timbre instruments offer.  I can see this software as an interesting tool for guitarists or single voice instrumentalists who want to capture simple solo-melody performances or control synthesizers in a live setting, but they had better be pretty careful: it does not take too much exuberance in one's playing to start getting lots of wrong MIDI notes.

Conclusion: The results of Digital Ear are really quite impressive.  A little more work and the results could be absolutely stunning.  The other products offer tantalizing features, but they have a long time to spend at the drawing board before they will find a space on my "regularly used" software list. In Pursuit of Realistic Timbres On related line of thought, it is amazing what one can actually do with an AWE64 card.  Of course, this requires one to dig down into the technical specs and to spend considerable time and effort adding pitch, expression controller, velocity and other MIDI data.  Also, avoiding the AWE64s internal reverb and chorus effects and then post-processing with a good software or hardware reverb can really help the final realization. I did  dig into the specs, spent the time, and produced 40 bars of highly tweaked MIDI data.  And was quite pleased with the initial result.  But I was also quite disappointed to discover that the AWE64 has no implementation of brightness response to volume, expression, or aftertouch data, rendering performance on a touch-sensitive keyboard all but useless without some complicated scheme to convert these control streams to some combination of NRPN control data.  Now I have to figure out some scripts to do all the stuff I did by hand.  Of course, it should not be too difficult, really.  Most of the tweaks were performed using the simple rules that:

  1. Most sung and blown instrument performances have a short pitch rise at the beginning of a breath and a short pitch fall at the end of a breath.
  2. It sounds quite expressive and natural to occasionally portamento-slur blown or bowed notes, especially when the intervals are smaller than an octave and the phrase is not too regular or staccato.
  3. The harder an instrument is initially blown, bowed, or plucked, the more likely the instrument is to sound sharp during its attack phase on that note.  Typically, the note then falls back to its proper pitch when sustained.
  4. Most longer blown, sung, or bowed notes are sounded initially louder than the overall perceived volume of playing and then fall back to a softer sustain, ending with an increased vibrato and volume toward the introduction of the next note in the phrase.  Since the overall perception of relative loudness is lowered by the intermediate softer sustain, such notes must be played with a MIDI velocity setting higher than if no such processing were added.
  5. Regular, repeating rhythmic phrases are typically performed louder on each first beat of the meter and/or possibly on others, depending on feel.
  6. Slow phrases (especially string or horn ensemble) sound best when there is well placed swell and ebb (typically at the beginning and end of phrases, respectively), using expression controller data.
  7. Fast phrases (especially string or horn ensemble) sound best when there is well placed swell and ebb using velocity data.
  8. Staccato string passages and percussion instruments sound best when their dynamics are modulated by velocity information rather than expression controller data because AWE64 GM MIDI synth voices link timbre to velocity rather than volume, expression controller or aftertouch.
  9. AWE64 single horn parts sound best when you stick to fanfare-type phrases, as longer notes do not respond too well to the volume and expression controller (no expression controller->filter programming capability in the 8000 chip, the drivers, or whatever, meaning that the brightness does not increase with volume, as is the case with real horns and in more expensive synthesizers).  I will do some more experiments with this to see what can be done to get natural sounding legato horn passages, but for now I am simply choosing to avoid such arrangements. Switching to horn ensemble can help, but is rather limiting if you want to do a legato horn solo.

Post new comment

More like this . . .

Microsoft Office 2003

Introduction Microsoft on Feb. 11, 2003 announced Office 2003 as the official name of the productivity suite originally known as "NGO" (...

Microsoft Office XP: Part 1 - Hoopla vs. Reality

A hands-on test of the final release of Microsoft Office XP Part 1 - Hoopla vs. Reality Introduction Microsoft is probably best known for its...

Codename: Office 10

Fig: Codename Office 10 Product: OfficeXP (known as Office 10 in beta; released May 31, 2001.) From: Microsoft Price: Retail prices range from $479 to $799 (US) for various...

Hands-on Office 2000

Fig: Hands-on Office 2000 Office 2000 Premium (final) – a Hands-on Test June 10th, 1999 was the long-awaited release date of Microsoft Office 2000. Although the full...

Office 2000: How much Office is enough?

Product: Office 2000 From: Microsoft Price: About C$250 for an OEM "standard" release; roughly double that for a retail copy. Professional...

Office 2000: to the Web and back

Office 2000 is the name of what is probably the most widely available version of Microsoft’s popular suite of Office software applications. It...

Photoshop 6.0.1 and ImageReady 3.0

The first public demo of Photoshop 6.0 occurred during Apple CEO Steve Jobs' keynote speech at the Seybold 2000 publishing conference (available for...