Audio to MIDI Conversion

Music recognition software "listens" to tunes, then transcribes

Introduction

As noted in parts one and two of our feature on Digital Ear, Intelliscore and Inst2Midi, the technology now exists for your computer to "listen" to an audio file in MP3 or WAV format and figure out the notes and, in some cases, harmonies being played. The resulting data is then converted into MIDI format, yielding a transcription that can then be edited or, using a third-party scoring utility, notated as standard sheet music. In this article, we'll look at several other competitors to these titles, in an effort to find the best programs currently available.

How we tested

We tested each program using the same source files, representing types of music suitable for transcription with an audio-to-MIDI program.  including Forest Fugue by Michael Hurst, First Snows of Winter by Lori Pappajohn and The Drop by Peter Gabriel. The first song tests timing precision and the ability to accurately transcribe counterpoint movement; the second features a slow-moving flute melody over a briskly plucked harp -- a test of the programs' abilities (or lack thereof) to distinguish between multiple voices. The excerpt from The Drop, from the Peter Gabriel album Up, is a good example of chord clusters, rich in harmonics. All three examples are representative of the kinds of audio files most likely to produce an acceptable transcription. Songs with drums or percussion, slurred phrases or excessive amounts of distortion are less likely to be successful.

We used Sonic Foundry's Sound Forge 6.0 to convert the audio tracks to the various formats supported by each program. All titles were tested on Windows XP, except where noted. All three test files are available for download, in the event you wish to conduct your own tests with these titles.

WIDI (Windows)

WIDI 2.7 dramatically improves upon earlier releases. We had previously tested v2.3 with poor results. All three of our test files were converted with good results, although the results could not be saved, due to restrictions of the trial version. Unfortunately, the program's interface, while relatively simple, still manages to be one of the least intuitive of any tested, with obscure, badly drawn icons (the "Recognition settings" icon, for example, is an S with a checkmark on it!) lining a browser-like window. US$33.

Digital Ear (Windows)

As noted in our in-depth review elsewhere on this site, Digital Ear makes no excuses for lacking polyphonic transcription support. Its primary aim is the expressive capture of live solo instruments, via microphone or pre-recorded audio files. Still, we discovered a few interesting things during this round of tests. Digital Ear 4.02 did not work with a standard 44.1 KHz stereo WAV (uncompressed PCM) file produced by Steinberg's WaveLab 3. The same format, when output from Sound Forge, worked as expected. The real-time edition of the software does, however, have the intriguing potential to allow a human voice to control a MIDI instrument. Standard and real-time editions are $79.95 and $119.95, respectively, from Epinoisis Software.

Inst2MIDI (Windows)

This program is a ZDNet Editor's Pick, for reasons that apparently have little to do with the fact that it produced some of the worst results in our tests. It's a monophonic transcriber with what we'd characterize as an awful interface. Reviewed elsewhere on this site. It's available from http://www.nerds.de/

WAV2MIDI 1.2 (DOS)

Some kind of booby prize must surely be awarded to this free (DOS) command-line program by Guenter Nagler, which produced a translation of our Forest Fugue source WAV file so utterly demented, we felt compelled to call the results "EvilFugue." Although the output bears no resemblance whatsoever to the source file, the results are interesting, in a suspenseful horror movie sort of way. We'll be using this one for a scary soundtrack next Halloween.

AmazingMIDI (Windows)

Polyphonic, PCM WAV sources only. US$29, from Araki Software. Trial version limits translation time to 30 seconds. The graphic representation of the transcoded WAV file looks nice, coloured in different shades of blue, but it is almost totally useless. You can zoom in or out and drag the view around, but it is otherwise uneditable. Transcription quality of our sample from The Drop was about average -- somewhat below the level we think most users would find useful.

Example output: AMA-The Drop.mid

Intelliscore (Windows)

Monophonic or polyphonic recognition, (too) many adjustable parameters. Requires a fair amount of configuration to produce useful results. Reviewed elsewhere on this site. Read more.... Prices range from US$65 (Standard version, Email version), $75 (Boxed version): Standard Direct $89 (Email version), $99 (Boxed version): Polyphonic Direct.

AutoScore (Windows 9x, Mac 8.x or earlier)

US$99 and up. This program was one of the few titles available for both Windows and Macintosh. Now, the developer's website says "Our commitment to excellence and to our customers requires us to discontinue all Macintosh versions of Autoscore." Funny Macintosh customer support, that. Things aren't much better on the PC side. The company says Autoscore is not compatible with Windows NT, 2000, or XP. Visit www.wildcat.com for details on the company's offerings.

Seventh String Transcribe! (Windows, Mac, Mac OS X)

Another Mac/Windows offering is the shareware title Transcribe, from Seventh String. This is the only title we've seen supporting Mac OS X. It takes a substantially different approach than the other titles mentioned here. Rather than attempting automatic audio-to-MIDI conversion, it provides a number of tools designed to aid the musician in learning a piece of music. These tools include the ability to slow down audio files (MP3, WAV, and AIFF files are supported), isolate and loop specific song segments and analyze chords, complete with a graphic representation of relative amplitudes of specific frequencies (e.g., which notes are played louder). We were disappointed to discover that the program lacked audio-to-midi conversion capabilities (the program's author unabashedly admits that "Transcribe! takes no interest in MIDI files")  until we found that the tools provided instead actually do a fine job at helping anyone with a little musical training accomplish a proper transcription themselves.

Akoff Music Composer 2.0 (Windows)

This easy-to-use program provides results somewhat more musically pleasing than those achieved by WIDI 2.7. Its interface is straightforward and the performance speedy. It is polyphonic, but accepts WAV files only. MIDI output, with selectable instrument type, can be previewed from within the program. Pitch bend is supported. Keyboard-style Graphical interface for range filtering. Occasionally unstable.

Akoff output example:

  • AKF-Forest_Fugue.mid

AudioToMidi (Windows)

This free program heralds from Russia. It's part of TallStick's Sound Project. This program, which supports a surprisingly robust feature list, handles the widest array of input formats of any of the programs tested and, unlike WIDI, does not need to transcode MP3 files to WAV in order to work with them. Results were good, although some manual tweaking of the default settings may be required for complex polyphonic scores. Fortunately, such tweaks are easy and fairly intuitive to accomplish. Keyboard-style Graphical interface for range filtering. A monophonic mode (with several alternate settings files) is also provided.

Here are samples of its MIDI output of test files #1 and #2:

  • A2M-Forest_Fugue.mid
  • A2M-First_Snow.mid

Conclusion

As is the case in almost any software category, there's always a trade-off between power and ease of use. None of the polyphonic conversion programs we tested featured the ability to convert vibrato into pitch-bend information, or convert timbre into filter cutoff information, the way a tool such as Digital Ear can do.  Nevertheless, for quick analysis of piano parts and instrumental solos, these polyphonic transcription programs are more flexible. Even in pieces with monophonic instrumental motifs, our tests of the best polyphonic transcribers yielded results at least as good as titles optimized for monophonic instruments, such as Autoscore or Digital Ear. Remember, however, that even the best of these utilities is imperfect at best. MIDI transcriptions end up on a single channel and, in the case of WIDI, the trial version can't save at all.

Akoff Composer 2.0, AudioToMidi and WIDI 2.7 produced the best automatic results, in roughly that order. Akoff Composer is a good choice for those seeking immediate gratification with a simple solution. It works well with microphone input and produced highly listenable results from our test files. We are a little worried about the occasional hangs we encountered while using the program, though. WIDI usually produced good results with the default settings, but it suffers from a lackluster interface you may find clumsy once you start tweaking recognition settings to eliminate unwanted key ranges or discordant overtones. Considering the free nature of AudioToMidi and the fact that it is stable, relatively easy to tweak, yet surprisingly full-featured, AudioToMidi is our Editor's Choice, while Transcribe! earns an honorable mention for its common-sense approach to the tasks at hand.

Post new comment

More like this . . .

Cubase VST 5.0

Cubase users rejoice: This is the upgrade you've been waiting for Product: Cubase VST 5.0 (5.0r4 is the latest official Windows release; 5.0r1PB1 for...

The Digital Ear

Audio-to-MIDI Conversion Software - The Digital Ear Overview Recently, I became frustrated with my Creative AWE64 card and MIDI scoring software....

Music Software - The Virtual Studio

Choosing the Right Sequencing Software The biggest recent development in music software is the emergence of the Virtual Studio. Ever since the the...

Audio CDs & MP3s

How to make your own audio discs, or convert music to or from MP3 files Introduction From a technical standpoint, the MPEG-Layer 3 audio format (...

Sheet Music Software - part 1

Scoring Software Looking for software that is primarily designed for composition? The trick, says Kelly Demoline of Kelly's Music & Computers, is...

Audio Sequencers for Windows

The MOD Squad So-called MOD players have existed for many years, having first appeared back in the early days of the Amiga and Atari ST (in fact,...

WinAmp and its Would be Successors

Competitors promise higher quality and smaller files, but MP3 still holds the public's fancy Fast Times in Digital Musicland The first week of May...