Speech Recognition for PCs

Speech! Speech!
Voice recognition has the potential to radically change the way we use computers. After all, who among us can type 140 words per minute? These programs can. Current speech recognition programs have the capability to recognize continuous speech -- that is, speech that is -- not -- like -- this.

Here are some of the latest tools that allow us to dictate letters or edit and format documents just by talking into a microphone.

New offerings from the Major Players
Microsoft has so far taken tentative steps toward the idea of fully integrating voice dictation features into its word processor. Demonstrations of its "Next Generation Windows Services" during the June 2000 rollout of its "dot-NET" initiative suggest that natural language input capabilities won't be fully integrated into Windows until the release of Office 10 and the version of the OS currently code-named "Blackcomb," currently expected in 2002.

The company has been working with Belgian firm Lernout & Hauspie (www.lhs.com), the parent company of speech recognition software leader Dragon Systems, on voice technology but, according to a recent PC Week article, says speech technology is not yet reliable enough to include in its Office products.

Lotus and Corel, the other major players in the office suite market, apparently feel otherwise. Lotus SmartSuite Millennium Edition uses IBM's ViaVoice and Corel WordPerfect Suite 8 uses the market-leading NaturallySpeaking from Dragon Systems Inc. of Newton, Mass. to add continuous-speech voice dictation to their word processors.

In the interim, Microsoft's website provides a downloadable 60-day trial version of Kurzweil VoiceCommands, an add-on for Office 97 that allows you to perform tasks in Microsoft Word by issuing spoken commands, the company the With Kurzweil VoiceCommands, you can automate routine tasks, make features of Word more accessible, and eliminate time-consuming sequences of action.

For example, instead of selecting text and then clicking Convert Text to Table on the Table menu, you can say, "Convert the next three paragraphs to a table" to accomplish this task. And since you can frequently complete several steps by issuing one command, you don't need to know all the menu and dialog box names in Word.

At the end of the 60-day trial period, the software stops working. For information about the full retail version, call Lernout & Hauspie Speech Products at (800) 380-1234. Update: Hopefully, L&H staff will be around to take the calls. Lernout & Hauspie filed for bankruptcy protection in Nov. 2000, amid charges of "misuse" of as much as $30 million dollars by an executive in the company's Korean office, and a slumping stock value that has seen the company's worth plummet from nearly $70 in March to about $3.50 by Nov.

Indeed, L&H surprised many on March 28, 2000, when it signed a definitive agreement to acquire Dragon Systems Inc. of Newton, Mass. At the time, the two companies said they planned to use existing resources create a "brain trust" that would use the best of the technologies developed in both companies to work on projects designed to accelerate the use of speech and language in emerging and mainstream consumer markets.

Following this announcement, the last edition of the NaturallySpeaking voice recognition software developed solely by Dragon Systems, hit the shelves in Aug. 2000. Dubbed NaturallySpeaking Version 5.0, the product focuses on a streamlined interface with a simplified toolbar and support for Intel's Pentium 4 microprocessor. PC World has additional details.

Dragon Systems offers its NaturallySpeaking product in  several different versions: A starter edition dubbed "Essentials" handles Web browsing, e-mail, and chat, and is priced at US$59. On the high end is the US$249 Preferred USB edition, which is aimed at notebook users and includes a USB sound card. NaturallySpeaking Preferred, which adds the ability to dictate directly into Microsoft Word and other certain other word processors, sells for C$169. Other releases, with specialized dictionaries for technical and legal trades are also available.

In fact, I dictated the first draft of this entire review into Dragon NaturallySpeaking by simply speaking into a headset microphone. I did not type a single word. I was able to specify punctuation, spell out specialized words or acronyms like "SCSI" and even save the file entirely by speaking into the microphone. The program has the capability to make menu selections such as "click file save" via voice commands as well as performing dictation tasks. You can also give commands to move the cursor around in your document. For example, you can say "move forward two words" and the insertion point will (hopefully) move to the position you specify.

Similarly, if a word is recognized incorrectly (and despite the many merits of this program, it does make occasional errors), you can simply say the name of the word you wish to correct and say "correct that" and the dialog box of the closest matches will open. Then, you can simply or say the number of the line the word is on, and say "click okay" and you are back in business.

NaturallySpeaking can also format text with commands for centering, adding bullets, making text bold or italicized, etc.

Dragon says its BestMatch technology, a new feature found in release 3.0 and later, improves NaturallySpeaking's already impressive recognition by 25 percent.

Meanwhile, Lernout & Hauspie released Voice Xpress, which uses natural language to provide advanced formatting and user interface navigation capabilities, as well as dictation roughly comparable to that of Dragon's offerings. Unlike NaturallySpeaking Personal Edition, Voice Xpress supports multiple users -- a boon for families. Like ViaVoice, it can speak text back, and users can dictate directly into Word 7 or Word97.

In our tests, Voice Xpress seemed to offer the best mixture of features, although its recognition accuracy was not quite up to par with that of NaturallySpeaking, even after extensive amounts of training. Also, it was quite slow to load and seemed a little more sluggish at recognition than its competitors. L&H says Voice Xpress requires a 200 MHz PC with MMX, 130 MB of disk space and 32MB RAM (48 on NT) to work.

L&H says the next version of VoiceExpress will incorporate features from the technologies it acquired from Dragon Systems. Again, we have to assume that the company's current financial woes and Chapter 11 status may affect these plans.

IBM's ViaVoice Gold is generally considered to offer a little less accuracy that the Dragon system (we'd rate it as approximately the same as Voice Xpress in recognition accuracy), but it has, as we noted in our previous review, some features not present in Dragon's product. The Gold version (dubbed "Pro" in the Release 7 "Millennium Edition") adds UI navigation features and improved correction capabilities sadly lacking in the original release. IBM's ViaVoice Millennium Edition, further improves the program's recognition capabilities (primarily by increasing the program's active vocabulary to a whopping 2 million words, up from 64,000 words in 1998's ViaVoice Gold, or a paltry 22,000 in the original release) and adds correction and navigation features comparable to NaturallySpeaking and Voice Xpress. ViaVoice Millennium Edition requires a minimum Pentium (or AMD) 233 with 48MB RAM (Win95/98; 64MB RAM for Windows NT) and 310 MB of disk space.

Following this version is ViaVoice 8.0, which shipped at the end of August 2000. As usual, it promises improved accuracy and usability. Details at http://www-4.ibm.com/software/speech

Not to be outdone by voice-capable word-processor offerings from IBM and Microsoft, Corel released a version of its flagship word processor called Corel WordPerfect Suite 8 with Dragon NaturallySpeaking with, as the name implies, built-in dictation capabilities courtesy of technology licensed from Dragon. When we tested it, we were impressed with the ability to not only dictate text into the word processor, but to format the text, change fonts and perform corrections. Corel's offering is based on Dragon's standalone NaturallySpeaking product.

Corel bet heavily on Dragon's technology. Indeed, Dr. Michael Cowpland, when he was the president and chief executive officer at Corel Corporation, equipped the company's entire world-wide sales force with multimedia-enabled Versa portables from NEC to facilitate effective demos of the program for its customers. We're pleased that Corel chose to integrate Dragon's market-leading speech technology -- it's nice to see Microsoft have to play catch-up once in a while.

Dragon and L&H aren't the only ones making deals, either. IBM has licensed its ViaVoice to a number of vendors, including Olympus, which ships a custom version with its D1000 digital minirecorder, and Creative Labs, which bundles ViaVoice "Special Edition" with its Sound Blaster Live Platinum sound card. IBM also licenses ViaVoice technology for use in various telephony apps and embedded computers and in late June, 2000, plans to begin selling ViaVoice Dictation for Linux, which will sell for about US$60 retail.

And if C$69 or so for Point and Speak, the least expensive package from Dragon Systems, is too much to spend on what some people will probably conclude is folly, there's always FreeSpeech from Philips Electronics. This program can be downloaded for a free seven-day trial, and was recently upgraded to support "voice surfing" of the Internet. If you like it (we didn't), the price to keep using it is only US$39. FreeSpeech requires Microsoft Windows 95, 98, or NT; a 166-MHz Pentium MMX (200-MHz recommended); 32MB RAM with Windows 95 or 98 (64MB recommended), 48MB with Windows NT (96MB recommended); a SoundBlaster-compatible sound card and  microphone. Get it from www.speech.be.philips.com.

Speech Recognition on the Mac
Finally, Apple users in Dec. 1999 finally got a speech recognition solution of their own from IBM, with ViaVoice for Macintosh. This product was subsequently updated in mid-2000 for better performance and accuracy. According to some reports, the Mac version of ViaVoice now outsells the PC edition.

Another Mac-based speech recognition product now shipping is iListen, from MacSpeech. It is based on the Philips FreeSpeech engine.

Interestingly, Dragon had previously committed to releasing an "industry leading" speech recognition system for the Mac. However, no product ever shipped, although Dragon was showing its NaturallySpeaking product for Mac at Comdex Fall '99. (The company blamed "unexpected difficulties" for the change in plans. No doubt!)

A historical footnote: Apple first developed a speech command recognition system for its Quadra series of computers years ago, but apparently failed to patent it. Another company patented a similar technology shortly afterwards and Apple is said to have been locked in a legal miasma ever since. The latest Mac OS 9 software from Apple includes basic voice command capabilities, but lacks the full text-input capabilities of the IBM product.

Now here's a shame. General Magic's "Portico" service is a speaker independent speech recognition system that's not going to be available in Canada anytime soon. Launched in the US in July 1998, Portico knocked our socks off during a live demo.  It is essentially a "virtual assistant," designed for today’s mobile professional workforce. It enables users to access, retrieve and redistribute information across computer and telephone networks using any telephone and a normal speaking voice through a technology the company calls "magicTalk." Portico integrates voice mail, email, address books, and calendars as well as public information from the Internet, major wire services and other sources. Check out the press releases and amazing demos at www.generalmagic.com

- Graeme Bennett

For Further Reading:

  • [Nov. 29, 2000] MacCentral: L&H seeks bankruptcy protection.
  • More info on voice command software is featured in our article on Talking to your Computer.
  • The University of Southern California has reportedly created the world's first machine system that can recognize spoken words "better than humans can."
  • [June 21, 2000] CNet: IBM overhauls voice-recognition strategy
  • [Jan. 24, 2002] New York Times: The Last Word in Dictation. Period.

Contacts:

Post new comment

More like this . . .

Microsoft Office 2003

Introduction Microsoft on Feb. 11, 2003 announced Office 2003 as the official name of the productivity suite originally known as "NGO" (...

Microsoft Office XP: Part 1 - Hoopla vs. Reality

A hands-on test of the final release of Microsoft Office XP Part 1 - Hoopla vs. Reality Introduction Microsoft is probably best known for its...

Codename: Office 10

Fig: Codename Office 10 Product: OfficeXP (known as Office 10 in beta; released May 31, 2001.) From: Microsoft Price: Retail prices range from $479 to $799 (US) for various...

Hands-on Office 2000

Fig: Hands-on Office 2000 Office 2000 Premium (final) – a Hands-on Test June 10th, 1999 was the long-awaited release date of Microsoft Office 2000. Although the full...

Office 2000: How much Office is enough?

Product: Office 2000 From: Microsoft Price: About C$250 for an OEM "standard" release; roughly double that for a retail copy. Professional...

Office 2000: to the Web and back

Office 2000 is the name of what is probably the most widely available version of Microsoft’s popular suite of Office software applications. It...

Photoshop 6.0.1 and ImageReady 3.0

The first public demo of Photoshop 6.0 occurred during Apple CEO Steve Jobs' keynote speech at the Seybold 2000 publishing conference (available for...