Don’t Speak Twice, It’s All Right

July 17th, 2006

Andy Lee sent me a bunch of excellent feedback about FlexTime, and let me know about a strange, 100% reproducible crashing bug. If you configure FlexTime such that both the ending cue of one activity and the starting cue of the one that follows are “Speak Text” cues, then the application crashes.

First thought: damn I’m glad I put a beta out. Second thought: good lord, what I have done!?

Unfortunately, the bug is not in my code. I was able to reproduce the problem quite easily with the simplest of command line tools:

#import <Carbon/Carbon.h>
main() { Str255 string1 = "\\pHello"; Str255 string2 = "\\pHello Again";
SpeakString(string1); SpeakString(string2); }

You may not have realized that it was quite so simple to accomplish spoken text on a Mac. Unfortunately, the simplicity is deceptive, since compiling and running the above tool results in a nasty crash:

Program received signal EXC_BAD_ACCESS, Could not access memory.
Reason: KERN_PROTECTION_FAILURE at address: 0x00000000
0x0020da94 in MTBEAudioUnitSoundOutput::BufferComplete ()
(gdb) bt
#0  0x0020da94 in MTBEAudioUnitSoundOutput::BufferComplete ()
#1  0x7006e9bc in AUScheduledSoundPlayerEntry ()
#2  0x700097c8 in DefaultOutputAUEntry ()
#3  0x700049d0 in DefaultOutputAUEntry ()
#4  0x700da1a8 in dyld_stub__keymgr_get_and_lock_processwide_ptr ()
#5  0x90bd9d24 in CallComponent ()
#6  0x942647a0 in AudioUnitUninitialize ()
#7  0x94178368 in AudioUnitNodeInfo::Uninitialize ()
#8  0x94178300 in AudioUnitGraph::Uninitialize ()
#9  0x941786a4 in AudioUnitGraph::Dispose ()
#10 0x941785f0 in DisposeAUGraph ()
#11 0x0020e0a8 in MTBEAudioUnitSoundOutput::~MTBEAudioUnitSoundOutput ()
#12 0x0020a78c in SpeechChannelManager::~SpeechChannelManager ()
#13 0x002215f4 in SECloseSpeechChannel ()
#14 0x91997974 in KillSpeechChannel ()
#15 0x91995088 in KillPrivateChannels ()
#16 0x919969d8 in SpeakString ()
#17 0x00002cec in main ()

SpeakString has been around for a long time. Long before Mac OS X and long before CoreAudio, where the crash appears to be happening. I would guess it didn’t used to crash, but when it was ported to Mac OS X, something got overlooked and now it leads to whammy land.

OK, so I how do I work around the problem? It is clearly related to attempting to speak text while some text is already speaking. Maybe if I could coddle the Speech Manager a little bit, I could prevent it from crashing.

From the Speech Synthesis Manager Reference documentation for SpeakString, we see that the behavior for overlapping speech is (supposed to be) very well defined:

“If SpeakString is called while a prior string is still being spoken, the sound currently being synthesized is interrupted immediately. Conversion of the new text into speech is then begun. If you pass a zero-length string (or, in C, a null pointer) to SpeakString, the Speech Synthesis Manager stops any speech previously being synthesized by SpeakString without generating additional speech. If your application uses SpeakString, it is often a good idea to stop any speech in progress whenever your application receives a suspend event. Calling SpeakString with a zero-length string has no effect on speech channels other than the one managed internally by the Speech Synthesis Manager for the SpeakString function.)”

Translation: what we’re doing is supposed to work. But maybe by overdoing it we can achieve the desired goal. If Mac OS X falls down on the “interrupting immediately” behavior, perhaps we can manually stop any previous sound to help it keep its bearings. According to the documentation, calling “SpeakString(NULL)” should effectively cancel playback. Unfortunately, injecting it into my simple crash case changes nothing. Worse, when I add it to my live application, I observe a new failure path. The text “pure virtual method called” is printed to the console, with the following backtrace:

(gdb) x/s $r4
0x52c1ad8 <_zn24mtmbraisedsinecrossfader7scoeffse +5916>:
"pure virtual method called\n" (gdb) bt #0 0x90014ac0 in write () #1 0x052b2814 in MTFEClone::VisitCommand () #2 0x05287604 in MTBEWorker::WorkLoop () #3 0x052866f4 in MTBEWorkerStartMPTask () #4 0x90bc1900 in PrivateMPEntryPoint () #5 0x9002bc28 in _pthread_body ()

Well, this can is getting wormier and wormier. It is starting to look like I won’t be able to take advantage of the ease and simplicity of SpeakString. Ten years ago, sure. But in 2006 SpeakString es muy sucky. It’s probably time to start looking at the more advanced speech API, where I’m responsible for managing my own speech channels. With responsibility also comes (we hope) the ability to save ourselves from certain doom.

But let’s say I just need to stick with SpeakString, because I have a demo in 5 minutes, or users are just screaming bloody murder about this bug. There is a crude workaround that takes all the asynchronous fun out of speech, but also prevents the crash. By explicitly waiting for the Speech Manager to be done with any previous speech, I can prevent it from maiming itself:

#import 
main() { Str255 string1 = "\\pHello"; Str255 string2 = "\\pHello Again";
while (SpeechBusy()) { ; } SpeakString(string1); while (SpeechBusy()) { ; } SpeakString(string2); while (SpeechBusy()) { ; } }

This also “works” in FlexTime, for some definition of “working.” But it can cause hideous stalls in the playback UI, since I’m blocking there for an indeterminate length of time. Passable in a beta release, but not acceptable for a finished product.

Sigh. I’m going to have to do real work. But you don’t have to. RSSafeSpeaker is a simple singleton class designed to make worry-free overlapping speech easy for the Cocoa programmer. Instead of trying to manage a number of open speech channels, this class takes the approach that it’s “good enough” to just allocate and deallocate a channel for every speech made. Obviously for some purposes this will not be suitable, and you’ll want to manage a pool of open channels. For the “everyday, get this done easily” use though, I hope you’ll find this class handy. Rewriting our previous example using RSSafeSpeaker:

NSString* string1 = @"Hello";
NSString* string2 = @"Hello Again";

[[RSSafeSpeaker sharedInstance] speakString:string1]; [[RSSafeSpeaker sharedInstance] speakString:string2];

No crashes! And I get to use NSString. Everything is better. This is a good example of a situation where the shortcomings of Apple’s API caused me grief and made me go to a lot of extra work. But it’s also an example of such a situation where the extra work won’t be for naught. It’s a good idea for me to use the “deeper” speech APIs, because it’s inevitable that I’ll want to have finer control over the playback effects in my application. It was just a lot easier to choose “SpeakString” as the quickest solution. If anything else persnickety comes up, I’ll be in an excellent position to respond quickly and effectively. All in all, time well spent!

Oh, and in case anybody was worried, I did report the crashing bug to Apple (rdar://problem/4633582).

Update: Oh man, don’t I feel like a dork! I somehow missed the presence of NSSpeechSynthesizer, altogether. Thanks to Jim Correia for pointing it out to me via email. It does seem to work, and doesn’t crash. Of course, now that I’ve got the infrastructure in place, I might as well keep using it, since it will ultimately give me more control over the playback options. But NSSpeechSynthesizer does seem a better choice for most purposes.

It looks like each NSSpeechSynthesizer corresponds with a “speech channel,” so if you actually want to overlap voices (instead of just causing the previous speech to be canceled), you’d need to allocate multiple speech synths (similarly to how my RSSafeSpeaker allocates a speech channel for each request).

Thanks again to Jim for sharing this! I am embarrassed to have overlooked it…

5 Responses to “Don’t Speak Twice, It’s All Right”

  1. Eric Albert Says:

    You might want to file a separate bug for the pure virtual method problem, since the fix for that is probably different from the fix for the crash.

  2. rentzsch Says:

    This is reproducible in all Cocoa apps that have an NSTextField and don’t go out of their way to disable the “Start Speaking” menu items (both the menu-bar one and the contextual one). Just starting speaking some text and while it’s speaking, do it again. I run into this on a regular basis since I often have my Mac read back my blog postings to catch errors.

    Bonus cool: Crash Reporter itself has a selectable NSTextField, so it’s simple to crash Crash Reporter using this method. Sadly crashreporterd stops Crash Reporter from reporting Crash Reporter crashes :-)

  3. Daniel Jalkut Says:

    Wow … that’s pretty amazing. In that case I’m especially surprised this bug has survived so long …

  4. Casey Fleser Says:

    Beware NSSpeechSynthesizer is quite leaky (radar://4387934) . So plan on allocating one and keeping it around for the life of the app.

  5. Andy Lee Says:

    Nice sleuthing! And I love the reference to one of my favorite songs.

    I somehow missed the presence of NSSpeechSynthesizer

    See, if you were completely speech-ignorant like me you could have searched for “speech” in AppKiDo, and NSSpeechSynthesizer would have been second on the list, after NSSpeechRecognizer. :)

Comments are Closed.

Follow the Conversation

Stay up-to-date by subscribing to the Comments RSS Feed for this entry.