I have an asp.net MVC application that has a controller action that takes a string as input and sends a response wav file of the synthesized speech. Here is a simplified example:
public async Task<ActionResult> Speak(string text)
{
Task<FileContentResult> task = Task.Run(() =>
{
using (var synth = new System.Speech.Synthesis.SpeechSynthesizer())
using (var stream = new MemoryStream())
{
synth.SetOutputToWaveStream(stream);
synth.Speak(text);
var bytes = stream.GetBuffer();
return File(bytes, "audio/x-wav");
}
});
return await task;
}
The application (and this action method in particular) is running fine in a server environment on 2008 R2 servers, 2012 (non-R2) servers, and my 8.1 dev PC. It is also running fine on a standard Azure 2012 R2 virtual machine. However, when I deploy it to three 2012 R2 servers (its eventual permanent home), the action method never produces an HTTP response -- the IIS Worker process maxes one of the CPU cores indefinitely. There is nothing in the event viewer and nothing jumps out at me when watching the server with Procmon. I've attached to the process with remote debugging, and the synth.Speak(text)
never returns. When the synth.Speak(text)
call is executed I immediately see the runaway w3wp.exe process in the server's task manager.
My first inclination was to believe some process was interfering with speech synthesis in general on the servers, but the Windows Narrator works correctly, and a simple console app like this also works correctly:
static void Main(string[] args)
{
var synth = new System.Speech.Synthesis.SpeechSynthesizer();
synth.Speak("hello");
}
So obviously I can't blame the server's speech synthesis in general. So maybe there is a problem in my code, or something strange in IIS configuration? How can I make this controller action work correctly on these servers?
This is a simple way to test the action method (just have to get the url
value right for the routing):
<div>
<input type="text" id="txt" autofocus />
<button type="button" id="btn">Speak</button>
</div>
<script>
document.getElementById('btn').addEventListener('click', function () {
var text = document.getElementById('txt').value;
var url = window.location.href + '/speak?text=' + encodeURIComponent(text);
var audio = document.createElement('audio');
var canPlayWavFileInAudioElement = audio.canPlayType('audio/wav');
var bgSound = document.createElement('bgsound');
bgSound.src = url;
var canPlayBgSoundElement = bgSound.getAttribute('src');
if (canPlayWavFileInAudioElement) {
// probably Firefox and Chrome
audio.setAttribute('src', url);
audio.setAttribute('autoplay', '');
document.getElementsByTagName('body')[0].appendChild(audio);
} else if (canPlayBgSoundElement) {
// internet explorer
document.getElementsByTagName('body')[0].appendChild(bgSound);
} else {
alert('This browser probably can\'t play a wav file');
}
});
</script>
I found that I can reproduce the issue on other servers, including Azure VMs, so I ruled out the possibility of an issue with our particular environment.
Also, I found that I could get the code to work fine on 2012 R2 if I ran the application pool under an identity that was an admin on the server and had previously logged into the server. After a very long process of ruling out permissions issues I decided it must be something in the logging in process that occurs that enables the TTS API calls to work correctly. (Whatever it is, I wasn't able to find it digging through procmon traces). So fortunately the ApplicationPoolIdentity can have similar login magic applied by opening "Advanced Settings" for the app pool in IIS and setting Load User Profile
to True
.
The identity that runs the app pool also needs permission to read HKU\.Default\Software\Microsoft\Speech
which can be granted to ApplicationPoolIdentity by using the local server for the location and IIS APPPOOL\.Net v4.5
for the username (where .Net v4.5
is the name of the application pool).
Once read permission to the reg key is granted, and the app pool is configured to load user profile, the above code works fine. Tested on Azure VMs and vanilla 2012 R2 from MSDN ISOs.