Search code examples
sqlspeech-recognitionergonomicsnaturallyspeakingcode-by-voice

Can I write SQL using speech recognition?


I have wrist pain when I type and I would like to start writing SQL statements, stored procedure, and views using speech recognition.


Solution

  • Yes. SQL is well-suited to speech recognition (as well-suited as a programming language can be, that is), given it's limited vocabulary and sentence-like structure. Aside from formatting the SQL so that it looks nice, I can dictate it much faster than typing. Dictating code isn't for everyone, however. It can be quite frustrating in the beginning. The people who try this and stick with it will probably be those who have no other choice.

    I use Dragon NaturallySpeaking 10 Professional. The Professional version has the tools that are needed to create a custom vocabulary like this. Version 9 should work fine, also. It's expensive, so try to get the company you work for to pay for it if possible. Get a decent headset microphone also. The one that comes with NaturallySpeaking isn't good enough (but you may want to try it first to see if it works for you). KnowBrainer is a good place for microphone recommendations.

    2009-01-05 Update: I have added tips below specific to dictating in SQL Server Management Studio.

    2012-01-04 Update: I have been keeping track of Microsoft's WSR for quite a while now, hoping tools would be added to easily create a completely custom vocabulary from scratch like I am doing in this tutorial with NaturallySpeaking. Unfortunately, it appears that this can only be done through the API (SAPI). I don't have the time to write that code, so I will continue to use NaturallySpeaking to write code until something better comes along.

    Preparation

    Clean up your database names and code

    Dictating "SELECT PT_17, PT_28, PT_29 FROM HIK.dbo.PATINFO" would be a pain in the butt, but I guess it would be possible. You would have to set a lot of pronunciations, since NaturallySpeaking would have no idea how "PT_17" would sound. This would be preferable for dictation:

    SELECT Patient.FirstName, Patient.MiddleName, Patient.LastName FROM Claim.dbo.Patient AS Patient WHERE Patient.LastName LIKE '%smith%'
    

    I switched to my TSQL vocabulary to dictate the above statement. Everything up to the LIKE statement is spoken just as it appears. '%smith%' was dictated as "open-single-quote percent-sign sierra mike india tango hotel percent-sign close-single-quote [PAUSE] compound-that". Using consistent table aliases and always preceding fields with them helps improve accuracy, since NaturallySpeaking keeps statistics of how often one word appears near another.

    Create a word list of SQL keywords

    Put one word on each line. You can optionally follow a word with a backslash (\) and a pronunciation. NaturallySpeaking uses a small backup dictionary of words to determine the pronunciation of words you add to a vocabulary, so it has no problem figuring out how SELECT, FROM, and WHERE are pronounced. It can sometimes figure out a compound word, and it makes its best guess for something like XACT_ABORT. I would provide pronunciations for cases like these. The database you use will determine what words the list contains - check your documentation for a list of keywords. Your list will look something like this, but be much longer.

    SELECT
    WHERE
    FROM
    XACT_ABORT\exact-abort
    MAXDOP
    NOLOCK\no-lock
    LEN
    RETURNS
    CURSOR
    MONEY    
    

    Also add these words

    \New-Line
    \New-Paragraph
    \All-Caps
    \All-Caps-On
    \All-Caps-Off
    \Cap
    \Caps-On
    \Caps-Off
    \No-Caps
    \No-Caps-On
    \No-Caps-Off
    \No-Space
    \No-Space-On
    \No-Space-Off
    \space-bar
    \tab-key
    a\alpha
    b\bravo
    c\charlie
    d\delta
    e\echo
    f\foxtrot
    g\golf
    h\hotel
    i\india
    j\juliet
    k\kilo
    l\lima
    m\mike
    n\november
    o\oscar
    p\papa
    q\quebec
    r\romeo
    s\sierra
    t\tango
    u\uniform
    v\victor
    w\whiskey
    x\xray
    y\yankee
    z\zulu
    PM
    AM
    one
    two
    three
    four
    five
    six
    seven
    eight
    nine
    ten
    eleven
    twelve
    thirteen
    fourteen
    fifteen
    sixteen
    seventeen
    eighteen
    nineteen
    twenty
    thirty
    fourty
    fifty
    sixty
    seventy
    eighty
    ninety
    hundred
    thousand
    million
    billion
    trillion
    

    Keep this list around, since you'll probably modify it several times and re-create your vocabulary to get it the way you like it.

    Create a word list of your database object names

    This is how I do it in SQL Server:

    SELECT DISTINCT * FROM 
    (
    SELECT DISTINCT [name] FROM Database1.[dbo].[sysobjects] WHERE xtype not IN ('F', 'S', 'PK', 'D', 'UQ') 
    UNION 
    SELECT DISTINCT column_name AS [name] FROM Database1.information_schema.[columns]
    UNION
    SELECT DISTINCT [name] FROM Database2.[dbo].[sysobjects] WHERE xtype not IN ('F', 'S', 'PK', 'D', 'UQ') 
    UNION 
    SELECT DISTINCT column_name AS [name] FROM Database2.information_schema.[columns]
    ...
    ) AS UnionTable
    

    Copy and paste the results into a text file.

    Create pronunciations for your database object names

    Use the same format for pronunciations as listed above. An easy way to create these is to use a regex search and replace function. In SQL Server Management Studio or Visual Studio the following (non-standard) regex will create pronunciations for two word mixed case names.

    Find: ^{[A-Z][a-z]+}{[A-Z][a-z]+}$
    Replace: \0\\\1-\2
    

    Review the pronunciations and clean up anything that doesn't look right. For acronyms, ASP becomes `A.S.P.'. Keep this list around, as well. If you decide to make vocabularies for other programming languages, you will probably include these words if you're a database developer.

    Create a text document that contains all of your SQL code (views, procedures, etc.)

    SQL Server:

    SELECT * FROM Database1.dbo.[View] UNION SELECT * FROM Database1.dbo.Routine UNION
    SELECT * FROM Database2.dbo.[View] UNION SELECT * FROM Database2.dbo.Routine 
    ...
    ORDER BY [Name]
    

    Remove comments and literal strings. Regex search and replace works well for this.

    Build your vocabulary

    Install NaturallySpeaking and create a new user if you have not already.

    Create a new vocabulary

    Click on "NaturallySpeaking | Manage Vocabularies...". Click New. Name the vocabulary something appropriate, such as "SQL". Base it on "Base General - Empty Dictation". When it asks you if you want to scan your email or documents, click cancel.

    Import words

    Click "Words | Import". Add the two word lists you created and import them.

    Adapt to writing style

    Click "Tools | Accuracy Center". Click "Add words from your documents to the vocabulary". Use the default settings, and select the document you created which contains your code.

    Try dictating some SQL

    The first thing you'll probably want to dictate is a select statement. Keep in mind that SELECT is what you use to begin a command in NaturallySpeaking that selects text. Because of this, you'll want to say "Cap" before dictating it so NaturallySpeaking doesn't get confused. That's it. Well, at least enough to get you started. Modify your word lists, pronunciations, and word properties as needed. There are other things you can do to increase accuracy and the speed at which you can dictate. As I think of them, I will edit this post and add them here.

    Tips for dictating into SQL Server Management Studio

    If you dictate into SQL Server Management Studio, you may notice very slow performance. Try the following to alleviate this:

    • Turn off all toolbars (create macros to access commonly used functionality)
    • Keep as few panes and documents open as possible
    • Keep only one database open at a time
    • Hide search results after you're done with them (Ctrl+R)
    • If all else fails, close and reopen management studio
    • Display the tab stops in the edit window to make it easier to format your SQL.

    Query Analyzer from SQL Server 2000 does not have these issues.