Search code examples
sqlrrodbc

Getting precise percentages from a SQL Query in R using the RODBC Connection


I am trying to create a function in R using the RODBC package to loop through five tables in a SQL database to find the count of rows for each year for each variable, the % null for each year for each var, and the % missing for each year for each var. I am having trouble creating a single function to give me this accurate output. I created a function that outputs the total count and percent null, but cannot seem to get it to produce an accurate percent missing directly - it seems to be rounding to whole integers, and not consistently rounding up or down. Below is my code. Any help with this would be much appreciated.

PctNull <- sqlQuery(channel, "select 
                [EVENT_YEAR] AS 'YEAR', 
                COUNT(*) AS 'TOTAL',
                (((COUNT(CASE WHEN MOTHER_EDUCATION_TRENDABLE = -1 THEN 1 END))*100)/COUNT(*)) AS 'PctMiss',
                (((COUNT(*) - COUNT(MOTHER_EDUCATION_TRENDABLE))*100)/COUNT(*)) AS 'PctNull'


                from [GA_CMH].[dbo].[BIRTHS]

                GROUP BY [EVENT_YEAR]
                ORDER BY [EVENT_YEAR]")

Here is my output and desired format, however I would like to improve my PctMiss accuracy:


Solution

  • This is a known SQL Server situation where integer columns if used in expressions must be converted to decimals which you can do so implicitly by using at least one decimal value in your expression or explicitly by using CAST or CONVERT.

    For implicit conversion, multiply your COUNT() value by 100.00 (with 2 decimal values) or entire value with 1.00:

    PctNull <- sqlQuery(channel, 
                        "SELECT [EVENT_YEAR] AS 'YEAR', 
                                COUNT(*) AS 'TOTAL',
                                (((COUNT(CASE WHEN MOTHER_EDUCATION_TRENDABLE = -1 
                                              THEN 1 
                                         END)) * 100.00) / COUNT(*)) AS 'PctMiss',
                                (((COUNT(*) - COUNT(MOTHER_EDUCATION_TRENDABLE)) * 100.00) /  
                                   COUNT(*)) AS 'PctNull'
                         FROM [GA_CMH].[dbo].[BIRTHS]
                         GROUP BY [EVENT_YEAR]
                         ORDER BY [EVENT_YEAR]")
    

    For explicit conversion, specifically declare the type and precision using CAST:

    (((CAST(COUNT(CASE WHEN MOTHER_EDUCATION_TRENDABLE = -1 
                       THEN 1 
                  END)) * 100) AS DECIMAL(10,2)) / COUNT(*))
    

    Or CONVERT:

    (((CONVERT(DECIMAL(10,2), COUNT(CASE WHEN MOTHER_EDUCATION_TRENDABLE = -1 
                                         THEN 1 
                                    END)) * 100)) / COUNT(*))