Search code examples
c++sqlsql-server

Creating statistical data from a table


I have a table with 20 columns of measurements. I would like 'convert' the table into a table with 20 rows with columns of Avg, Min, Max, StdDev, Count types of information. There is another question like this but it was for the 'R' language. Other question here.

I could do the following for each column (processing the results with C++):

    Select Count(Case When [avgZ_l1] <= 0.15 and avgZ_l1 > 0 then 1 end) as countValue1, 
Count(case when [avgZ_l1] <= 0.16 and avgZ_l1 > 0.15 then 1 end) as countValue2,
Count(case when [avgZ_l1] <= 0.18 and avgZ_l1 > 0.16 then 1 end) as countValue3, 
Count(case when [avgZ_l1] <= 0.28 and avgZ_l1 > 0.18 then 1 end) as countValue4,
Avg(avgwall_l1) as avg1, Min(avgwall_l1) as min1, Max(avgZ_l1) as max1, 
STDEV(avgZ_l1) as stddev1, count(*) as totalCount  from myProject.dbo.table1

But I do not want to process the 50,000 records 20 times (once for each column). I thought there would be away to 'pivot' the table onto its side and process the data at the same time. I have seen examples of the 'Pivot' but they all seem to pivot on a integer type field, Month number or Device Id. Once the table is converted I could then fetch each row with C++. Maybe this is really just 'Insert into ... select ... from' statements.
Would the fastest (execution time) approach be to simply create a really long select statement that returns all the information I want for all the columns? We might end up with 500,000 rows. I am using C++ and SQL 2014.

Any thoughts or comments are welcome. I just don't want have my naive code to be used as a shining example of how NOT to do something... ;)...


Solution

  • If your table looks the same as the code that you sent in r then the following query should work for you. It selects the data that you requested and pivots it at the same time.

    create table #temp(ID int identity(1,1),columnName nvarchar(50));
    insert into #temp 
    SELECT COLUMN_NAME as columnName
    FROM myProject.INFORMATION_SCHEMA.COLUMNS               -- change myProject to the name of your database. Unless myProject is your database
    WHERE TABLE_NAME = N'table1';                           --change table1 to your table that your looking at. Unless table1 is your table
    declare @TableName nvarchar(50) = 'table1';             --change table1 to your table again 
    
    declare @loop int = 1;
    declare @query nvarchar(max) = '';
    declare @columnName nvarchar(50);
    declare @endQuery nvarchar(max)='';
    while (@loop <= (select count(*) from #temp))
        begin
            set @columnName = (select columnName from #temp where ID = @loop);
            set @query = 'select t.columnName, avg(['+@columnName+']) as Avg ,min(['+@columnName+']) as min ,max(['+@columnName+'])as max ,stdev(['+@columnName+']) as STDEV,count(*) as totalCount from '+@tablename+' join   #temp t on t.columnName = '''+@columnName+''' group by t.columnName';
            set @loop += 1;
            set @endQuery += 'union all('+ @query + ')';
        end;
    set @endQuery = stuff(@endQuery,1,9,'')
    Execute(@endQuery);
    drop table #temp;
    

    It creates a #temp table which stores the values of your column headings next to an ID. It then uses the ID when looping though the number of columns that you have. It then generates a query which selects what you want and then unions it together. This query will work on any number of columns meaning that if you add or remove more columns it should give the correct result.

    With this input:

     age   height_seca1 height_chad1 height_DL weight_alog1
    1   19         1800         1797       180           70
    2   19         1682         1670       167           69
    3   21         1765         1765       178           80
    4   21         1829         1833       181           74
    5   21         1706         1705       170          103
    6   18         1607         1606       160           76
    7   19         1578         1576       156           50
    8   19         1577         1575       156           61
    9   21         1666         1665       166           52
    10  17         1710         1716       172           65
    11  28         1616         1619       161           66
    12  22         1648         1644       165           58
    13  19         1569         1570       155           55
    14  19         1779         1777       177           55
    15  18         1773         1772       179           70
    16  18         1816         1809       181           81
    17  19         1766         1765       178           77
    18  19         1745         1741       174           76
    19  18         1716         1714       170           71
    20  21         1785         1783       179           64
    21  19         1850         1854       185           71
    22  31         1875         1880       188           95
    23  26         1877         1877       186          106
    24  19         1836         1837       185          100
    25  18         1825         1823       182           85
    26  19         1755         1754       174           79
    27  26         1658         1658       165           69
    28  20         1816         1818       183           84
    29  18         1755         1755       175           67
    

    It will produce this output:

                        avg     min     max     stdev   totalcount
    age                 20      17      31      3.3     29 
    height_seca1        1737    1569    1877    91.9    29 
    height_chad1        1736    1570    1880    92.7    29 
    height_DL           173     155     188     9.7     29 
    weight_alog1        73      50      106     14.5    29 
    

    Hope this helps and works for you. :)