I have a collection of data that looks as follows:
id name c1 c2 c3 c4 ... c50
-----------------------------------------------
1 string1 0.1 0.32 0.54 -1.2 ... 2.3
2 string2 0.12 0.12 -0.34 2.45 ... 1.3
...
(millions of records)
So I have an id column, a string column, then 50 floating point columns.
There will be only one type of query run on this data that in a traditional SQL SELECT statement would look like this:
SELECT name FROM table WHERE ((a1-c1)+(a2-c2)+(a3-c3)+...+(a50-c50)) > 1;
where a1,a2,a3,etc
are values that are generated before the query is sent (not housed in the data table).
My question is this: Does anyone have any recommendations as to what type of database would handle this type of query the fastest. I have used SQL server
(which is majorly slow), so I am looking for other opinions.
Would there be a way to optimize SQL server for this type of query? I have also been curious about column store databases such as MonetDB
. Or perhaps a document store database such as MongoDB
. Does anyone have any suggestions?
Many thanks, Brett
You can continue using SQL Server and use a persisted computed column that calculates the sum of all the values and index that.
ALTER TABLE tablename ADD SumOfAllColumns AS (c1 + c2 + ... + c50) PERSISTED
Then you can rearrange your query as:
SELECT name FROM tablename WHERE SumOfAllColumns < a1+a2+a3+...+a50 - 1
This query will be able to use the index on the computed column and should find the relevant rows quickly.