What is the most performance efficient way to load a single (or a few) wide rows from Cassandra to C#? My wide rows have 10.000-100.000 columns. The primary keys consists of several values but the column key is a single string and the column value is a single counter (see the schema below).
Using "tracing on" in the cqlsh I can see that Cassandra can select a wide row with 17.000 columns in 44 m, but loading this data all the way into C# using the Datastax driver takes 700 ms. Is there a faster way? I need to load the full wide row in 50-100ms. (Is there a more native way? A way minimizing the network traffic? A faster driver? Another configuration of the driver? Or something else?)
I actually do not need all 17.000 columns. I just need the columns where ‘support’ >= 2 or the top 1000 columns sorted descending by ‘support’. But since ‘support’ is my column value I don't know of any way to query like this in CQL.
This is my table:
CREATE TABLE real_time.grouped_feature_support (
algorithm_id int,
group_by_feature_id int,
select_feature_id int,
group_by_feature_value text,
select_feature_value text,
support counter,
PRIMARY KEY ((algorithm_id, group_by_feature_id, select_feature_id, group_by_feature_value), select_feature_value)
This is my way to access the data using the Datastax driver:
var table = session.GetTable<GroupedFeatureSupportDataEntry>();
var query = table.Where(x => x.CustomerAlgorithmId == customerAlgorithmId
&& x.GroupByFeatureId == groupedFeatureId
&& myGroupedFeatureValues.Contains(x.GroupByFeatureValue)
&& x.GroupByFeatureValue == groupedFeatureValue
&& x.SelectFeatureId == selectFeatureId)
.Select(x => new
{
x.GroupByFeatureValue,
x.SelectFeatureValue,
x.Support,
})
.Take(1000000);
var result = query.Execute();
If you are looking for the best performance when retrieving a large result set you should not use a mapping component like Linq-to-cql or any other.
You can retrieve the rows using the technique documented on the driver readme, in your case it would be something like:
var query = "SELECT * from grouped_feature_support WHERE" +
" algorithm_id = ? AND group_by_feature_id = ? " +
" AND select_feature_id = ? AND group_by_feature_value = ?";
//Prepare the query once in your application lifetime
var ps = session.Prepare(query);
//Reuse the prepared statement by binding different parameters to it
var rs = session.Execute(ps.Bind(parameters));
foreach (var row in rs)
{
//The enumerator will yield all the rows from Cassandra
//Retrieving them in the back in blocks of 5000 (determined by the pagesize).
}
//You can also use a IEnumerable<T> Linq Extensions to filter
var filteredRows = rs.Where(r => r.GetValue<long>("support") > 2);