Search code examples
redisgeomesa

Geomesa: Using statistics in redis


I try to use geomesa with redis. I thought that redis enables statistics on geomesa by default.

my redis geomesa db:

./geomesa-redis describe-schema -u localhost:6379 -c geomesa -f SignalBuilder                                                                                   
INFO  Describing attributes of feature 'SignalBuilder'
geo           | Point   (Spatio-temporally indexed) (Spatially indexed)
time          | Date    (Spatio-temporally indexed) (Attribute indexed)
cam           | String  (Attribute indexed) (Attribute indexed)
imei          | String  
dir           | Double  
alt           | Double  
vlc           | Double  
sl            | Integer 
ds            | Integer 
dir_y         | Double  
poi_azimuth_x | Double  
poi_azimuth_y | Double  

User data:
  geomesa.attr.splits     | 0
  geomesa.feature.expiry  | time(30 days)
  geomesa.id.splits       | 0
  geomesa.index.dtg       | time
  geomesa.indices         | z3:7:3:geo:time,z2:5:3:geo,attr:8:3:time,attr:8:3:cam,attr:8:3:cam:time
  geomesa.stats.enable    | true
  geomesa.table.partition | time
  geomesa.z.splits        | 0
  geomesa.z3.interval     | week

from doc: https://www.geomesa.org/documentation/stable/user/datastores/query_planning.html#stats-collected

Stat generation can be enabled or disabled through the simple feature type user data using the key geomesa.stats.enable

Cached statistics, and thus cost-based query planning, are currently only implemented for the Accumulo and Redis data stores.

*Total count, *Min/max (bounds) for the default geometry, *default date and any indexed attributes, *Histograms for the default geometry, default date and any indexed attributes, *Frequencies for any indexed attributes...

Why the return time is increased when increased amount of data?

./geomesa-redis export -u localhost:6379 -c geomesa -f SignalBuilder -q "cam like '%' and bbox(geo,38,56,39,57)" --hints STATS_STRING='Enumeration(cam)'
INFO  Running export - please wait...
id,stats:String,*geom:Geometry
stat,"{""5798a065-d51e-47a1-b04b-ab48df9f1324"":203215}",POINT (0 0)
INFO  Feature export complete to standard out for 1 features in 2056ms

next request

/geomesa-redis export -u localhost:6379 -c geomesa -f SignalBuilder -q "cam like '%' and bbox(geo,38,56,39,57)" --hints STATS_STRING='Enumeration(cam)'

INFO  Running export - please wait...
id,stats:String,*geom:Geometry
stat,"{""5798a065-d51e-47a1-b04b-ab48df9f1324"":595984}",POINT (0 0)
INFO  Feature export complete to standard out for 1 features in 3418ms

How to understand that statistics are collected and saved, and used when returning hints stats, like STATS_STRING='MinMax(time)' or STATS_STRING='Enumeration(cam)'?

And how to use sampling with geotools? I try next

geomesa-cassandra export -P 10.200.217.24:9042 -u cassandra -p cassandra \
-k geomesa -c gsm_events -f SignalBuilder \
-q "cam like '%' and time DURING 2021-12-27T16:50:38.004Z/2022-01-26T16:50:38.004Z" \
--hints SAMPLE_BY='cam';SAMPLING=0.000564

but it does not work. Thank you for any answer.


Solution

  • When you run an export with a query hint for stats, GeoMesa will always run a query. If you want to use the cached statistics, use the stats-* commands instead. In code, you'd use the stats method which all GeoMesa data stores implement.