To create indexes, Geomesa creates multiple tables in HBase. I have a few questions:
I am planning to use Geomesa with Hbase (backed by S3) combination to store my geospatial data; the data size can grow up to Terabytes to Petabytes.
I am investigating how reliable Geomesa is in terms of synchronization between the primary and index table?
HBase Tables:
catalog1
catalog1_node_id_v4 (Main Table)
catalog1_node_z2_geom_v5 (Index Table)
catalog1_node_z3_geom_lastUpdateTime_v6 (Index Table)
catalog1_node_attr_identifier_geom_lastUpdateTime_v8 (Index Table)
Geomesa Schema
geomesa-hbase describe-schema -c catalog1 -f node
INFO Describing attributes of feature 'node'
key | String
namespace | String
identifier | String (Attribute indexed)
versionId | String
nodeId | String
latitude | Integer
longitude | Integer
lastUpdateTime | Date (Spatio-temporally indexed)
tags | Map
geom | Point (Spatio-temporally indexed) (Spatially indexed)
User data:
geomesa.index.dtg | lastUpdateTime
geomesa.indices | z3:6:3:geom:lastUpdateTime,z2:5:3:geom,id:4:3:,attr:8:3:identifier:geom:lastUpdateTime
GeoMesa does not do anything to sync indices - generally this should be taken care of in your ingest pipeline.
If you have a reliable feature ID tied to a given input feature, then you can write that feature multiple times without causing duplicates. During ingest, if a batch of features fails due to a transient issue, then you can just re-write them to ensure that the indices are correct.
For HBase, when you call flush
or close
on a feature writer, the pending mutations will be sent to the cluster. Once that method returns successfully, then the data has been persisted to HBase. If an exception is thrown, you should re-try the failed features. If there are subsequent HBase failures, you may need to recover write-ahead logs (WALs) as per standard HBase operation.
A feature may also fail to be written due to validation (e.g. a null geometry). In this case, you would not want to re-try the feature as it will never ingest successfully. If you are using the GeoMesa converter framework, you can pre-validate features to ensure that they will ingest ok.
If you do not have an ingest pipeline already, you may want to check out geomesa-nifi, which will let you convert and validate input data, and re-try failures automatically through Nifi flows.