Search code examples
amazon-emrgeomesa

Running GeoMesa HBase on AWS S3, how do I ingest / export remotely


I am running Geomesa-Hbase on an EMR cluster, set up as described here. I'm able to ssh into the Master and ingest / export from there. How would I ingest / export the data remotely from for example a lambda function (preferably a python solution). Right now for the ingest part I'm running a lambda function that just sends a shell command via SSH:

c = paramiko.SSHClient()
c.connect(hostname = host, username = "ec2-user", pkey = k )
c.exec_command("geomesa-hbase ingest <file_to_ingest_on_S3> ...")

But I imagine I should be able to ingest / export remotely without using ssh. I've been looking for days for a solution but no luck so far.


Solution

  • You can ingest or export remotely just by running GeoMesa code on a remote box. This could mean installing the command-line tools, or using the GeoTools API in a processing framework of your choice. GeoServer is typically used for interactive (not bulk) querying.

    There isn't any out-of-the-box solution for ingest/export via AWS lambdas, but you could create a docker image with the GeoMesa command-line tools and invoke that.

    Also note that the command-line tools support ingest and export via map/reduce job, which allows you to run a distributed process using your local install.