I am trying to migrate data from a Oracle Database (12+) with NiFi to HDFS. The processors "QueryDatabaseTable" and "ExecuteSQL" cant seem to handle geometry data type (SDO_GEOMETRY).
I'm getting the error "Unkown SQL Type 2002, cannot be converted to avro type"
When alternatively converting the data type to a long string using the "sdo_util.to_wktgeometry()" function, doesnt seem to be possible resp. for the first processor and the second gives the error : "SDO_UTIL.TO_WKTGEOMETRY() illegal character"
Does anybody have tips about migrating Geometry data types from Oracle (12+)by NiFi? A line string is enough information from the Geometry Object (for now)
I imagine NiFi does not know how to handle any kind of object types defined in relational databases. And obviously much less understanding complex types like spatial types. Even if it did understand object types, you would be left with something hard to use since it would expose the internals of the type and you need a deep inspection of the Oracle manuals to decode the content in order to use the information.
The simplest is definitely to get the spatial types in a serialized string or binary notation. You have a choice between multiple notations and two approaches to fetch those:
Using object type methods:
Important: note that you must use a table alias in order to invoke object methods!
WKT:
SQL> select c.location.get_wkt() from us_cities c where city='New York';
C.LOCATION.GET_WKT()
-------------------------------------------------------------------------------
POINT (-73.943849 40.6698)
1 row selected.
GML:
SQL> select c.location.get_gml() from us_cities c where city='New York';
C.LOCATION.GET_GML()
-------------------------------------------------------------------------------
<gml:Point srsName="EPSG:4326" xmlns:gml="http://www.opengis.net/gml"><gml:coordinates decimal="." cs="," ts=" ">-73.943849,40.6698 </gml:coordinates></gml:Point>
1 row selected.
GeoJSON:
SQL> select c.location.get_geojson() from us_cities c where city='New York';
C.LOCATION.GET_GEOJSON()
-------------------------------------------------------------------------------
{ "type": "Point", "coordinates": [-73.943849, 40.6698] }
1 row selected.
Using functions:
WKT
SQL> select sdo_util.to_wktgeometry(location) from us_cities c where city='New York';
SDO_UTIL.TO_WKTGEOMETRY(LOCATION)
-------------------------------------------------------------------------------
POINT (-73.943849 40.6698)
1 row selected.
GML
SQL> select sdo_util.to_gmlgeometry(location) from us_cities c where city='New York';
SDO_UTIL.TO_GMLGEOMETRY(LOCATION)
-------------------------------------------------------------------------------
<gml:Point srsName="EPSG:4326" xmlns:gml="http://www.opengis.net/gml"><gml:coordinates decimal="." cs="," ts=" ">-73.943849,40.6698 </gml:coordinates></gml:Point>
1 row selected.
GeoJSON
SQL> select sdo_util.to_geojson(location) from us_cities c where city='New York';
SDO_UTIL.TO_GEOJSON(LOCATION)
-------------------------------------------------------------------------------
{ "type": "Point", "coordinates": [-73.943849, 40.6698] }
1 row selected.
In those examples I show only points. Geometries can of course be more complex: lines, multi-lines, polygons, multi-polygons, polygon with voids. Also 3D structures: surfaces and solids ...
For those geometries that are points, you can just extract the X and Y coordinates straight from the objects. Again note the use of aliases.
select city, c.location.sdo_point.x longitude, c.location.sdo_point.y latitude
from us_cities c;
CITY LONGITUDE LATITUDE
------------------------------ ---------- ----------
New York -73.943849 40.6698
Los Angeles -118.4112 34.112101
....
Sioux Falls -96.7301 43.544201
Simi Valley -118.7513 34.26305
195 rows selected.
For other geometries (lines etc) then you need to pass via a text serialization.