Search code examples
apache-nifi

geoIpEnrich Processor NiFi


I'm trying to utilize the geoEnrichIP processor as part of a nifi flow. I'm trying to follow the documentation https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-enrich-nar/1.6.0/org.apache.nifi.processors.GeoEnrichIP/ without luck.

I'm trying to attach the geoEnrichIP processor at the end of a convertRecord Processor.

ConvertRecord(Json) ---> geoEnrichIP

in the configuration for the geoEnrichIP I've added an attribute for the ip address field. The Field is Enrich: host_address But I'm not getting anything in my output. I don't think I'm referencing the field host_address which contains the IP Address.

How do you properly reference the ip address name of host_address to enrich with geolocation data?

Thanks


Solution

  • For GeoEnrichIP the field you want to enrich on must be an Attribute of the FlowFile, not part of the FlowFile content (e.g. inside a record).

    The IP Address Attribute property must contain the name of the Attribute.

    If the IP is in the FlowFile content, you'll need to extract the IP and put the value in an Attribute.

    There are a few ways to do this, depending on your use case - but there's also an alternative approach.

    1. If every FlowFile contains only a SINGLE Record, then you can use EvaluateJsonPath to extract the IP and create an Attribute.
    2. If every FlowFile contains MULTIPLE Records, with completely random IP addresses, you could use SplitJson to create unique FlowFiles and then EvaluateJsonPath (this is usually a pattern to avoid!)
    3. If every FlowFile contains MULTIPLE Records, but the IP is one of a smaller set of common IP addresses, then you could use PartitionRecord to bucket Records into FlowFiles with a common IP Attribute.

    However, rather than using GeoEnrichIP, you could instead use LookupRecord with an IPLookupService. In this way, you can handle either SINGLE or MULTIPLE Records per FlowFile and you do not need to deal with Attributes, instead relying on data within the Record itself. This handles all 3 cases listed above.

    I wrote a post about using LookupRecord here if you need more details on how to use it, it's a very powerful processor for enrichment workflows.