I am importing data using the DIH and have a need to parse a string, capture two numbers, then populate a field of type=location (which accepts a "lat,long" coordinate pair). The logical thing to do is:
<field column="latLong"
regex="Latitude is ([-\d.]+)\s+ Longitude is ([-\d.]+)\s+"
replaceWith="$1,$2" />
It seems the DIH only knows about a single capture group. So $2 is never used.
Has anyone ever used more than one capture with the regexTransformer? Searching the documentation didn't provide any examples of $2 or $3. What gives, O ye priests of Solr?
It is not true that Solr DIH does not understand $2
, $3
, etc.,
I just tried this. Added this in DIH data-config.xml:
<entity name="foo"
transformer="RegexTransformer"
query="SELECT list_id FROM lists WHERE list_id = ${Lists.id}">
<field column="firstLastNum"
regex="^(\d).*?(\d)$"
replaceWith="$1:$2"
sourceColName="list_id"/>
</entity>
and then added the field in my schema.xml
<field name="firstLastNum" type="string" indexed="true" stored="true"/>
When I indexed a document with list_id = 390, firstLastNum was 3:0 which is indeed correct.
I suspect that the issue may be because of an incorrect regex which matches only the first part and not the second. Maybe try this regex:
regex="Latitude is ([-\d.]+)\s*Longitude is ([-\d.]+)"
Another reason could be that latLong is of location
type and $1,$2
is of string type, but I am not sure about that.