Search code examples

Solr regextransformer - parse space separated file

Hi i have a file with the following contents. the character '.' denotes space.

I want to use the data import handler to parse this data into three fields. this is what i have so far-

<entity name="iCode" processor="LineEntityProcessor" url="file.csv" 

  <field column="code" regex="^(\w*)"  sourceColName="rawLine" />
  <field column="fruit" regex="(\W)\b.*"  sourceColName="rawLine" />
  <field column="color" regex="(\w*)\s*$"  sourceColName="rawLine" />


The import runs successfully, but i dont get any documents created in solr. I believe the regex are not correct.

Any ideas how I can get this to work?


  • Try

    <field column="code" regex="^(\S+)" />
    <field column="fruit" regex="(\S)+(?=\s+\S+\s+$)" />
    <field column="color" regex="(\S+)(?=\s+$)" />
    • The first matches all non-whitespaces at the beginning of the line.
    • The second matches all non-whitespaces followed by whitespaces and non-spaces at the end of the line, leaving them out of the result.
    • The third matches all non-whitespaces followed by whitespaces at the end of the line, leaving them out of the result.