Search code examples
solrdataimporthandler

How to convert text file with delimiters as fields into solr document


I have a text file which consist of the following data:

andy~1234;M~64365113~2P3VWU3H10~~
mike~4152;M~64365113~2P3VWU3H10~0.6~MG
lesa~4512;F,PM~~N/A~16~MG
riky~7845;M,PM2~~N/A~3.99~MG

I wish to convert it into a solr document in the following manner :

  1. Each row is considered as 1 <doc> document in solr.
  2. '~' is a delimiter which means fields <field> of document.

Do I need to use a DataImportHandler for handling these kind of files? which kind of DataImportHandler is useful. I've gone through LineEntityProcessor, but i didn't understand how I can use it for my problem.


Solution

  • Assuming that you know the field names (lines contain just the values), here's an example of how you can do that using a FileDatasource + LineEntityProcessor + ScriptTransformer:

    <dataConfig>  
        <dataSource encoding="UTF-8" type="FileDataSource" name="file-datasource"/>
        <script><![CDATA[
            function parse(row)    
            {
                var rawLine = row.get("rawLine")
    
                // Split the rawLine 
                // And for each field
    
                // row.put('fieldName', fieldValue);                    
    
                return row;
            }
        ]]></script>        
        <document>
            <entity name="jc"
                processor="LineEntityProcessor"
                url="file:///your.path.file.here"
                dataSource="file-datasource"
                transformer="script:parse">
        </document>
    </dataConfig>