Search code examples
hadoopsolrlucidworks

"Hadoop-Solr Lucidworks Project" retrieve input name-path


I am using this project :https://github.com/lucidworks/hadoop-solr Does anyone know in which value is saved the name (or the path) of the document that is being processed. I want to retrieve this value to Solr Admin (adding a field with its name to my schema). Is this possible?

Example:i want to able to see the name of the document, from which the query returns same results.

i am running the project with this command :

    hadoop jar solr-hadoop-job-2.2.5.jar 
    com.lucidworks.hadoop.ingest.IngestJob  
    -Dlww.commit.on.close=true -DcsvDelimiter= 
   -cls com.lucidworks.hadoop.ingest.CSVIngestMapper -c spyros1  
    - i  /usr/local/hadoop/input 
    -of com.lucidworks.hadoop.io.LWMapRedOutputFormat 
    -s http://127.0.1.1:8983/solr

Solution

  • This worked for me :

    hadoop jar solr-hadoop-job-2.2.5.jar com.lucidworks.hadoop.ingest.IngestJob  
        -Dlww.commit.on.close=true 
        -Dcom.lucidworks.hadoop.ingest.RegexIngestMapper.regex="\\w+" 
       -Dcom.lucidworks.hadoop.ingest.RegexIngestMapper.groups_to_fields=0=match_ss  
       -cls com.lucidworks.hadoop.ingest.RegexIngestMapper  
       -c collection1 -i /path/* -s http://127.0.1.1:8983/solr
       -of com.lucidworks.hadoop.io.LWMapRedOutputFormat 
    

    Also see this for more info.