Search code examples
hbasenutch

How to add extra field in hbase through nutch?


I am writing a nutch plugin at fetch time. I am doing some analysis on the fetched webpages and the results are to be stored in hbase corresponding to the webpage. I am not sure how to add an extra field and how to write data to that field using nutch.


Solution

  • If You want to add Additional Fields While indexing in Solr ::

    If the value of the additional fields fixed (Static), then you can use the Nutch's index-static plugin.

    It allows you to add a number of fields with their contents.

    Step 1:

    You first need to enable index.static property in nutch-site.xml

    Step 2:

    Add index.static property

    <property>
     <name>index.static</name>
     <value>first_field:value,second_field:value</value>
     <description>
      Used by plugin index-static to adds fields with static data at indexing time. 
       You can specify a comma-separated list of fieldname:fieldcontent per Nutch job.
      Each fieldcontent can have multiple values separated by space, e.g.,
       field1:value1.1 value1.2 value1.3,field2:value2.1 value2.2 ...
       It can be useful when collections can't be created by URL patterns, 
      like in subcollection, but on a job-basis.
      </description>
    </property>
    

    Step 3:

    Add field definition in schema.xml

    Step 4:

    Enabled the index in plugin.includes

    Or You can follow https://wiki.apache.org/nutch/WritingPluginExample-1.2 for Writing Plugin