Search code examples
pythonsolrsunburnt

KeyError: 'id' when trying to index documents to Solr using sunburnt


I am trying to index a few text files to Solr using sunburnt. Below is my code

solr_url = "http://localhost:8983/solr"      
h = httplib2.Http(cache="/var/tmp/solr_cache")    
solr_instance = sunburnt.SolrInterface(url=solr_url, http_connection=h)

for url,title, webpage in webpages: 
html_id = hashlib.md5(url).hexdigest()
doc = {"id":html_id, "content":webpage, "title":title}  
solr_instance.add(doc)

try:
    solr_instance.commit()
except:
      print "Could not Commit Changes to Solr, check the log files."
else:
      print "Successfully committed changes"

But when I run this I get below error.

  File "/Users/ananya/Desktop/dbms project/code/extractText/ExtractText.py", line 94, in index_to_Solr
    solr_instance = sunburnt.SolrInterface(url=solr_url, http_connection=h)

  File "/Users/ananya/anaconda/lib/python2.7/site-packages/sunburnt/sunburnt.py", line 166, in __init__
    self.init_schema()

  File "/Users/ananya/anaconda/lib/python2.7/site-packages/sunburnt/sunburnt.py", line 177, in init_schema
    self.schema = SolrSchema(schemadoc, format=self.format)

  File "/Users/ananya/anaconda/lib/python2.7/site-packages/sunburnt/schema.py", line 417, in __init__
    if self.unique_key else None

KeyError: 'id'

I am very new to Solr. Please help me. Do I need to make any changes to the schema file? If yes, please let me know how.

Thanks.


Solution

  • If you're using Solr 4.8 or greater this is a bug against sunburnt 0.6.

    The fork of sunburnt by arafalov has a patch that fixed it for me.

    Try:

    git clone git@github.com:arafalov/sunburnt.git
    cd sunburnt
    python setup.py install # optionally with --user