Search code examples
ruby-on-railsrubysolrrubygemsrsolr

Sunspot Solr Reindexing failing due to illegal characters


I'm having an issue where Solr is failing to reindex my site, due to the following error from my production log:

bundle exec rake sunspot:solr:reindex
rake aborted!
RSolr::Error::Http: RSolr::Error::Http - 400 Bad Request
Error: Illegal character ((CTRL-CHAR, code 12))
 at [row,col {unknown-source}]: [155,1]

I am not sure where this 'illegal character' is being generated from, nor where to find this. I more than appreciate everyone's help, as it is causing a 500 server error on my app right now. Thank you, and let me know if more information is needed.

(Rails 3.2) (Rsolr 1.0.10)


Solution

  • Usually this is caused by bad data in your database. If you're using MySQL you can find any instances of control character 12 with a query like this:

    SELECT * FROM table WHERE col REGEXP CHAR(12);

    Then you can remove the character from the content of any matched rows & proceed to reindex.

    You could also do something like this to remove the control characters:

    UPDATE table SET col=REPLACE(col, CHAR(12), '') WHERE col REGEXP CHAR(12);