Search code examples
solrcharinvalid-characters

I have to find a char in a position in an xml file, how can I do?


I'm using Solr and it returned an error 500 and it says that there's an invalid utf8 middle byte at char 139212, how can i go to this char to see what's the problem?


Solution

  • If you have the file sitting on your filesystem, and it's a Unix/Linux type system, you could try something like this on the command line:

    $ head -c 139300 <filename> | tail -c 1000
    

    This tells head that you want the first 139300 bytes in the file. Since number 139212 will be near the end of that, you would be able to see your character in context, or at least figure out what section/block it's in. Because that's a good amount of data, I added | tail -c 1000 to show only the last 1000 bytes of the first 139300 characters. That should keep you from having to scroll as much data through your terminal to see your results.

    If you generated this XML yourself, I'd recommend adding XML validation or at least illegal character detection so you can avoid the problem in the future.