I'm following the Hello Prediction example of the Google Prediction API.
Unfortunately the training file language_id.txt seems to be corrupted somehow? I tested downloading it using Google Chrome and Firefox, same result, see screenshot:
I think, therefore my tests do not work and I always get back English 1.0
as score for the Muy Bueno
example string.
...
{
"label": "English",
"score": "1.000000"
},
...
Where do I get a usable language_id.txt test file from or is there anything else I can do?
EDIT: My guess is, the file has not been stored in UTF-8 format on the Google server?
The file is in UTF-8, but it doesn't declare an encoding, so viewing it in a browser assumes the default HTTP charset, ISO-8859-1.
I'm not sure why you're actually getting a corrupted copy (if I view it in Chrome, it appears corrupt, but saving it results in a correct UTF-8-encoded file), but perhaps you could try another mechanism to download it?