my question is simple. Can Pig (Hadoop) handle ebcdic file? I have some of them and I'd like to handle and process them using Pig in the Hadoop Platform.
At the moment I've saved the file and try to load that as follows:
A = LOAD '/user/enrico/FilesForPigs/IRIS.txt' AS (f1,f2,f3);
It seems to work, but when I tried typing: DUMP A; I received an error.
EDIT:
Following Donald advice, I am trying to create a Java program to make the conversion, in particular I am trying to create my own LOAD function.
Actually, I have the following problem in the code:
@Override
public InputFormat getInputFormat() {
return new TextInputFormat();
}
This is the example I found, but TextInputFormat is not right for my case. Do you know how can I solve that?
Thanks
No, the default storage mechanism assumes data is ASCII separated by tabs. You can use PigStorage(',')
to change the delimiter to something like comma.
You have two options:
Maybe someone else has implemented this, but after a quick google search I didn't see anything.