I have data in format:
"header1","header2","header3",...
"value11","value12","value13",...
"value21","value22","value23",...
....
What is the best way to parse it in Scalding? I have over 50 columns altogether, but I am only interested in some of them. I tried importing it with Csv("file"), but that doesn't work.
The only solution that comes to mind is to parse it manually with TextLine and disregard the line with offset == 0. But I'm sure there must be a better solution.
In the end I solved it by parsing each line manually as follows:
def tipPipe = TextLine("tip").read.mapTo('line ->('field1, 'field5)) {
line: String => val arr = line.split("\",\"")
(arr(0).replace("\"", ""), if (arr.size >= 88) arr(4) else "unknown")
}