I have a file and I want to give it to an mllib algorithm. So I am following the example and doing something like:
val data = sc.textFile(my_file).
map {line =>
val parts = line.split(",");
Vectors.dense(parts.slice(1, parts.length).map(x => x.toDouble).toArray)
};
and this works except that sometimes I have a missing feature. That is sometimes one column of some row does not have any data and I want to throw away rows like this.
So I want to do something like this map{line => if(containsMissing(line) == true){ skipLine} else{ ... //same as before}}
how can I do this skipLine action?
You can use filter
function to filter out such lines:
val data = sc.textFile(my_file)
.filter(_.split(",").length == cols)
.map {line =>
// your code
};
Assuming variable cols
holds number of columns in a valid row.