Ok, so, in scalding we can easily work with matrix, using matrix api, and it is ok - in a such way:
val matrix = Tsv(path, ('row, 'col, 'val))
.read
.toMatrix[Long,Long,Double]('row, 'col, 'val)
But how can I transform matrix to that format from format, like we usually write? Are there some elegant ways?
1 2 3
3 4 5
5 6 7
to
1 1 1
1 2 2
1 3 3
2 1 3
2 2 4
2 3 5
3 1 5
3 2 6
3 3 7
I need this to make operations on matrix with huge sizes, and I don't know the number of rows and columns (it is possible to give sizes if file? NxM for example).
I tried to make smth with TextLine( args("input") )
but i dunno how to count line number. I want to convert matrix on hadoop, mb there r other ways how to deal with format? Is it possible with scalding?
The below answer is not mine but OP's answer, which was put in the question.
Here's what I've done, which outputs what I wanted:
var prev: Long = 0
var pos: Long = 1
val zeroInt = 0
val zeroDouble = 0.0
TextLine( args("a") )
.flatMap('line -> 'number) { line : String => line.split("\\s+") }
.mapTo(('offset, 'line, 'number) -> ('row, 'val)) {
(offset: Long, line: String, number: String) =>
pos = if(prev == (offset + 1)) pos + 1 else 1
prev = offset + 1
(offset + 1, number) }
.filter('row, 'col, 'v) {
(row: Long, col: String, v: String) =>
val (row, col, v) = line
(v != zeroInt.toString) && (v != zeroDouble.toString) }
.write(Tsv(args("c")))