I have an array of numbers separated by comma as shown:
a:{108,109,110,112,114,115,116,118}
I need the output something like this:
a:{108-110, 112, 114-116, 118}
I am trying to group the continuous numbers with "-" in between. For example, 108,109,110 are continuous numbers, so I get 108-110. 112 is separate entry; 114,115,116 again represents a sequence, so I get 114-116. 118 is separate and treated as such.
I am doing this in Spark. I wrote the following code:
import scala.collection.mutable.ArrayBuffer
def Sample(x:String):ArrayBuffer[String]={
val x1 = x.split(",")
var a:Int = 0
var present=""
var next:Int = 0
var yrTemp = ""
var yrAr= ArrayBuffer[String]()
var che:Int = 0
var storeV = ""
var p:Int = 0
var q:Int = 0
var count:Int = 1
while(a < x1.length)
{
yrTemp = x1(a)
if(x1.length == 1)
{
yrAr+=x1(a)
}
else
if(a < x1.length - 1)
{
present = x1(a)
if(che == 0)
{
storeV = present
}
p = x1(a).toInt
q = x1(a+1).toInt
if(p == q)
{
yrTemp = yrTemp
che = 1
}
else
if(p != q)
{
yrTemp = storeV + "-" + present
che = 0
yrAr+=yrTemp
}
}
else
if(a == x1.length-1)
{
present = x1(a)
yrTemp = present
che = 0
yrAr+=yrTemp
}
a = a+1
}
yrAr
}
val SampleUDF = udf(Sample(_:String))
I am getting the output as follows:
a:{108-108, 109-109, 110-110, 112, 114-114, 115-115, 116-116, 118}
I am not able to figure out where I am going wrong. Can you please help me in correcting this. TIA.
Here's another way:
def rangeToString(a: Int, b: Int) = if (a == b) s"$a" else s"$a-$b"
def reduce(xs: Seq[Int], min: Int, max: Int, ranges: Seq[String]): Seq[String] = xs match {
case y +: ys if (y - max <= 1) => reduce(ys, min, y, ranges)
case y +: ys => reduce(ys, y, y, ranges :+ rangeToString(min, max))
case Seq() => ranges :+ rangeToString(min, max)
}
def output(xs: Array[Int]) = reduce(xs, xs.head, xs.head, Vector())//.toArray
Which you can test:
println(output(Array(108,109,110,112,114,115,116,118)))
// Vector(108-110, 112, 114-116, 118)
Basically this is a tail recursive function - i.e. you take your "variables" as the input, then it calls itself with updated "variables" on each loop. So here xs
is your array, min
and max
are integers used to keep track of the lowest and highest numbers so far, and ranges
is the output sequence of Strings that gets added to when required.
The first pattern (y
being the first element, and ys
being the rest of the sequence - because that's how the +:
extractor works) is matched if there's at least one element (ys
can be an empty list) and it follows on from the previous maximum.
The second is if it doesn't follow on, and needs to reset the minimum and add the completed range to the output.
The third case is where we've got to the end of the input and just output the result, rather than calling the loop again.
Internet karma points to anyone who can work out how to eliminate the duplication of ranges :+ rangeToString(min, max)
!