I am working on compression algorithm for which I want to replace all consecutive number with its mathematical form which is not logical mathematically but my algorithm will know and convert it to original form.
Suppose I have string:
string input = "732183900000000000002389288888888888888";
Did you see it have 0000000000 and 8888888888888 are major consecutive duplicates.
And now I want to convert those to:
//convert 000000000 to 0*9. Means 9 times 0.
//convert 888888888 to 8*9. Means 8 times 0.
string output = "7321839" +
"0*13" +
"23892" +
"8*14";
//or
string output = "7321839-0*13-23892-8*14";
Points to consider:
Any language that works on windows will be accepted. For me main thing is algorithm.
Please keep performance in mind as it would be used for big files.
Regex might be a bit convoluted for this given the rules for dashes (although not impossible by any means),
Seemingly, you want the following
Here is a fairly efficient C# O(n) implementation with StringBuilder
, which inurn should allow you to work with exceedingly large strings with minimal allocations
Given
public static string Shorten(string value)
{
var sb = new StringBuilder(value.Length);
int i, last;
var isLastGroup = false;
void Write()
{
var isGroup = i - last > 1;
var getDash = last == 0 || isLastGroup ? "" : "-";
sb.Append(isGroup ? $"{getDash}{value[last]}*{i - last}{(i != value.Length ? "-" : "")}" : value[last].ToString());
isLastGroup = isGroup;
last = i;
}
for (i = 0, last = 0; i < value.Length; i++)
if (value[last] != value[i])
Write();
Write();
return sb.ToString();
}
Tests
Console.WriteLine(Shorten("1"));
Console.WriteLine(Shorten("111"));
Console.WriteLine(Shorten("1112"));
Console.WriteLine(Shorten("1222"));
Console.WriteLine(Shorten("12233344445555512345"));
Results
1
13
13-2
1-23
1-22-33-44-5*5-12345