Search code examples
algorithmcompressionnumber-formatting

Replace consecutive numbers with mathematical form


I am working on compression algorithm for which I want to replace all consecutive number with its mathematical form which is not logical mathematically but my algorithm will know and convert it to original form.
Suppose I have string:

string input = "732183900000000000002389288888888888888";

Did you see it have 0000000000 and 8888888888888 are major consecutive duplicates.
And now I want to convert those to:

//convert 000000000 to 0*9. Means 9 times 0.
//convert 888888888 to 8*9. Means 8 times 0.
string output = "7321839" +
                "0*13"    +
                "23892"   +
                "8*14";
//or
string output = "7321839-0*13-23892-8*14";

Points to consider:
Any language that works on windows will be accepted. For me main thing is algorithm.
Please keep performance in mind as it would be used for big files.


Solution

  • Regex might be a bit convoluted for this given the rules for dashes (although not impossible by any means),

    Seemingly, you want the following

    1. Groups of the same number greater than the count of 1
    2. No prefix dash
    3. No suffix dash
    4. No double dashes (speculative)

    Here is a fairly efficient C# O(n) implementation with StringBuilder, which inurn should allow you to work with exceedingly large strings with minimal allocations

    Given

    public static string Shorten(string value)
    {
        var sb = new StringBuilder(value.Length);
        int i, last;
        var isLastGroup = false;
    
        void Write()
        {
            var isGroup = i - last > 1;
            var getDash = last == 0 || isLastGroup ? "" : "-";
            sb.Append(isGroup ? $"{getDash}{value[last]}*{i - last}{(i != value.Length ? "-" : "")}" : value[last].ToString());
            isLastGroup = isGroup;
            last = i;
        }
    
        for (i = 0, last = 0; i < value.Length; i++)
            if (value[last] != value[i])
                Write();
    
        Write();
    
        return sb.ToString();
    }
    

    Tests

    Console.WriteLine(Shorten("1"));
    Console.WriteLine(Shorten("111"));
    Console.WriteLine(Shorten("1112"));
    Console.WriteLine(Shorten("1222"));
    Console.WriteLine(Shorten("12233344445555512345"));
    

    Results

    1
    13
    1
    3-2
    1-23
    1-2
    2-33-44-5*5-12345

    Full Demo Here