Theoretically, how much you can compress this 256-byte string containing only "F" and "G"?
FGFFFFFFGFFFFGGGGGGGGGGGGGFFFFFGGGGGGGGGGGGFFGFGGGFFFGGGGGGGGFFFFFFFFFFFFFFFFFFFFFGGGGGGFFFGFGGFGFFFFGFFGFGGFFFGFGGFGFFFGFGGGGFGGGGGGGGGFFFFFFFFGGGGGGGFFFFFGFFGGGGGGGFFFGGGFFGGGGGGFFGGGGGGGGGFFGFFGFGFFGFFGFFFFGGGGFGGFGGGFFFGGGFFFGGGFFGGFFGGGGFFGFGGFFFGFGGF
While I don't see a real world application, it is intriguing that compression algorithms like gz, bzip2 and deflate have a disadvantage in this case.
Well, I have this answer and the C# code to demonstrate:
using System;
public class Program
{
public static void Main()
{
string testCase = "FGFFFFFFGFFFFGGGGGGGGGGGGGFFFFFGGGGGGGGGGGGFFGFGGGFFFGGGGGGGGFFFFFFFFFFFFFFFFFFFFFGGGGGGFFFGFGGFGFFFFGFFGFGGFFFGFGGFGFFFGFGGGGFGGGGGGGGGFFFFFFFFGGGGGGGFFFFFGFFGGGGGGGFFFGGGFFGGGGGGFFGGGGGGGGGFFGFFGFGFFGFFGFFFFGGGGFGGFGGGFFFGGGFFFGGGFFGGFFGGGGFFGFGGFFFGFGGF";
uint[] G = new uint[8]; // 256 bit
for (int i = 0; i < testCase.Length; i++)
G[(i / 32)] += (uint)(((testCase[i] & 1)) << (i % 32));
for (int i = 0; i < 8; i++)
Console.WriteLine(G[i]);
string gTestCase = string.Empty;
//G 71 0100 0111
//F 70 0100 0110
for (int i = 0; i < 256; i++)
gTestCase += (char)((((uint)G[i / 32] & (1 << (i % 32))) >> (i % 32)) | 70);
Console.WriteLine(testCase);
Console.WriteLine(gTestCase);
if (testCase == gTestCase)
Console.WriteLine("OK.");
}
}
It may sound silly, but as to how I can improve the algorithm so that this 256-bit decimal number can be further compressed, I have the following idea:
(Note: The following are different topics of discussion but related to compressing 256-byte further)
From my understanding of Microsoft's implementation of Decimal
,
96-bit + 96-bit = 128-bit decimal.
Which implies that a 192-byte string containing of any two distinct characters can be encoded as 128-bit number instead of 192-bit number. Correct?
My questions are:
Can I do the same with 256-byte strings?
(by splitting each of them into a pair of two numbers before adding those two as a Decimal
shorter than 256-bit)?
How do I decode the above-mentioned 128-bit Decimal
back to a pair of two 96-bit numbers, while maintaining the compressed data size less than 192-bit?
Sorry for my previous rather vague question.
The following code would demonstrate how to add two 96-char "binary" strings as 128-char binary string.
public static string AddBinary(string a, string b) // 96-char binary strings
{
int[] x = { 0, 0, 0 };
int[] y = { 0, 0, 0 };
string c = String.Empty;
for (int z = 0; z < a.Length; z++)
x[(z / 32)] |= ((byte)(a[a.Length - z - 1]) & 1) << (z % 32);
for (int z = 0; z < b.Length; z++)
y[(z / 32)] |= ((byte)(b[b.Length - z - 1]) & 1) << (z % 32);
decimal m = new decimal(x[0], x[1], x[2], false, 0); //96-bit
decimal n = new decimal(y[0], y[1], y[2], false, 0); //96-bit
decimal k = decimal.Add(m, n);
int[] l = decimal.GetBits(k); //128-bit
Console.WriteLine(k);
for (int z = 127; z >= 0; z--)
c += (char)(((l[(z / 32)] & (1 << (z % 32))) >> (z % 32)) | 48);
return c.Contains("1") ? c.TrimStart('0') : "0";
}
96-bit + 96-bit = 128-bit decimal.
That is a misunderstanding. Decimal
is 96bit integer/mantissa, a sign and an exponent from 0 to 28 (~5bit) to form a scaling factor for the mantissa.
Addition is from 2×(1+5+96) bits to 1×(1+5+96) bits, including inevitable rounding errors and overflow.
You can't get summands from a sum easily - for starters, addition is symmetrical, there is no earthly way of knowing which of two summands has been the first and which the second.
Paul Hankin mentioned the programmer's variant of compressibility: Kolmogorov complexity.
In all fairness, you'd have to add to the 256 bits of your recoding of the input string the size of a program to turn those bits into the original string.
(As would gz, bzip2, deflate(, LZW) - decoders for "pure LZ" can be very small. The usual escape is to define a file format, including a recognisably header.)
Lasse V. Karlsen mentioned one consequence of the Pigeon-hole principle: to tell each combination of 192 bits from every other one, you need no less than 2^192 codes.