Search code examples
algorithmmathcompressiontheorysequences

Compression methods


I know most compression methods rely on some data repeatings in order to be effective. For example the sting "AAAAAaaaQWERTY" can be represented as "5A3aQWERTY" for lossless and something like "8aqwerty" for lossy (these are just for example, not actual working methods). As far as I know, all compression algorithms count on the repeats of ->constant<- strings of characters.

Here comes the problem with the string "abcdefghijklmnopqrstuvwxyz". Here nothing repeats, but as you probably see the information in the string can be represented in far shorter manner. In regex-like str. will be "[a-z]", or maybe "for(x=0;x<25;++){ascii(97+x)}".

Consider also the string "0149162536496481100121" - it can be represented by "for(x=0;x<11;++){x*x}".

The string "ABEJQZer" can be represented by "for(x=0;8;++){ascii(64+x*x)}"

The last two were examples of knowing an algorithm, which can reproduce the original string. I know that in general algorithms (if they are efficient) take far lesser space than the data they can produce.

like in svg images (which have only algorithms in the file) the size is lesser than jpeg.

My question is is there a way of compression, which takes the data and tryes to find efficient algorithms which can represent it. Like vectorizing an raster image ( like in http://vectormagic.com/), which works with other data too.

Consider audio data (for it can be compressed lossy) - some audio edditors (audacity for example) project files contain information like "generate 120Hz constant frequency with 0.8 amplitude from time 0 to time 2 minutes 45.6 seconds" (audacity stores info in xml format). This metadata takes very little memory, and when the project is exported to wav or mp3, the program "renders" the information to actual samples in the exported format.

In that case the compressor should reverse the process of rendering. It should take wav or mp3 file, figure out what algorithms can represent the samples (if it's lossy the algorithms must produce some approximation of the samples - like vectormagic.com approxymates the image) and produce compressed file.

I understand that compression time will be unbelievably long, but are there such (or similar) compression algorithms ?


Solution

  • All compression methods are like that: the output is a set of parameters for a set algorithms that renders the input, or something similar to the input.

    For example MP3 audio codec breaks the input into blocks of 576 samples and converts each block into frequency-amplitude space, and prunes what cannot be heard by a human being. The output is equivalent to "during the next 13 milliseconds play frequencies x,y,z with amplitudes a,b,c". This woks well for audio data, and the similar approach used in JPEG works well for photographic images.

    Similar methods can be applied to cases you mention. Sequences such as 987654 or 010409162536 for example are generated by successive values from polynomials, and can be represented as the coefficients of that polynomial, the first one as (9, -1) for 9-x, and the second one as (1,2,1) for 1+2x+x².

    The choice of algorithm(s) used to generate the input tends to be fixed for simplicity, and tailored for the use case. For example if you are processing photographic images taken with a digital camera there's little point in even attempting to produce a vectorial output.