I've seen some programs showing amazing highly detailed 3d scenes with soundtracks, but what shocked me is that they are all smaller than 64kB! How do these programs work?
They generate their content procedurally. i.e. they don't add 3d models, bitmaps, sample based audio-files,... but generate that from code or some low detail representation.
Using self similarity(fractales) and building complex data by combining simple building blocks and formulas is usually the key to a compact representation.
The audio could be stored in some midi like format where the different notes are stored.
The textures are generated combining filters, fractales,... google for "Perlin noise" for a simple example. Shows how to create very different textures from perlin noise
3D models probably have some geometric description using formulas and the detail is added with techniques similar to procedural textures.
And most use some runtime unpacker. i.e. your normal executable is larger than the limit and gets compressed with an exe packer. Demos usually don't use UPX, but specialized packers which have a very small loader/unpacker and might even leak memory(who cares about memory leaks if you can safe a few bytes).