Back to General and Gameplay Programming

MP3-Beating Compression

kieren_j · 2000-05-06T13:18:55

You probably don''t believe me, but if you''re at all interested in my new "CAR" compression alogrithm, check this out: The strange thing is, it works better on compressed files! Zipping an MP3 file gives you 99% of original, but check this out! **** TESTS ON UNCOMPRESSED FILES **** TXT File Example TXT File: 1,318,671 Savings: 1,308,940 CAR File: 9,731 Percent: 0.7% WAV File Example WAV File: 8,362,354 Savings: 8,323,477 CAR File: 38,877 Percent: 0.5% EXE File Example EXE File: 216,064 Savings: 213,336 CAR File: 2,728 Percent: 1.3% **** TESTS ON ALREADY-COMPRESSED FILES **** MP3 File Example MP3 File: 4,961,773 Savings: 4,945,669 CAR File: 16,104 Percent: 0.3% MPG File Example MPG File: 5,976,068 Savings: 5,946,909 CAR File: 29,159 Percent: 0.5% If you didn''t see it first time, I compressed an MP3 file from 5 meg to 16kb. What CAR actually does is obviously a complete secret, but I''m really really excited about it! I''ve been thinking of how to do it for years - but now, yay! (I figured it out playing around in QB, of all things!). What I want to know is basically are there any sites that are relatively easy to understand that tell you how to do: Huffman Compression LZW Compression "Textbook" RLE Compression (I only know PCX''s RLE) I know that you use binary trees and nodes and so on but I have no idea for a software implementation! Anyways you probably don''t believe me, but I just wanna try to make the compression better. Thanks from a very very excited Kieren Johnstone --------------- kieren_j

General and Gameplay Programming Programming

Started by kieren_j April 06, 2000 01:58 PM

494 comments, last by kieren_j 24 years, 8 months ago

LackOfKnack

122

April 10, 2000 05:42 AM

Not yet. I''m 50-50 on this one. As said before, 3b was a gross exaggeration, and 400k was closer to reality. I believe that it may be possible to rearrange things and look up patterns from a huge number of them to achieve these numbers, but if it is possible, the compression time and program size do not seem right. You can fit more than the mathematical theories say, but only if the compressor itself contains many megabytes of patterns, and if it had to process all of them, the program would go a lot slower than (s)he claimed.

So, I think it is possible, but not at those speeds. Of if you have those speeds, it is not possible. You can''t have both, in my opinion.

Lack

Christianity, Creation, metric, Dvorak, and BeOS for all!

Lack
Christianity, Creation, metric, Dvorak, and BeOS for all!

osmanb

2,082

April 10, 2000 09:40 AM

No, you're not thinking the pattern idea through completely (I've been down this road before... when I was still very ignorant on the subject.) OK, imagine that he has some lookup table of "big" patterns with some small index. The index contains M bits, and the patterns themselves contain N bits, where N >> M.

OK, you can only represent 2^M patterns, which is astronomically smaller than 2^N. Lookup tables don't buy you anything. To represent all 2^N patterns of some size, you lookup table must contain 2^N entries, which means that your index must have N bits... You're now using the same amount of information. If you plan on not having all 2^N patterns in your table, then you need some way of knowing which patterns are there:

1) They're implicit in the program. OK... this will only work on certain (types of) files. And will work on a relatively small percentage of files. On the files that can traditionally not be compressed (ie near random data) you're still not going to make any progress, because you hae no way of intelligently choosing which patterns to place in the table. And with the random data, any pattern of N bits is equally likely, so any guess you make is just as likely to be wrong as right.

2) Store the table in the compressed file. From the file sizes he's quoting I don't think this is what he's doing, and it wouldn't really work either. Unless the patterns are repeated very often, and are large (which is incredibly unlikely) you're going to waste more space in overhead than you'll gain otherwise. Besides, normal compression algorithms already can do wonders on files with those properties.

Finally, you've claimed that the limits of information theory can be exceeded? (Sorry, the reply screen doesn't have your message, so I'm going from memory). Hogwash. If you're referring to the fact that data compression is possible at all, then you're sorely misunderstanding the concept. I dare you to try the following experiment (note, not a good idea, unless you have near infinite time and storage):

1) Create a cmoplete set of files, all of a fixed length (say 1kb). The set should enumerate every bit pattern 1kb in length.

2) Run every one of those files through any compression program, and check the average size of the output files.

Like I said, this is an infeasible experiment, but you can simulate it fairly well by using a large number of randomly generated files in step 1. No compression program will perform well. I've never even tried this, but I'm willing to make that statement.

When you get down to it, you need N bits to store N bits of information. The only reason data compression works at all is because we deal in data filled with patterns and a general lack of randomness.

Must I make my plea again? Don't encourage him. (Although I am starting to enjoy lecturing on compression and information theory...)

-Brian

Edited by - osmanb on 4/10/00 9:42:58 AM

Harvester

122

April 10, 2000 09:53 AM

well.... have you guys checked out the latest microsoft multimedia compression system???? Well... it just beats out everything i''ve seen so far!

I don''t remember its name, but i recall something like
''MS Jump start''.

If i recall well, it can save a wav file that is 60MB, in some kbytes!!!!

I remember i saw on the CD (at work), a video clip (complete song), that was compressed for high quality, and its size was some Kbytes (again!!!)

I couldn''t beleive it, but.... its true! I dunno how they did it, but they simply did it.

Pretty soon you''ll see it around the web. It was developed for making the web more interactive...

... LEMMINGS ... LEMMINGS ... LEMMINGS ... LEM..... SpLaSh!...Could this be what we stand like before the mighty One?Are we LeMmIngS or WhAt!? ;)

Loqi

122

April 10, 2000 10:46 AM

Bah! Who needs math, and rock-solid logic? We have impossible compression theories, and a bunch of people so carried away with the idea that they ignore all common sense!

Ridcully

122

April 10, 2000 10:56 AM

another thing that one might say is that there ARE compressions that work better than they should be able to do.
MP3 and JPEG for example and if it''s true this microsoft multimedia compression.
but that is only because of one thing: these algorithms use the fact that we as humans don''t see/hear the difference of certain multimedia data that has a loss of quality.
these type of compression doesn''t actually COMPRESS but it uses certain things that are unique to the format it compresses. for example MP3 removes sounds that are so low we can''t actually hear them
all of these things have one in common, they LOOSE data. you can''t revert from a 3MB MP3 to the original 30MB WAV.
and this is what instantly made me not believe kieren_j, he claimes his general purpose packer would be better than mp3. and if you think about it that can''t be true

Pythius

122

April 10, 2000 11:33 AM

Actually, JPEG and MP3 *DO* compress the data, but you are right that they rely on the fact that humans can''t notice the difference to a certain degree. Each requires this, because it limits how much they actually need to compress (meaning fewer bits per symbol). JPEG first breaks the RGB into Chrominance/Luminance. Since we don''t notice the difference in luminance as much, they first of all take this down to 1/4 the size of the actual image and stretch it at a loss of quality. Then it uses a DCT (discrete cosine transform) to get the symbols in a proper order in blocks of 8x8. This is compiled in a table, which is then passed into an RLE compressor, since that is optimal on these tables. Believe me - the JPEG is very much compression, but it still is based off the fact that we as humans don''t notice. MPEG and such use the same principles, that we can only notice a small difference of frequency at a time, thus allowing us to use a single bit to represent either an up or down motion along the frequency wave. Then this is compressed. Both use compression to the fullest.

Pythius

"The object of war is not to die for your country, but to make the other bastard die for his"

Muzzafarath

146

April 10, 2000 12:46 PM

Isn''t that called "destructive compression"?

/. Muzzafarath

I'm reminded of the day my daughter came in, looked over my shoulder at some Perl 4 code, and said, "What is that, swearing?" - Larry Wall

Anonymous

April 10, 2000 01:56 PM

Okay Kieren,

I''m not planning on bashing what you may or may not have figured out *but* the method you posted will make absolutely no difference to the suitability for compression of your data in the vast majority of cases.

Now there are situations when it would help. A text file filled with the same character or short sequence of characters for instance would suddenly be very well suited to a form of bit level compression.

However most binary files do not have a common repeating pattern in them. This is why lossless compression doesn''t tend to work well on EXE files and why repeatedly Zipping files is a waste of time. Your published step which involved rearranging bit orders would be a complete waste of time here. Many binary files will have seemingly random bit orders. This being the case where is the wonderful piece of code that reduced an EXE file by a factor of approx. 100. That alone would be amazing but it has nothing to do with the stuff you posted.

Ralph

LackOfKnack

122

April 10, 2000 04:06 PM

Okay, I was a little stressed when I wrote that, but I still stand solidly on my 50-50 indecision

. The more patterns you store within the compressor, the smaller you can make the file but the longer it takes, to a point. Then, like you said, you get to where N bytes needs N bytes just to describe itself. However, if you can rearrange the bits so they''re more compressable, that''d be small and fast, but you aren''t actually compressing the data yet.

Someone said in sarcasm that if you moved all the ones to the beginning of the file, yeah, it''d compress real good. But say you had a set of 32 patterns, and you masked them over the bits and then chose the one for that file that caused the random bits to be arranged into more compressable patterns, you''d need only the first 5 bits of the file to say which of the 32 patterns you need to reverse-overlay, and then go to it. If ZIP is lossless, then this works perfectly under most circumstances.

So, I think he may be on to something, but he may not. I won''t say either way. The ''amazing'' part that you think is impossible has little to do with information theory, since he''s not actually compressing anything.

Excuse me if I sound like an idiot.

Lack

Christianity, Creation, metric, Dvorak, and BeOS for all!

Lack
Christianity, Creation, metric, Dvorak, and BeOS for all!

abe_bcs

122

April 11, 2000 02:46 AM

Can you post an SFX with CAR compressed an uncompressed bitmaps somewhere?

/ // / |< <-

MP3-Beating Compression

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

MP3-Beating Compression

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Reticulating splines