Not yet. I''m 50-50 on this one. As said before, 3b was a gross exaggeration, and 400k was closer to reality. I believe that it may be possible to rearrange things and look up patterns from a huge number of them to achieve these numbers, but if it is possible, the compression time and program size do not seem right. You can fit more than the mathematical theories say, but only if the compressor itself contains many megabytes of patterns, and if it had to process all of them, the program would go a lot slower than (s)he claimed.
So, I think it is possible, but not at those speeds. Of if you have those speeds, it is not possible. You can''t have both, in my opinion.
Lack
Christianity, Creation, metric, Dvorak, and BeOS for all!
MP3-Beating Compression
No, you're not thinking the pattern idea through completely (I've been down this road before... when I was still very ignorant on the subject.) OK, imagine that he has some lookup table of "big" patterns with some small index. The index contains M bits, and the patterns themselves contain N bits, where N >> M.
OK, you can only represent 2^M patterns, which is astronomically smaller than 2^N. Lookup tables don't buy you anything. To represent all 2^N patterns of some size, you lookup table must contain 2^N entries, which means that your index must have N bits... You're now using the same amount of information. If you plan on not having all 2^N patterns in your table, then you need some way of knowing which patterns are there:
1) They're implicit in the program. OK... this will only work on certain (types of) files. And will work on a relatively small percentage of files. On the files that can traditionally not be compressed (ie near random data) you're still not going to make any progress, because you hae no way of intelligently choosing which patterns to place in the table. And with the random data, any pattern of N bits is equally likely, so any guess you make is just as likely to be wrong as right.
2) Store the table in the compressed file. From the file sizes he's quoting I don't think this is what he's doing, and it wouldn't really work either. Unless the patterns are repeated very often, and are large (which is incredibly unlikely) you're going to waste more space in overhead than you'll gain otherwise. Besides, normal compression algorithms already can do wonders on files with those properties.
Finally, you've claimed that the limits of information theory can be exceeded? (Sorry, the reply screen doesn't have your message, so I'm going from memory). Hogwash. If you're referring to the fact that data compression is possible at all, then you're sorely misunderstanding the concept. I dare you to try the following experiment (note, not a good idea, unless you have near infinite time and storage):
1) Create a cmoplete set of files, all of a fixed length (say 1kb). The set should enumerate every bit pattern 1kb in length.
2) Run every one of those files through any compression program, and check the average size of the output files.
Like I said, this is an infeasible experiment, but you can simulate it fairly well by using a large number of randomly generated files in step 1. No compression program will perform well. I've never even tried this, but I'm willing to make that statement.
When you get down to it, you need N bits to store N bits of information. The only reason data compression works at all is because we deal in data filled with patterns and a general lack of randomness.
Must I make my plea again? Don't encourage him. (Although I am starting to enjoy lecturing on compression and information theory...)
-Brian
Edited by - osmanb on 4/10/00 9:42:58 AM
OK, you can only represent 2^M patterns, which is astronomically smaller than 2^N. Lookup tables don't buy you anything. To represent all 2^N patterns of some size, you lookup table must contain 2^N entries, which means that your index must have N bits... You're now using the same amount of information. If you plan on not having all 2^N patterns in your table, then you need some way of knowing which patterns are there:
1) They're implicit in the program. OK... this will only work on certain (types of) files. And will work on a relatively small percentage of files. On the files that can traditionally not be compressed (ie near random data) you're still not going to make any progress, because you hae no way of intelligently choosing which patterns to place in the table. And with the random data, any pattern of N bits is equally likely, so any guess you make is just as likely to be wrong as right.
2) Store the table in the compressed file. From the file sizes he's quoting I don't think this is what he's doing, and it wouldn't really work either. Unless the patterns are repeated very often, and are large (which is incredibly unlikely) you're going to waste more space in overhead than you'll gain otherwise. Besides, normal compression algorithms already can do wonders on files with those properties.
Finally, you've claimed that the limits of information theory can be exceeded? (Sorry, the reply screen doesn't have your message, so I'm going from memory). Hogwash. If you're referring to the fact that data compression is possible at all, then you're sorely misunderstanding the concept. I dare you to try the following experiment (note, not a good idea, unless you have near infinite time and storage):
1) Create a cmoplete set of files, all of a fixed length (say 1kb). The set should enumerate every bit pattern 1kb in length.
2) Run every one of those files through any compression program, and check the average size of the output files.
Like I said, this is an infeasible experiment, but you can simulate it fairly well by using a large number of randomly generated files in step 1. No compression program will perform well. I've never even tried this, but I'm willing to make that statement.
When you get down to it, you need N bits to store N bits of information. The only reason data compression works at all is because we deal in data filled with patterns and a general lack of randomness.
Must I make my plea again? Don't encourage him. (Although I am starting to enjoy lecturing on compression and information theory...)
-Brian
Edited by - osmanb on 4/10/00 9:42:58 AM
well.... have you guys checked out the latest microsoft multimedia compression system???? Well... it just beats out everything i''ve seen so far!
I don''t remember its name, but i recall something like
''MS Jump start''.
If i recall well, it can save a wav file that is 60MB, in some kbytes!!!!
I remember i saw on the CD (at work), a video clip (complete song), that was compressed for high quality, and its size was some Kbytes (again!!!)
I couldn''t beleive it, but.... its true! I dunno how they did it, but they simply did it.
Pretty soon you''ll see it around the web. It was developed for making the web more interactive...
I don''t remember its name, but i recall something like
''MS Jump start''.
If i recall well, it can save a wav file that is 60MB, in some kbytes!!!!
I remember i saw on the CD (at work), a video clip (complete song), that was compressed for high quality, and its size was some Kbytes (again!!!)
I couldn''t beleive it, but.... its true! I dunno how they did it, but they simply did it.
Pretty soon you''ll see it around the web. It was developed for making the web more interactive...
... LEMMINGS ... LEMMINGS ... LEMMINGS ... LEM..... SpLaSh!...Could this be what we stand like before the mighty One?Are we LeMmIngS or WhAt!? ;)
Bah! Who needs math, and rock-solid logic? We have impossible compression theories, and a bunch of people so carried away with the idea that they ignore all common sense!
another thing that one might say is that there ARE compressions that work better than they should be able to do.
MP3 and JPEG for example and if it''s true this microsoft multimedia compression.
but that is only because of one thing: these algorithms use the fact that we as humans don''t see/hear the difference of certain multimedia data that has a loss of quality.
these type of compression doesn''t actually COMPRESS but it uses certain things that are unique to the format it compresses. for example MP3 removes sounds that are so low we can''t actually hear them
all of these things have one in common, they LOOSE data. you can''t revert from a 3MB MP3 to the original 30MB WAV.
and this is what instantly made me not believe kieren_j, he claimes his general purpose packer would be better than mp3. and if you think about it that can''t be true
MP3 and JPEG for example and if it''s true this microsoft multimedia compression.
but that is only because of one thing: these algorithms use the fact that we as humans don''t see/hear the difference of certain multimedia data that has a loss of quality.
these type of compression doesn''t actually COMPRESS but it uses certain things that are unique to the format it compresses. for example MP3 removes sounds that are so low we can''t actually hear them
all of these things have one in common, they LOOSE data. you can''t revert from a 3MB MP3 to the original 30MB WAV.
and this is what instantly made me not believe kieren_j, he claimes his general purpose packer would be better than mp3. and if you think about it that can''t be true
Actually, JPEG and MP3 *DO* compress the data, but you are right that they rely on the fact that humans can''t notice the difference to a certain degree. Each requires this, because it limits how much they actually need to compress (meaning fewer bits per symbol). JPEG first breaks the RGB into Chrominance/Luminance. Since we don''t notice the difference in luminance as much, they first of all take this down to 1/4 the size of the actual image and stretch it at a loss of quality. Then it uses a DCT (discrete cosine transform) to get the symbols in a proper order in blocks of 8x8. This is compiled in a table, which is then passed into an RLE compressor, since that is optimal on these tables. Believe me - the JPEG is very much compression, but it still is based off the fact that we as humans don''t notice. MPEG and such use the same principles, that we can only notice a small difference of frequency at a time, thus allowing us to use a single bit to represent either an up or down motion along the frequency wave. Then this is compressed. Both use compression to the fullest.
Pythius
Pythius
"The object of war is not to die for your country, but to make the other bastard die for his"
Isn''t that called "destructive compression"?
/. Muzzafarath
/. Muzzafarath
I'm reminded of the day my daughter came in, looked over my shoulder at some Perl 4 code, and said, "What is that, swearing?" - Larry Wall
April 10, 2000 01:56 PM
Okay Kieren,
I''m not planning on bashing what you may or may not have figured out *but* the method you posted will make absolutely no difference to the suitability for compression of your data in the vast majority of cases.
Now there are situations when it would help. A text file filled with the same character or short sequence of characters for instance would suddenly be very well suited to a form of bit level compression.
However most binary files do not have a common repeating pattern in them. This is why lossless compression doesn''t tend to work well on EXE files and why repeatedly Zipping files is a waste of time. Your published step which involved rearranging bit orders would be a complete waste of time here. Many binary files will have seemingly random bit orders. This being the case where is the wonderful piece of code that reduced an EXE file by a factor of approx. 100. That alone would be amazing but it has nothing to do with the stuff you posted.
Ralph
I''m not planning on bashing what you may or may not have figured out *but* the method you posted will make absolutely no difference to the suitability for compression of your data in the vast majority of cases.
Now there are situations when it would help. A text file filled with the same character or short sequence of characters for instance would suddenly be very well suited to a form of bit level compression.
However most binary files do not have a common repeating pattern in them. This is why lossless compression doesn''t tend to work well on EXE files and why repeatedly Zipping files is a waste of time. Your published step which involved rearranging bit orders would be a complete waste of time here. Many binary files will have seemingly random bit orders. This being the case where is the wonderful piece of code that reduced an EXE file by a factor of approx. 100. That alone would be amazing but it has nothing to do with the stuff you posted.
Ralph
Okay, I was a little stressed when I wrote that, but I still stand solidly on my 50-50 indecision. The more patterns you store within the compressor, the smaller you can make the file but the longer it takes, to a point. Then, like you said, you get to where N bytes needs N bytes just to describe itself. However, if you can rearrange the bits so they''re more compressable, that''d be small and fast, but you aren''t actually compressing the data yet.
Someone said in sarcasm that if you moved all the ones to the beginning of the file, yeah, it''d compress real good. But say you had a set of 32 patterns, and you masked them over the bits and then chose the one for that file that caused the random bits to be arranged into more compressable patterns, you''d need only the first 5 bits of the file to say which of the 32 patterns you need to reverse-overlay, and then go to it. If ZIP is lossless, then this works perfectly under most circumstances.
So, I think he may be on to something, but he may not. I won''t say either way. The ''amazing'' part that you think is impossible has little to do with information theory, since he''s not actually compressing anything.
Excuse me if I sound like an idiot.
Lack
Christianity, Creation, metric, Dvorak, and BeOS for all!
Someone said in sarcasm that if you moved all the ones to the beginning of the file, yeah, it''d compress real good. But say you had a set of 32 patterns, and you masked them over the bits and then chose the one for that file that caused the random bits to be arranged into more compressable patterns, you''d need only the first 5 bits of the file to say which of the 32 patterns you need to reverse-overlay, and then go to it. If ZIP is lossless, then this works perfectly under most circumstances.
So, I think he may be on to something, but he may not. I won''t say either way. The ''amazing'' part that you think is impossible has little to do with information theory, since he''s not actually compressing anything.
Excuse me if I sound like an idiot.
Lack
Christianity, Creation, metric, Dvorak, and BeOS for all!
This topic is closed to new replies.
Advertisement
Popular Topics
Advertisement
Recommended Tutorials
Advertisement