Tuesday 3 April 2018

compression - Why is a 7zipped file larger than the raw file?




Possible Duplicate:
Why doesn't ZIP Compression compress anything?



I tried 7zipping an .exe file but it actually became larger.


enter image description here


Is this the expected result?



Answer



It comes down to a concept called entropy. See Wikipedia.


The basic idea is that, if there existed a compression operation that could always make a file smaller, then logic dictates that said compression operation would be able to reduce any file to 0 bytes and still retain all the data. But this is absurd, because we know that 0 bytes can not convey any information at all. So we have just proven that there can not exist a compression algorithm that always makes its input smaller, because if that were the case, any information could be stored in 0 bytes -- but 0 bytes implies the absence of information, so you can't simultaneously have no information and all information. Hence, it's absurd.


Due to this theoretical concept, every compression program you ever use is going to increase the size of (or at best, maintain the same size of) some input. That is, for any compression algorithm you design or use, there will be certain inputs that will come out smaller, and some that will not.


Already-compressed data is generally a terrible candidate for further compression, because most lossless compression algorithms are based on the same theoretical principles. It is possible to compress poorly-compressed data even further; but this is less efficient than simply compressing it with the best-available algorithm from the original data to begin with.


For example, if you had a 100 MB text file and compress it using the regular Zip algorithm, it might get compressed down to 50 MB. If you then compress the Zip file with LZMA2, you might get it down to 40 or 45 MB, because LZMA has a higher compression ratio for most compressible data than Zip does. So it stands to reason that it can also compress Zip data, because Zip doesn't completely suck all the entropy out of it. But if you eliminate the Zip container entirely, you may be able to get it even smaller by compressing the raw text with LZMA2, potentially yielding something on the order of 30 - 35 MB (these are just "air numbers" to illustrate the concept).


In the case of that binary you're trying to compress, it's larger because the 7-Zip file format has to create its own internal structure and pack the already-compressed executable's data into the 7-Zip format. This contains things like a dictionary, a file header, and so on. These extra data are usually more than offset by the savings of compressing the data itself, but it appears that the executable you're trying to compress is already compressed with some form of LZMA; otherwise, it would likely shrink the size of the executable or very slightly increase it, rather than increasing it by 2 MB (which is a lot).


No comments:

Post a Comment

Where does Skype save my contact's avatars in Linux?

I'm using Skype on Linux. Where can I find images cached by skype of my contact's avatars? Answer I wanted to get those Skype avat...