when I stumbled onto Haruhiko Okumura’s lzss.c, but I was really intrigued.  Every time I’ve seen anything to do with compression, it’s insanely massive.
Except for this.
Including the ‘main’ portion of the source, it’s 180 lines long. Â 4.3Kb. Â That’s microscopic by today’s standards. Â On OS X I get a 13kb executable, compared to 76kb for gzip, or 1.6MB for 7za.
And googling around I found a few other variations. Â So I figured it would be slightly fun to have a ‘bakeoff’ with the ‘tradtional’Â Calgary Corpus, which includes some variable types of data.
Unsurprisingly, 7zip is the best of the bunch.
$ ./testcomp.sh
compiling……..
cleaning up
unzipping
running…….The winner (smallest) is :
261057 6 Jun 20:18 book1.7z
53167 6 Jun 20:18 geo.7z
9472 6 Jun 20:18 obj1.7z
17322 6 Jun 20:18 paper1.7z
43596 6 Jun 20:18 pic.7z
15060 6 Jun 20:18 progl.7z
16748 6 Jun 20:18 trans.7z
30716 6 Jun 20:18 bib.7z
169894 6 Jun 20:18 book2.7z
119399 6 Jun 20:18 news.7z
61758 6 Jun 20:18 obj2.7z
27310 6 Jun 20:18 paper2.7z
12605 6 Jun 20:18 progc.7z
10428 6 Jun 20:18 progp.7z
But the source to 7zip is unwieldy at best. Â So how did the small lzss and variants stuff do?
Honestly I’m surprised gzip put up a good fight. Â Bzip2 & 7zip really fought for the top, The surprise to me was lzhuf leading the old stuff, which has it’s roots back in 1988/1989. Â So let’s look at the data without anything modern in the way.
So from the numbers, we can see that lzs2 and lz3 run almost identical, with lzs & lzs4 at the bottom. Â Now when we look at time, we get something different.
Both lzs & lzs4 take eight or more seconds! Â So they are both out, as I’m shopping for something good/fast, and taking this long is out of the question! Â So it comes down to how complicated lzhuf2, lzs2 and lzs3 are.
Source | Lines |
lzs.c | 4360 |
lzs4.c | 4632 |
lzs2.c | 8308 |
lzs3.c | 12844 |
lzhuf.c | 18323 |
lzhuf2.c | 22556 |
While lzs.c is still pretty impressive for the size, for what I’m going to try thought, I’m going to use lzs2.c as it’s 8kb, and seems to fit the bill.
For anyone who’s interested in running this on their own, here is the package. I only tested on OS X, it may run on other UNIX stuff, it may not. Extract it and run ‘testcomp.sh’. And it may even work! The only thing on OS X I had to add was ‘-Wno-return-type’ for compiling, as clang doesn’t like ancient source like this…
P.S. ZFS’s LZJB is shockingly tiny as well:
http://www.opensource.apple.com/source/zfs/zfs-59/zfs_kext/zfs/lzjb.c
Wow, another one I’ll have to check out! Thanks for pointing this one out to me!
There’s also lz4 [0], miniz/tinfl [1], minlzo [2], snappy [3], fastlz [4], quicklz [5], density [6].
[0] https://github.com/Cyan4973/lz4
[1] https://code.google.com/p/miniz/
[2] http://www.oberhumer.com/opensource/lzo/
[3] https://code.google.com/p/snappy/
[4] https://code.google.com/p/fastlz/
[5] http://www.quicklz.com/
[6] https://github.com/centaurean/density