I forget what I was looking for

when I stumbled onto Haruhiko Okumura’s lzss.c, but I was really intrigued.  Every time I’ve seen anything to do with compression, it’s insanely massive.

Except for this.

Including the ‘main’ portion of the source, it’s 180 lines long.  4.3Kb.  That’s microscopic by today’s standards.  On OS X I get a 13kb executable, compared to 76kb for gzip, or 1.6MB for 7za.

And googling around I found a few other variations.  So I figured it would be slightly fun to have a ‘bakeoff’ with the ‘tradtional’ Calgary Corpus, which includes some variable types of data.

Unsurprisingly, 7zip is the best of the bunch.

$ ./testcomp.sh
compiling……..
cleaning up
unzipping
running…….The winner (smallest) is :
261057 6 Jun 20:18 book1.7z
53167 6 Jun 20:18 geo.7z
9472 6 Jun 20:18 obj1.7z
17322 6 Jun 20:18 paper1.7z
43596 6 Jun 20:18 pic.7z
15060 6 Jun 20:18 progl.7z
16748 6 Jun 20:18 trans.7z
30716 6 Jun 20:18 bib.7z
169894 6 Jun 20:18 book2.7z
119399 6 Jun 20:18 news.7z
61758 6 Jun 20:18 obj2.7z
27310 6 Jun 20:18 paper2.7z
12605 6 Jun 20:18 progc.7z
10428 6 Jun 20:18 progp.7z

But the source to 7zip is unwieldy at best.  So how did the small lzss and variants stuff do?

Compression percentage
Compression percentage

Honestly I’m surprised gzip put up a good fight.  Bzip2 & 7zip really fought for the top, The surprise to me was lzhuf leading the old stuff, which has it’s roots back in 1988/1989.  So let’s look at the data without anything modern in the way.

Old Compression only
Old Compression only

So from the numbers, we can see that lzs2 and lz3 run almost identical, with lzs & lzs4 at the bottom.  Now when we look at time, we get something different.

Compression duration
Compression duration

Both lzs & lzs4 take eight or more seconds!  So they are both out, as I’m shopping for something good/fast, and taking this long is out of the question!  So it comes down to how complicated lzhuf2, lzs2 and lzs3 are.

[table]
source,lines
lzs.c,4360
lzs4.c,4623
lzs2.c,8308
lzs3.c,12844
lzhuf.c,18323
lzhuf2.c,22556
[/table]

While lzs.c is still pretty impressive for the size, for what I’m going to try thought, I’m going to use lzs2.c as it’s 8kb, and seems to fit the bill.

For anyone who’s interested in running this on their own, here is the package.  I only tested on OS X, it may run on other UNIX stuff, it may not.  Extract it and run ‘testcomp.sh’.  And it may even work!  The only thing on OS X I had to add was ‘-Wno-return-type’ for compiling, as clang doesn’t like ancient source like this…

3 thoughts on “I forget what I was looking for”

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.