Lempel-Ziv applet

Discussion:

Lempel-Ziv applet

(too old to reply)

Alex Lopez-Ortiz

2004-03-04 21:48:00 UTC

You can see an applet showing interactive lempel ziv at

http://www.cs.sfu.ca/cs/CC/365/li/squeeze/LZW.html

Click on the "LZW Compression Applet" button. Type a string such
as "the theorem theme" and watch how and when (a) a code is outputted
and (b) a new pair of entry is added to the dictionary.

--
Alex Lopez-Ortiz alopez-***@uwaterloo.ca
http://db.uwaterloo.ca/~alopez-o Assistant Professor
School of Computer Science University of Waterloo

Chris Ingram

2004-03-04 22:05:14 UTC

Permalink

It is worth mentioning that the algorithm used here is slightly different
from our notes. In particular, it actively prevents duplicates from being
added to the dictionary.

So the "YO! YOU! YOUR YOYO!" example works as ours does (with a slightly
different initial dictionary) it will generate a different result for
"aaaaaaaaa".

Post by Alex Lopez-Ortiz
You can see an applet showing interactive lempel ziv at
http://www.cs.sfu.ca/cs/CC/365/li/squeeze/LZW.html
Click on the "LZW Compression Applet" button. Type a string such
as "the theorem theme" and watch how and when (a) a code is outputted
and (b) a new pair of entry is added to the dictionary.
--
http://db.uwaterloo.ca/~alopez-o Assistant Professor
School of Computer Science University of Waterloo

--
Christopher Ingram (519) 888-4567 x6816
University of Waterloo, Lecturer DC 2518

Ali Piccioni

2004-03-05 23:29:27 UTC

Permalink

I do not understand why we allow for duplicates to be inserted into the
dictionary. It seems to me that checking for duplicates would be cheap
and easy to implement.

Post by Chris Ingram
It is worth mentioning that the algorithm used here is slightly different
from our notes. In particular, it actively prevents duplicates from being
added to the dictionary.
So the "YO! YOU! YOUR YOYO!" example works as ours does (with a slightly
different initial dictionary) it will generate a different result for
"aaaaaaaaa".

--
Christopher Ingram (519) 888-4567 x6816
University of Waterloo, Lecturer DC 2518

Alex Lopez-Ortiz

2004-03-06 01:22:48 UTC

Permalink

Post by Ali Piccioni
I do not understand why we allow for duplicates to be inserted into the
dictionary. It seems to me that checking for duplicates would be cheap
and easy to implement.

Correct. However it is a common shortcut. Last year we saw two versions of LZ
one avoiding duplicates, another with duplicates, but it might have created
more confusion rather than helping.

Alex

Post by Ali Piccioni

--
Christopher Ingram (519) 888-4567 x6816
University of Waterloo, Lecturer DC 2518

--
Alex Lopez-Ortiz alopez-***@uwaterloo.ca
http://db.uwaterloo.ca/~alopez-o Assistant Professor
School of Computer Science University of Waterloo

Chris Ingram

2004-03-06 04:04:21 UTC

Permalink

Alex sortof punted on the point, so I will explain what I talked about in
my section.

For the decoder, it is extremely simple to implement the decoding
algorithm if they do not have to check for duplicates. Duplicate checking
requires the decoder to build the same trie, which greatly increases the
amount of work for the decoder, not to mention severely increases the
space requirement. And the trie isn't used to do the decoding, it is
merely to determine whether or not to insert the next code into the table.

Should you ever check for duplicates? If you truly want the smallest
compressed file (although the improvement is not very dramatic) and the
decoder has the time/space to handle it, then you would definately add
in the check.

Post by Ali Piccioni
I do not understand why we allow for duplicates to be inserted into the
dictionary. It seems to me that checking for duplicates would be cheap
and easy to implement.

--
Christopher Ingram (519) 888-4567 x6816
University of Waterloo, Lecturer DC 2518

Alex Lopez-Ortiz

2004-03-06 04:47:43 UTC

Permalink

Post by Chris Ingram
For the decoder, it is extremely simple to implement the decoding
algorithm if they do not have to check for duplicates. Duplicate checking
requires the decoder to build the same trie, which greatly increases the
amount of work for the decoder, not to mention severely increases the
space requirement.

This used to be a concern back in 1977 when LZW was first introduced and
memory was at a premium. In this day and age of digital watches running
linux it is not so important anymore.

Post by Chris Ingram
Should you ever check for duplicates? If you truly want the smallest
compressed file (although the improvement is not very dramatic) and the
decoder has the time/space to handle it, then you would definately add
in the check.

This is correct, the literature reports that the savings are usually minor
(you might want to implement this yourself and compare the two results).

Alex

Post by Chris Ingram

Post by Ali Piccioni
I do not understand why we allow for duplicates to be inserted into the
dictionary. It seems to me that checking for duplicates would be cheap
and easy to implement.

--
Christopher Ingram (519) 888-4567 x6816
University of Waterloo, Lecturer DC 2518

--
Alex Lopez-Ortiz alopez-***@uwaterloo.ca
http://db.uwaterloo.ca/~alopez-o Assistant Professor
School of Computer Science University of Waterloo

Chris Ingram

2004-03-06 07:37:47 UTC

Permalink

Post by Alex Lopez-Ortiz
This used to be a concern back in 1977 when LZW was first introduced and
memory was at a premium. In this day and age of digital watches running
linux it is not so important anymore.
This is correct, the literature reports that the savings are usually minor
(you might want to implement this yourself and compare the two results).

Sniff... I wish my watch ran linux :(

I have already had someone come and discuss their bonus plan with me, and
this is a perfect time to elaborate more on what can be done.

We talked in class about a couple of more elaborate solutions to the
dictionary filling problem, and they are ideal candidates to get the full
bonus marks.

However, this very problem of "how much more compression would we get
removing duplicates" can also be your angle. Now, the code change is
fairly minimal, so your bonus rests more on
- getting good data to quantify the range of compression improvement
- identifying the extra resource penalties on the decoder (time, space,
etc.)
- analyzing for what situations this improvement is good and bad
(this is the real important point)

Similar "analysis-oriented" bonus improvements can also be attempted.

For instance, what compression gain would be achieved by inserting the
string "ts" with the next code instead of just the "tc" that the basic
algorithm uses (line 11 on slide 15-6)?
What are the resource penalties and what other quirks to the
implementation does this scheme introduce?

Do not hesitate to contact myself or Alex to run any other ideas that you
are thinking about if you are unsure if it will be substantial enough.
Chances are, we'll think of wonderful analysis questions that will make it
feasible!

--
Christopher Ingram (519) 888-4567 x6816
University of Waterloo, Lecturer DC 2518