Discussion:
Suffix Skip Array and BM Pseudocode Questions
(too old to reply)
Maros Hluska
2011-07-19 02:41:34 UTC
Permalink
I don't understand some cases for computing the suffix skip array. If
the pattern is P = baba, what would be the array?

I had S[0] = -2, S[1] = -1, S[2] = ?, S[3] = ?

With S[2], we have the suffix a, so then we can match it to a prefix
like this?
a
baba
Now it's behind the pattern and at position (-2) + 1, so S[2] = -2?

With S[3], I am not at all sure what to do. It would give the suffix P[i
+ 1..m - 1] = P[4..3] = empty string?

If I use this suffix skip array, and follow the pseudocode on slide 33
of module 8, the algorithm seems to enter an infinite loop when I use
the text T = 'aaaaaaaaaaba' and P = 'baba'

Maros Hluska
Maros Hluska
2011-07-19 02:42:53 UTC
Permalink
Post by Maros Hluska
I don't understand some cases for computing the suffix skip array. If
the pattern is P = baba, what would be the array?
I had S[0] = -2, S[1] = -1, S[2] = ?, S[3] = ?
With S[2], we have the suffix a, so then we can match it to a prefix
like this?
a
baba
Now it's behind the pattern and at position (-2) + 1, so S[2] = -2?
With S[3], I am not at all sure what to do. It would give the suffix P[i
+ 1..m - 1] = P[4..3] = empty string?
If I use this suffix skip array, and follow the pseudocode on slide 33
of module 8, the algorithm seems to enter an infinite loop when I use
the text T = 'aaaaaaaaaaba' and P = 'baba'
Maros Hluska
I forgot to mention that I used S[3] = 2.

Maros Hluska
Daniel S. Roche
2011-07-19 06:55:09 UTC
Permalink
First I have to give a reminder not to post any potential solutions
here. Since the text and pattern you gave are within the bounds for
problem 4, this might be part of a correct solution. (Feel free to email
or make an appointment for office hours if you want to discuss such things.)
Post by Maros Hluska
With S[3], I am not at all sure what to do. It would give the suffix
P[i + 1..m - 1] = P[4..3] = empty string?
Yes, that suffix is the empty string. In general, S[m-1] will always be
just the index of the last character in P that is *not* equal to P[m-1].
Post by Maros Hluska
If I use this suffix skip array, and follow the pseudocode on slide 33
of module 8, the algorithm seems to enter an infinite loop when I use
the text T = 'aaaaaaaaaaba' and P = 'baba'
Hmm, when I run this based on the pseudocode there, and based on the
values you have mentioned for your suffix array, I don't get an infinite
loop.

However, I won't discuss that particular example further here because it
could be too helpful in P4. Instead, I'll give you an example on a
different pattern:
P = aabaabbaa

In determining S[i], we are basically looking for the latest place where
we could have a match of NOT P[i] followed by the suffix
P[i+1],P[i+2],...,P[m-1]. For each i I will show these matches. I will
indicate NOT some character by writing in uppercase. So for instance 'B'
means 'any character except b'.

i=0
-7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8
a a b a a b b a a
A a b a a b b a a
S[0] = -7

i=1
-6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8
a a b a a b b a a
A b a a b b a a
S[1] = -6

i=2
-5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8
a a b a a b b a a
B a a b b a a
S[2] = -5

i=3
-4 -3 -2 -1 0 1 2 3 4 5 6 7 8
a a b a a b b a a
A a b b a a
S[3] = -4

i=4
-3 -2 -1 0 1 2 3 4 5 6 7 8
a a b a a b b a a
A b b a a
S[4] = -3

i=5
0 1 2 3 4 5 6 7 8
a a b a a b b a a
B b a a
S[5] = 1

i=6
-1 0 1 2 3 4 5 6 7 8
a a b a a b b a a
B a a
S[6] = -1

i=7
0 1 2 3 4 5 6 7 8
a a b a a b b a a
A a
S[7] = 6

i=8
0 1 2 3 4 5 6 7 8
a a b a a b b a a
A
S[8] = 6


So all in all we have S = [-7, -6, -5, -4, -3, 1, -1, 6, 6]. Note that I
actually think it's easier to construct this array from the back
forward, i.e. start at S[m-1], which is the opposite of the order I have
above. But of course either way works.

I really hope this is useful and helps clear things up.

-Dan Roche
CS 240 Instructor, Section 003

Loading...