Quad Tree range search complexity

Discussion:

(too old to reply)

Pavel Petrenko

2011-08-06 15:29:37 UTC

Permalink

Can someone please explain why the complexity of a range search in a
range search is Big Theta(n + h)?

in a quadtree*

Pavel Petrenko

2011-08-06 15:57:58 UTC

Permalink

Post by Pavel Petrenko

Can someone please explain why the complexity of a range search in a
range search is Big Theta(n + h)?

in a quadtree*

Particularly the 'n' part.

Thanks :)

cs240

2011-08-06 17:43:28 UTC

Permalink

h is the height of the quadtree, and n is the number of points in the
quadtree.

In the worst case you have to search down to the lowest level of the
tree. This is where the Theta(h).

Even though you may not report every point, you may still need to check
every point.

1. if (T is a leaf) then
2. if (T.point in R) then
3. report T.point

So in the worst case you need to check every point hence Theta(n)

Therefore Theta(n+h)

--
Patrick Lee
CS240 Tutor

Pavel Petrenko

2011-08-06 18:21:28 UTC

Permalink

Post by cs240
h is the height of the quadtree, and n is the number of points in the
quadtree.
In the worst case you have to search down to the lowest level of the
tree. This is where the Theta(h).
Even though you may not report every point, you may still need to check
every point.
1. if (T is a leaf) then
2. if (T.point in R) then
3. report T.point
So in the worst case you need to check every point hence Theta(n)
Therefore Theta(n+h)

Ok, makes sense. Can you clarify why this is not the case when
performing a range-search in a kd-tree (2d), which has complexity O(k +
root(n)), where k is the number of keys reported (and not the total
number of keys)?

Pavel Petrenko

2011-08-06 18:45:30 UTC

Permalink

Post by Pavel Petrenko
Ok, makes sense. Can you clarify why this is not the case when
performing a range-search in a kd-tree (2d), which has complexity O(k +
root(n)), where k is the number of keys reported (and not the total
number of keys)?

Never mind that, I think I get it. Isn't it possible that that (almost)
all points are at a height of h (which can be infinitely large)? So
wouldn't the complexity be O(n*h)?

cs240

2011-08-07 03:46:58 UTC

Permalink

Post by Pavel Petrenko

The running time of range search in a kd tree is O(k + U)

Where k is number of reported points and U is the number of unsuccessful
searches.

U is bounded by root(n).

See slide 16 of module 7 for details.

Post by Pavel Petrenko
Never mind that, I think I get it. Isn't it possible that that (almost)
all points are at a height of h (which can be infinitely large)? So
wouldn't the complexity be O(n*h)?

I'm assuming your asking about quad-trees.

1. if (T is a leaf) then
2. if (T.point in R) then
3. report T.point
4. for each child C of T do
5. if C.region intersects R is not empty then
6. QTree-RangeSearch(C, R)

The analysis hard to explain.
You can view the algo. this way:
Traverse down the quad-tree until you reach a region that has 2 or more
children that intersects the range.
At most every point in this region will have to be visited and checked.

The traversal costs at most h (traverse until leaf node).
The visiting costs at most n (every point).

Therefore Theta(n+h) is the cost.

--
Patrick Lee
CS240 Tutor

Pavel Petrenko

2011-08-07 05:05:58 UTC

Permalink

Post by cs240
I'm assuming your asking about quad-trees.
1. if (T is a leaf) then
2. if (T.point in R) then
3. report T.point
4. for each child C of T do
5. if C.region intersects R is not empty then
6. QTree-RangeSearch(C, R)
The analysis hard to explain.
Traverse down the quad-tree until you reach a region that has 2 or more
children that intersects the range.
At most every point in this region will have to be visited and checked.
The traversal costs at most h (traverse until leaf node).
The visiting costs at most n (every point).
Therefore Theta(n+h) is the cost.

Okay, suppose we have an example like this:

Loading Image...

The range (indicated in red), intersects all 4 children of the region.
We will need to check at most all these points. However, a lot of these
points can be be very deep (depth of tree). So wouldn't the cost in this
case be O(n*h)?

Sorry for all the nagging. Thanks in advance for your help!

cs240

2011-08-08 00:43:15 UTC

Permalink

Post by Pavel Petrenko
http://oi56.tinypic.com/2qk84jm.jpg
The range (indicated in red), intersects all 4 children of the region.
We will need to check at most all these points. However, a lot of these
points can be be very deep (depth of tree). So wouldn't the cost in this
case be O(n*h)?

You can do a pre-order traversal to visit every node.
The running time of the pre-order traversal will be bounded by the
number of edges in the tree.

In a complete quad-tree
the number of edges = sum of (i = 1 to h) 4^i = 4^(h+1) - 1
height is bounded by log(n)

4^(logn + 1) - 1 = n + 1 - 1 = n

so the pre-order traversal costs O(n)

--
Patrick Lee
CS240 Tutor

Pavel Petrenko

2011-08-08 16:45:19 UTC

Permalink

Post by cs240
You can do a pre-order traversal to visit every node.
The running time of the pre-order traversal will be bounded by the
number of edges in the tree.
In a complete quad-tree
the number of edges = sum of (i = 1 to h) 4^i = 4^(h+1) - 1
height is bounded by log(n)
4^(logn + 1) - 1 = n + 1 - 1 = n
so the pre-order traversal costs O(n)

I don't see how I can get

from:
#edges = sum of (i = 1 to h) 4^i = 4^(h+1) - 1

to:
height is bounded by log(n)

In the analysis that you've shown, it seems like n = #edges... In
assignment #4, we showed that we can create a quadtree of height 8 using
3 points, and we discovered that the height of the tree can be very
large, depending on the proximity of certain points. So I don't see how
its possible to bound the height of the tree using the number of points
in it (n).

Additionally, if the height was bounded by log(n), then the stated
running time, O(n+h) would be O(n+log(n)), which is simply O(n).