Studies in Applied Data Structures
Research output: Thesis › Doctoral Thesis (compilation)
Abstract
The design of efficient data structures is of primary importance in creation of theoretical algorithms as well their more tangible descendants, computer programs. In this dissertation we study computational aspects of data structures and their respective algorithms from a theoretical viewpoint, which are however of direct importance in the implementation of solutions for realworld problems. We present results for the following problems:
In tolerancing, the OutOfRoundness factor determines the relative circularity of planar shapes. We show that the Minimum Radial Separation algorithm given by Le and Lee runs in Theta(n^2) time even for convex polygons. Furthermore, we present an optimal O(n) time algorithm to compute the Minimum Radial Separation of convex polygons, which represents not only a factor n improvement over the previously best known algorithm, but also a factor of log n improvement over Le and Lee's conjectured complexity for the problem.
We consider the general problem of (2dimensional) range reporting allowing arbitrarily convex queries. We show that using a traditional approach, a polylogarithmic query time can not be achieved unless more than linear space is used. Our arguments are based on a new nontrivial lower bound in a new model of computation, Layered Partitions, which can be used to describe all known algorithms for processing range queries, as well as many other data structures used to represent multidimensional data. We show that Omega((log n)/(log T(n))) partitions must be used to allow queries in O(T(n) + k) time, for n total and k reported elements, and for any growing function T(n).
We discuss an intrinsic generalization of the suffix tree, designed to index a string of length n which has a natural partitioning into m multicharacter substrings or words. This word suffix tree} represents only the m suffixes that start at word boundaries. Since traditional suffix tree construction algorithms rely heavily on the fact that all suffixes are inserted, construction of a word suffix tree is nontrivial, in particular when only O(m) construction space is allowed. We solve this problem, presenting an algorithm with O(n) expected running time. In applications that require strict node ordering, an additional cost of sorting O(m') characters arises, where m' is the number of distinct words. In either case, this is a significant improvement over previously known solutions. Furthermore, when the alphabet is small, we may assume that the n characters in the input string occupy o(n) machine words. We illustrate that this can allow a word suffix tree to be built in sublinear time.
We propose a new data structure for storing sparse matrices which are too large to fit entirely within main memory. This data structure is optimized to use the computer's page size and is arranged in order to be able to efficiently handle random access and updates useful for a wide range of matrix operations. We also present several variations on an ancillary structure which greatly decreases the probability of unnecessary page faults when accessing the structure, even when the size of main memory is extremely limited. We assert that these data structures are easy to implement and provide very good results in practice.
In tolerancing, the OutOfRoundness factor determines the relative circularity of planar shapes. We show that the Minimum Radial Separation algorithm given by Le and Lee runs in Theta(n^2) time even for convex polygons. Furthermore, we present an optimal O(n) time algorithm to compute the Minimum Radial Separation of convex polygons, which represents not only a factor n improvement over the previously best known algorithm, but also a factor of log n improvement over Le and Lee's conjectured complexity for the problem.
We consider the general problem of (2dimensional) range reporting allowing arbitrarily convex queries. We show that using a traditional approach, a polylogarithmic query time can not be achieved unless more than linear space is used. Our arguments are based on a new nontrivial lower bound in a new model of computation, Layered Partitions, which can be used to describe all known algorithms for processing range queries, as well as many other data structures used to represent multidimensional data. We show that Omega((log n)/(log T(n))) partitions must be used to allow queries in O(T(n) + k) time, for n total and k reported elements, and for any growing function T(n).
We discuss an intrinsic generalization of the suffix tree, designed to index a string of length n which has a natural partitioning into m multicharacter substrings or words. This word suffix tree} represents only the m suffixes that start at word boundaries. Since traditional suffix tree construction algorithms rely heavily on the fact that all suffixes are inserted, construction of a word suffix tree is nontrivial, in particular when only O(m) construction space is allowed. We solve this problem, presenting an algorithm with O(n) expected running time. In applications that require strict node ordering, an additional cost of sorting O(m') characters arises, where m' is the number of distinct words. In either case, this is a significant improvement over previously known solutions. Furthermore, when the alphabet is small, we may assume that the n characters in the input string occupy o(n) machine words. We illustrate that this can allow a word suffix tree to be built in sublinear time.
We propose a new data structure for storing sparse matrices which are too large to fit entirely within main memory. This data structure is optimized to use the computer's page size and is arranged in order to be able to efficiently handle random access and updates useful for a wide range of matrix operations. We also present several variations on an ancillary structure which greatly decreases the probability of unnecessary page faults when accessing the structure, even when the size of main memory is extremely limited. We assert that these data structures are easy to implement and provide very good results in practice.
Details
Authors  

Organisations  
Research areas and keywords  Subject classification (UKÄ) – MANDATORY
Keywords

Original language  English 

Qualification  Doctor 
Awarding Institution  
Supervisors/Assistant supervisor 

Award date  1998 Oct 8 
Publisher 

Print ISBNs  9162831550 
Publication status  Published  1998 
Publication category  Research 
Bibliographic note
Defence details
Date: 19981008
Time: 10:15
Place: E:1406
External reviewer(s)
Name: Boyar, Joan
Title: Dr
Affiliation: Odense

Article: Kurt Swanson, DerTsai Lee, and Vanban L. Wu.An optimal algorithm for roundness determination on convex polygons.Computational Geometry: Theory and Applications, 5(4):225235, 1995.
Article: Arne Andersson and Kurt Swanson.On the difficulty of range searching.Computational Geometry: Theory and Applications, 8(3):115122, 1997.
Article: Arne Andersson, N. Jesper Nilsson, and Kurt Swanson.Suffix trees on words.Algorithmica. To appear.
Article: Arne Andersson and Kurt Swanson.Efficient representation of sparse matrices in secondary memory.Submitted.