Description

Perfect Hash-Table

Author

Kon Lovett

Version

Requires

Usage

(require-extension perfect-hash)

Download

perfect-hash.egg

Documentation

Purports to provide a SRFI-69 compatible API, but building a perfect hash table. However this hash API is not suitable for all datasets since the table size can grow astronomically.

The choice of a hash function is crucial to minimize hash collision. The SRFI-69 hash functions are ok, but better results might be obtained with a 'bounded' function from the "hashes" egg.

SRFI-69 Compatible API

procedure: (make-perfect-hash-table [EQUAL equal?] [HASH hash] [SIZE 1])

SRFI-69

procedure: (alist->perfect-hash-table ALIST [EQUAL equal?] [HASH hash])

SRFI-69

procedure: (perfect-hash-table? OBJECT)

SRFI-69

procedure: (perfect-hash-table-equivalence-function PHT)

SRFI-69

procedure: (perfect-hash-table-hash-function PHT)

SRFI-69

procedure: (perfect-hash-table-ref PHT KEY [THUNK])

SRFI-69

procedure: (perfect-hash-table-ref/default PHT KEY DEFAULT)

SRFI-69

procedure: (perfect-hash-table-set! PHT KEY VALUE)

SRFI-69

procedure: (perfect-hash-table-delete! PHT KEY)

SRFI-69

procedure: (perfect-hash-table-exists? PHT KEY)

SRFI-69

procedure: (perfect-hash-table-update! PHT KEY FUNCTION [THUNK])

SRFI-69

procedure: (perfect-hash-table-update!/default PHT KEY FUNCTION DEFAULT)

SRFI-69

procedure: (perfect-hash-table-size PHT)

SRFI-69

procedure: (perfect-hash-table-keys PHT)

SRFI-69

procedure: (perfect-hash-table-values PHT)

SRFI-69

procedure: (perfect-hash-table-walk PHT PROCEDURE)

SRFI-69

procedure: (perfect-hash-table-fold PHT FUNCTION INITIAL)

SRFI-69

procedure: (perfect-hash-table->alist PHT)

SRFI-69

procedure: (perfect-hash-table-copy PHT)

SRFI-69

procedure: (perfect-hash-table-merge! PHT1 PHT2)

SRFI-69

Parameters

parameter: perfect-hash-table-count-maximum

The maximum table size. When reached an error will be signaled, at which point the offending perfect-hash-table is corrupt.

parameter: perfect-hash-table-grow-bias

Additive amount to increase the vector storage.

parameter: perfect-hash-table-grow-factor

Multiplicative amount to increase the vector storage.

Procedures

procedure: (perfect-hash-table-count PHT)

Returns the number of table slots. This can be used to monitor the memory usage. A large difference between SIZE and COUNT signals poor table density.

procedure: (perfect-hash-table-perfect? PHT)

Is the hash perfect?

procedure: (make-perfect-hash-table-accessor PHT [THUNK])

Returns a reference procedure for the PHT, (-> object object). THUNK is called when the referenced item is not found.

The state of the PHT is cached. Lookups are performed using the state of the PHT at the point of invocation.

Issues

In practice this perfect hash table is useful only for small, < 100, datasets. While '*-exists?' is quicker than the srfi-69 version for all tested datasets, '*-ref' is about the same as the srfi-69 version. However, the vector storage cost can be much larger.

A 'perfect-hash-table-accessor' is always faster than 'perfect-hash-table-ref'.

License

Copyright (c) 2006, Kon Lovett.  All rights reserved.

Permission is hereby granted, free of charge, to any person obtaining a
copy of this software and associated documentation files (the Software),
to deal in the Software without restriction, including without limitation
the rights to use, copy, modify, merge, publish, distribute, sublicense,
and/or sell copies of the Software, and to permit persons to whom the
Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included
in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED ASIS, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR
OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
OTHER DEALINGS IN THE SOFTWARE.