Description

A dictionary data structure based on counting Bloom filters.

Author

Ivan Raikov

Version

Requires

Usage

(require-extension sfht)

Download

sfht.egg

Documentation

The sfht library is an implementation of the Shared-node Fast Hash Table (SFHT) data structure described by Song, et al., in

_Fast Hash Table Lookup Using Extended Bloom Filter: An Aid to Network Processing_. (SIGCOMM'05)
.

This code defines an sfht object that implements a dictionary mapping of keys to values. The object responds to messages for querying, insertion of new elements, and deletion of existing elements. The interface of the sfht object is particularly suitable for situations where the keys are represented by bit vectors or vectors of fixnum values.

A counting Bloom filter is a Bloom filter that has been extended so that each bit of the filter has a counter associated with it. Upon insertion or deletion of an element, the counter is incremented or decremented, respectively. In order to find an element efficiently, we need to compute the k hash values, read the counters at the k locations, determine the smallest bucket size, and perform a linear search of that bucket for the element.

SFHT procedures

The sfht object is created by a make-sfht function, the only user-visible function defined in this egg:

procedure: make-sfht:: N P MAKE-RNG-STATE RANDOM! KEY->VECTOR KEY-VECTOR-REF KEY-VECTOR-LENGTH [KEY-EQUAL?] -> SELECTOR

where

MAKE-RNG-STATE is a user-supplied function that takes in an integer argument and returns an RNG state value.
RANDOM! is a user-supplied function that generates a random positive integer, given a state value, which is expected to be mutated.
KEY->VECTOR is a user-supplied function that takes a key value and returns a vector.
KEY-VECTOR-REF is a user-supplied function that retrieves an element from the vector returned by KEY-VECTOR.
KEY-VECTOR-LENGTH is a user-supplied function that returns the length of the key vector.
KEY->EQUAL? is a user-supplied predicate that takes two keys and returns #t if they are equal. The default function used is equal?

The returned selector procedure can take one of the following arguments:

'get returns a procedure LAMBDA KEY . DEFAULT-CLAUSE which searches the hash table for an association with a given KEY, and returns a (key . value) pair of the found association. If an association with KEY cannot be located in the hash table, the PROC returns the result of evaluating the DEFAULT-CLAUSE. If the default clause is omitted, an error is signaled. KEY must be comparable to the keys in the hash table by the KEY-EQUAL? predicate specified when the hash table was created)
'empty? returns #t if the hash table is empty
'size returns the size (the number of associations) in the hash table
'clear! removes all associations from the hash table (thus making it empty)
'put! returns a procedure LAMBDA KEY VALUE which, given a KEY and a VALUE, adds the corresponding association to the hash table. If an association with the same KEY already exists, its value is replaced with the VALUE. The return value is #f.
'delete! returns a procedure LAMBDA KEY . DEFAULT-CLAUSE which searches the hash table for an association with a given KEY, deletes it, and returns a (key . value) pair of the found and deleted association. If an association with the KEY cannot be located in the hash table, the PROC returns the result of evaluating DEFAULT-CLAUSE. If the default clause is omitted, an error is signaled.
'debugprint prints out all the contents the Bloom filter, for debug purposes

Examples

(require-extension iset)
(require-extension sfht)
(require-extension random-swb)

(define sfht (make-sfht 100000 0.0001 
			(lambda (i) (make-swb-random-state i (fx+ i 17)))
			swb:random!
			integer->bit-vector 
			(compose (lambda (x) (if x 1 0)) bit-vector-ref)
			bit-vector-length))

((sfht 'put!) 1 'one)
((sfht 'get))

License

Copyright 2007 Ivan Raikov and the Okinawa Institute of Science and Technology

This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 3 of the License, or
at your option any later version.

This program is distributed in the hope that it will be useful, but
WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
General Public License for more details.

A full copy of the GPL license can be found at
<http://www.gnu.org/licenses/>.