hashtable.anubis 9.87 KB

Edit Raw Blame History


 *Project*                             The Anubis Project

 *Title*                         Creating and using hash tables.

 *Copyright*                     Copyright (c) Alain Prouté 2004.


 *Author*       Alain Prouté


   --------------------------------- table of Contents -----------------------------------

   *** (1) Creating a hash table.
   *** (2) Reading the value of an entry.
   *** (3) Updating/creating an entry.
   *** (4) Deleting an entry.

   ---------------------------------------------------------------------------------------

   One  of the  most obvious  ideas  for creating  a dictionary  is  to create  a list  of
   'definitions'  or 'entries'.   Each entry  is made  of two  parts: a  character string,
   representing  the word  defined by  the  entry, and  the definition  proper, i.e.   the
   description of the  meaning (or meanings) of  this word.  This kind of  thing is called
   'A-list' (association list) in  Lisp for example. This will work, but  will not be very
   efficient. Indeed, finding  a definition in the dictionary  requires inspecting all the
   entries until the right one is found.

   In  order to  be  efficient  (in computation  speed),  you should  better  use a  'hash
   table'. This  may be 100  or 1000 times  faster than using a  list of entries.   A hash
   table is  something like a partial function  (i.e. not necessarily defined  for all the
   elements of  the source type)  except that  the value associated  to an element  of the
   source  type may  be changed  during the  execution.  This  is a  good way  of creating
   dictionaries  which have  to be  enriched dynamically,  and into  which access  must be
   fast. (Another solution being B-trees.)

   A  hash table  dispatches  the  entries of  the  dictionary into  a  certain number  of
   'classes'. The  efficiency of the hash table  depends on this number.   The higher this
   number, the faster the access to the data. However, the higher this number, the greater
   the memory used by the hash table.  A  good compromise is to choose a number of classes
   which is between the half and the tenth of the typical number of entries you think your
   dictionary may have in general.

   However,  what  you provide  is  not  the  number of  classes  itself  but its  base  2
   logarithm. For example, if you choose 10 for this logarithm, the number of classes will
   be 1024 (i.e.  2^10),  which is very good for a dictionary  supposed to hold about 5000
   entries.

   Hash tables are of the following type (which should be considered as opaque):

public type HashTable($S,$T):...

   where $S and $T are the source  and target types (of our partial function). Notice that
   $T may  be arbitrary, but that  $S must be  serializable.  The hash table  associates a
   datum of type $T to any (recorded) datum of type $S.


   *** (1) Creating a hash table.

   You can create an empty (i.e. with no entry at all) hash table with:

public define HashTable($S,$T)
   create_hashtable
     (
       Word32    lnc       // base 2 logarithm of the number of classes
     ).


   The memory used by the hashtable when it is empty is about:

        4 * (2^lnc)      bytes

   For example, if you  choose lnc = 10, the empty table will use  4096 bytes of memory as
   soon as  it is created. This is  not so much for  most machines, but values  like 16 or
   more may make problems.

   Notice that the value of 'lnc' has no impact on the number of entries you can create in
   the hash  table. The  number of entries  is illimited.  The choice of  'lnc' is  just a
   matter of performances.


   Now, you want to use your hash table. The hash table accepts the following operations:

     - reading the content of an entry,
     - updating/creating a new entry,
     - deleting an entry,

   All the  functions declared  below are  protected (by the  'protect' mecanism)  so that
   several virtual machines may use the same hash table without corrupting the hash table.


   *** (2) Reading the value of an entry.

public define Maybe($T)
   read_entry
     (
       HashTable($S,$T)      hashtable,
       $S                    entry_name
     ).

   'read_entry' returns 'failure' if no entry  exists with the given name.  Otherwise, the
   value of the  entry is returned within  a 'success' (see definition of  type 'Maybe' in
   'predefined.anubis').


   *** (3) Updating/creating an entry.

public define One
   update_entry
     (
       HashTable($S,$T)           hashtable,
       $S                         entry_name,
       Maybe($T) -> $T            update
     ).

   If you want to update an entry,  you provide the function 'update' of type Maybe($T) ->
   $T,  which  receive  as  its  operand,  either  (see  definition  of  type  'Maybe'  in
   'predefined.anubis'):

      - 'failure' if the entry does not exist,
      - 'success(e)' if the entry exists and has value 'e'.

   This function 'update' must return the new value of the entry. Notice that if the entry
   does not  exist, it  is created. Actually,  'update_entry' must  be used to  create new
   entries.


   *** (4) Deleting an entry.

public define One
   delete_entry
     (
       HashTable($S,$T)      hashtable,
       $S                    entry_name
     ).

   This  function   simply  deletes   the  entry  whose   name  is  'entry_name',   if  it
   exists. Otherwise, nothing happens.


   --- That's all for the public part ! --------------------------------------------------


   --------------------------------- table of Contents -----------------------------------
   *** [1] Managing lists of entries.
      *** [1.1] Adding/updating an entry in a list of entries.
      *** [1.2] Getting the value of an entry from a list of entries.
      *** [1.3] Deleting an entry from a list of entries.
   *** [2] Creating the hash table.
   *** [3] Using the hash table.
      *** [3.1] Updating/creating entries in a hash table.
      *** [3.2] Reading an entry from a hash table.
      *** [3.3] Deleting an entry from a hash table.
   ---------------------------------------------------------------------------------------

   The hash  table is just a  table (i.e. a multiple  variable) of lists  of entries. Each
   entry is  a pair of  type 'Entry($S,$T)'.  Each  list represents a 'class'  of entries.
   The  class  to  which  an  entry  belongs  is  identified  by  the  'simple_hash'  (see
   'predefined.anubis') of the name of the entry.  The hash table is more efficient than a
   single list of  entries, because the lists  of entries in the hash  table are typically
   very short (the  average length is the quotient  of the total number of  entries by the
   number of classes).

type Entry($S,$T):
   entry($S,$T).

public type HashTable($S,$T):
   hashtable  (Word32,                               // lnc
               MVar(List(Entry($S,$T)))).           // entries


   *** [1] Managing lists of entries.

   First of all we need some functions for managing lists (classes) of entries:

     - adding/updating an entry,
     - getting an entry,
     - deleting an entry.


      *** [1.1] Adding/updating an entry in a list of entries.

define List(Entry($S,$T))
   update_entry
     (
       List(Entry($S,$T))       l,
       $S                       entry_name,
       Maybe($T) -> $T          update
     ) =
   if l is
     {
       [ ] then
         [entry(entry_name,update(failure))],

       [h . t] then if h is entry(name1,value1) then
         if entry_name = name1
         then [entry(name1,update(success(value1))) . t]
         else [h . update_entry(t,entry_name,update)]
     }.


      *** [1.2] Getting the value of an entry from a list of entries.

define Maybe($T)
   get_value
     (
       List(Entry($S,$T))        l,
       $S                        entry_name
     ) =
   if l is
     {
       [ ] then failure,
       [h . t] then if h is entry(name1,value1) then
         if name1 = entry_name
         then success(value1)
         else get_value(t,entry_name)
     }.


      *** [1.3] Deleting an entry from a list of entries.

define List(Entry($S,$T))
   delete_entry
     (
       List(Entry($S,$T))        l,
       $S                        entry_name
     ) =
   if l is
     {
       [ ] then [ ],
       [h . t] then if h is entry(name1,_) then
         if name1 = entry_name
         then t
         else [h . delete_entry(t,entry_name)]
     }.


   *** [2] Creating the hash table.

public define HashTable($S,$T)
   create_hashtable
     (
       Word32    lnc       // base 2 logarithm of the number of classes
     ) =
   hashtable(lnc,mvar(1<<to_Int(lnc),[])).


   *** [3] Using the hash table.


      *** [3.1] Updating/creating entries in a hash table.

public define One
   update_entry
     (
       HashTable($S,$T)           htb,
       $S                         name,
       Maybe($T) -> $T            update
     ) =
   if htb is hashtable(lnc,entries) then
   with      index = simple_hash(lnc,name),
   protect   (entries(index) <- update_entry(*entries(index),name,update)).


      *** [3.2] Reading an entry from a hash table.

public define Maybe($T)
   read_entry
     (
       HashTable($S,$T)      htb,
       $S                    name
     ) =
   if htb is hashtable(lnc,entries) then
   with      index = simple_hash(lnc,name),
   protect   (get_value(*entries(index),name)).


      *** [3.3] Deleting an entry from a hash table.

public define One
   delete_entry
     (
       HashTable($S,$T)      htb,
       $S                    name
     ) =
   if htb is hashtable(lnc,entries) then
   with      index = simple_hash(lnc,name),
   protect   (entries(index) <- delete_entry(*entries(index),name)).