hashtable.anubis
9.87 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
*Project* The Anubis Project
*Title* Creating and using hash tables.
*Copyright* Copyright (c) Alain Prouté 2004.
*Author* Alain Prouté
--------------------------------- table of Contents -----------------------------------
*** (1) Creating a hash table.
*** (2) Reading the value of an entry.
*** (3) Updating/creating an entry.
*** (4) Deleting an entry.
---------------------------------------------------------------------------------------
One of the most obvious ideas for creating a dictionary is to create a list of
'definitions' or 'entries'. Each entry is made of two parts: a character string,
representing the word defined by the entry, and the definition proper, i.e. the
description of the meaning (or meanings) of this word. This kind of thing is called
'A-list' (association list) in Lisp for example. This will work, but will not be very
efficient. Indeed, finding a definition in the dictionary requires inspecting all the
entries until the right one is found.
In order to be efficient (in computation speed), you should better use a 'hash
table'. This may be 100 or 1000 times faster than using a list of entries. A hash
table is something like a partial function (i.e. not necessarily defined for all the
elements of the source type) except that the value associated to an element of the
source type may be changed during the execution. This is a good way of creating
dictionaries which have to be enriched dynamically, and into which access must be
fast. (Another solution being B-trees.)
A hash table dispatches the entries of the dictionary into a certain number of
'classes'. The efficiency of the hash table depends on this number. The higher this
number, the faster the access to the data. However, the higher this number, the greater
the memory used by the hash table. A good compromise is to choose a number of classes
which is between the half and the tenth of the typical number of entries you think your
dictionary may have in general.
However, what you provide is not the number of classes itself but its base 2
logarithm. For example, if you choose 10 for this logarithm, the number of classes will
be 1024 (i.e. 2^10), which is very good for a dictionary supposed to hold about 5000
entries.
Hash tables are of the following type (which should be considered as opaque):
public type HashTable($S,$T):...
where $S and $T are the source and target types (of our partial function). Notice that
$T may be arbitrary, but that $S must be serializable. The hash table associates a
datum of type $T to any (recorded) datum of type $S.
*** (1) Creating a hash table.
You can create an empty (i.e. with no entry at all) hash table with:
public define HashTable($S,$T)
create_hashtable
(
Word32 lnc // base 2 logarithm of the number of classes
).
The memory used by the hashtable when it is empty is about:
4 * (2^lnc) bytes
For example, if you choose lnc = 10, the empty table will use 4096 bytes of memory as
soon as it is created. This is not so much for most machines, but values like 16 or
more may make problems.
Notice that the value of 'lnc' has no impact on the number of entries you can create in
the hash table. The number of entries is illimited. The choice of 'lnc' is just a
matter of performances.
Now, you want to use your hash table. The hash table accepts the following operations:
- reading the content of an entry,
- updating/creating a new entry,
- deleting an entry,
All the functions declared below are protected (by the 'protect' mecanism) so that
several virtual machines may use the same hash table without corrupting the hash table.
*** (2) Reading the value of an entry.
public define Maybe($T)
read_entry
(
HashTable($S,$T) hashtable,
$S entry_name
).
'read_entry' returns 'failure' if no entry exists with the given name. Otherwise, the
value of the entry is returned within a 'success' (see definition of type 'Maybe' in
'predefined.anubis').
*** (3) Updating/creating an entry.
public define One
update_entry
(
HashTable($S,$T) hashtable,
$S entry_name,
Maybe($T) -> $T update
).
If you want to update an entry, you provide the function 'update' of type Maybe($T) ->
$T, which receive as its operand, either (see definition of type 'Maybe' in
'predefined.anubis'):
- 'failure' if the entry does not exist,
- 'success(e)' if the entry exists and has value 'e'.
This function 'update' must return the new value of the entry. Notice that if the entry
does not exist, it is created. Actually, 'update_entry' must be used to create new
entries.
*** (4) Deleting an entry.
public define One
delete_entry
(
HashTable($S,$T) hashtable,
$S entry_name
).
This function simply deletes the entry whose name is 'entry_name', if it
exists. Otherwise, nothing happens.
--- That's all for the public part ! --------------------------------------------------
--------------------------------- table of Contents -----------------------------------
*** [1] Managing lists of entries.
*** [1.1] Adding/updating an entry in a list of entries.
*** [1.2] Getting the value of an entry from a list of entries.
*** [1.3] Deleting an entry from a list of entries.
*** [2] Creating the hash table.
*** [3] Using the hash table.
*** [3.1] Updating/creating entries in a hash table.
*** [3.2] Reading an entry from a hash table.
*** [3.3] Deleting an entry from a hash table.
---------------------------------------------------------------------------------------
The hash table is just a table (i.e. a multiple variable) of lists of entries. Each
entry is a pair of type 'Entry($S,$T)'. Each list represents a 'class' of entries.
The class to which an entry belongs is identified by the 'simple_hash' (see
'predefined.anubis') of the name of the entry. The hash table is more efficient than a
single list of entries, because the lists of entries in the hash table are typically
very short (the average length is the quotient of the total number of entries by the
number of classes).
type Entry($S,$T):
entry($S,$T).
public type HashTable($S,$T):
hashtable (Word32, // lnc
MVar(List(Entry($S,$T)))). // entries
*** [1] Managing lists of entries.
First of all we need some functions for managing lists (classes) of entries:
- adding/updating an entry,
- getting an entry,
- deleting an entry.
*** [1.1] Adding/updating an entry in a list of entries.
define List(Entry($S,$T))
update_entry
(
List(Entry($S,$T)) l,
$S entry_name,
Maybe($T) -> $T update
) =
if l is
{
[ ] then
[entry(entry_name,update(failure))],
[h . t] then if h is entry(name1,value1) then
if entry_name = name1
then [entry(name1,update(success(value1))) . t]
else [h . update_entry(t,entry_name,update)]
}.
*** [1.2] Getting the value of an entry from a list of entries.
define Maybe($T)
get_value
(
List(Entry($S,$T)) l,
$S entry_name
) =
if l is
{
[ ] then failure,
[h . t] then if h is entry(name1,value1) then
if name1 = entry_name
then success(value1)
else get_value(t,entry_name)
}.
*** [1.3] Deleting an entry from a list of entries.
define List(Entry($S,$T))
delete_entry
(
List(Entry($S,$T)) l,
$S entry_name
) =
if l is
{
[ ] then [ ],
[h . t] then if h is entry(name1,_) then
if name1 = entry_name
then t
else [h . delete_entry(t,entry_name)]
}.
*** [2] Creating the hash table.
public define HashTable($S,$T)
create_hashtable
(
Word32 lnc // base 2 logarithm of the number of classes
) =
hashtable(lnc,mvar(1<<to_Int(lnc),[])).
*** [3] Using the hash table.
*** [3.1] Updating/creating entries in a hash table.
public define One
update_entry
(
HashTable($S,$T) htb,
$S name,
Maybe($T) -> $T update
) =
if htb is hashtable(lnc,entries) then
with index = simple_hash(lnc,name),
protect (entries(index) <- update_entry(*entries(index),name,update)).
*** [3.2] Reading an entry from a hash table.
public define Maybe($T)
read_entry
(
HashTable($S,$T) htb,
$S name
) =
if htb is hashtable(lnc,entries) then
with index = simple_hash(lnc,name),
protect (get_value(*entries(index),name)).
*** [3.3] Deleting an entry from a hash table.
public define One
delete_entry
(
HashTable($S,$T) htb,
$S name
) =
if htb is hashtable(lnc,entries) then
with index = simple_hash(lnc,name),
protect (entries(index) <- delete_entry(*entries(index),name)).