description.text 201 KB

Edit Raw Blame History



                             Description of the Anubis Compiler


   This file contains  a description of the Anubis compiler together  with a discussion on
   how the next version of the compiler should be designed.


   -------------------------------- Table of Contents ------------------------------------

   *** (1) Overall description.
      *** (1.1) Modules of the compiler.
      *** (1.2) How it works.
         *** (1.2.1) Phase 1.
         *** (1.2.2) Phase 2.
      *** (1.3) Garbage collection.

   *** (2) Tools.
      *** (2.1) Lisp-like representation.
      *** (2.2) Exceptions to the Lisp-like 'Expr' representation.
      *** (2.3) Catalog of all internal data representations.
         *** (2.3.1) Internal representation of types.
         *** (2.3.2) Internal representation of terms.
         *** (2.3.3) Internal representation of interpretations.
         *** (2.3.4) Internal representation of type implementations.
         *** (2.3.5) Internal representation of virtual machine instructions.
            *** (2.3.5.1) Pseudo instructions.
            *** (2.3.5.2) Computing instructions.
            *** (2.3.5.3) Garbage collector instructions.
            *** (2.3.5.4) Closure instructions.
            *** (2.3.5.5) Equality instructions.
            *** (2.3.5.6) Dynamic variables instructions.
            *** (2.3.5.7) Serialising/unserialising.
            *** (2.3.5.8) Starting a new virtual machine.
            *** (2.3.5.8) Obsolete and soon obsolete instructions.
      *** (2.4) Unification.
      *** (2.5) Organization of the knowledge base.
      *** (2.4) Indexation of symbols.

   *** (3) The lexer.
      *** (3.1) Generalities.
      *** (3.2) Finding source files.
      *** (3.3) The 'read' stack.
      *** (3.4) Remembering already read files.
      *** (3.5) Computing visibilities.

   *** (4) The parser.

   *** (5) Computing interpretations of terms.
      *** (5.1) Overview.
      *** (5.2) Interpreting applicative terms.
      *** (5.3) Interpreting conditionals.
      *** (5.4) Interpreting selective conditionals.

   *** (6) Implementation of types.
      *** (6.1) Implementation of primitive types.
         *** (6.1.1) 'String'.
         *** (6.1.2) 'ByteArray'.
         *** (6.1.3) 'Int32'.
         *** (6.1.4) 'Float'.
         *** (6.1.5) 'Listener'.
      *** (6.2) Implementation of defined types.
      *** (6.3) Implementation of functional types.
      *** (6.4) Implementation of address types.
         *** (6.4.1) File and network address types.
         *** (6.4.2) Dynamic variable address types.
      *** (6.5) implementation of C structure pointer types.

   *** (7) Code generation.
      *** (7.1) General conventions.
      *** (7.2) Elimination of terminal recursion.
      *** (7.3) Generating the code.
         *** (7.3.1) Protect.
         *** (7.3.2) Lock.
         *** (7.3.3) Global symbols.
         *** (7.3.4) Strings, Int32, small data and Floats.
         *** (7.3.5) Local symbols.
         *** (7.3.6) Microlocal symbols.
         *** (7.3.7) Closures.
         *** (7.3.8) Applicative terms.
         *** (7.3.9) Conditionals.
         *** (7.3.10) Selective conditionals.
         *** (7.3.11) Local definitions ('with ...').
         *** (7.3.12) Delegate.
         *** (7.3.13) Serialize/Unserialize.
      *** (7.4) Peephole optimisation.

   *** (8) Garbage collector code.

   *** (9) Equality code.

   *** (10) Serialisation/unserialisation.
      *** (10.1) How it works.
      *** (10.2) Descriptions of small types.
      *** (10.3) Pseudo-instructions set.
      *** (10.4) Serialising.
      *** (10.5) Unserialising.

   *** (11) Outputing the code (making an *.adm file).

   *** (12) How to make a system of macros ?
      *** (12.1) Overview.
      *** (12.2) The trick.
      *** (12.3) How it works.
      *** (12.4) An example.

   *** (13) Propositions for a dynamically defragmenting memory management system.
      *** (13.1) How it works.
      *** (13.2) Moving an allocated segment within memory.
      *** (13.3) Compatibility with segregated lists.
      *** (13.4) Defragmenting memory.
      *** (13.5) This allocator seen from the outside.
         *** (13.5.1) Hidden words.
         *** (13.5.2) Tools (interface to the allocator).
      *** (13.6) Permanent data.
      *** (13.7) Fake references.
      *** (13.8) Only one main segment ?

   *** (14) Static garbage-collection.
      *** (14.1) The facts.
      *** (14.2) Moving 'copy' and 'delete' instructions.
      *** (14.3) Moving 'copy' instructions downwards.
      *** (14.4) Should functions eat their arguments ?
      *** (14.5) Moving 'delete' instructions upwards.

   *** (15) Voluntary multitasking.

   ---------------------------------------------------------------------------------------


   *** (1) Overall description.


      *** (1.1) Modules of the compiler.

   The Anubis compiler is made of the following main parts:

     - the lexer (lexer.y)
     - the parser (grammar.y)
     - a module for analysing type definitions (typedef.c)
     - a module for analysing data definitions (opdef.c)
     - the interpretations module (this not an interpreter) (interp.c)
     - the compiler proper (generation/optimization of symbolic code) (compile.c)
     - the module for computing the implementations of data types (implem.c)
     - the output module (making the '*.adm' files) (output.cpp)

   It also contains secondary modules. Some of them are always used:

     - a module for deciding if a type is recursive (rectype.c)
     - tools for working on data types (unify.c, typetools.c, typewidth.c, typecmp.c, unknowns.c)
     - computing the implicit destructors (destruct.c)
     - computing the deletion code (delcode.c)
     - computing the equality code (eqcode.c)
     - a module containing various tools for manipulating expressions (expr.cpp)
     - a system for managing the memory of the compiler itself (mallocz.c)
     - a set of messages and functions for printing messages (msgtexts.c, show.c)
     - a set of predefined paragraphs (predef.c, predef.anubis, predefined.anubis)
     - tools for computing the size and binary format of instructions (vminstr.c)

   Some others are used only on demand (options of the compiler):

     - making an HTML index of all public symbols (index.c)
     - outputing a C version of some Anubis types together with the constructors (dumpct.c)
     - making a 'polish' translation (polish.c)
     - producing a symbolic code dump (*.sc) (symcode.c)


      *** (1.2) How it works.

   The compiler works in two phases:

     - phase 1: reading source files, checking paragraphs, computing the interpretations.
     - phase 2: generating a module for each 'global' paragraph.

         *** (1.2.1) Phase 1.

   During  phase 1,  the  compiler reads  source files  and  does the  following for  each
   paragraph:

     1. reading  the paragraph  (using the  parser and  the lexer).   This  may eventually
        produce a syntax error.

     2. depending on the kind of paragraph:

        - 'read' paragraph:  checking if the  file refered to  has already been  read. The
        file is read only  if it has not already been read.   However, visibility is setup
        correctly. See the section on the lexer below.

        - 'replaced  by' paragraph: this  is the  same as  'read' paragraphs,  except that
        visibility is computed  differently. See the section on the lexer below.

        - 'type' paragraph: the  type definition is checked.  See  the section 'Checking a
        type definition' below.

        - 'define' paragraph: the definition is checked. See the section 'Checking a datum
        definition' below.

        - 'C constructor' paragraph. An Anubis type is translated into some program usable
        from within a C program. (the output,  for all those paragraphs, is written in the
        files 'construc.h.out' and 'construc.c.out').

   Only one error is considered per paragraph. If an error is found, the analysis restarts
   at the next paragraph.


         *** (1.2.2) Phase 2.

   Phase 2  is performed only  if there is  no error after phase  1. Phase 2  performs the
   following tasks:

     - check  if all type  definitions are  complete. Since  a type  may be  defined using
     several paragraphs, it is possible  to define incomplete types (all alternatives have
     not been defined).

     - determine which types are infinite (recursive).

     - Dump C versions of types (if required).

     - make the modules (*.adm).

     - dump implementations of types (if required).

     - print statistics (if required).

     - print possible errors after phase 2.


      *** (1.3) Garbage collection.

   Data are implemented in two different ways:

     - directly
     - indirectly

   'Directly' means that what we have in  hand is the datum itself, and 'indirectly' means
   that what we have in hand is a pointer to a segment of memory containing the datum.

   Data may  be 'created', 'duplicated' and  'destroyed'. When we duplicate  (or 'copy') a
   direct datum,  we have  no choice,  we just  copy all the  bits of  the datum.  When we
   duplicate an indirect datum, we have the choice between:

     - allocating a new segment of memory, and copying the content of the original segment
       into the new segment (this is called 'real' copy),
     - just duplicating the pointer (this is called 'virtual' copy).

   Notice that these  methods are semantically equivalent for  abstract data. For concrete
   data (for example: dynamic variables, connections, etc...), indirect implementation and
   virtual copy are required.

   Of course, virtual  copy is faster, but it creates 'shared  data'.  When a (necessarily
   indirect) datum  is shared, we  have several pointers  pointing to the same  segment of
   memory. Then, the problem is just to know  when the segment is no more used, so that we
   can  return it  to the  pool of  memory. This  problem is  easily solved  by 'reference
   counting', which works as follows.

   The first 32 bits word of any  allocated segment is reserved for reference counting. It
   countains an unsigned integer, called the 'counter' (for this segment). This counter is
   either:

      - zero, in which case, the segment is 'permanent' and will never be freed,
      - non zero,  in which  case its  value is the  number of  pointers pointing  to this
        segment.

   Permanent segments are  created in general (always ?) at compile  time. For example, if
   you write  a character string into your  source text, the compiler  creates a permanent
   segment for  holding this string (this segment  is within the program  code itself). Of
   course the counter of a permanent datum is never changed and remains always 0.

   Non   permanent   data  are   created   dynamically   at   run  time.    The   function
   'AllocateDataSegment' (see the  virtual machine source code) receives  the number of 32
   bits words  to allocate,  and returns the  pointer to  the segment (actually  casted to
   'U32'). The first 32 bits words of  the segment (the counter) is already initialized to
   1.

   When we need to  make a virtual copy of a (non permanent)  indirect datum, we just have
   to increment the  counter of this segment  (so that it always represents  the number of
   (significant) pointers to the segment).   Notice that this will never overflow, because
   an  overflow  would  mean a  number  of  pointers  bigger  than  the total  memory  can
   hold.

   When  we want  to  delete  a (non  permanent)  indirect datum,  we  perform a  'virtual
   deletion'.  This  means that we  decrement the counter.  If the counter becomes  0, the
   segment may be freed.

   But  freeing the  segment is  not as  simple as  this, because  the segment  itself may
   contain pointers  which are  counted within other  segments. Freeing the  segment means
   destroying  these  pointers,   so  that  we  must  also   decrement  the  corresponding
   counters. We see that this process  is recursive, because destroying a pointer may lead
   to decrement a counter to zero, and to free another segment, and so on ...

   Hence, the process of freeing the segments must be coded as a (recursive) program. This
   is what we call  'garbage collector code'.  This code is generated  by the compiler and
   depends only  on the  definition of  types.  Actually, there  is one  garbage collector
   function  for each  defined type  which  may contain  indirect data  (i.e. 'mixed'  and
   'large'  types). Some  other  types  (like functional  types)  require special  garbage
   collector functions.


   *** (2) Tools.


      *** (2.1) Lisp-like representation.

   The  Anubis compiler  uses a  particular style  of programming  inspired by  Lisp.  The
   advantage is  that this system is very  flexible.  The disadvantage is  that, being not
   typed, it is not very safe. I think I will never do that again. It works as follows.

   The  type  'Expr'  represents all  symbolic  'expressions'  to  be manipulated  by  the
   compiler. This includes  tokens generated by the lexer,  syntactical trees generated by
   the  parser,  interpretations  generated   by  the  interpretations  module,  contexts,
   environments, symbolic code, ...

   Actually 'Expr' is  just a clone of U32,  the type of unsigned 32  bits integers. These
   integers are 'dynamically' typed (using the low bits) as follows:

   0 mod 2 ---------------------------------> (signed) integers
   1 mod 2 --->  1 mod 32 ------------------> tags
                 3 mod 32 ------------------> temporary pairs
                 5 mod 32 ------------------> strings
                 7 mod 32 --->  7 mod 64 ---> user defined type variables
                               39 mod 64 ---> internal type variables (unknowns)
                 9 mod 32 ------------------> permanent pairs

   In other words:

      xxxxxxxx xxxxxxxx xxxxxxxx xxxxxxx0       integer
      xxxxxxxx xxxxxxxx xxxxxxxx xxx00001       tags
      xxxxxxxx xxxxxxxx xxxxxxxx xxx00011       temporary pairs
      xxxxxxxx xxxxxxxx xxxxxxxx xxx00101       strings
      xxxxxxxx xxxxxxxx xxxxxxxx xx000111       user defined type variables (like $T)
      xxxxxxxx xxxxxxxx xxxxxxxx xx010111       unknowns
      xxxxxxxx xxxxxxxx xxxxxxxx xxx01001       permanent pairs

   Notice for  example, that pairs (both temporary  and permanent) have the  5 lowest bits
   inhibited. Hence, the maximal number of temporary pairs is 2^27. The same for permanent
   pairs. This makes  at most 134217728 pairs  of each sort. For temporary  pairs, this is
   not a problem, but  for permanent pairs, this may be a  problem.  Nevertheless, since a
   pair occupies  8 bytes, an  array of permanent  pairs containing 134217728  pairs would
   contain 2^30  bytes, i.e. 1  gigabyte. So, with  a machine of  less than 1  gigabyte of
   memory, no problem will happen (except that very big programs may not be compiled).

   A good program  should be protected against such problems independantly  of the size of
   available  memory. Examining  the  code  I see  that  it is  not  the  case for  Anubis
   1. Indeed, the function 'pcons' (defined in 'expr.cpp'), which is used for making a new
   permanent pair,  has no  protection against  an enlargement of  the array  of permanent
   pairs above the limit. This should be fixed as soon as possible. (Similarly, 'cons' for
   temporary pairs is not well protected).


   All  'Expr's except temporary  and permanent  pairs are  called 'atoms'.  Pairs, either
   temporary or permanent, are just made  of two 'Expr's, and represented (in the comments
   of the source of the compiler), as follows:

                                              (a . b)

   where 'a'  and 'b' are  two 'Expr's  respectively called the  'head' and 'tail'  of the
   pair. The macro 'cons' enables the construction of such pairs.

   As in  Lisp, lists are represented  using pairs and the  tag 'nil' (the  empty list) as
   follows (for a list made of 4 'Expr's):

                                             (a b c d)

   which is actually a shortcut for:

                                     (a . (b . (c . (d . nil))))

   Such lists may be easily constructed  using the macros 'list1', 'list2', ... defined in
   'compil.h'.

   Now, an  'Expr' like (a .   (b .  (c .   d))) is written '(a  b c .  d)',  and called a
   'multiple  cons'. A  set of  macros, called  'mcons3', 'mcons4',  ...  enable  a direct
   construction of such data.

   Examples of use and meaning of these macros (and some others):

     The C term                         represents
     ---------------------------------------------------------------
     nil                                ( )
     cons(a,b)                          (a . b)
     mcons3(a,b,c)                      (a b . c)
     mcons4(a,b,c,d)                    (a b c . d)
     etc...
     list1(a)                           (a)          same as (a . nil)
     list2(a,b)                         (a b)        same as (a b . nil)
     etc...

     If x = (a b c d e):
     car(x)                             a
     second(x)                          b
     third(x)                           c
     etc...

     cdr(x)                             (b c d e)
     cdr2(x)                            (c d e)
     cdr3(x)                            (d e)
     etc...

     Notice that 'mcons2' does not exist because it is the same as 'cons'.

   'consp(x)' is 1  if and only x is a  pair (i.e. not an atom). Otherwise,  it is 0. This
   predicate ('consp'  is a  traditional Lisp abbreviation  for 'cons predicate')  is very
   often used as follows:

      while(consp(x))
        {
           Expr head = car(x);

           ... do something with the head of x ...

           x = cdr(x);    // continue with the tail of x
        }
        // in most cases, when we arrive here, we have x == nil (the empty list).


   Normally,  pairs  are  created  temporary,  and  transformed  into  permanent  only  if
   needed. The  question is that,  while working on  a paragraph, the  compiler constructs
   data which will  not be part of the  final result. When the analysis of  a paragraph is
   finished, only the data to be kept  are made permanent (using the function 'save'), and
   the array  of temporary  pairs is reset  to zero  for the next  paragraph. This  is how
   memory is managed in the compiler (this  has nothing to do with garbage collection in a
   virtual machine).

   Integers are coded on 31  bits (bits 1 to 31), and the bit number  0, whose value is 0,
   means  that this 'Expr'  is an  integer. The  file 'compil.h'  contains all  the macros
   needed to test the type of an 'Expr', or to construct 'Expr's.

   Tags  are used  for representing  simple atomic  concepts. For  example, the  tag 'nil'
   represents  the empty  list. They  are also  used for  'tagging' non  atomic  data. For
   example, when the function

                                    (Int32 i, String s) |-> a

   is read from a source file, the parser constructs the following 'Expr':

                      (lambda <lc> (([Int32] . i) ([String] . s)) . [a])

   where '[Int32]'  is a  representation of  type 'Int32' (similarly  for 'String')  , and
   '[a]' a representation of the term 'a'.

   Here, the tag 'lambda' represents the arrow '|->'.


      *** (2.2) Exceptions to the Lisp-like 'Expr' representation.

   Actually, this system has exceptions. Normally, all tokens returned by the lexer have a
   value of type 'Expr', i.e. a datum  containing its own (dynamic) type. Examining such a
   value it is possible to know if we have an integer, a string, a pair, etc...

   However, the two tokens 'yy__integer' and 'yy__char' have a value of type 'integer'.

   This means that when the lexer  reads a number (regular expression: [0-9]+), it returns
   the token  'yy__integer', with a  value to be  interpretated as a  C integer, not  as a
   Lisp-like integer. We must be careful  because for example the integer 6 is represented
   like this in the two systems:

          ...0110           (C integer: this is not an 'Expr')
         ...01100           (Lisp-like integer: this is an 'Expr')
                ^
                |
                +--------- this bit (0) means that this 'Expr' is an integer

   The reason for these exceptions is that we want the compiler to be able to read 32 bits
   integers, not only 31 bits integers from source files.

   The parser contains the following grammar rule:

      Term:    yy__integer    %prec prec_symbol  { $$ = mcons3(integer,linecol(),$1); }

   This means that when  an integer has been read by the  lexer, the parser constructs the
   following multiple cons as a representation of this integer:

                       (integer <lc> . <n>)

   where <n> is the value of the integer as a C integer, not as a Lisp-like integer.

   As a consequence, we must always remember  that a list whose head is the tag 'integer',
   has always  the form: (integer  <lc> . <n>),  where <n> is not  an 'Expr'. It  would be
   meaningless to  apply a  predicate like 'is_integer'  to <n>,  and the result  would be
   unpredictable.

   The exceptions are the following (where only <Cint> is not an 'Expr'):

      (integer <lc> . <Cint>)
      (small_datum <type> . <Cint>)
      (anb_int32 <lc> . <Cint>)
      (load_word32 . <Cint>)

   (I hope there is no other exception, but I'm not absolutely sure).

   Of course, the existence  of these exceptions is very bad. A  better solution should be
   to represent (say) the integer 746542 as the list  (7 4 6 5 4 2), where the elements of
   the list would have been Lisp-like  integers. In the compilation step, the compiler may
   transform this  representation into a packed integer  of 32, 64 bits,...   maybe with a
   warning in the case of an overflow. Actually, this is what happens already at a smaller
   scale. Indeed,  the current  version of  the compiler finds  3 interpretations  for the
   integers 180 (of types  'Int8', Int32' and 'Float'), but only 2  for 4567, because this
   number is greater that 255.

   Added 2007/01/16: he tag 'integer' is now obsolete. There are two new tags:

     integer_10
     integer_16

   They are used as follows:

     (integer_10 <lc> bigit ... bigit)
     (integer_16 <lc> bigit ... bigit)

   where  in both  cases  the bigits  are Lisp-like  integers  which are  bigits in  basis
   256. The reason why there  are two tags is just for knowing how  to print the number in
   an error message.


      *** (2.3) Catalog of all internal data representations.

   The compiler represents the following data (internally):

     - 'types' (as produced by the parser)
     - 'terms' (as produced by the parser)
     - 'interpretations' (as produced by 'term_interpretations' and related functions)
     - 'implementations' (as produced by 'type_implementation' and related functions)
     - 'instructions' (as produced by 'compile_term' and related functions)

   Note: All  data are represented  either by Lisp-like  atoms (string, tags, ...),  or by
   lists. In the case of a list, the head of list is always a tag.


         *** (2.3.1) Internal representation of types.

   <type> :=

   yy__Symbol                        just a type name in the form of a Lisp-like string

      For  example, the  types  'One', 'Bool'  etc ...   are  in this  category. The  file
      'compil.h' contains the  definitions of a lot of 'predefined  strings'. Within the C
      source, such symbols  may be represented by the  C symbols 'pdstr_One', 'pdstr_Bool'
      etc... ('pdstr_' for 'predefined strings').

      Equivalently, 'pdstr_Bool' maybe replaced  by 'new_string("Bool")'. This has exactly
      the same effect, except that 'pdstr_Bool' is faster at execution (of the compiler).

      Remark: 'new_string' (defined in 'expr.cpp') does not allocate a new segment for the
      string if the  string already exists. In this case, it  returns the already existing
      Lisp-like string.

      All  types  (without parameters)  whose  names  are created  by  the  user are  also
      represented like this.


   yy__utvar                         user type variable (type parameter)

      Type parameters have a special Lisp-like representation as atoms (see above).

   type_String                       a Lisp-like tag representing the type 'String'
   type_ByteArray                    idem
   type_Int32                        idem
   type_Float                        idem
   type_Listener                     idem

      The above being more 'primitive', they are represented by Lisp-like tags, instead of
      Lisp-like strings.

   (type_RAddr . <type>)             represents 'RAddr(T)'; type_RAddr is a Lisp-like tag
   (type_WAddr . <type>)             idem
   (type_RWAddr . <type>)            idem
   (type_GAddr . <type>)             idem (obsolete: global variable)
   (type_Var . <type>)               idem (dynamic variable)
   (type_MVar . <type>)              idem (multiple dynamic variable)

      The above are called 'address types'.

   (app_ts <name> . <types>)         type scheme applied to type operands

      'app_ts' is for 'apply type scheme'. For example, 'List(String)' is represented as:

                              (app_ts new_string("List") type_String)

      and 'Result(One,String)' as:

                         (app_ts new_string("Result") pdstr_One type_String)


      Notice  that  there is  no  dot in  the  above  Exprs.  '(app_ts  new_string("List")
      . type_String)' is erroneous, because the second tail ('cdr2') of the list must be a
      list of types, not a single type.

   (type_struct_ptr . <id>)          encapsulated  C structure type

      <id> is a Lisp-like integer identifying the structure.

   (functype <types> . <type>)       functional type

      <types> is the list of the representations of the arguments types, and <type> is the
      representation of the target type.


   There   is  still   another  sort   of  'types'   represented  internally:   the  'type
   unknowns'. They are also  a special sort of Lisp-like Exprs. They  are not generated by
   the parser,  but in general  by the interpretation  functions. When a symbol  is found,
   representing a  scheme, all  the type variables  in this  scheme are replaced  by fresh
   unknowns. Of course, there is a generator of fresh unknowns: 'fresh_unknown'.

   From this we may conclude that defined  types are those which are represented either by
   a string or by (app_ts ...).


         *** (2.3.2) Internal representation of terms.

   Note:  <lc>   (generated  by   the  lexer)   is  a  packed   version  of   the  triplet
   (filename,line,column). There are macros for extracting the components:

       file_in(<lc>)            a C char *
       line_in(<lc>)            a C int
       col_in(<lc>)             a C int


   As usual, <Cint> represents a C integer (i.e. not a Lisp-like integer).

   <term> :=

   (alert <lc> . <filename>)           <filename> is a Lisp-like string
   (protect <lc> . <term>)
   (lock <lc> <term> . <term>)
   (debug_avm <lc> . <term>)
   (avm <lc> . AVM)                    (AVM is explained below)
   (terminal <lc> . <term>)
   (symbol <lc> . <Lisp-like string>)
   (integer <lc> . <Cint>)
   (fpnum <lc> <mantissa> . <exponent>)    (<mantissa> and <exponent> are Lisp-like integers)
   (of_type <lc> <type> . <term>)          (explicit typing)
   (string <lc> . <string>)                (<string> is a Lisp-like string)
   (with <lc> <symbol> <value> . <term>)
   (delegate <lc> <term> . <term>)
   (lambda <lc> <FArgs> . <term>)               (|->)
   (rec_lambda <lc> <name> <FArgs> . <term>)    (|-f->)
   (app <lc> <term> . <terms>)                  (applicative term)
   (cond <lc> <term> . <cases>)                 (conditional)
   (select_cond <lc> <term> <case> . <term>)    (selective conditional)
   (serialize <lc> . <term>)
   (unserialize <lc> . <term>)

   The following are still used, but should become obsolete (and replaced by system calls).

   (anb_read ...)
   (anb_write ...)
   (anb_exchange ...)
   (connect_to_file ...)
   (connect_to_IP ...)
   (wait_for ...)


         *** (2.3.3) Internal representation of interpretations.

   The  function   'term_interpretations'  (in  'interp.c')  computes  the   list  of  all
   interpretations of  a term  (in a  given appropriate context).  An interpretation  is a
   pair:

                                          (head . env)

   where 'head'  is called an  'interpretation head', and  where env is  an 'environment'.
   Here, we describe interpretation heads.

   Unfortunately, I've used the same tags as for terms. It would be safer to use different
   tags.  Furthermore, tags for some sort of expressions should have a uniform prefix, for
   example:

      typ_           for types
      ter_           for terms
      int_           for interpretations
      imp_           for implementations
      ins_           for instructions

   The tags  should also be numbered.  This will limit the  risk of forgetting a  tag in a
   switch. Hence for example, we should have the following tags for terms:

     ter_00_alert
     ter_01_protect
     ter_02_lock
     etc...

   and a dummy one for marking the end of the list:

     ter_20_dummy


   Interpretations heads are the following:

   (protect <lc> . <head>)
   (lock <lc> <term> . <term>)
   (avm <lc> . <instructions>)
   (debug_avm <lc> . <head>)
   (terminal <lc> . <head>)

   (operation <lc> <opid> <name> <parms> <type> . <types>)
   (string <lc> . <string>)
   (anb_int32 <lc> . <Cint>)
   (small_datum <type> . <Cint>)
   (fpnum <lc> <mantissa> . <exponent>)
   (local <name> <i> . <type>)
   (micro_local <name> <d> <i> . <type>)
   (closure <lc> (f_micro_ctxt <fname> <ftype> (<sym> . <type>) ...) <args> . <body>)
   (app <lc> <head> . <heads>)
   (cond <lc> <head> ((<name> (<sym> . <type>) ...) <lc> . <head>) ...)
   (select_cond_interp <lc> <test head> <index> <clause head> <head then> . <head else>)
   (with <lc> <symbol> <head> . <head>)

   (delegate <lc> <head> . <head>)
   (serialize <lc> . <head>)
   (unserialize <lc> <type> . <head>)

   The following are still used but should become obsolete:

   (anb_read ...)
   (anb_write ...)
   (anb_exchange ...)
   (wait_for ...)


         *** (2.3.4) Internal representation of type implementations.

   Type implementations are computed in 'implem.c'. We never compute an interpretation for
   a scheme,  but only for an  actual type (i.e.   with no type parameter).   For example,
   'Maybe(Int8)' has an implementation different from 'Maybe(String)', but 'Maybe($T)' has
   no implementation at all.

   In the future, it would be nice to have quantification 'à la Girard' on types:

         forall $T ...
         exists $T ...

   In this case, we would have to implement for example the type:

         forall $T Maybe($T)

   which has no parameter, since $T is bound in this expression.

   Internal representations of implementations are  the following. Again the same tags are
   used, which is quite unfortunate.


   <implem> :=


   type_String
   type_ByteArray
   type_Int32
   type_Float

   (functype <implem> . <implems>)

   (type_RAddr . <implem>)
   (type_WAddr . <implem>)
   (type_RWAddr . <implem>)
   (type_GAddr . <implem>)       (obsolete)
   (type_MVar . <implem>)

      It  is very unfortunate  that dynamic  variables are  not implemented  as '(type_Var
      . <implem>). Instead,  this type is implemented as a large  type. I don't understand
      how  this can work  properly (but  it seems  to work  miraculously). This  should be
      changed of course.

   (type_struct_ptr . <id>)

   (small_type <nalt> <iw> <alt geom> ... <alt geom>)
   (mixed_type <nalt> <iw> <alt geom> ... <alt geom>)
   (large_type <nalt> <iw> <alt geom> ... <alt geom>)

   where:

   <alt geom> :=

   (small_alt (<imp> <offset> . <width>) ... (<imp> <offset> . <width>))
   (mixed_alt (<imp> <offset> . <width>) ... (<imp> <offset> . <width>))
   (large_alt (<imp> <offset> . <width>) ... (<imp> <offset> . <width>))


         *** (2.3.5) Internal representation of virtual machine instructions.


            *** (2.3.5.1) Pseudo instructions.

   These instructions are of size 0.

   (label . <addr>)             position within the code
   (ret_point . <addr>)         same as label
   (header ...)                 used for decorating the .sc files
   (comment ...)                idem
   (context ...)                idem


            *** (2.3.5.2) Computing instructions.


   (glue_index . <index>)                      used for small types
   (glue_mixed_index . <index>)                used for mixed types
   (store_index . <index>)                     used for large types

   These instructions are used  for constructing a datum of a defined  type.  They put the
   right index (i.e. alternative number of the datum) into the datum.

   For a  small datum, 'glue_index' puts the  index within the least  significant bits (of
   R). This is the  first operation when constructing the datum, so  that the content of R
   is actually overwritten.

   For  a  mixed  datum,  the  index  is   put  into  the  2  least  significant  bits  by
   'glue_mixed_index'.

   For a large datum, the index (a byte)  is put at offset 4 (just after the counter) into
   the segment.


   (glue . <bit width>)                    used for small and mixed types
   (store_0 . ?)                           used for mixed and large types
   (store_1 . ?)
   (store_2 . ?)
   (store_4 . ?)

   These  instructions  are  used  for   putting  the  components  into  the  datum  under
   construction (of some defined type).

   The instruction '(glue . <bit width>) ORes the  register R with a left shift of the top
   of stack. The number of bits of the shift is given by the operand.

   '(store_? . <offset>)' stores a  word of 0, 1, 2 or 4 bytes at  the offset given by the
   operand within the segment pointed to by R.


   (index_direct . <bit width>)
   index_indirect

   These instructions  are used  by conditionals.   A datum of  a defined  type is  in the
   register  'R', and  they put  its index  (alternative number)  into the  index register
   'I'. The first one is used for small and mixed data, the second one for large data.


   (switch <addr> ... <addr>)

   This  instruction check  the  content  of the  index  register 'I',  and  jumps to  the
   corresponding address. It is used by conditionals for jumping at the right case.


   (unglue <bit width> . <right shift>)
   (unstore <offset> . <n>)      where <n> is a Lisp-like integer equal to 0, 1, 2 or 4
   (unstore_copy . <offset>)
   (unstore_copy_mixed <offset> . <mask>)
   (unstore_copy_ptr . <offset>)
   (unstore_copy_function . <offset>)

   These instructions are used by conditionals for pushing the values of resurgent symbols
   onto the stack.

   '(unglue <bit width> . <right shift>)' is  used for extracting a component from a small
   datum. The datum is pushed into the stack.

   'unstore' instructions are similar but for indirect  data.  Some of them need to make a
   virtual copy of the datum to be extracted.


   (jmp . <addr>)

   This simple 'jump' instructionn is used at the end of cases in conditionals for jumping
   to the next code (which is common to all cases).


   (apply . k)
   (ret . k)

   When '(apply . k)' is on the point  to be executed, the stack contains (starting at top
   of stack) the arguments of a function  and a return address. '(apply . k)' The function
   to be 'applied' to these arguments is in 'R'.

   Thisn instruction acts  in two different ways depending on the  nature of the function,
   which  may be  either 'top  level' or  a  'closure'. See  the section  on closures  and
   functions for more explanations.

   '(ret . k)'  is just an ordinary  return instruction (except that is  stops the virtual
   machine if the return address is zero).

   'k' is the number of arguments of the function.


   (load_Int32 . <Cint>)
   (load_float <mantissa> . <exponent>)
   (string . <string index>)

   These instructions are used for loading constants into 'R'.


   (check_stack . <n>)

   This instruction checks if the stack is  big enough for pushing '<n>' 32 bits words. If
   it is not, it puts the virtual  machine in the state 'needs_bigger_stack' and gives up,
   so that the scheduler may enlarge the stack of this machine.


   (push_addr . <addr>)

   This instruction pushes a return address on top of stack.


   (address . <addr>)

   The same but the address is put into 'R'.


   push

   This instruction pushes the content of 'R' onto the stack.


   (select_index_direct <bit width> <index> . <addr>)
   (select_index_indirect <index> . <addr>)

   These instruction are generated for selective conditionals.


   (peek <name> . <depth>)
   (micro_peek <name> <depth> . <micro depth>)

   The instruction 'peek' takes  a 32 bits word at depth '<depth>'  into the stack and put
   it  into 'R'.  The  datum is  not virtually  copied.  The compiler  generates a  'copy'
   instruction to follow this one, if needed, depending on the type of the datum.

   'micro_peek' is  similar except that  the datum to  be put into  'R' is located  into a
   micro context. The micro  context is at depth <depth> into the  stack, and the datum is
   at position <micro depth> into the micro context.


   protect
   (unprotect . <addr>)


   unlock
   lock


   odd_align

   This instruction is put in front of the code of a top level function if this code would
   be at an even address.


   pop1
   pop2
   pop3

   These instructions are just removing 1, 2 or 3 32 bits words from the top of stack.

   invalid

   This instruction  should never be  executed. It is  generated in the  garbage collector
   code for small alternatives in mixed  types. This means that the garbage collector code
   must never be called for a datum  belonging to small alternative of a mixed type. Other
   instructions with a <mask> operand must use this mask to avoid such a call.

   give_up

   This  instructions make  the  virtual machine  give  up as  if its  slice  of time  was
   finished.


   start_debug_avm
   stop_debug_avm

   These instructions  set and unset  a flag.  When this flag  is set the  virtual machine
   prints all instructions executed with other informations.


   finish

   (program . <instructions>)

   do_alert


            *** (2.3.5.3) Garbage collector instructions.

   The  null 32  bits word  '0' is  always a  valid datum  (of any  type) for  the garbage
   collector.   Of course,  this  datum doesn't  need  to be  collected,  but the  garbage
   collector may be passed such a datum.


   Allocation of memory:

      (alloc . <number of bytes>)

   The pointer to  the resulting segment is put  in R. The counter (first 32  bits word of
   the segment) contains always 1 after the allocation.


   Making virtual copies. The following instructions are used for making a virtual copy of
   a  maybe indirect  datum.   This  amounts to  increment  the counter  which  is at  the
   beginning of the  segment.  However, these instructions are generated  on behalf of the
   type of the datum only, and this is not enough to know if the counter exists (some data
   are direct).  Hence,  the existence of several variants  for these instructions.  Also,
   these instructions always check  if the counter is zero.  If it  is the case, the datum
   is permanent, and the counter is not incremented.

   Copying virtually within the stack. The datum is at position <depth> in the stack (0 is
   the top of stack).

      (copy_stack_ptr . <depth>)

   The  counter always  exists (except  if the  datum is  the null  word). The  counter is
   incremented if it is non zero.

      (copy_stack_function . <depth>)

   The datum reprensents a function, i.e. the address  of some piece of code if it is odd,
   and a pointer to a closure if it is event. The counter exists only in the second case.

      (copy_stack_mixed <depth> . <mask>)

   The <mask> has at  most 4 bits. Each bit indicates if  the corresponding alternative is
   large or small. The counter exists only if the alternative is large.

   Virtually copying in R. This is the same except that the datum to be copied is in R.

      copy_ptr
      copy_function
      (copy_mixed . <mask>)

   The next one should become obsolete:

      copy

   It's  almost the  same  as  'copy_ptr' except  that  it does  not  check for  permanent
   data. Should disappear and be replaced by 'copy_ptr'.


   Virtual deletion within the stack.

      (collapse . <depth>)

   This instruction  deletes the datum  at position <depth>  into the stack. The  datum is
   direct, so that the instruction has just to  move the <depth> words at top of stack one
   slot downwards,  and adjust the  stack pointer.  For  example, (collapse . 3)  works as
   follows:

          the stack before                          the stack after
          a b c d e f g ....                          a b c e f g ...

   ('d' has disappeared)

   The next  instructions delete virtually a datum  which is in R.  These instructions are
   generated only for thoses types who have indirect data.

      del_ptr

   'del_ptr' expects a pointer in R to a segment not containing any indirect data. In this
   case, the  deletion amounts to decrement  the counter pointed  to by R. However,  R may
   contain 0 (in this  case the instruction does nothing), or the  counter may be zero (in
   this case, the datum is permanent  and the instruction does nothing). Otherwise, if the
   counter becomes zero, the segment is freed.

      del_conn

   This  is  similar  to  'del_ptr'  except  that the  segment  pointed  to  represents  a
   'connection' (file or network). The instruction  works as 'del_ptr', except that if the
   counter becomes zero, it also closes the connection.

      del_function

   This is  similar to 'del_ptr', except that  functions are either addresses  of code (if
   odd) or  pointers to closures (if  even).  In the first  case nothing is  done.  In the
   second  case, 'del_function'  works  like 'del_ptr',  except  that the  address of  the
   deletion  code is  found within  the closure  itself instead  of as  an operand  of the
   instruction. This  is because, the  types of  the data in  the micro context  cannot be
   known from the type of the function.

      (del . <addr>)

   '(del  . <addr>)' expects  a datum  belonging to  a large  type in  R. '<addr>'  is the
   address of  the deletion code for  this type. The instruction  checks if R  is zero. In
   this case it does  nothing.  Otherwise, R contain a pointer to  a valid segment. If the
   counter is zero (permanent datum)  the instruction does nothing. Otherwise, the counter
   is decremented.  If the counter becomes zero, a return address and the datum (in R) are
   pushed onto the stack, and the deletion code at '<addr>' is jumped to.

      (del_mixed <mask> . <addr>)

   This is  the same as  '(del . <addr>)' except  that the datum  in R belongs to  a mixed
   type. The mask is  used to check if the datum is direct or  indirect.  If it is direct,
   nothing is done. If it is indirect, the same as for '(del . <addr>)' is performed.

      (del_struct_ptr . <id>)

   This instruction is  the same one but for C  structure encapsulation pointers. Somewhat
   curiously, it is coded in the compiler ('compile.c'), but has never been generated (the
   proof:  it is  not  coded in  'vminstr.c').   This is  probably due  to  the fact  that
   encapsulated C structures are  manipulated only from within 'predefined.anubis'. Hence,
   this is not a big problem,  because only a modification of 'predefined.anubis' can lead
   to generate this instruction.

      (del_mvar . <addr>)

   This  instruction  is  even more  phantomatic  than  the  previous  one. This  is  more
   problematic,  because multiple  variable  may  be manipulated  by  anyone. Hence,  this
   instruction should be coded as soon as possible.

   Note:  We should  also have  a '(del_var  .  <addr>)'.   The type  'Var' is  very badly
   implemented as  a large type (one of  my worst ideas). Everything  concerning this type
   should be  examined and corrected. I suspect  that this creates at  least memory leaks,
   but maybe also more serious problems.

      (del_stack_ptr . <depth>)
      (del_stack_conn . <depth>)
      (del_stack_function . <depth>)
      (del_stack <depth> . <addr>)
      (del_stack_mixed <depth> <mask> . <addr>)
      (del_stack_struct_ptr <id> . <depth>)
      (del_stack_var <depth> . <addr>)           (phantom)
      (del_stack_mvar <depth> . <addr>)

   The above instructions are  the same as the previous ones, except  that the datum to be
   virtually deleted is in the stack at depth <depth> instead of being in R.

      mvar_slots_del_ptr
      mvar_slots_del_conn
      mvar_slots_del_function
      (mvar_slots_del . <addr>)
      (mvar_slots_del_mixed <mask> . <addr>)
      (mvar_slots_del_struct_ptr . <id>)
      (mvar_slots_del_var . <addr>)
      (mvar_slots_del_mvar . <addr>)

   Again,  these instructions  are  the same  as above  except  that they  have to  delete
   virtually the data in all the slots of the multiple variable.


      indirect_del_ptr
      indirect_del_conn
      indirect_del_function
      (indirect_del . <addr>)
      (indirect_del_var . <addr>)               (phantom)
      (indirect_del_mvar . <addr>)
      (indirect_del_mixed <mask> . <addr>)
      (indirect_del_struct_ptr . <id>)

   Again,  these instructions  are working  like the  above except  that the  datum  to be
   virtually deleted is pointed to by the top of stack.

      (increment_del . <byte width>)

   This instruction is used  when we are really deleting an indirect  datum.  A pointer to
   some position within the segment to be  freed is on top of stack. This instruction just
   increments this pointer of the number of bytes indicated by its operand.

      del_index_direct

   This instruction expects a datum of a mixed  type on top of stack, belonging to a large
   alternative. This datum is a 32 bits word which may be decomposed as follows:

         pppppppp pppppppp pppppppp ppppppii

   i.e into a field of 30 bits and a field of 2 bits. The two bits 'ii' give the number of
   the alternative the datum belongs to, and

         pppppppp pppppppp pppppppp pppppp00

   is the pointer to the segment. The role of the instruction is:

       - put 'ii' into the index register I
       - replace 'ii' by '00' on top of stack
       - duplicate the top of stack.

   The reason why  we duplicate the top of  stack is that one pointer  will be incremented
   for walking into the segment, and the other one will be used for freeing the segment.

      del_index_indirect

   This  instruction is similar  to 'del_index_direct',  except that  it concerns  a large
   datum.  The  index is got from  the data segment,  not from the manipulation  word. The
   pointer is also duplicated on top of stack.

      free_seg_0
      free_seg_1

   These two  instructions are just calling  the 'FreeDataSegment' function  of the memory
   allocator. The first  one transmits the word at  *(SP-1) (top of stack: depth  0 in the
   stack), whilst the second one transmits the word at *(SP-2) (depth 1 in the stack).


            *** (2.3.5.4) Closure instructions.

   These instructions are used for constructing new closures.


   (put_copy_direct <depth> . <position>)
   (put_copy_indirect <depth> . <position>)
   (put_copy_function <depth> . <position>)
   (put_copy_mixed <mask> <depth> . <position>)

   A memory  segment (the closure under construction)  is pointed to by  'R'.  The <depth>
   operand is  the stack  depth of the  datum to  be put.  The  <position> operand  is the
   position where the  datum must be put in  the micro context of the  closure. The offset
   (in bytes)  within the closure  where the datum  must be put is  4*(3+<position>), i.e.
   <position> is 0 for the first datum of the micro context, 1 for the next one, etc...

   For indirect data, the counter is incremented, because this is a duplication.


   (put_micro_copy_direct <depth> <micro depth> . <position>)
   (put_micro_copy_indirect <depth> <micro depth> . <position>)
   (put_micro_copy_function <depth> <micro depth> . <positions>)
   (put_micro_copy_mixed <mask> <depth> <micro depth> . <position>)

   Same as above, except that the datum  is located into another micro context which is at
   depth <depth> in the stack. The datum itself is at position <micro depth> in this other
   micro context. Again <micro depth> is 0 for the first datum in the micro context, 1 for
   the next one, etc...


   (put_closure_labels <code> . <del code>)

   This instruction puts its two operands at  offsets 4 and 8 into the closure (pointed to
   by 'R').

   Below is an example  of how a closure is constructed. First of  all, here is the source
   text (copy-pasted from the sources of our web site):

           (DBRow(MTxt) dbrt) |-> if dbrt is dbrow(_,_,t) then
                                  if t is mtxt(_,v,time) then
                                  success(mtxt(new_name,v,time))

   and below is  the code generated for the construction of  the closure (copy-pasted from
   the .sc file):


         |       ; |-> in '/home/alp/www_dev/anubis-language/_1_6/translate_item.anubis' at line 451
         |       ; function code at label 31092
  146195 |    alloc 16
         |       ;--- stack ------------------------
         |       ;   0                            (return address)
         |       ;   1     i                      LangInfo
         |       ;   2.0   tdb                    DB(TDB)
         |       ;   2.1   old_name               String
         |       ;   2.2   new_name               String
         |       ;   3                            (return address)
         |       ;----------------------------------
  146197 |    put_micro_copy_indirect 2 2 0
  146201 |    put_closure_labels 31092 31093

   'alloc 16' allocates 4  32 bits words for the closure: counter,  code, del and just one
   datum in the micro context (the value of 'new_name').

   Remark that 'new_name'  is the sole local  symbol occuring free in the  above term, and
   that its value  is within an older micro  context located at depth 2 in  the stack. The
   value of  'new_name' is at position  2 in this older  micro context, so  that the right
   instruction for constructing the new micro context is:

  146197 |    put_micro_copy_indirect 2 2 0

   which means: 'take the datum at position 2  in the micro context at depth 2, and put it
   at position 0 in the new micro context' (of course, count a virtual copy).

   ('indirect' because the type is 'String').  The last instruction:

  146201 |    put_closure_labels 31092 31093

   puts the address (label  31092) of the code for executing the  function and the address
   (label 31093) of the garbage collector code for deleting the closure itself.


            *** (2.3.5.5) Equality instructions.

   In the next version of Anubis, equality  code should be generated as generic code (like
   for implicit  destructors), so  that all the  instructions below will  become obsolete.
   Furthemore,  equality (with  Boolean value)  should be  generated only  for 'separable'
   types,  i.e.  type  for  which this  boolean  equality, in  the  mathematical sens,  is
   computable. From a  Topos theoretic view point, this means that  it should be generated
   only for types 'T' such that the internal equality '(T,T) -> Omega' factors through the
   canonical arrow 'Bool -> Omega' (sending  'true' onto 'true' and 'false' onto 'false').
   However, I don't know if there is an algorithm able to decide for this property. I know
   only a  recursive class of  types having this  property.  Of course,  internal equality
   exists for all (abstract) types and may be defined by Leibnitz rule.

   In Anubis 1, equality code is generated  for all types (except 'Float', but this may be
   the result  of some  misunderstanding). For concrete  types (address types  and several
   primitive types, structure  pointer types) equality is physical  (compare the addresses
   of the data).  For functional types it  is also physical, despite the fact that in some
   cases, we  could be  more clever (when  we know that  the product  of the types  of the
   arguments  is finite).  Theoretically, boolean  equality  should not  be generated  for
   functional types in most cases.

   So, we  have something  special to generate  only for  defined types. The  principle of
   comparison is as follows for 'x' and 'y' to be equal:

     1. 'x' and 'y' must belong to the same alternative,
     2. if so, the components must be equal two by two.

   Of  course, the second  condition shows  that this  is a  recursive process.  Hence, an
   'equality  code'  is   generated  for  each  defined  type.   This  code  uses  special
   instructions, which is quite stupid,  because ordinary instructions would have done the
   job (maybe somewhat more slowly).


      (false_jmp . <addr>)
      (true_jmp . <addr>)

   Put either 'false' (i.e. 0) or 'true' (i.e. 1) in 'R' and jump (unconditionally) to the
   operand address.


      (jmp_false . <addr>)

   Jump to address if the content of 'R' is 'false'.


      (jmp_eq_stack . <addr>)

   This instruction compares the two words on top of stack, puts 1 (true) in R if they are
   equal and performs a jump to 'addr'.  Otherwise the next instruction is executed, and R
   remains empty.


      (jmp_neq_indexes_large . <addr>)
      (jmp_neq_indexes_mixed <bit width> <mask> . <addr>)


      (jmp_neq_string . <addr>)
      (jmp_neq_byte_array . <addr>)
      (jmp_neq 0 . <addr>)
      (jmp_neq 1 . <addr>)
      (jmp_neq 2 . <addr>)
      (jmp_neq 4 . <addr>)


      (eq . <depth>)

   This instruction test the equality of the two words at depths '<depth>' and '<depth>+1'
   in the stack.   The register R is set to  1 (true) if they are equal,  and to 0 (false)
   otherwise.


      eq_string
      eq_byte_array

   Tests the equality of the two strings or  byte arrays on top of stack (depths 0 and 1),
   and  puts either 'true'  (if equal)  or 'false'  (if non  equal) in  R. Of  course, the
   strings or byte arrays are walked into.


      push_eq_data


      (increment_eq . <byte width>)


            *** (2.3.5.6) Dynamic variables instructions.

      (get_var_handler . <addr>)
      (get_mvar_handler . <addr>)
      get_var_monitors
      get_mvar_monitors
      push_mvar_length
      remove_monitor
      ret_if_zero
      get_vv
      xchg_vv
      get_mvv
      xchg_mvv
      create_var
      free_var_seg
      free_mvar_seg


            *** (2.3.5.7) Serialising/unserialising.

      (serialize . <addr>)
      (unserialize . <addr>)


            *** (2.3.5.8) Starting a new virtual machine.

      (start <depth> . <addr>)

   This  instruction is  generated by  the  keyword 'delegate'  and starts  a new  virtual
   machine. '<depth>' is the number of slots on  top of the stack of the old machine which
   should be  copied into  the stack  of the new  machine. No  virtual copies  are needed,
   because they are already performed  (by 'copy_stack_...' instructions). The new machine
   starts execution at the instruction following this one, whilst the old machine jumps to
   the operand address.


            *** (2.3.5.8) Obsolete and soon obsolete instructions.

   The following are still used but should become obsolete (used for global variables):

      (create_vars ...)
      (gv_address ...)
      (init_gv ...)
      (del_gv ...)
      get_gvv
      xchg_gvv
      initialization_address
      variables_deletion_address


   The following have been never used (experimental ?):

      (call ...)
      location
      (code_for ...)
      (type_list ...)
      push_before_eq
      (dec3 . <addr>)


      *** (2.4) Unification.

   Since the Anubis language allows  the use of type parameters (parametric polymorphism),
   we need to be able to unify types.

   'Environments' are lists of the form:

                                  ((v_1 . T_1) ... (v_k . T_k))

   where  each 'v_i' is  an 'unknown',  and where  each 'T_i'  is a  type (which  may also
   contain unknowns). Such an environment just says that:

           v_1 has value T_1
           ...
           v_k has value T_k

   If a  pair is of the  form '(v_i .  v_j)', i.e.  if the  value of 'v_i' is  the unknown
   'v_j', we assume that 'v_i > v_j'  (recall that they are 'Expr's, hence integers). This
   is to avoid some kind of circularity.

   Because  of the Lisp-like  representation, unification  is fairly  simple. It  works as
   follows,  where  'u(A,B)/E' is  the  result  of unifying  'A'  with  'B' relatively  to
   environment 'E'. The result is a new environment. '$1' and '$2' represent two arbitrary
   unknowns.


      u($1,$1)/E   = E
      u($1,$2)/E   = (($1 . $2) . E) (if $1 > $2)
      u($1,$2)/E   = (($2 . $1) . E)     (otherwise)
      u($1,T)/E                  = (($1 . T) . E)
      u(A,A)/E                   = E                   (for any atomic A)
                                     | not_unifiable   if u(A1,B1)/E = not_unifiable
      u((A1 . B1),(A2 . B2))/E   =   |
                                     | u(B1,B2)/(u(A1,A2)/E)  otherwise
      u(A,B)/E                   = not_unifiable    in all other situations


   We have no problem  with the fake 'Expr' <Cint> because we  unify only types, and types
   never contain a <Cint>.

   It may  be the case that  the two expressions to  be unified are  relative to different
   environments. In that  case we just have to merge the  two environments before unifying
   the two expression.

   We also  want to avoid  circularities, and to  that end we  use a non  circularity test
   (like  some Prolog  systems).  The  problem is  that  the above  rules  would give  the
   following for example:

          u($1,Maybe($1))/E = (($1 . Maybe($1)) . E)

   But this is not correct because, a type 'T' cannot be equal to 'Maybe(T)'.  Hence, when
   we have to compute 'u($1,T)/E', we use the function 'depends_on' to test if 'T' depends
   on '$1' directly or indirectly. If it is the case, the unification fails.

   Important note: Unification is used each time  we have to compare types. Indeed, a type
   with unknowns  is a non  already completely known  type. The comparison may  succeed if
   some  unknowns are  instanciated.   This is  why  comparing types  without unifying  is
   generally an error.

   Also,  all the  computations of  unification and  interpretations of  terms  are purely
   deterministic  (no side  effect  at all).  This  warranties that  when  we try  several
   possibilities, there is no clash between them.

   It  is  often  the  case  that  we  want  to  dereference  a  type  relatively  to  its
   environment. This means replacing (recursively) all types unknows by their values (when
   they have one). This is performed by the command 'dereference_type'.


      *** (2.5) Organization of the knowledge base.

   The compiler stores the result of checking the paragraphs into several arrays:

    struct Type_struct        *types;      (array of type definitions)
    struct Operation_struct   *operations; (array of definitions, formerly called 'operations')
    struct Variable_struct    *variables;  (array of global variables (obsolete))

   See the definitions of these structures in 'compil.h'.

   Checked paragraphs  are stored  into these  arrays, which work  like stacks.  The stack
   pointers are called 'next_type' and 'next_operation'.


      *** (2.4) Indexation of symbols.

   In a  previous version of  Anubis, when the  interpretations of a symbol  were searched
   for, the array of types or of  operations were looked up sequentially.  During the year
   2003 (as far  as I can remember), it  appeared that this was too  slow (especially with
   Olivier's  program, containing thousands  of types  and symbols).   I decided  to index
   these arrays  by a hashing method.  The result was  that the compiler became  two times
   faster.

   The 'symbol index' is just an array  of 256 arrays of 'SymbolIndexEntry'. When a symbol
   is  searched for,  its 'hash'  is first  computed.   This hash  is an  U8 (computed  by
   'u8_symbol_hash' in 'interp.c').  With this hash  we may go directly to the right array
   of 'SymbolIndexEntry'.

   This secondary array  is searched for an entry corresponding to  our symbol (each entry
   contains the name  of a symbol). When such an  entry is found, we get  the index of the
   actual definition in the array of types or operations.

   In the future this system should be replaced by a B-tree, indexed on names of symbols.


   *** (3) The lexer.


      *** (3.1) Generalities.

   The role of  the lexer is essentially to  cut the input into 'tokens'.   'lexer.y' is a
   FLEX file,  which must be compiled by  'flex'. This produces the  file 'lex.yy.c' which
   becomes part of the C source.

   However, the lexer performs several other tasks:

     - handling 'read' and 'replaced by' paragraphs, computing visibilities,

     - counting lines and columns, and level of nested comments,

     - working on several special tokens (strings, floats, named arrows,...).


   The lexer has several 'states' (in the sens of FLEX). They are the following:

     - INITIAL (used when reading comments outside paragraphs)
     - COM (used for reading /* ... */ comments)
     - PAR (used for reading paragraphs)
     - INCL (used for reading file names after either 'read' of 'replaced by')
     - STR (used for reading strings)
     - STRTL (used when a string is too long)
     - CONF (used for reading the configuration file)
     - CONFCOM (used for reading comments within the configuration file)

   The names  of all the  tokens returned by  the lexer are  prefixed by 'yy__'.  They are
   declared in 'grammar.y'.


      *** (3.2) Finding source files.

   The compiler must identify files clearly, not only by the name, because different files
   in different directories  may have the same  name. Notice that the path  which is given
   after the keyword 'read'  is not enough for identifying the file.  This is because this
   path is itself relative.

   The compiler  keeps in memory the absolute  path of a 'current'  directory. Notice that
   this current directory  is a notion internal to  Anubis. It has nothing to  do with the
   current  directory in  the  sens of  the  host operating  system.   Anyway, the  Anubis
   compiler never changes the current directory in the sens of the host system.

   When  the compiler  starts, the  'current' directory  is the  directory from  which the
   Anubis compiler has been launched.

   When the  compiler needs to find  the source file  corresponding to a path,  either the
   path given  on the command line,  or a path found  after a keyword  'read' or 'replaced
   by', it looks for the file using the following absolute paths successively:

      <current directory>/<given path>
      <my_anubis directory>/<given path>
      <anubis directory>/<given path>

   except if <given path> is absolute. In this case only this path is used.

   When the  file has been  found, the  directory of this  file becomes the  new 'current'
   directory.


      *** (3.3) The 'read' stack.

   (also called 'include stack' in the source of the compiler).

   Of course,  when a 'read'  or 'replaced by'  keyword is encountered, the  compiler must
   read another file,  but must be able to  come back for reading the rest  of the current
   file. This is why it needs a stack.

   Each 'frame' in this stack contains the following informations:

     - a datum of type YY_BUFFER_STATE, needed by FLEX/BISON for each file,
     - the absolute path of the file currently read
     - the current line number in the file currently read,
     - the absolute path of the 'current' directory for the file currently read.

   Note: at the time  of writting this, this stack is actually  made of 4 distinct stacks,
   one for each item. These stacks are growing in parallel, and hence share a unique stack
   pointer. It would be a good idea to replace this by a single stack whose slots would be
   a C structure.

   When the end of a file is reached, a  frame is poped from the stack, and the reading of
   the 'calling' file may be continued.

   At the time of writting this text, this stack may contain at most 100 frames.


      *** (3.4) Remembering already read files.

   The compiler remembers which files have been read. Indeed, during the same compilation,
   it may encounter several times a request for reading the same file. Of course, the file
   is read only once, and the compiler has just to adjust visibility.

   The  absolute   paths  of   the  files   already  read  are   stored  into   the  array
   'already_included'  (of at most  'max_already_included' =  1024 paths,  at the  time of
   writting this text).


      *** (3.5) Computing visibilities.

   A file B is visible from a file A only if the file A contains one of the paragraphs:

     read B
     replaced by B

   Normally,  a file  containing a  'replaced  by' paragraph  does not  contain any  other
   paragraph (but it  may contain off paragraph  comments). This kind of file  is called a
   'relay' file, and is used when the location of a file of the distribution is changed.

   'read' is not transitive. In other words, if we have

     read B

   in file A, and

     read C

   in file B, this does not make file C visible from file A. However, if we have:

     read B

   in file A and

     replaced by C

   in file B, C is visible from file A.

   The visibility informations  are stored into several arrays. First  of all, notice that
   each file  is identified  by an 'id'  which is the  position of  its path in  the array
   'already_included'. Notice that 'predefined.anubis' (which is 'pre-read') has id 0.

     - the array 'number_of_visible'  gives for each file id, the number  (of type U16) of
     files which are visible from this one.

     - the array  'is_relay_file' is an array of  booleans (actually an array  of U8). For
     each file it indicates if the file is a relay file or not.

     - the  two  dimensional  array  'visibility'.   This  array  provides  some  kind  of
     'visibility stack' for each file id. For example, if the number of files visible from
     the file whose  file id is idA is 3, and  if the ids of these files  are idA, idB and
     idC, we have:

          visibility[idA][0] = idA
          visibility[idA][1] = idB
          visibility[idA][2] = idC

   Notice that since a file is always visible from itself, the first entry into this stack
   always contains the id of the file to which this stack belongs.

   When file  B becomes visible from  file A, ibB is  pushed into the  visibility stack of
   file A  (if not already  present).  Such stacks  are never poped  off, because if  B is
   visible from A, it remains visible  indefinitely. Notice that such stacks never need to
   be   enlarged,   because  the   number   of  files   read   cannot   be  greater   than
   'max_already_included', which is the size of these stacks.

   So, the question is just to know when a file becomes visible from another one. First of
   all, each file must be visible from itself. Hence this is registered the first time the
   file is  encountered. As a  particular case, 'predefined.anubis' is  encountered before
   any other one. Furthermore, it must be visible from all other files.

   When a relay  file becomes visible from  some file, it must transmit  its visibility to
   the file it is pointing to.


   *** (4) The parser.

   The parser  (grammar.y) is a  BISON file. It  provides the function 'yy_parse'  used in
   'main.cpp'. It contains essentially the following parts:

     - Declaration of tokens and non terminals with their types.
     - Precedence and association rules.
     - Grammar rules.

   The  rules of precedence  and association  should not  be changed  (and their  order is
   important) because  if they are changed,  files which are currently  compiling ok could
   have syntax errors.

   Each  grammar  rule has  an  'action'  part (between  {  and  })  which constructs  the
   syntactical tree  corresponding to the rule.  These actions are the  official source of
   informations for the Lisp-like format of terms.


   *** (5) Computing interpretations of terms.


      *** (5.1) Overview.

   The Anubis language  allows overloading of symbols and  type parameters. These features
   introduce many ambiguities which must be resolved.

   A  term may  have several  distinct interpretations.  For example,  when you  write the
   number:

                                          2

   it has  at least 3 interpretations, one  as an 'Int8', one  as an 'Int32' and  one as a
   'Float'.  Similarly,  a  symbol  or  a  binary operator  like  '+',  may  have  several
   definitions, hence several interpretations.

   When atomic terms (like numbers and symbols) are combined together for forming compound
   terms,  like applicative  terms, conditionals,...   the compound  term may  itself have
   several  interpretation depending  on which  interpretation  we choose  for the  atomic
   subterms of this term.

   Hence, what we need is a (necessarily recursive) function for computing the list of all
   interpretations  of a  term. This  function  is called  'term_interpretations' and  has
   several auxiliary functions. All these functions are defined in 'interp.c'.

   Also terms must  be interpreted relative to  a 'context'. A context is  normally just a
   set  of declarations.   But in  the case  of  the Anubis  compiler, a  context is  more
   precisely an accurate representation of what will be the top of stack at run time. Such
   a context  is represented  as a list,  with the  head of list  being the  most recently
   pushed item.

   For example, if we want to interpret a term like this one in some context C:

                                     with x = a, t

   we must  first get  the list of  all interpretations  of 'a' relative  to C.   For each
   interpretation of 'a', we  may compute the list of interpretations of  't', but it must
   be relative to '((x . T) . C)',  where 'T' is the type found for this interpretation of
   'a', because after  'a' is computed (at run  time) its value is pushed  into the stack.
   (Actually, in  the case of this  example, the compiler  requires that 'a' has  only one
   interpretation. This  is for making the source  files more explicit and  easier to read
   for humans.)


      *** (5.2) Interpreting applicative terms.

   An applicative term  is just made of  a function applied to arguments.  For example, it
   may be:

                                         f(a,b,c)

   (or 'a + b'). Such a term is represented internally as:

                               (app <lc> [f] . ([a] [b] [c]))

   where <lc> is a Lisp integer ('Expr')  from which the name of the source file, together
   with the line and the column of the term may be recovered.

   In order to interpret this term, the compiler must first compute all interpretations of
   the  function  itself.  It  also  computes  all the  interpretations  of  the tuple  of
   arguments. This may lead to a very big number of interpretations.  For example, if 'a',
   'b'  and   'c'  have  respectively  2,   4  and  7  interpretations,   this  yields  56
   interpretations for  the tuple '(a,b,c)'. If  the number of interpretations  of a tuple
   exceeds say  one million, the compiler  may become very  slow. The remedy for  the time
   being (comming from  the user) is to  type explicitly each argument in  order to reduce
   this number of interpretations.

   Now, for each  interpretation of the function and, each interpretation  of the tuple of
   arguments, we must unify the signature of the function (type of the tuple of arguments)
   with  the type  of the  actual tuple  of arguments.   If this  unification  fails, this
   combination is  impossible. This  is how the  number of  interpretations may stay  at a
   reasonable level.

   I've tried  to proceed another way. Since  for each interpretation of  the function, we
   get types for  the arguments, I've tried to recompute the  interpretations of the typed
   arguments in each case. But experimentations showed that this is even slower.

   Added 2006/12/10:  Finally, I found  the good  solution. It is  so simple that  I don't
   understand why I didn't see it at once.  Actually, I know why. This is because from the
   very beginning I  wrote a function for computing the interpretations  of a tuple (which
   is used for applicative terms and for bodies of conditionals), and I never had the idea
   to question  the necessity of this  function. It is  actually not required, and  is the
   cause of the problem. Here is the solution:

   Instead of  computing the list  of interpretations of  the tuple of arguments,  we just
   compute  the lists of  interpretations of  each argument.   For example,  if we  have 3
   arguments having  respectively 2, 4 and 7  interpretations, this makes only  2+4+7 = 13
   interpretations (dispatched  into 3 lists) instead  of 56 interpretations  (in a single
   list). Now  for each  interpretation of the  function we  select for each  argument the
   compatible  interpretations.  This  will lead  to  a  list  of interpretations  of  the
   applicative term for each interpretation of the function.  In any case, we will have to
   handle much less interpretations than with the current algorithm.

   We could also enhance the error message in case no combination enables to interpret the
   applicative term.  The current  error message (number  'E021') is very  troublesome for
   most users. We could better do as follows:

     - 1. Print one section for each interpretation of the function.
     - 2. In each section print informations only on incompatible arguments, not on all
            arguments.


      *** (5.3) Interpreting conditionals.

   The test is first interpreted. The compiler requires (again for making the source files
   more  explicit and  easier to  read for  humans) that  the test  has one  and  only one
   interpretation. For each interpretation of the test,  the type of the test is known. Of
   course  all   interpretations  whose  type  is   not  a  type   with  alternatives  are
   rejected. However,  components in this type  may still be unknown,  the essential being
   that alternatives are known.

   Next, the compiler  analyses the cases. We  must have one case per  alternative, and in
   the same order.

   For each alternative, the compiler checks that the  name of the case is the same as the
   name of the  alternative, and that the number  of resurgent symbols is the  same as the
   number of  components. If a resurgent symbol  is explicitly typed, its  type is unified
   with  the type  of the  corresponding component.  If this  unification fails,  an error
   message is sent.

   For each case, the  compiler constructs a new context for interpreting  the body of the
   case. Indeed,  resurgent symbols are declarations  which enrich the  context.  This may
   yield several interpretations  for each body of case. The  compiler constructs the list
   of all  interpretations of the tuple of  bodies. Again (as for  applicative terms) this
   may lead to a big number of  interpretations and slow down the compiler or even exhaust
   the memory.

   For  each such  interpretation of  the tuple  of bodies,  the types  of the  bodies are
   unified. This  is because  in a  conditional, the bodies  must all  have the  same type
   (which will be the  type of the conditional). When the unification  fails, the tuple is
   rejected. This leads in general to  a reasonable number of interpretations of the tuple
   of bodies.

   Added 2006/12/10: As for applicative terms,  this method may be enhanced. First of all,
   we may compute  the lists of interpretations of all bodies,  but without constructing a
   list of interpretations of the tuple of bodies. Then we just have to find all tuples of
   interpretations of the bodies  within which all bodies have the same  type. This may be
   done by induction on the number of bodies.  For 1 body this is obvious. For n bodies (n
   > 1), begin (induction  hypothesis) by computing all compatible  interpretations of n-1
   bodies (say  all but the  first one).  Then for each  interpretation of the  first body
   select only the tuples of interpretations of  the others whose type unify with the type
   of the first one.

   Even better for performances: Order the lists of interpretations of bodies according to
   their lengths,  the longuest  one first, and  apply the  above algorithm. That  way the
   first unifications of  types of bodies will be performed on  bodies having the smallest
   number  of interpretations,  leading  more quickly  to the  type  to be  found for  all
   bodies. In particular, if just one body  is explicitly typed, and if this leads to only
   one interpretation of that body, the type of  all bodies will be known at once, and the
   computation will be much faster.


      *** (5.4) Interpreting selective conditionals.

   The selected  alternative is found  using the name  of the head  of case. If  there are
   several  alternatives with  this  name, the  compiler  checks the  number of  resurgent
   symbols.   If there are  several alternatives  with the  same name  and same  number of
   resurgent symbols,  the compiler  uses the declarations  (if any)  of the types  of the
   resurgent  symbols. These  types  are unified  with  the actual  types  taken from  the
   definition of the type of the test.

   Of course, if  there is an 'else' case, the  type of this case must be  the same as the
   type of the body of the selected case. This is checked by unification.


   *** (6) Implementation of types.

   Notice that we implement only 'actual types', not type schemes (for the time being; see
   the discussion  on quantified  types). An  actual type is  a type  with no  unknown nor
   parameter.

   Types in Anubis 1 are of the following sorts:

     - primitive types: String, Int32, Listener, ...

     - defined types: types which are defined by a paragraph, even if this paragraph is in
       predef.c or predefined.anubis.

     - functional types: (T_1,...,T_k) -> U

     - address types: RAddr(T), WAddr(T), RWAddr(T), GAddr(T) (obsolete), Var(T), MVar(T).

     - structure  pointer types:  These  types  are used  for  encapsulating C  structures
       comming from linked libraries (like SSL or SQLite3).


      *** (6.1) Implementation of primitive types.

   There is a particular implementation for each primitive type.


         *** (6.1.1) 'String'.

   A string in Anubis 1 is implemented as a pointer:

      *
      |
      V
      +-------+-------------------------------+---+
      |  cnt  | a   b   c   d  ....           | 0 |
      +-------+-------------------------------+---+
      0       4

   The first 4 bytes are the counter  for the garbage collector. Notice that strings which
   are defined statically  in the source files, have  a 0 at this place,  meaning that the
   string is permanent (never collected).

   Starting at offset 4,  are the characters of the string.  A trailing  0 is added at the
   end, like in C.


         *** (6.1.2) 'ByteArray'.

   A byte array is a pointer as follows:

      *
      |
      V
      +-------+-------+-------------------------------+
      |  cnt  |   n   |  a   b   c   d  ....          |
      +-------+-------+-------------------------------+
      0       4       8

   where again 'cnt'  is the counter, and 'n'  the number of bytes in the  byte array (not
   including  'cnt' and  'n'). The  bytes are  stored starting  at offset  8. There  is no
   trailing 0.


         *** (6.1.3) 'Int32'.

   An 'Int32' is just a 'int' of C (of  32 bits of course), i.e. a 32 bits word containing
   the number itself.


         *** (6.1.4) 'Float'.

   Because data in Anubis 1 cannot have more than 32 bits, double precision floating point
   numbers are implemented indirectly (which is very bad for performances):

      *
      |
      V
      +--------+-----------------+
      |  cnt   | IEEE754 number  |
      +--------+-----------------+
      0        4                 12

   where again 'cnt' is the counter for the garbage collector.


         *** (6.1.5) 'Listener'.

   Despite the  fact that  'Listener' is a  primitive type,  it is implemented  exactly as
   network address types. See below.


      *** (6.2) Implementation of defined types.

   Defined types  (types with alternatives, defined  by paragraphs beginning  by 'type' or
   'public type') are  implemented according to 3 different  methods: 'small', 'mixed' and
   'large'.

   First of  all, the  width of  the type  is computed. The  width is  the number  of bits
   required for accomodating all data of this type. It is computed as follows.

   Some kind of base  2 logarithm of the number of alternatives  is computed.  This is the
   smallest number of bits required  for accomodating all alternative numbers (also called
   'indexes'). It is:

      0         for at most 1 alternative       (example: 'Empty' or 'One')
      1         for 2 alternatives              (example: 'Bit' or 'Bool')
      2         for 3 to 4 alternatives
      3         for 5 to 8 alternatives
      4         for 9 to 16 alternatives
      etc...

   Since the maximal number of alternatives is limited to 256, the width of an index field
   is at most 8.

   This logarithm of the number of alternatives is called the 'index width' for this type.

   Now, each alternative has its own width. It  is the sum of the index width of the type,
   and  the  widths  of  all  components  of  the  alternative.   For  example,  the  type
   'Maybe(Int8)' has 2 alternatives:

        failure
        success(Int8)

   the index width is 1 (2 alternatives).  The  width of the first alternative is 1 (= 1 +
   0; no  component), and  the width of  the second alternative  is 9  (= 1 +  8), because
   'Int8' is of width 8 (as expected).

   Now, the  width of the  type is the  supremum of the width  of its alternatives  (0 for
   empty types). So, it is 9 for 'Maybe(Int8)' for example.

   These computations are performed in 'typewidth.c'.

   Notice that  this width is  not well defined  in the case of  a recursive type.   For a
   recursive type (or a cross recursive type), the width is infinite (which corresponds to
   the fact that the type itself is infinite (if it is not empty)).  This is why we do not
   compute the width of recursive types.  Hence, we need a tool for detecting if a type is
   recursive  (maybe  cross recursive  through  other types).   This  tool  is defined  in
   'rectype.c'.  Notice  that a recursive  type which is  empty (this may happen:  see the
   discussion on  'NonEmptyList' in 'predefined.anubis') is still  considered as recursive
   by Anubis 1, and its width is still infinite.

   The width of a type is used to  determine the number of bits needed for holding a datum
   of this type.  Above a certain limit,  and in particular for recursive  types, the type
   must be implemented indirectly, i.e. through a pointer to a segment.

   The width discussed above will be called the 'theoretical width'.

   Hence, we have another notion: that of 'real width'. The real width, for Anubis 1 is an
   integer between 0 and 33 included.  The value 33 means that the theoretical width is at
   least 33, and that  the type will be implemented indirectly.  Values  from 0 to 32 mean
   that the type will be implemented directly within that number of bits.

   Of course, allowing 64  bits or 128 bits direct data in  the future, means updating all
   these programs.

   If the real width of a type is at most 32, the type is 'small'. It cannot be recursive,
   and all its components, and the components of its components, and so on, are small. Any
   datum of this type may be represented on 'width' bits.

   If the real width is 33, there are two cases:

     - the number of alternatives is at most 4: the type is 'mixed'.
     - the number of alternatives is 5 or more: the type is 'large'.

   If  a type is  large, all  data of  this type  are represented  indirectly, i.e.   as a
   pointer to a memory segment:

      *
      |
      V
      +----------+---+--------------+--------+---------------+
      |   cnt    | i |     c1       |   c2   |      c3       |
      +----------+---+--------------+--------+---------------+
      0          4   5

   The first  4 bytes of  the segment are  always occupied by  the counter ('cnt')  of the
   garbage  collector.  The next  byte (at  offset 4)  is the  alternative index  for this
   datum. It indicates  to which alternative the datum belongs. Starting  at offset 5, are
   the representations of the components of the datum.

   Since all pointers returned  by the memory allocator are multiples of  4 (the two least
   significant bits  are always  0), these two  bits are  available for any  purpose. I've
   decided to use them  for storing the alternative index (this is  why this method cannot
   be applied if there are more than 4 alternatives). This has the advantage of avoiding a
   pointer to  a segment for  some alteratives. For  example, the type 'List(T)'  (for any
   type 'T') is  mixed. The first alternative  has width 1, the second  alternative has an
   infinite width. Hence, it is tempting to  avoid the use of an indirection for the first
   alternative. This is what the 'mixed' implementation realizes.

   Hence, the empty list (for any type 'T') is  just the 32 bits word 0, while a non empty
   list looks like this:

        pppppppp pppppppp pppppppp pppppp01

   where  the  least significant  two  bits (01)  indicate  alternative  number 1  (second
   alternative), and where:

        pppppppp pppppppp pppppppp pppppp00

   is the pointer:

      *
      |
      V
      +----------+-------+---------------+
      |   cnt    | head  |     tail      |
      +----------+-------+---------------+
      0          4

   Notice that  there is no  alternative number at  offset 4 (since  it is already  in the
   least significant bits of the datum itself), and that components are stored starting at
   offset 4. 'tail' always occupies 4 bytes, but 'head' may occupy 0, 1, 2 or 4 bytes.


      *** (6.3) Implementation of functional types.

   Functions are implemented in two different ways depending on how they are defined.

     1. If the function is  defined by a paragraph, it is just  represented by the address
        of its code.

     2. If the function is defined by the arrow '|->' (or by the recursive arrow '|-f->'),
        it is represented as a 'closure'. A closure is a pointer to a segment like this:

      *
      |
      V
      +----------+----------+----------+----------+----------+----------+...
      |   cnt    |  code    |   del    |   x1     |   x2     |   x3     |
      +----------+----------+----------+----------+----------+----------+... etc
      0          4          8          12         16         20         24

   where 'cnt' is  the counter of the garbage  collector as usual. All slots  in a closure
   occupy 4 bytes. The  slot 'code' is the address of the code  of the function. The slots
   'x1', 'x2', etc... contain the values of the symbols which where present in the context
   at the time the function (i.e. the  closure) was created, and which are refered to from
   within the  body of the function  definition.  This part  of the closure is  called the
   'micro context'. The slot 'del' contains the address of the garbage collector code able
   to properly destroy the closure itself.

   When we have a fonction at hand (i.e. a 32 bits word representing a datum whose type is
   functional), we must be able to decide  if the function is implemented as a simple code
   address or as  a closure. Again we take  advantage of the fact that  pointers to memory
   segments are divisible  by 4. If the pointer  is even (least significant bit  = 0), the
   function is  a closure. If it is  odd (least significant bit  = 1) it is  a simple code
   address. For this reason  the code of a function defined at  top level (by a paragraph)
   is always  positioned at an odd  address.  This is the  reason of the  existence of the
   instruction 'odd_align' (which is of course of size 1 and never executed).


      *** (6.4) Implementation of address types.

   There are two sorts of address types:

      1. File and network address types: 'RAddr(T)', 'WAddr(T)', 'RWAddr(T)', 'Listener'
      2. Dynamic variable address types: 'Var(T)', 'MVar(T)'.

   We ignore GAddr(T), which is obsolete.


         *** (6.4.1) File and network address types.

   A datum of such a type is a pointer:

      *
      |
      V
      +----------+---+---+---+---+---------------------------------+
      |   cnt    | t | m | _ | _ |           12 bytes              |
      +----------+---+---+---+---+---------------------------------+
      0          4   5   6   7   8                                 20

   As usual the first 4 bytes are for  the counter of the garbage collector. The next four
   bytes are:

       t      type indicator
       m      mode indicator
       _      not used
       _      not used

   The next  12 bytes starting  at offset 8  are used depending on  the value of  the type
   indicator.

   The type indicator may take the following values ('conn' is for 'connection', a synonym
   (!?)  of 'address'):

      conn_invalid             invalid connection (should never happen)
      conn_file                file on disk
      conn_listener            listening TCP/IP connection (server: type 'Listener')
      conn_network             client TCP/IP connection
      conn_ssl_network         client SSL connection
      conn_local               (obsolete: local variable)
      conn_global              (obsolete: global variable)
      conn_stdin               standard input
      conn_stdout              standard output
      conn_stderr              standard error output

   The values of the above are defined in 'vm.h'.

   The mode is an OR of the following flags:

      conn_read             readable connection
      conn_write            writable connection
      conn_read_pw          (experimental: connection with password access)
      conn_write_pw         (idem)
      in_progress           (connection not yet up)

   The 12 bytes after offset 8 are used as follows:


   Files (conn_file):

               +---+---+---+---+---+---+---+---+---+---+---+---+
               |       *       |   |   |   |   |   |   |   |   |
               +---+---|---+---+---+---+---+---+---+---+---+---+
                       V
                       FILE (in the sens of C)

   Network (conn_network, conn_listener):

               +---+---+---+---+---+---+---+---+---+---+---+---+
               | socket handle |   |   |   |   |   |   |   |   |
               +---+---+---+---+---+---+---+---+---+---+---+---+

   SSL connection (conn_ssl_network):

               +---+---+---+---+---+---+---+---+---+---+---+---+
               |  (SSL *)      | TCP/IP handle | (char *)      |
               +---+---|---+---+---+---+---+---+---+---|---+---+
                       V                               V
                       SSL struct of OpenSSL           server common name

   Standard connections (conn_stdin, conn_stdout, conn_stderr):

               +---+---+---+---+---+---+---+---+---+---+---+---+
               |   |   |   |   |   |   |   |   |   |   |   |   |   (no byte used)
               +---+---+---+---+---+---+---+---+---+---+---+---+


         *** (6.4.2) Dynamic variable address types.

   A simple dynamic variables (type 'Var(T)') is a pointer:

     *
     |
     V
     +-----------+-----------+-----------+-----------+
     |   cnt     |  value    |     n     |     *     |
     +-----------+-----------+-----------+-----|-----+
     0           4           8           12    |     16
                                               V
                                               monitors table

   The segment contains 4 32 bits words:

      - at offset 0: the counter
      - at offset 4: the current value of the variable
      - at offset 8: size of monitors table (number of slots)
      - at offset 12: pointer to monitors table

   'n' should be 0  exactly when the pointer to monitors table  is NULL. When the variable
   is created (by instruction  'create_var'), n is 0 and the pointer  to monitors table is
   NULL.

   A monitor is just a datum of type 'One -> One' (most often it is a closure).


   Multiple dynamic variables (type 'MVar(T)'). This is again a pointer:

     *
     |
     V
     +----------+----------+----------+----------+----------+-----------------------
     |    cnt   |    ns    |    nm    |    *     |   nbw    |       data ...
     +----------+----------+----------+----|-----+----------+-----------------------
     0          4          8          12   V     16         20
                                           monitors table

   In this segement, we have:

      - at offset 0:    garbage collector counter
      - at offset 4:    number of slots for data in the variable
      - at offset 8:    number of slots for monitors
      - at offset 12:   pointer to the table of monitors
      - at offset 16:   normalized bit width (see below)
      - at offset 20:   data

   A multiple dynamic  variable is nothing else  than a 'mutable array'. This  array has a
   size (number of data of type T that we can put in the array).

   Normally, we should do in such a way that slots for data has just the size required for
   type T.  For  example, if we have  a multiple variable of type  'MVar(Bool)', each slot
   should  occupy just  1  bit, not  32  bits. If  we  have a  multiple  variable of  type
   'MVar(Maybe(Int8))', each data slot should occupy 9 bits.

   Because  of byte  boundaries, we  use only  powers of  2 for  the number  of bits  of a
   slot. Hence, for example, in a multiple variable of type 'MVar(Maybe(Int8))', each data
   slot will be  of 16 bits, the least number of  bits which is a power of  2 and at least
   equal to 9.

   This numner of bits is called the 'normalized bit width'. It may be computed as follows
   in C:

     if (bw <= 0)   nbw = 0;  else
     if (bw <= 1)   nbw = 1;  else
     if (bw <= 2)   nbw = 2;  else
     if (bw <= 4)   nbw = 4;  else
     if (bw <= 8)   nbw = 8;  else
     if (bw <= 16)  nbw = 16; else
     nbw = 32;

   where 'bw' is the real bit width, and 'nbw' the normalized bit width.

   However, for  the time being, Anubis  1 sets nbw =  32. This should be  worked out, and
   maybe, we should also consider 24 as an acceptable normalized bit width.


      *** (6.5) implementation of C structure pointer types.

   A datum of type '£StructPtr(Name)' is as follows:

      *
      |
      V
      +--------+--------+
      |  cnt   |    *   |
      +--------+----|---+
      0        4    |   8
                    V
                    C structure

   i.e.   it is  a pointer  to a  segment of  8 bytes  containing only  a counter  for the
   garbage collector, and a pointer  to the actual C structure of name  'Name'. Names of C
   structures are declared in 'bytecode.h'.


   *** (7) Code generation.

   The   function    'compile_term'   (in   'compile.c')   generates    byte   code   from
   interpretations. Below, we describe how this code is generated.


      *** (7.1) General conventions.

   Code generation is a recursive  process (just because interpretations themselves are of
   a  recursive nature).  The  code resulting  from the  compilation of  an interpretation
   should behave always  the same way with respect  to the stack and the  register R.  Our
   conventions are the following.

   First of all, consider  an interpretation '(head . env)'. We may  of course replace all
   occurences  of a type  unknown in  'head' by  its value  as given  by 'env'.  Hence, we
   consider only interpretation heads, and assume that they contain no unknowns.

   An  interpretation head  must be  compiled relative  to a  context. This  context  is a
   precise description of what the stack will contain at run time. We represent context by
   lists, the head of list being the top of stack. Each element in the list corresponds to
   a slot in the stack. Furthermore, the stack  in Anubis 1 is homogeneous. Each slot is a
   32 bits word, able to contain a datum of any type.

   The Exprs which are listed in a context are the following:

      (<symbol> . <type>)

   This  means that  this slot  contains a  datum of  type <type>,  and that  its  name is
   <symbol>.

      (ret . <arity>)

   This means  that this  slot contains  a return address,  corresponding to  a call  to a
   function of <arity> arguments.

      (f_micro_ctxt <fname> <ftype> (<symbol> . <type>) ...)

   This slot contains a 'closure' (i.e.   function defined by '|->').  '<fname>' is either
   'nil' (in the case of '|->'), or "f" (in the case of '|-f->'). '<ftype>' is the type of
   the  function. The  list  of  '(<symbol> .  <type>)  is the  micro  context proper.  It
   indicates in which order  the data remembered by the function are  placed in the memory
   segment of the closure.

      (argument . <type>)

   This means that  the slot is occupied by  a datum which has been pushed  into the stack
   for being  an argument to  a function call.  This datum is  anonymous, but its  type is
   known.

   For example, you may see below how the compiler prints the context:

        (
          (argument . "SSL_Connection")
          (argument . "Bool")
          (argument . "DenialOfService")
          (ret . ...)
          ("connection" . "SSL_Connection")
          (f_micro_ctxt nil ... ("sites" app_ts "List" "Web_Site_Description")
                                ("dos" . "DenialOfService"))
          (ret . ...)
        )

   into a '.sc' file:

         |       ;--- stack ------------------------
         |       ;   0                            SSL_Connection
         |       ;   1                            Bool
         |       ;   2                            DenialOfService
         |       ;   3                            (return address)
         |       ;   4     connection             SSL_Connection
         |       ;   5.0   sites                  List(Web_Site_Description)
         |       ;   5.1   dos                    DenialOfService
         |       ;   6                            (return address)
         |       ;----------------------------------


   If '<head>' is an interpretation head, and 'Ctxt' is a context, we denote by:

                                        [<head>]Ctxt

   the sequence of instructions resulting  from compiling '<head>' relative to the context
   'Ctxt'.

   Convention  1:  The   stack  after  execution  of  the   sequence  of  instructions  in
   '[<head>]Ctxt' is the same as before this execution.

   Convention 2: After the execution of '[<head>]Ctxt', R contains the value of '<head>'.


      *** (7.2) Elimination of terminal recursion.

   Since the language has no notion of  loop, it is of fundamental importance to eliminate
   terminal recursion. Roughly speaking, this amounts to replace the sequence:

     (apply . k)
     (ret . 0)

   by:

     (collapse . k)
     (apply . k)

   Indeed, when  the first sequence is  on the point to  be executed, the  situation is as
   follows:

       stack                       R
       --------------------------------------------
       a_1 ... a_k r s ...         f

   where 'a_1,...,a_k' are  the arguments of the function, 'r' the  return address and 'f'
   the function itself. 's' is the return address of the previous call.

   '(apply . k)' will perform a jump to the code of the function (found in 'R'), and after
   the return from  this function call, 'a_1 ...  a_k r' have been removed  from the stack
   which becomes:

                     s ...

   Then the instruction '(ret  . 0)' removes 's' from the stack  and jumps to address 's',
   so that the stack becomes:

                       ...

   The other sequence of instructions works  as follows. '(collapse . k)' removes 'r' from
   the stack, which becomes:

         a_1 ... a_k s ...

   Then '(apply . k)' call the function 'f',  and after the return of this call, the stack
   is:

                       ...

   as expected.

   Actually,  what it  does is  just avoid  to  perform the  instruction '(ret  . ?)'  two
   consecutive times (one time with address 'r', and one time with address 's').

   Actually, the  situation is more complex because,  in general we have  the arguments of
   a previous call between 'r' and 's', so that the stack looks like this:

       a_1 ... a_k r b_1 ... b_i s ...

   When we return from the call to 'f', the stack becomes:

                     b_1 ... b_i s ...

   and the next instruction  cannot be '(ret . i)' (except if i ==  0), because the top of
   stack 'b_1' is  not a return address.  In this situation,  terminal recursion can still
   be eliminated.  If the recursion is terminal, the next instructions are:

      (del_stack ...)
      ...
      (del_stack ...)
      (ret . i)

   (as many 'del_stack' or 'collapse' as they  are arguments, i.e. 'i'). At that point the
   machine deletes the arguments from the stack and performs the 'ret'.

   So the general situation is the following sequence of instructions:

      (apply . k)
      (del_stack_? . <depth>)
      (collapse . <depth>)
      ...
      (ret . i)

   and the stack is the following:

       stack                                   R
       --------------------------------------------
       a_1 ... a_k r b_1 ... b_i s ...         f

   We replace this sequence by the following one:

      (collapse . k)
      (del_stack_? . k+<depth>)
      (collapse . k+<depth>)
      ...
      (apply . k)

   Here is how it works:

      Instructions                              Stack
      ----------------------------------------------------------------------------------
                                                a_1 ... a_k r b_1 b_2 b_3 ... b_i s ...
      (collapse . k)                              a_1 ... a_k b_1 b_2 b_3 ... b_i s ...
      (del_stack_? . k+<depth>)                       a_1 ... a_k b_2 b_3 ... b_i s ...
      (collapse . k+<depth>)                              a_1 ... a_k b_3 ... b_i s ...
      ...
                                                                      a_1 ... a_k s ...
      (apply . k)                                                                   ...

   The main  difference with  the original  code is just  that 'b_1  ... b_i'  are deleted
   before the call to 'f'.


      *** (7.3) Generating the code.


         *** (7.3.1) Protect.

   The interpretation head has the following form:

      (protect <lc> . <head>)

   The code generated is:

       [(protect <lc> . <head>)]Ctxt =

         label a:
            protect
            [<head>]Ctxt
            unprotect a
            <end code>

   The 'protect' instruction  is of size 2  (2 bytes).  The first byte  is the instruction
   name 'protect', and the  second byte is the 'lock'. This byte is  zero when the code is
   generated.

   At run time, the instruction 'protect' checks the  value of the lock (which may be 0 or
   1). If it is  1, the subsequent code is locked, and the instruction  gives up. If it is
   0, the piece of code  is not locked. So the instruction puts 1 in  the lock so that the
   subsequence code is now locked, and IP (Instruction Pointer) is incremented by 2.

   The subsequent  code '[<head>]Ctxt'  is executed. Then  the instruction  'unprotect' is
   reached. This instruction puts 1 into the  byte at offset 'a+1' (i.e. into the lock) so
   that the code is now unlocked.


         *** (7.3.2) Lock.

         *** (7.3.3) Global symbols.

   Global symbols are those which are defined by paragraphs.


         *** (7.3.4) Strings, Int32, small data and Floats.

         *** (7.3.5) Local symbols.

         *** (7.3.6) Microlocal symbols.

         *** (7.3.7) Closures.

   The code generated must construct a closure. Hence, it begins by the instruction 'alloc'.

       [(closure <lc> (f_micro_ctxt fname ftype (sym . type)...) <args> . <body>)]Ctxt =

          alloc 3+n                 32 bits words (n = size of micro context)
          put_copy y_0              in closure (at 3+0 word)
          ...
          put_copy y_n-1            in closure (at 3+n-1 word)
          put_closure_labels a d
          <end code>

   'n' is the  number of values to be put  in the micro context. Each  value is taken from
   the stack,  virtually copied (if needed),  and put at  its place in the  micro context.
   This is  what the instructions 'put_copy  ...'  are doing.   However these instructions
   are  specialised  depending  on  the  type  of  the datum  to  be  put.   The  function
   'get_closure_copy' returns the list of these instructions.  It receives the description
   of  the micro  context  of  the closure  to  be constructed,  and  the current  context
   (description of the current stack).

   When this is  done, the closure is quite  ready. Only the code of the  function and the
   code for  garbage collection of the  closure is missing.  These codes are put  into the
   closure by the instruction 'put_closure_labels'.

   Of course,  these two codes  must also be  generated by the  compiler. The code  of the
   function is generated as follows:

       label a:
          [<body>]<args>+(<micro_context>)+(ret . k)
          del args
          del_closure
          ret k

   When a function represented by a closure is called the stack is prepared as follows:

                       a_1 a_2 ... a_k  c  r ...

   (the top  of stack on the  left side), where 'a_1'  ... 'a_k' are  the arguments (first
   argument on  top of stack), 'c'  the closure itself,  and 'r' the return  address. Just
   before the function returns the stack must become:

                                           r ...

   So  the   body  of  the  function   is  compiled  relatively  to   a  context  (denoted
   <args>+(<micro_context>)+(ret . k) above) representing  the above stack. When this code
   has finished, the value to be returned by the function is in R.  At that point, we just
   have to delete (virtually) the arguments and the closure from the stack, and return.


         *** (7.3.8) Applicative terms.

         *** (7.3.9) Conditionals.

         *** (7.3.10) Selective conditionals.

         *** (7.3.11) Local definitions ('with ...').

         *** (7.3.12) Delegate.

         *** (7.3.13) Serialize/Unserialize.


      *** (7.4) Peephole optimisation.


   *** (8) Garbage collector code.


   *** (9) Equality code.


   *** (10) Serialisation/unserialisation.

   Serialisation   is   also   called    'marshalling'.    Anubis   1   has   a   built-in
   serialisation/unserialisation mechanism. In the  next version of Anubis, this mechanism
   should be written in Anubis using macros. It will be much simpler (and safer).

   In principle  any abstract datum could be  serialised. In the current  system, only the
   data of the following types are serialisable:

   - the primitive types: String, ByteArray, Int, Float,
   - defined types, provided that all their components are serialisable.

   Functional types  should in principle  be serialisable, but  they are not in  Anubis 1.
   This would require a more sophisticated  mechanism, which has also something to do with
   dynamically loadable libraries.


   Serialisation is  not performed by a  piece of code,  but by a piece  of 'pseudo-code'.
   The  same   'pseudo-code'  is  used   for  serialising  and  for   unserialising.   The
   'pseudo-instructions' have just a different  execution. Hence the compiler produces (if
   needed) only one such pseudo-code per serialisable type.

   Actually the  pseudo-code is essentially a  description of the  type of the data  to be
   serialised/unserialised.

   Note (2006/08/16): I just remarked, looking at the '.sc' file of the web site, that the
   compiler generates pseudo code paragraphs like this one:

---------|--------------------------------------------------------
         |
         |         * * * type implementation pseudo code * * *
         |
   42901 |  label 590
         |       ;--- stack ------------------------
         |       ;   0                            Var(List(Int32))
         |       ;   1                            (return address)
         |       ;----------------------------------
   42901 |    type_large_switch 1336
   42907 |  label 1336
   42907 |    large_alt_begin 9
   42909 |    indirect_type_mixed 2 591
   42915 |    pop1
   42916 |    large_alt_end 0
   42918 |    jmp 1335
   42923 |  label 1335
   42923 |    swap
   42924 |    ret 1
---------|--------------------------------------------------------

   Strange, isn't  it ? Because 'Var(...)' is  not serialisable ! However,  I also checked
   that such pieces of code  are never called (by 'serialize/unserialize' instructions). I
   also checked with the following example:

global define One
   truc
     (
       List(String) args
     ) =
   forget(serialize(var(args))).

   that  the compiler  doesn't see  the problem.   This is  probably due  to the  very bad
   implementation  of 'Var(...)' which  is implemented  as a  large type,  not as  a 'Var'
   type. This must be  fixed as soon as possible. The best  way is to implement 'Var(...)'
   correctly.


      *** (10.1) How it works.

   Serialising means  'transforming into a sequence  of bytes'. This sequence  of bytes (a
   byte array)  may be saved on  disk, sent over  a network connection etc...   Of course,
   serialisation is interesting because it  does'nt loose information. Serialised data may
   be unserialised.

   Now, when we  are unserialising a byte array,  expecting a datum of a  certain type, we
   are not always sure  that the byte array is the result of  the serialisation of a datum
   of this  type. Hence,  unserialisation may  produce errors. This  is why  the primitive
   'unserialize' returns a datum of type 'Maybe($T)'.

   Notice that when a datum is serialised, its type is not written into the serialisation.
   Hence, when  unserialising a  datum we need  to know  the type of  the datum.  In other
   words, the type is the 'key' (or the 'grammar') for unserialising.

   Serialisation is fairly simple. It works as follows:

     - String: the characters of the string, followed by a zero byte are written.

     - ByteArray:  a 4  bytes word  is written  in little  indian order,  representing the
     number of bytes in the byte array. Then the bytes of the byte array are written.

     - Int: The first byte is the sign (0  = positive, 1 = negative). The next 4 bytes the
     number of  bigits (little endian).  The subsequent groups  of 4 bytes are  the bigits
     (most  significant first, so  that, unfortunately,  the bytes  are ordered  as little
     endian, whilst the bigits are ordered as big endian !).

     - Float: The  floating point  number is written  in the  IEEE754 8 bytes  format (and
     stored little endian !).

     - small type:  The datum is written either  as a 0 byte,  1 byte, 2 bytes  or 4 bytes
     word.

     - mixed  type: The  index  of the  alternative  is first  written in  the  form of  a
     byte. Then the components are serialised. But in the case of a small alternative, the
     components are serialized as a 'small datum',  i.e. a 32 bits word containing all the
     datum of the  alternative (index included). As a consequence, in  this case the index
     is redundantly serialized. See  the pseudo-instruction 'indirect_type_mixed' for more
     informations.

     - large type: Same as for mixed types.

   The  result of  serialisation  is  a byte  array  which needs  to  be allocated  before
   serialisation begins. The virtual machine has special registers for serialisation:

     U8 *serial_buf;              buffer for serialisation
     U32 serial_size;             size of this buffer
     U32 serial_next;             next free position in the buffer
     Bool unserial_failed         flag for unserialisation

   When  serialisation begins  (i.e. when  the instruction  'serialize' is  executed), the
   datum to  be serialised  is on  top of the  stack. The  instruction 'serialize'  has an
   address  operand  pointing to  the  pseudo  code for  the  type  of  this datum.   This
   instruction works  like a call  (it calls  the pseudo code),  hence must push  a return
   address.   However,  the  return  address  is  inserted just  below  the  datum  to  be
   serialized. The  instruction also changes the  'work sort' of the  virtual machine from
   'computing' to 'serializing',  adn allocates a data segment of  some default size whose
   address  is put  in  'serial_buf'. 'serial_size'  is  intialized to  the  size of  this
   segment, and 'serial_next' is initialized to 8  = 4+4, so as to accomodate room for the
   garbage collector counter  and the word  for the size  of the resulting byte  array. Of
   course, if the segment cannot be  allocated, the instruction gives up, waiting for next
   time.

   Form now on, we are serializing, executing some pseudo-code. This pseudo-code ends with
   the instructions 'revert_to_computing' and 'ret'.

   'revert_to_computing'  changes  the  work   sort  from  'serializing'  to  'computing',
   reallocates the  resulting byte array  at the  right size, puts  the size at  the right
   place  in the  byte  array, puts  the  new  byte array  in  R, and  a  null pointer  in
   'serial_buf'.

   'ret' returns from the pseudo-code to the normal code.


   Unserializing works as follows.

   The byte  array 'b' to be  unserialized is in  R. The 'unserialize' instruction  has an
   address as its unique operand:

   unserialize <addr>

   It is the address of a piece of pseudo-code, namely the pseudo-code for the type 'T' of
   the datum to be extracted from the byte array.

   What 'unserialize' does is just:

        1. put (the address of) 'b' into the 'serial_buf' register, and
           initialize related registers ('serial_size' and 'serial_next'),
        2. put 0 in the flag 'unserial_failed'
        3. push a return address 'r',
        4. put the machine in the state 'unserializing',
        5. jump to <addr>

   Hence, just after 'unserialize' has executed, we have:

        R            stack          serial_buf    unserial_failed
     (empty)         r ...              b               0

   The execution of the pseudo code yields one of the following two situations:

   case 1: unserialization succeeded

        R            stack          serial_buf    unserial_failed
        d              ...              b               0

   case 2: unserialization did not succeed

        R            stack          serial_buf    unserial_failed
        d              ...              b               1

   In both cases, we get a datum 'd' of type 'T'. However,

     - if 'unserial_failed'  is 0,  d is  the datum successfully  extracted from  the byte
       array,

     - if 'unserial_failed' is 1, d is a valid datum of type T, at least from the point of
       view of the garbage collector. Nevertheless, it is non significant.

   The return address 'r' has been removed from the stack.  The byte array 'b' is still in
   the 'serial_buf' register.   At that time, the machine is  still in the 'unserializing'
   state. The next instruction:

        revert_to_computing <width>

   has  the responsability  of testing  'unserial_failed', virtually  delete 'b',  put the
   machine in the state 'computing', and:

   case 1 (unserail_failed == 0):

     - put 'd' into a 'success', so as to construct a datum of type 'Maybe(T)', and push a
       zero on the stack. Hence, the situation becomes:

        R                stack           serial_buf        unserial_failed
        success(d)       0 ...           (empty)           (empty)

   case 2 (unserail_failed == 1):

     - push 'd' on the stack, and put  a zero ('failure' of type 'Maybe(T)') in R.  Hence,
       the situation becomes:

        R               stack           serial_buf        unserial_failed
        failure         d ...           (empty)           (empty)

   The next instruction is:

        del_stack_? 0

   which will delete correctly the top of  stack, hence yielding in any case the following
   situation:

        R            stack          serial_buf    unserial_failed
        result           ...           (empty)           (empty)

   where result is of type 'Maybe(T)'.

   So, if the initial situation is:

        R         stack
        (empty)       ...

        after execution of:

        [term]
        unserialize <addr>
        revert_to_computing <width>
        del_stack_? 0

   it becomes:

        R         stack
        d           ...

   where 'd' is the result of the unserialization (of type 'Maybe(T)').

   Notice that the <width> operand is  used by 'revert_to_computing' on order to construct
   'success(d)'. Indeed,

     - if <width> <= 31,   success(d) is just     (d<<1)|1
     - if <width> >= 32,   success(d) requires a new data segment of two words, with:

        - at offset 0:   counter (= 1)
        - at offset 4:   d

   and  the pointer  to this  data  segment must  be ORed  with  1 (mixed  index for  type
   'Maybe').

   To summarize, the code for (unserialize <lc> <type> . <term>) is:

        [<term>]
        unserialize <addr>
        revert_to_computing <width>
        <end code>

        except if the declared type is ByteArray. In that case, we have:

        [<term>]
        success 32            needed to put the byte array into a 'success'
        <end code>


      *** (10.2) Descriptions of small types.

   They are lists on integers defined by the following grammar:

   <na> :=   8 bits integer (number of alternatives)
   <iw> :=   8 bits integer (index width)
   <nc> :=   8 bits integer (number of components)

   <small type descr> :=    <iw> <na> <alt descr> ...   (as many <alt descr> as <na>)

   <alt descr> := <nc> <small type descr> ...   (as many as <nc>)

   Hence a small type description is a list of 8 bits integers. These integers become part
   of  serialization/unserialization instructions  like 'type_8',  'type_16'  etc...  (see
   'vminstr.c'). One question is: `how long this  description may be ?'. The answer is not
   so obvious !  Remember that the number  of alternatives is limited to 256, and likewise
   for the  number of components in  an alternative.  Now, if  'iw' is the  number of bits
   required for the index,  the total width of the components is at  most: 'bw - iw' where
   'bw' is  the bit width of  our type.  So, let  'dl(iw,bw)' be the maximal  length for a
   description of a type of index width 'iw' and bit width 'bw'. We have,

                       dl(iw1,bw1) =< 2 + (2^iw1 * (1 + bl(iw2,bw1 - iw1)))


      *** (10.3) Pseudo-instructions set.

   The official list of pseudo-instructions  is defined in 'bytecode.h'. Recall that these
   instructions have two meanings: one for serialisation and one for unserialisation.


                             action while                      action while
                             serialising                       unserialising
   ---------------------------------------------------------------------------------------
   swap                      swaps the top of stack            nothing

   type_0                    nothing                           nothing
   type_8
   type_16
   type_32
   type_String
   type_ByteArray
   type_Int32
   type_Float

   indirect_type_0
   indirect_type_8
   indirect_type_16
   indirect_type_32
   indirect_type_String
   indirect_type_ByteArray
   indirect_type_Int32
   indirect_type_Float
   indirect_type_mixed
   indirect_type_large

   type_mixed_switch
   type_large_switch


   mixed_alt_begin
   large_alt_begin
   mixed_alt_end
   large_alt_end

   type_small_alt

   revert_to_computing

   type_mixed (obsolete; never used)
   type_large (obsolete; never used)
   small_mixed_alt (obsolete; never used)


      *** (10.4) Serialising.


      *** (10.5) Unserialising.


      The machine is now doing unserialization. When unserialization begins (just after
      the instruction 'unserialize'), the machine is in the following state:

                R            stack          serial_buf    unserial_failed
             (empty)         r ...              b               0

      where 'r' is a  return address (to be removed by the 'ret' at  the end of the pseudo
      code, and where 'b' is a byte array.

      During unserialization, the registers serial_buf and serial_size remain constant. On
      the contrary,  serial_next is incremented while  the byte array is  read.  It always
      gives the offset (relative to serial_buf) of the next byte to be read.

      Below, we simulate  a successful unserialization for various  types. The instruction
      'swap' does nothing during unserialization.

      Below, 'p' is a pointer to a newly allocated segment, whose size (in bytes) is given
      by  the  instruction  ?_alt_begin.   'e'  is  the  result  of  unserialization  when
      instructions call  another pseudo code.  'pop1'  does not pop anything,  but put the
      content of  R at the  address pointed to  by the top  of stack, and  increments that
      pointer.  'indirect' instructions  which do not call another  pseudo code, put their
      result at the address pointed to by the top of stack.

small type: Bool
                                                   R            stack
      50 |  label 20                            (empty)         r ...
      50 |    type_8                               d            r ...
      51 |    swap
      52 |    (ret . 1)                            d              ...


primitive type: String

                                                   R            stack
      50 |  label 4                             (empty)         r ...
      50 |    type_String                          d            r ...
      51 |    swap
      52 |    (ret . 1)                            d              ...


mixed type: List(String)

                                                   R            stack
      50 |  label 15                            (empty)         r ...
      50 |    (type_mixed_switch 2 30 29)
      61 |  label 30                            (empty)         r ...
      61 |    type_32                              d            r ...
      62 |    (jmp . 28)
      67 |  label 29
      67 |    (mixed_alt_begin . 12)            (empty)  p+4  p r ...
      69 |    indirect_type_String              (empty)  p+8  p r ...
      70 |    (indirect_type_mixed 2 . 15)         e     p+8  p r ...
      76 |    pop1                              (empty)  p+12 p r ...
      77 |    (mixed_alt_end . 1)               d = p|1         r ...
      79 |    (jmp . 28)
      84 |  label 28
      84 |    swap                                 d            r ...
      85 |    (ret . 1)                            d              ...


large type: Printable_tree          (defined in basis.anubis)

                                                   R            stack
     130 |  label 20                            (empty)         r ...
     130 |    (type_large_switch 5 42 41 40 39 38)
     152 |  label 42
     152 |    (large_alt_begin . 5)             (empty)  p+5  p r ...
     154 |    (large_alt_end . 0)               d = p|0         r ...
     156 |    (jmp . 37)
     161 |  label 41
     161 |    (large_alt_begin . 13)            (empty)  p+5  p r ...
     163 |    indirect_type_String              (empty)  p+9  p r ...
     164 |    (indirect_type_large . 20)           e     p+9  p r ...
     169 |    pop1                              (empty)  p+13 p r ...
     170 |    (large_alt_end . 1)               d = p|1         r ...
     172 |    (jmp . 37)
     177 |  label 40
     177 |    (large_alt_begin . 10)            (empty)  p+5  p r ...
     179 |    indirect_type_8                   (empty)  p+6  p r ...
     180 |    (indirect_type_large . 20)           e     p+6  p r ...
     185 |    pop1                              (empty)  p+10 p r ...
     186 |    (large_alt_end . 2)               d = p|2         r ...
     188 |    (jmp . 37)
     193 |  label 39
     193 |    (large_alt_begin . 13)            (empty)  p+5  p r ...
     195 |    indirect_type_Int32               (empty)  p+9  p r ...
     196 |    (indirect_type_large . 20)           e     p+9  p r ...
     201 |    pop1                              (empty)  p+13 p r ...
     202 |    (large_alt_end . 3)               d = p|3         r ...
     204 |    (jmp . 37)
     209 |  label 38
     209 |    (large_alt_begin . 13)            (empty)  p+5  p r ...
     211 |    (indirect_type_large . 20)           e     p+5  p r ...
     216 |    pop1                              (empty)  p+9  p r ...
     217 |    (indirect_type_large . 20)           e     p+9  p r ...
     222 |    pop1                              (empty)  p+13 p r ...
     223 |    (large_alt_end . 4)               d = p|4         r ...
     225 |    (jmp . 37)
     230 |  label 37
     230 |    swap                                 d            r ...
     231 |    (ret . 1)                            d              ...


    Hence, instructions work as follows during unserialization (assuming no error occurs).

      'direct type' instructions:

        type_0
        type_8
        type_16
        type_32
        type_Int32
        type_String
        type_ByteArray
        type_Float
        --> unserialize a datum (by reading the byte array) and put it in R

      'switch' instructions:
        type_mixed_switch
        type_large_switch
        --> unserialize an index (always a byte), check it using second operand (number of
          alternatives), and jump to the right label.

      'alt_begin' instructions:
        mixed_alt_begin
        large_alt_begin
        --> allocate a segment, push the pointer, duplicate it, and add either 4 or 5 to the
          copy.

      'indirect non calling type' instructions:
        indirect_type_0
        indirect_type_8
        indirect_type_16
        indirect_type_32
        indirect_type_Int32
        indirect_type_String
        indirect_type_ByteArray
        indirect_type_Float
        --> unserialize a datum and put it at the address pointed to by pointer on top of
          stack. Increment this pointer by the size of the datum.

      'indirect calling type' instructions:
        indirect_type_mixed
        indirect_type_large
        --> Call a pseudo code. This has the effect of putting an unserialized datum in R
         upon return.

      'pop1'
        --> put the content of R at the address pointed to by the top of stack, and increment
         that pointer (always by 4).

      'alt_end' instructions:
        mixed_alt_end
        large_alt_end
        --> remove the pointer on top of stack, put the next one in R, and OR it with the
         index (given by the instruction itself).


    Now, unserialization may  fail, because the bytes in the byte  array are arbitrary. As
    soon as such  a failure is detected, the  flag 'unserial_failed' is set to  1. In that
    case,  we must  continue  a 'fake'  unserialization,  because, we  must complete  data
    partially constructed.   This is the reason  why all instructions must  first test the
    flag.

    If the flag is 0 they operate  as described above. Nevertheless, since they may detect
    a problem they may also put 1 in the flag.

    'Fake execution' of the above instructions is as follows:

      'direct type' instructions:    put a value of 0 in R as the result of the unserialization
      'switch instructions':         jump to the first alternative, which always exists
      'alt_begin' instructions:      do not allocate a segment, but do as if, by pushing a NULL
                                       pointer and duplicating it (hence we have two NULL pointers
                                       on top of stack
      'indirect non calling type' instructions: there are two cases:
         - the two pointers on top of stack are NULL:
             do not do anything
         - the two pointers on top of stack are non NULL:
             put a value of zero as the result of unserialization, and increment the topmost
             pointer by the right value.
      'indirect calling type' instructions: again there are two cases:
         - the two pointers on top of stack are NULL:
             do not do anything
         - the two pointers on top of stack are non NULL:
             do not call any pseudo MAM(m_code_begin), and put 0 in R.
      'pop1': two cases:
         - the two pointers on top of stack are NULL:
             do not do anything
         - the two pointers on top of stack are non NULL:
             put 0 at the address pointed to by the top of stack, increment the pointer.
      'alt_end' instructions: two cases:
         - the two pointers on top of stack are NULL:
             remove the two NULL pointers and put a NULL pointer in R,
         - the two pointers on top of stack are non NULL:
             remove the topmost pointer, and put the other one in R, and OR it with the index.

      Notice that any instruction which detects a failure, and puts 1 in the flag, should do as if
      the flag was already set.


      Finally, we must also discuss 'revert_to_computing'. After the unserialization, the state of
      the machine is:

                R            stack          serial_buf    unserial_failed
                d              ...              b               0/1

      where 'd' is a datum, which is significant only if unserial_failed is 0.

      In any case, the byte array in serial_buf must be virtually deleted, and a zero must
      be put in serial_buf (this will be checked by 'unserialize'/'serialize').

      If unserial_failed is:

            0      then  construct 'success(d)' in R (use the <width> operand to know the
                     bit width of type of d), and push a zero in the stack, so that
                     the next instruction (del_stack_?) may operate.
            1      then  push R in the stack (it will be deleted by the next instruction),
                     and put 0 (failure of type 'Maybe(T)') in R.

      Note: the sequence of instructions:

          unserialize <addr>
          revert_to_computing <width>

      is always followed by:

      del_stack_? 0               (for the type of the unserialized datum)


   *** (11) Outputing the code (making an *.adm file).


   *** (12) How to make a system of macros ?

      *** (12.1) Overview.

   Anubis has no system  of macros, but should have one. This  would (as already remarked)
   induce  many simplification  in  the system,  and  at the  same  time provide  powerful
   programming tools for the  user. The best system of macros ever  created is that of the
   Lisp language. This system was natural  in Lisp because of the particularly simple form
   of the syntax.  Of  course, the syntax of Anubis is much more  complex, but as we shall
   see, constructing a system 'à la Lisp' is still possible.

   What a language is good for ?   It is good for representing 'things' (belonging to some
   'target' world) by  'expressions'. For example, the natural  language represents things
   of  the  real  world  by  words,  sentences, etc...   but  also  abstract  things  like
   statements,  etc...   Of course,  each  language  has a  syntax  (grammar)  but also  a
   semantics.   The  'semantics'  is  the  correspondance  between  the  expressions  (the
   'signifiers') and the thing they represent (the 'signified').

                                       the semantics
                                             |
            symbolic side (language)         |        semantic side (target world)
        -------------------------------------+----------------------------------------
                                             |
                                             |
                 expression   |----------------------------> thing
                (signifier)                  |               (signified)
                                             |


   Remark  that in most  cases (programming  languages, mathematics,  but not  the natural
   language),  the  symbolic side  is  more concrete  (less  abstract)  than the  semantic
   side. Indeed, for example in mathematics, we have many expressions for representing the
   number (say) 2, like 2, 1+1, 5-3, etc... and these expressions may be manipulated using
   computation  rules. So,  this  is in  some sens  very  concrete. On  the contrary,  the
   signified,  i.e. the  mathematical object  2,  is a  complete abstraction  by its  very
   nature.

   So, we use  the language for manipulating the  things (or at least we  give ourself the
   illusion that  we manipulate the  things, because what  we actually manipulate  are the
   expressions, for example when we are computing). This leads us in some circumstances to
   write down several  times the same expressions or at least  similar expressions. So, we
   may want to make  this writing automatic. In order to do that,  we need to consider the
   expressions themselves as a new sort of  things (the 'meta-things') and to create a new
   language (the 'meta-language') for manipulating the meta-things:

                         meta-semantics                semantics
          meta-language         |     language              |   target world
       -------------------------+---------------------------+--------------------------
                                |                           |
         meta-expression |-----------> expression |--------------> thing
                                |                           |
                                |                           |

   In most cases,  the meta-language is different from the language  and more simple. This
   is the  case for the  C language,  where the expressions  of the meta-language  are all
   those which begin by: #define #if etc... (a very poor language actually).

   However  in Lisp, the  meta-language is  Lisp itself.  The creators  of Lisp  were much
   inspired when they did that. This has several consequences:

      1. The meta-language is as powerful as the language itself.
      2. The language is also the meta-meta-language, the meta-meta-meta-language, etc...

   Fans of Lisp (like me) know how this feature is powerful.


      *** (12.2) The trick.

   In order  to have  the language be  its own  meta-language, it is  enough to  embed the
   language  into its  own target  universe.  In  other words,  it should  be  possible to
   consider all expressions of the language as data of the target universe:


                      +------------------------------------------------+
                      |                                                |
                      |  target universe (all sorts of data)           |
                      |                                                |
                      |                                                |
                      |                                                |
                      |                                                |
                      |                                                |
                      |       +--------------------------+             |
                      |       |                          |             |
                      |       |   the language           |             |
                      |       |                          |             |
                      |       |  (a subset of the        |             |
                      |       |         universe)        |             |
                      |       |                          |             |
                      |       |                          |             |
                      |       +--------------------------+             |
                      |                                                |
                      |                                                |
                      |                                                |
                      +------------------------------------------------+

   Hence  the  language,  being  able  to  represent all  data,  may  also  represent  the
   expressions of the  language itself. It may also be able  to represent expression which
   themselves represent expressions  and so on, so  that the language is at  the same time
   the meta-language, the meta-meta-language, and so on... This is the big trick.


      *** (12.3) How it works.

   Basically, we  need only a set  of types (called 'syntactical  types') for representing
   expressions, and an operator (named  the 'expansion' operator) for forcing the compiler
   to evaluate data of these types. We  may also have a 'quoting' operator for simplifying
   the writing of data of these types.

   One  of the  main types  for representing  expressions in  Anubis should  be a  type of
   'terms':

   public type Term:
      symbol(String name),
      applicative(Term function, List(Term) arguments),
      function(List(Declaration) arguments, Term body),
      conditional(Term test, List(Case) cases),
      ...

   The  quoting operator  (denoted:  ' )  should simplify  the  writting of  data of  type
   'Term'. For example, instead of writting:

      conditional(symbol("x"),
                  [
                    case([("failure",[])],symbol("false")),
                    case([("success",[symbol("u")])],
                             applicative(symbol("f"),[symbol("u")]))
                  ]
                 )

   we would have just to write this:

       'if x is
          {
            failure    then false,
            success(u) then f(u)
          }

   (Remark the quoting mark  in front of this term.) The above term  is not of type 'Bool'
   but just  of type  'Term'.  Thus, the  quoting mark  just means: 'don't  interpret this
   term, just consider it as a term'.

   Now, we may write down functions for producing terms (of type 'Term'), type definitions
   and  all  sorts  of  'syntactical  concepts',  considered as  data  of  our  semantical
   universe. From now on, the expressions of  the language may be manipulated as any other
   kind of data.

   Manipulating  expressions as data  is called  'meta-programming'. But  this is  not the
   whole story, because the  destiny of a datum of type 'Term' or  of some similar type is
   at the very end to be interpeted (evaluated).

   When we want to evaluate a datum (say 't') of type 'Term', we just have to write:

      @t

   '@' is the 'expansion'  (or 'evaluation') operator. In some sens it  is the converse of
   quoting. It transforms a datum of type  (say) 'Term' into what this datum represents if
   we consider it as an expression.


      *** (12.4) An example.

   The  most difficult  is to  design the  types: 'TypeDefinition',  'Definition', 'Term',
   'Case', etc... They should  be the perfect image of the whole  language. As an example,
   we consider only a small fragment of this family.

public type Type:
  scheme     (String name, List(Type) operands),
  functional (List(Type) source_types, Type target_type),
  ...

public type Component:
  anonymous  (Type type),
  named      (Type type, String name).

public type Alternative:
  singleton  (String name),
  usual      (String name, List(Component) components).

public type TypeDefinition:
  public     (String name,
              List(String) parameters,
              List(Alternative) alternatives),
  private    (String name,
              List(String) parameters,
              List(Alternative) alternatives).

public type Paragraph:
   type_definition(TypeDefinition),
   ...


   Now, we may want to define this

define TypeDefinition
  enumerated_type
    (
      String       type_name
      List(String) alternative_names
    ) =
  private(type_name,
          [],
          map((String n) |-> singleton(n), alternative_names).


define List(String) my_names = ["ga","bu","zo", "meu"].

@enumerated_type("Shadok",my_names).

   In this last paragraph (a 'macro-paragraph' beginning by '@'), the term:

                                enumerated_type("Shadok",my_names)

   is of type  'TypeDefinition'. Because of the presence of '@',  the compiler will expand
   it into an actual paragraph, namely this one:

type Shadok:
   ga,
   bu,
   zo,
   meu.


   The syntax of 'macro-paragraphs' is:

@t

   (with  '@'  in  the  leftmost  column)  where  t is  a  term  of  type  'Paragraph'  or
   'List(Paragraph)'. In  the case of a list  of paragraphs the compiler  expands the term
   into a sequence of paragraphs.


   Of course, the expansion operator should also be usable for the expansion of terms (and
   any other  sort of syntactical entity), not  only for paragraphs.  For  example, we may
   write something like this:

define Term
   my_cond
     (
       Term          test
       List(String)  alts
     ) =
   conditional(test,
               map((String n) |-> singleton_case(n,length(n)),
                   alts)).

   Now, we may write (within a paragraph):

                                    @my_cond(_x,my_names)

   which will be expanded into the term:

        if x is
          {
            ga   then    2,
            bu   then    2,
            zo   then    2,
            meu  then    3
          }

   where 'x' is the expansion of '_x'.

   The expansion may also  be implicit. For example if a term  has an interpretation whose
   type is a syntactical  type, the compiler may also try to  interpret its expansion. And
   conversely  the quoting  may also  be implicit.  However, this  may eventually  lead to
   problems and this should be experimented.


   *** (13) Propositions for a dynamically defragmenting memory management system.

   For the time being, the main segments  of the memory allocator are never freed. This is
   because since  allocated segments cannot be  moved within memory, main  segments of the
   allocator have almost no chance to become empty. Anyway, no system for testing for this
   emptyness has been implemented.

   It  would be  nice  to be  able  to move  an allocated  segment  within memory  without
   disturbing  the program.  That way,  it would  be possible  to gather  allocated memory
   segments and to free  big parts of memory. The system described  below allows moving of
   allocated segments.  Compared with  the current system  (usual allocator  and reference
   counting), it has the following characteristics:

      - doesn't need  more memory than the  current system within  the allocated segments,
      but the size of pointers is the double (64 bits) of the current size of pointers (32
      bits).

      - Virtual copy is almost as quick as in the current system, but virtual deletion may
      be slower  when there are data  shared by many pointers  (which does not  seem to be
      very often the case).

      - An allocated segment  may be moved within memory by  the system without perturbing
      the currently running virtual machine, provided  it is done in one non interruptable
      operation (but which duration is very short).

      - Allocated segments may  be moved and main segments returned  to the host allocator
      in a low priority mecanism.


      *** (13.1) How it works.

   I would  like to call this  system 'Reference Chaining Garbage  Collector' (compare to:
   'Reference Counting Garbage Collector', the current system). Here is how it works.

   When a new  segment is allocated a  'reference' to it is created  and stored somewhere.
   This reference represents the datum just created. It is made of two parts (two words):


       --------------------
      |                    |
      |                    V
      |                    +---------+---------+
      |                    |    *    | (n<<2)|1| the 'reference' (two words)
      |                    +----|----+---------+
      |                         |
      |                         |
      |                         V
      |     +---------+---------+----------------------------------+
      |     |   mc    |    *    |   the newly allocated segment    |
      |     +---------+----|----+----------------------------------+
      |                    |
       --------------------

   The first part of the reference is an ordinary pointer (say 32 bits) to the useful part
   of the newly allocated segment. The second  part has the form (assuming pointers are 32
   bits wide):

          nnnnnnnn nnnnnnnn nnnnnnnn nnnnnn01

   where n...n is the size of the allocated segment (in words). The segment itself has two
   hidden words  at the beginning.   The second word  contains a pointer to  the reference
   itself.   Hence,  the  reference  can  be  recovered from  the  allocated  segment  and
   conversely.  The  size of the segment  (needed only for deallocation)  may be recovered
   from the second word of the reference. The word marked 'mc' will be explained later.

   Now, what happens if  this reference is shared, i.e. if we make a  virtual copy of it ?
   Here is what happens:

       --------------------                --------------
      |                    |  second      |              |
      |                    V  reference   |              V   first reference
      |                    +---------+----|----+         +---------+---------+
      |                    |    *    |    *    |         |    *    | (n<<2)|1|
      |                    +----|----+---------+         +----|----+---------+
      |                         |                             |
      |                         |-----------------------------
      |                         |
      |                         V
      |     +---------+---------+----------------------------------+
      |     |   mc    |    *    |   the allocated segment          |
      |     +---------+----|----+----------------------------------+
      |                    |
       --------------------

   Hence, the references to the allocated  segment are chained into a list. Since pointers
   are divisible by 4, the end of the list  is found by the fact that (n<<2)|1 is 1 modulo
   4. The  list  is  pointed to  by  the  second  hidden  word  in the  allocated  segment
   itself. Each new reference  is added in front of the list, as  we show already with the
   second one. For example, here is how it looks with 3 references to our segment:

       -------                --------                --------------
      |       | third        |        |  second      |              |
      |       V reference    |        V  reference   |              V   first reference
      |       +---------+----|----+   +---------+----|----+         +---------+---------+
      |       |    *    |    *    |   |    *    |    *    |         |    *    | (n<<2)|1|
      |       +----|----+---------+   +----|----+---------+         +----|----+---------+
      |            |                       |                             |
      |             -----------------------------------------------------
      |                         |
      |                         V
      |     +---------+---------+----------------------------------+
      |     |   mc    |    *    |   the allocated segment          |
      |     +---------+----|----+----------------------------------+
      |                    |
       --------------------

   Notice  that  the segment  is  reachable  directly from  any  reference,  and that  the
   allocated  segment has  two  words used  for  extra 'non  data'  informations.  In  the
   Reference  Counting system  there are  also two  such words:  one for  the size  of the
   segment (used by the allocator), and another one for counting the references.

   Now, here is  how virtual deletion is  performed. When a reference needs  to be deleted
   (i.e. a copy of the datum is virtually deleted), we must know if it is the head of list
   or not.  To  that end, we first go  to the allocated segment, and then  follow the list
   until we find our segment again.  We determine if it is the first one in the list or if
   it is not.  If it is the first one, we may easily determine if it is the last reference
   to our segment by having a look at the parity of the second word of the reference.

   Case 1: There is only one reference  to the allocated segment. In this case the segment
   is freed.

   Before deletion:

       --------------------
      |                    |
      |                    V
      |                    +---------+---------+
      |                    |    *    | (n<<2)|1|  the 'reference' (two words)
      |                    +----|----+---------+
      |                         |
      |                         |
      |                         V
      |     +---------+---------+----------------------------------+
      |     |   mc    |    *    |   the allocated segment          |
      |     +---------+----|----+----------------------------------+
      |                    |
       --------------------

   After deletion:  There is nothing  left. The allocated  segment returns to the  pool of
   free segments,  after all the references  it contains are  virtually deleted (recursive
   process).


   Case 2: There  are several references, and the  reference to be deleted is  the head of
   list. In this case the next reference becomes the head of list:

   Before deletion:

       --                --------                ---........---
      |  |              |        |              |              |
      |  V              |        V              |              V
      |  +---------+----|----+   +---------+----|----+         +---------+---------+
      |  |    *    |    *    |   |    *    |    *    |         |    *    | (n<<2)|1|
      |  +----|----+---------+   +----|----+---------+         +----|----+---------+
      |       |                       |                             |
      |        -----------------------------------------------------
      |                         |
      |                         V
      |     +---------+---------+----------------------------------+
      |     |   mc    |    *    |   the allocated segment          |
      |     +---------+----|----+----------------------------------+
      |                    |
       --------------------

   After deletion:

       --------------------------                ---........---
      |    deleted               |              |              |
      |    reference             V              |              V
      |  +---------+----|----+   +---------+----|----+         +---------+---------+
      |  |    *    |    *    |   |    *    |    *    |         |    *    | (n<<2)|1|
      |  +----|----+---------+   +----|----+---------+         +----|----+---------+
      |                               |                             |
      |                          -----------------------------------
      |                         |
      |                         V
      |     +---------+---------+----------------------------------+
      |     |   mc    |    *    |   the allocated segment          |
      |     +---------+----|----+----------------------------------+
      |                    |
       --------------------

   Notice  that we  just have  to change  the pointer  in the  second hidden  word  of the
   segment.


   Case 3: The reference to be deleted is not the head of list and not the last one in the
   list. In this case it must be removed from the list. We illustrate as above:

   After deletion:

       --                ---------------------------........---
      |  |              |          deleted                     |
      |  V              |          reference                   V
      |  +---------+----|----+   +---------+----|----+         +---------+---------+
      |  |    *    |    *    |   |    *    |    *    |         |    *    | (n<<2)|1|
      |  +----|----+---------+   +----|----+---------+         +----|----+---------+
      |       |                                                     |
      |        -----------------------------------------------------
      |                    |
      |                    V
      |     +---------+---------+----------------------------------+
      |     |   mc    |    *    |   the allocated segment          |
      |     +---------+----|----+----------------------------------+
      |                    |
       --------------------

   Notice that this amounts to change a pointer in the previous element of the list.


   Case 4: The reference to be deleted is the  last one in the list.  In this case we just
   have  to move  the  word (n<<2)|1  from  the last  reference to  the  previous one,  as
   illustrated below:

   After deletion:

       --                ---.......--
      |  |              |            |                          deleted
      |  V              |            V                          reference
      |  +---------+----|----+       +---------+---------+    +---------+---------+
      |  |    *    |    *    |       |    *    | (n<<2)|1|    |    *    | (n<<2)|1|
      |  +----|----+---------+       +----|----+---------+    +----|----+---------+
      |       |                           |
      |        ---------------------------
      |                    |
      |                    V
      |     +---------+---------+----------------------------------+
      |     |   mc    |    *    |   the allocated segment          |
      |     +---------+----|----+----------------------------------+
      |                    |
       --------------------


      *** (13.2) Moving an allocated segment within memory.

   This system has been designed for  moving segments into memory without interrupting the
   program more  than the time  needed for  moving just one  segment. However, there  is a
   problem.  Indeed,  if a  pointer is  moved into memory,  it always  points to  the same
   position. If a reference is moved, the  left hand pointer will always point to the data
   part of the segment, and the right hand pointer (if any) to the next reference, but the
   problem is that the right hand pointer  of the previous reference (or the second hidden
   word in  the segment) will no  more point to our  reference.  In other  words, moving a
   reference entails that the right hand  pointer in the previous reference is updated. Of
   course, this is  not very difficult to perform,  but again, it implies a  walk into the
   list of references.

   A more  delicate problem is  comming from the  fact that the  data part of  the segment
   itself may contain references.  If we move this segment by a  naïve copy, the chains of
   these references will be broken.

   A solution is that the compiler produces a ``move code'' for each sort of segment. When
   called, this move  code may properly move  the segment, because it knows  where are the
   references within  it. The field 'mc'  in the first hidden  word is the  address of the
   move code of the segment.

   Hence, in  order to move  an allocated segment  within memory, we  just have to  do the
   following (without being interrupted):

      - copy the segment at the new position using its move code,

      - follow the  chain of references, and in  each reference change the  pointer to the
        segment.

   Notice that the data part of the segment and the chain of its references cannot share a
   reference. Indeed, such a reference would  point to the segment containing it, which is
   impossible.

   Before moving the segment:

       --                --------                ---........---
      |  |              |        |              |              |
      |  V              |        V              |              V
      |  +---------+----|----+   +---------+----|----+         +---------+---------+
      |  |    *    |    *    |   |    *    |    *    |         |    *    | (n<<2)|1|
      |  +----|----+---------+   +----|----+---------+         +----|----+---------+
      |       |                       |                             |
      |        -----------------------------------------------------
      |                         |
      |                         V
      |     +---------+---------+----------------------------------+
      |     |   mc    |    *    |   the allocated segment          |
      |     +---------+----|----+----------------------------------+
      |                    |
       --------------------

   After the segment is moved:

       --                --------                ---........---
      |  |              |        |              |              |
      |  V              |        V              |              V
      |  +---------+----|----+   +---------+----|----+         +---------+---------+
      |  |    *    |    *    |   |    *    |    *    |         |    *    | (n<<2)|1|
      |  +----|----+---------+   +----|----+---------+         +----|----+---------+
      |       |                       |                             |
      |        -----------------------------------------------------
      |                                                   |
      |                                                   V
      |                               +---------+---------+----------------------------------+
      |                               |   mc    |    *    | the allocated segment moved      |
      |                               +---------+----|----+----------------------------------+
      |                                              |
       ----------------------------------------------

   Notice  that  only the  pointers  in  the  first part  of  all  references have  to  be
   changed. Of course, all this operation must be done without being interrupted. In other
   words, this process must complete before giving up (to the scheduler, probably).


      *** (13.3) Compatibility with segregated lists.

   It is known that  segregated lists are a very important feature  of allocators. When we
   replaced our  old system without segregated lists  by a new one  with segregated lists,
   the speed of Anubis has been multiplied by 3 or 4. The system of segregated lists works
   as follows.

   The  free segments  (those which  are not  allocated) are  gathered into  several lists
   instead  of just  one  list. There  is  one list  for  each small  lengths:  1 word,  2
   words,...,  20 words  (say).  These  lists are  called 'segregated  lists'  because the
   segments they contain are  all of the same length. Of course  there is another list for
   allocating  bigger segments.   This last  list is  not segregated  because  it contains
   segments of  different lengths.  It  should be noted  that most allocated  segments are
   small, hence the importance of segregated lists.

   The  allocator has  a  small  family of  pointers  (maybe 21  pointers)  for all  these
   lists. When an allocation is required, the  right list is used depending on the size of
   the segment  to be allocated.   If it is  small, the segment is  taken from one  of the
   segregated lists. Otherwise, it is taken from the non segregated list.

   Since all segments in  a segregated list have the same size, the  first one in the list
   is always ok. On  the contrary, in a non segregated list, we  must find a segment which
   is big enough, and we must carve a piece  from it. When a big segment is freed, it must
   be put into the non segregated list  and eventually glued to its neighbours in order to
   make  bigger free  segments.   It is  easy to  understand  that it  requires much  more
   work. Our experiments  have shown that finding  a segment in a non  segregated list may
   require  to  examine several  thousands  of  segments. With  the  new  system, the  non
   segregated list remains quite small (several  tens of segments). In the previous system
   the unique  non segregated list  had generally several  thousands, or even  hundreds of
   thousands of segments.

   The system  of 'Reference Chaining' is  obviously compatible with  segregated lists. We
   will assume that  all the segments (free and  allocated) are parts of a  list of bigger
   'main segments'. The  segregated lists and the non segregated  lists are mixed together
   into these main segments, and with allocated segments.

   When a segregated list is empty, and a segment is required from this list, a segment is
   allocated from the non segregated list and  put in the chosen segregated list. When the
   non segregated list is empty, a new big 'main segment' is asked to the host system, and
   the useful part of this segment become a new segment of the non segregated list.

   Main segments are also chained together into a list.


      *** (13.4) Defragmenting memory.

   Now, we arrive at the big question: How to defragment memory ?

   The question is to  move segments so that a main segment  becomes 'empty'.  When a main
   segment is empty, it may be returned to the host allocator. By 'empty', we mean that it
   contains no allocated segment, but it  may contain free segments.  Of course, the lists
   of free  segments must  be updated  before the main  segment is  freed, i.e.   the free
   segments which lie within  the main segment to be returned to  the host must be removed
   from the lists of free segments.

   The purpose  of defragmenting  is freeing main  segments. Moving an  allocated segments
   from  one main  segment to  another one  may  result in  a main  segment containing  no
   allocated segment at  all. In this case,  the free segments it contains  may be removed
   from their respective lists, and the main segment may be freed.

   When the allocator  allocates a new segment, is  it a heavy work to  determine the main
   segment to which it  belongs ? A priori, we must follow the  chain of main segment. For
   each main segment we easily get the  beginning and end address of the main segment, and
   we just  have to compare the  address of the  allocated segment to these  beginning and
   end.

   However, each free segment could also contain  a pointer to the main segment it belongs
   to. This is easily done when creating the free segments. That way, we can determine the
   main segment more quickly. However, this  makes a problem when the allocated segment is
   deallocated,  because within  an allocated  segment, there  is no  pointer to  the main
   segment containing it.  Hence, we could have  to follow the chain of main segments when
   deallocating an allocated segment.  This is probably not a good solution.

   But, we could also have two hidden  words in any allocated segment, one pointing to the
   chain of  references to  this segment, and  another one  pointing to the  relevant main
   segment.  In  this case,  it is easy  to count,  for each main  segment, the  number of
   allocated segment  it contains.  When  this number becomes  0, the main segment  may be
   freed (after the lists of free segments have been cleaned up).

       --                --------                ---........---
      |  |              |        |              |              |
      |  V              |        V              |              V
      |  +---------+----|----+   +---------+----|----+         +---------+---------+
      |  |    *    |    *    |   |    *    |    *    |         |    *    | (n<<2)|1|
      |  +----|----+---------+   +----|----+---------+         +----|----+---------+
      |       |                       |                             |
      |        -----------------------------------------------------
      |                              |
      |                              V
      |          +---------+---------+-----------------------------------+
      |          |    *    |    *    |    the allocated segment          |
      |          +----|----+----|----+-----------------------------------+
      |               |         |
       ---------------          V
                                main segment containing this one


   That a  main segment becomes  'empty' (i.e. contains  no allocated segment)  may happen
   even if we  never move a segment within  memory. Now we have to examine  a strategy for
   moving the allocated segments.

   The most  obvious strategy is to move  the allocated segments from  main segments which
   are far from the  beginning of the chain of main segments  into main segments which are
   near the  beginning of  this chain.   That way, main  segments which  are far  from the
   beginning have some chance to become empty. This is probably what we should do.


      *** (13.5) This allocator seen from the outside.

         *** (13.5.1) Hidden words.

   We   discuss   how   other   parts   of   the  system   should   see   and   use   this
   allocator/garbage-collector. First of all, what is visible from the outside ? The parts
   which are hidden are shown like this: //////// in the next picture:

         +---------+---------+   +---------+---------+         +---------+---------+
         |    *    |/////////|   |    *    |/////////|         |    *    |/////////|
         +----|----+---------+   +----|----+---------+         +----|----+---------+
              |                       |                             |
               -----------------------------------------------------
                                     |
                                     V
                 +---------+---------+-----------------------------------+
                 |/////////|/////////|    the allocated segment          |
                 +---------+---------+-----------------------------------+


   In other words:

     - Each reference has the size of two words (presumably 64 bits for the time being).

     - Only the first word  in each reference may be used. It is used  as a pointer to the
       allocated segment. The second word is manipulated by the allocator only.

     - The  pointer points  to  the data  area  of the  allocated  segment (no  'reference
       counting' word).

   Of course, the references may be aligned anywhere in memory, i.e. not necessarily on an
   address divisible  by 4. So  there is no  unused bit in the  second (hidden) word  of a
   references.   On the  contrary, the  allocated  segment may  be aligned  on an  address
   divisible by 4, or  preferably by 8. In this last case, there are  3 unused bits in the
   first  word (the  3  bits of  lowest  weight).  This  allows to  continue  to used  our
   implementation of  'mixed' types, but with  up to 8  alternatives intead of only  4. Of
   course, in a clever implementation all these constants should be parameters.


         *** (13.5.2) Tools (interface to the allocator).

   Several tools should be provided by the allocator to the rest of the system:

     - The  'allocate' function  receives the  number of  words to  be allocated,  it also
   receives  the address at  which the  first reference  should be  created. It  returns a
   boolean saying that the segment has been allocated or not. For example, in C:

       Bool   allocate((U32 *)where, U32 number_of_words);

   If it returns  'TRUE', this function fills up  the two words at address  'where' with a
   pointer to the data part of the allocated segment and with the word (n<<2)|1.

     - The 'move_reference' macro or function moves  the whole reference from one place to
   another one:

       void move_reference((U32 *)source, (U32 *)destination);

   Of course the source should point to a valid reference, and the destination points to 2
   words which will  be overwritten.  This function must update  the reference chaining in
   either the allocated segment or in the previous reference of the chain.

     - The  'duplicate_reference' macro  or function  creates a  duplicate of  an existing
   reference:

       void duplicate_reference((U32 *)source, (U32 *)destination);

   It chains the new references in front of the list of references.

     - The 'delete_reference' macro  or function.  This function must  update the chain of
   references and eventually deallocate the segment:

       void delete_reference((U32 *)reference, U32 del_code);

   This function receives the address of a  piece of deletion code.  This deletion code is
   specific to the type of data to be deleted. It is used when the reference to be deleted
   is the last one. Of course, this is  a recursive process as usual. Actually, the set of
   all these pieces of code are the garbage-collector code proper.


      *** (13.6) Permanent data.

   We want to  be able to handle permanent  data, i.e. data created at  compile, which are
   never collected.

   The second  least significant bit in  the (hidden) pointer  to the main segment  in any
   data  segment may  be  used  for detecting  permanent  data: 0  =  non  permanent, 1  =
   permanent.  Of  course,  if the  segment  is  permanent,  is  has actually  never  been
   allocated, and  lies within some  area outside the  allocator's main segments.  In this
   case, the pointer  to the main segment will  never be used (it is  meaningless) and its
   value is 0. Hence,  that word is 1.  The second word of each  reference in this case is
   2, indicating  that the reference points to  a permanent segment (this  is redundant of
   course, but avoid one  indirection and we have plenty of room  for that). References to
   permanent segments are not chained together.  Of course the hidden pointer to the chain
   of references  in the  permanent segment  doesn't exist. However,  the second  words in
   references must exist:

         +---------+---------+   +---------+---------+         +---------+---------+
         |    *    |        2|   |    *    |        2|         |    *    |        2|
         +----|----+---------+   +----|----+---------+         +----|----+---------+
              |                       |                             |
               -----------------------------------------------------
                                     |
                                     V
                           +---------+-----------------------------------+
                           |        1|      a permanent segment          |
                           +---------+-----------------------------------+


      *** (13.7) Fake references.

   We also want to be able to  handle fake references. It is sometimes useful to construct
   fake references (as  we already did for serialising).  It is  understood that such fake
   references are never manipulated by regular programs but only by the garbage-collector.

   The value  0 for a  pointer (first word  of a reference) may  be interpreted as  a fake
   reference.   In a  fake reference,  the second  word is  not significant  (but  must be
   present):

         +---------+---------+
         |        0| not used|       a fake reference
         +---------+---------+

   Notice  that  a fake  reference  may  be indistinguishable  from  a  datum  in a  small
   alternative in  a mixed type.  But this has  no importance since  the garbage-collector
   will handle both in the same way (i.e. just ignore them).


      *** (13.8) Only one main segment ?

   It could be  possible to design the system  with only one main segment.   In this case,
   instead of adding new main segments, we just ask the host system to enlarge or decrease
   the unique  main segment.  Of course, in  this case,  allocated segments must  be moved
   towards the beginning of the main segment.  Here, the question will be to determine the
   offset on the right of which nothing  is allocated. This is easily done by maintaning a
   pointer to this offset.


   *** (14) Static garbage-collection.

   Should we  use the above reference  chaining garbage-collector or  the more traditional
   reference counting  garbage-collector, it is in  all cases interesting  to minimize the
   number of virtual copies and virtual deletions by an optimisation of the code.  This is
   called  ``static  garbage-collection''.   This  may improve  performances  drastically,
   because this will eliminate most unneeded copy/delete pairs.


      *** (14.1) The facts.

   Consider the function 'length' defined for lists by:

   public define Int32
      length
        (
          List($T) l
        ) =
      if l is
        {
          [ ] then 0,
          [h . t] then 1 + length(t)
        }.

   In the  current implementation  of Anubis (version  1.8), this function  increments and
   decrements 3 times the counters of all the nodes of the list.  All this is wasted time.
   Static garbage-collection is able to eliminate 2 of these 3 increment/decrement.

   There is no hope to eliminate the  third one. Indeed, imagine for example that the list
   given  to the  function 'length'  as its  argument is  not shared,  i.e.  that  all the
   counters in the nodes of the list are 1. Then, the list must have disappeared after the
   function 'length'  returns. Hence, it  must be able  to decrement counters.   What will
   happen in this case is that the counter of the tail is first incremented when a virtual
   copy of the  tail is taken from the  first node. Then the first node  is decremented to
   zero, so that the second node is decremented  to 1, etc...  Now, if the list is shared,
   for example if the  counter of the first node is 2, and the  counters of the others are
   1,  the counter  of the  tail is  similarly  incremented, but  when the  first node  is
   abandoned by the function 'length', its counter is just decremented to 1, etc...

   Now, we  examine the code  for the  function 'length'.  Here  is the code  produced for
   'length' by Anubis 1.8, instantiated for lists of strings.

       9 |  label 23
         |       ;--- stack ------------------------
         |       ;   0     l                      List(String)
         |       ;   1                            (return address)
         |       ;----------------------------------
       9 |    check_stack 7
      14 |    peek "l" 0
      19 |    copy_mixed 2
      21 |    index_direct 2
      23 |    _switch 24 25
      33 |  label 25
      33 |    unstore_copy_mixed 8 2
      36 |    del_mixed 2 15
         |       ;--- stack ------------------------
         |       ;   0     t                      List(String)
         |       ;   1     l                      List(String)
         |       ;   2                            (return address)
         |       ;----------------------------------
      42 |    push_addr 27
      47 |    peek "t" 1
      52 |    copy_mixed 2
      54 |    push
         |       ;--- stack ------------------------
         |       ;   0                            List(String)
         |       ;   1                            (return address)
         |       ;   2     t                      List(String)
         |       ;   3     l                      List(String)
         |       ;   4                            (return address)
         |       ;----------------------------------
      55 |    address 23          ('length' in '/home/alp/essai.anubis' at line 3)
      60 |    apply 1
      62 |    ret_point 27
      62 |    push
      63 |    load_word32 1
      68 |    push
         |       ;--- stack ------------------------
         |       ;   0                            Int32
         |       ;   1                            Int32
         |       ;   2     t                      List(String)
         |       ;   3     l                      List(String)
         |       ;   4                            (return address)
         |       ;----------------------------------
      69 |    int32_plus
      72 |    collapse 0
      77 |    collapse 0
      82 |    del_stack_mixed 0 2 15
      92 |    del_stack_mixed 0 2 15
     102 |    ret 1
     103 |  label 24
         |       ;--- stack ------------------------
         |       ;   0     l                      List(String)
         |       ;   1                            (return address)
         |       ;----------------------------------
     103 |    load_word32 0
     108 |    del_stack_mixed 0 2 15
     118 |    ret 1

   The argument  'l' is first copied at  offset 19. In the  case of an non  empty list, we
   jump at  label 25. The  tail 't' of  the list 'l'  is copied at  offset 33, and  'l' is
   virtually deleted at offset 36. 't' is copied again at offset 52, because it is present
   two times  in the stack  (at offset 55 in  slots 0 and  2). At address 60  the function
   calls itself, so that 't', is again copied at offset 19 during the call. Hence, 't' has
   been virtually  copied 3 times. Of  course, it is  also virtually deleted 3  times (one
   time at  offset 82, and two  times at offsets 36  and 92 (or 108)  during the recursive
   call).  This is obviously too much and one of the main causes of the slowness of Anubis
   1.


      *** (14.2) Moving 'copy' and 'delete' instructions.

   The first  principle of  static garbage-collection is  that a sequence  of instructions
   like

     copy
     del

   may  be  removed, provided  they  act  on the  same  counter.  Indeed,  the counter  is
   incremented, and decremented immediately. So, this has no effect.

   Consequently, the  question is how to  move the copy instructions  downwards within the
   code, and the delete instructions upwards  within the code, so that they disappear when
   they meet.   Of course, a  delete instruction may  move up or  down to a  certain point
   where  if is  blocked. In  this case,  if the  corresponding copy  instruction  is also
   blocked,  they will  not  meet.  In  other  words, static  garbage-collection will  not
   eliminate all copy/delete pairs. Nevertheless, it eliminates most of them, providing an
   important enhancement in  speed. In the case of the 'length'  function, it eliminates 2
   of the 3 pairs, as we shall see.

   The code  for a function  is not linear.  It looks much more  like a tree,  where nodes
   correspond  to  conditionals. If  the  conditional  has 'n'  cases,  the  node has  'n'
   branches. For example, the code for  'length' above, corresponds to the following tree,
   where the numbers are the offsets of the instructions:

                                              9 (beginning of function code)
                                              |
                                              |
                                              23 (_switch)
                                       ---------------
                                      33              103
                                      |               |
           branch for non empty lists |               | branch for the empty list
                                      |               |
                                      |               |
                                      102 (ret)       118 (ret)

   The reason  why the branches do  not join together  is that optimisation using  the end
   code  (in  particular, elimination  of  terminal  recursion) is  not  the  same in  all
   branches. This is  why the end code is duplicated and  each branch optimized separately
   with it.

   Now, we consider the same tree, but with the copy/delete instructions:

                                      9
                                      |
                                      19 (copy 'l' in R)
                                      |
                                      23
                      ----------------------------------
                     33 (copy 't' during unstore)      103
                     36 (delete 'l' in R)              |
                     |                                 |
                     52 (copy 't' in R)
                     |                                 |
                     82 (delete 't' in stack)
                     92 (delete 'l' in stack)          108 (delete 'l' in stack)
                     |                                 |
                     102                               118

   We  did  not represent  'collapse'  instruction used  for  deleting  Int32s, but  which
   actually never call the garbage-collector nor increment/decrement anything, since Int32
   is a direct type.

   To simplify our explanations, we redraw the above tree like this:

                                      |
                                      clR
                                      |
                              ----------------
                              ctU             |
                              dlR             |
                              |               |
                              ctR             |
                              |               |
                              dtS             |
                              dlS             dlS
                              |               |

   It may seem strange that l is deleted 2 times in the left hand branch and 1 time in the
   right hand branch.  The first copy instruction  clR copies 'l' from the stack to R. The
   two instructions dlR and dlS in the left  hand branch delete the two copies of 'l' in R
   and in the stack. On the contrary, in the right hand branch only the copy of 'l' in the
   stack is deleted.  This  is due to the fact that in the  right hand branch the compiler
   knows that  'l' in R is  in the first  alternative of 'List(String)' (empty  list).  He
   also knows that this first alternative is small, hence direct.  This is why it does not
   generate an instruction like dlR at the beginning of the right hand branch. Notice that
   the instruction  clR tests the alternative  of 'l' and  uses a mask for  determining if
   there is a counter or not.


      *** (14.3) Moving 'copy' instructions downwards.

   We examine how to move copy instructions downwards. In our example, the clR instruction
   is concerned. We have the following:

      19 |    copy_mixed 2
      21 |    index_direct 2
      23 |    _switch 24 25

   Is  it  true  that  'copy_mixed'  commutes   with  'index_direct'  ?   We  may  have  a
   doubt. Indeed, imagine that we transform the code into:

      21 |    index_direct 2
      19 |    copy_mixed 2
      23 |    _switch 24 25

   (where, of  course, offset become meaningless, but  we keep them because  they allow to
   identify where  instruction are comming  from), and that  the virtual machine  gives up
   precisely after  'index_direct'. Another  virtual machine may  have a referecne  to our
   'l', and  virtually delete this  reference, so that  the counter of 'l'  is decremented
   before   having  been   incremented  by   our  current   virtual   machine.   Actually,
   incrementations and  decrementations may  be permuted provided  that the  counter never
   becomes zero.  If  a virtual machine executes a 'copy' instruction,  this is because it
   already  has a  reference to  the  datum. Hence,  other machines  cannot decrement  the
   counter to zero, because this would mean that this first reference of our machine would
   have been uncounted by another machine,  which is impossible. Summarizing, if a machine
   has access  to a datum,  even just through  one path of  pointers, the counter  of this
   datum cannot be decremented to zero by other machines.

   So, this is ok. Notice that  'copy' instructions commute with most instructions, but of
   course not all instructions.

   Now, our 'copy_mixed' is just before the _switch:

      21 |    index_direct 2
      19 |    copy_mixed 2
      23 |    _switch 24 25
      33 |  label 25
      33 |    unstore_copy_mixed 8 2
      36 |    del_mixed 2 15
     ........
     103 |  label 24
     103 |    load_word32 0
     108 |    del_stack_mixed 0 2 15
     ........

   Any  instruction  commutes  with  the   _switch  provided  it  is  duplicating  in  all
   branches. Hence, we will have this:

      21 |    index_direct 2
      23 |    _switch 24 25
      33 |  label 25
      19 |    copy_mixed 2
      33 |    unstore_copy_mixed 8 2
      36 |    del_mixed 2 15
     ........
     103 |  label 24
      19 |    copy_mixed 2
     103 |    load_word32 0
     108 |    del_stack_mixed 0 2 15
     ........

   In other words,  should we take one branch or  the other one, we still  have to count a
   virtual copy for 'l'.

   Now,   'copy_mixed'   commutes   with   'unstore_copy_mixed'  for   the   reason   that
   incrementations of any  two counters commute, even if they  increment the same counter.
   Hence, in the left hand branch, we have:

      33 |  label 25
      33 |    unstore_copy_mixed 8 2
      19 |    copy_mixed 2
      36 |    del_mixed 2 15
     ........

   Now, we  are at the point  where 'copy_mixed' meets 'del_mixed',  and both instructions
   may disappear:

      33 |  label 25
      33 |    unstore_copy_mixed 8 2
     ........

   In the right hand branch we have:

     103 |  label 24
      19 |    copy_mixed 2
     103 |    load_word32 0
     108 |    del_stack_mixed 0 2 15
     118 |    ret 1

   which may clearly be transformed into:

     103 |  label 24
     103 |    load_word32 0
      19 |    copy_mixed 2
     108 |    del_stack_mixed 0 2 15
     118 |    ret 1

   However,  'copy_mixed' cannot  be put  after  'del_stack_mixed', and  more generally  a
   'copy'  instruction  cannot pass  behind  a 'del'  instruction,  even  if they  concern
   distinct  counters (and  data  of distinct  types).   Indeed, imagine  that the  delete
   instruction concerns a  segment containing a reference to the  segment concerned by the
   copy instruction:

       +------+-------------------------+-----+
       |  1   |  segment to be deleted  |  *  |
       +------+-------------------------+--|--+
                                           |
                                           V
                                           +------+-------------------------+
                                           |  1   |  segment to be copied   |
                                           +------+-------------------------+

   If the second segment  is first copied, its counter becomes 2,  and is decremented to 1
   by the  real deletion of the  first segment. On the  contrary, if the  first segment is
   deleted first, the two segments will return  to the allocator, and our reference to the
   second segment will be dandling.

   At that point, 'copy_mixed' seems to  be meaningless, because there is no corresponding
   delete instruction  and no function call.  However, this is just  because the compiler,
   did not  generate the  dlR instruction  in the right  hand branch,  as we  have already
   noted.  In order  to perform  these optimisations,  the compiler  should  generate this
   instructions. In that case, the code would have been:

     103 |  label 24
      19 |    copy_mixed 2
         |    del_mixed 2 15
     103 |    load_word32 0
     108 |    del_stack_mixed 0 2 15
     118 |    ret 1

   (even if 'del_mixed  2 15' has no effect,  since at this point 'l' is  the empty list),
   and the instructions 'copy_mixed' and 'del_mixed' disappear.

   After this 'moving downwards' of our  clR instruction, our whole code becomes (where we
   have removed the stack representations which may have become erroneous):

       9 |  label 23
       9 |    check_stack 7
      14 |    peek "l" 0
      21 |    index_direct 2
      23 |    _switch 24 25
      33 |  label 25
      33 |    unstore_copy_mixed 8 2
      42 |    push_addr 27
      47 |    peek "t" 1
      52 |    copy_mixed 2
      54 |    push
      55 |    address 23          ('length' in '/home/alp/essai.anubis' at line 3)
      60 |    apply 1
      62 |    ret_point 27
      62 |    push
      63 |    load_word32 1
      68 |    push
      69 |    int32_plus
      72 |    collapse 0
      77 |    collapse 0
      82 |    del_stack_mixed 0 2 15
      92 |    del_stack_mixed 0 2 15
     102 |    ret 1
     103 |  label 24
     103 |    load_word32 0
     108 |    del_stack_mixed 0 2 15
     118 |    ret 1

   and each node of the list is incremented/decremented only 2 times. Now, we will see how
   to eliminate another pair of copy/delete instructions.

   First of  all, we consider the  'copy_mixed' at offset 52.  It is followed  by a 'push'
   instruction:

      52 |    copy_mixed 2
      54 |    push

   The effect  of 'push' is  to move the content  of R onto  the stack, so that  R becomes
   empty.  Hence,  the counter  to be  incremented is not  reached the  same way,  and the
   permutation of the two instructions gives:

      54 |    push
      52 |    copy_stack_mixed 0 2

   where '0' means slot  0 in the stack.  Recall that the  operand '2' in this instruction
   is a  bit mask (binary  10 in  this example) indicating  that the first  alternative of
   'List(String)' is direct (no pointer) and  the second one indirect (through a pointer).
   At that point, we have this:

      54 |    push
      52 |    copy_stack_mixed 0 2
      55 |    address 23          ('length' in '/home/alp/essai.anubis' at line 3)
      60 |    apply 1
      62 |    ret_point 27

   The instruction  'copy_stack_mixed' commutes with 'address', since  'address' only puts
   an address into R. Notice that 'copy_mixed' would not have commuted with 'address', but
   this case will never happen since it would mean that 'address' is used with a non empty
   R, which is impossible. Hence, we have:

      54 |    push
      55 |    address 23          ('length' in '/home/alp/essai.anubis' at line 3)
      52 |    copy_stack_mixed 0 2
      60 |    apply 1
      62 |    ret_point 27

   Now, our 'copy_stack_mixed 0 2' copies a datum with is tranmitted as an argument to the
   function  called  by   'apply'.   The  called  function  will   virtually  delete  this
   datum. Hence, the corresponding delete instruction is  not in our code, but in the code
   of the called function (even if it is the same function). Hence, we are block here.

   Now, we  consider the  case of the  instruction ctU,  i.e. 'unstore_copy_mixed 8  2' at
   offset 33,  whose corresponding delete  is 'del_stack_mixed 0  2 15' at offset  82.  We
   have the following between these two instructions (after our previous transformations):

      33 |    unstore_copy_mixed 8 2
      42 |    push_addr 27
      47 |    peek "t" 1
      54 |    push
      55 |    address 23          ('length' in '/home/alp/essai.anubis' at line 3)
      52 |    copy_stack_mixed 0 2
      60 |    apply 1
      62 |    ret_point 27
      62 |    push
      63 |    load_word32 1
      68 |    push
      69 |    int32_plus
      72 |    collapse 0
      77 |    collapse 0
      82 |    del_stack_mixed 0 2 15

   First of all, 'unstore_copy_mixed 8 2' should be replaced by:

      33 |    unstore 8 4
      33 |    copy_stack_mixed 0 2

   which is equivalent. So, we have this:

      33 |    copy_stack_mixed 0 2
      42 |    push_addr 27
      47 |    peek "t" 1
      54 |    push
      .........

   'copy_stack_mixed' commutes with 'push_addr' like this:

      42 |    push_addr 27
      33 |    copy_stack_mixed 1 2
      47 |    peek "t" 1
      54 |    push
      .........

   This is because the  datum is now in the slot 1 of the  stack.  It commutes with 'peek'
   without modification:

      42 |    push_addr 27
      47 |    peek "t" 1
      33 |    copy_stack_mixed 1 2
      54 |    push
      .........

   Again the  commutation with 'push' modifies  the instruction, and  the commutation with
   'address' makes no problem:

      .........
      47 |    peek "t" 1
      52 |    copy_mixed 2
      54 |    push
      55 |    address 23          ('length' in '/home/alp/essai.anubis' at line 3)
      33 |    copy_stack_mixed 2 2
      52 |    copy_stack_mixed 0 2
      60 |    apply 1
      .........

   The commutation between the two 'copy_stack_mixed' is also possible:

      55 |    address 23          ('length' in '/home/alp/essai.anubis' at line 3)
      52 |    copy_stack_mixed 0 2
      33 |    copy_stack_mixed 2 2
      60 |    apply 1

   Now, we are just before 'apply 1'.  Here, there is no problem for commuting because the
   slot 2 in the stack is below the  return address at slot 1. The position of this return
   address is also the  operand of 'apply', so that the decision  for permuting is easy to
   take. However,  since the  called function  deletes its arguments  and pops  its return
   address from the stack, the instruction depth must be modified (it becomes 0 here):

      .........
      55 |    address 23          ('length' in '/home/alp/essai.anubis' at line 3)
      52 |    copy_stack_mixed 0 2
      60 |    apply 1
      62 |    ret_point 27
      33 |    copy_stack_mixed 0 2
      62 |    push
      63 |    load_word32 1
      68 |    push
      69 |    int32_plus
      .........

   Of course, we pass 'apply' and 'ret_point'  together, because the return of the call is
   just at  the 'ret_point'.  The subsequent  'push', 'load_word32' and  'push' may commute
   like this:

      .........
      55 |    address 23          ('length' in '/home/alp/essai.anubis' at line 3)
      52 |    copy_stack_mixed 0 2
      60 |    apply 1
      62 |    ret_point 27
      62 |    push
      63 |    load_word32 1
      68 |    push
      33 |    copy_stack_mixed 2 2
      69 |    int32_plus
      .........

   The instruction 'int32_plus' (a syscall) does not modify the stack. This is why its two
   arguments are poped  from the stack by the next two  'collapse' instructions, which are
   also easily permuted with our 'copy_stack_mixed'.  Despite the fact that it is a delete
   instruction,  'collapse' commutes  with a  copy instruction  since it  is just  a stack
   adjustment (no counter is decremented by 'collapse'). Now, we have:

      .........
      55 |    address 23          ('length' in '/home/alp/essai.anubis' at line 3)
      52 |    copy_stack_mixed 0 2
      60 |    apply 1
      62 |    ret_point 27
      62 |    push
      63 |    load_word32 1
      68 |    push
      69 |    int32_plus
      72 |    collapse 0
      77 |    collapse 0
      33 |    copy_stack_mixed 0 2
      82 |    del_stack_mixed 0 2 15
      .........

   At that point, our copy instruction  has met its corresponding delete instruction. That
   it is actually the corresponding instruction is  visible by the fact that they have the
   same  depth operand  (0  in this  case).  Hence they  disappear  together, however  not
   exactly, because they produce a 'collapse 0' for stack synchronization.

   Finally, our code for the function 'length' is optimized as follows:

       9 |  label 23
       9 |    check_stack 7
      14 |    peek "l" 0
      21 |    index_direct 2
      23 |    _switch 24 25
      33 |  label 25
      33 |    unstore 8 4
      42 |    push_addr 27
      47 |    peek "t" 1
      54 |    push
      55 |    address 23          ('length' in '/home/alp/essai.anubis' at line 3)
      52 |    copy_stack_mixed 0 2
      60 |    apply 1
      62 |    ret_point 27
      62 |    push
      63 |    load_word32 1
      68 |    push
      69 |    int32_plus
      72 |    collapse 0
      77 |    collapse 0
         |    collapse 0
      92 |    del_stack_mixed 0 2 15
     102 |    ret 1
     103 |  label 24
     103 |    load_word32 0
     108 |    del_stack_mixed 0 2 15
     118 |    ret 1

   and we  have still  one increment/decrement operation  for each  node of our  list. The
   increment  is  performed by  'copy_stack_mixed  0 2'  at  (pseudo-)offset  52, and  the
   corresponding decrement is  performed by the called function either at  offset 92 or at
   offset 108.


      *** (14.4) Moving 'delete' instructions upwards.

   If  a copy instruction  cannot move  downwards, we  may try  to move  the corresponding
   'delete' instruction upwards.   In the case of our example,  already optimized as shown
   above, we know  that we cannot eliminate the last pair  of copy/delete instructions. If
   we try  to move the  'del_stack_mixed' at  offset 92 upwards,  it will be  blocked just
   below  the 'copy_stack_mixed'  at (pseudo-)offset  52, as  already noticed.  Hence, the
   other 'del_stack_mixed',  at offset 108, will not  be able to pass  the switch, because
   the same instruction is not comming from the other branch.

   However, what would happen in  another situation ?  Probably, moving delete instruction
   upwards has  no interest, because if  a delete instruction  can move upwards to  meet a
   copy instruction,  the copy instruction may probably  as well go downwards  to meet the
   delete  instruction.   I did  not  check  that formally,  because  there  are too  many
   instructions in the Anubis 1 virtual machine.


      *** (14.5) More peephole optimizations.

   What  we  have  done  in  the   above  subsections  is  a  special  case  of  'peephole
   optimization'. We may examine other possible similar optimizations.


   *** (15) Voluntary multitasking.

   In Anubis  1.8, multitasking is realized  within one thread of  the host(UNIX, Windows,
   ...). In  other words,  Anubis has its  own multitasking  system. This system  works as
   follows.  The scheduler  starts a virtual machine with a  ``credit'' of instructions to
   execute  (for example 500).  At each  instruction executed,  the system  decrements the
   credit. When the credit becomes 0, the system returns to the scheduler.

   However, a  virtual machine  may give  up (return to  the scheduler)  voluntarily. Some
   instructions  are doing  that.   Of course,  the  instruction 'give_up',  but also  the
   instruction which tries  to read a byte from  a connection. It it cannot  read the byte
   immediately, it gives up (i.e. it waits).

   This is not very good because of the overhead created by the decrements and test of the
   credit.  A better  solution would be to have an  instruction 'may_give_up', for example
   at the  beginning of all functions,  in any case at  places where it is  sure that such
   instructions  are executed  often. This  instruction may  either give  up  directly, or
   decrement a credit, and  give up only if the credit is 0.  I will call this ``voluntary
   multitasking''.

   Since function codes contain no loop (as  we saw already they are much like trees, with
   the entrance at the root and an exit  at each leaf), the time needed for going from the
   root to  a leaf is always very  small. Hence, putting the  instruction 'may_give_up' at
   the beginning of functions only, is enough. This would also have the advantage, that we
   could rely on the assumption that the virtual machine will never be interrupted between
   two function calls. This may be useful (maybe).

   We  may still  enhance this  solution, especially  if we  generate native  code  of the
   physical machine. We may use a hardware interruption (a timer). This interruption would
   set a flag  meaning ``it's time to give up''. The  'may_give_up' instruction would just
   test this flag, without having to decrement a credit counter.