description.text 140 KB
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 1344 1345 1346 1347 1348 1349 1350 1351 1352 1353 1354 1355 1356 1357 1358 1359 1360 1361 1362 1363 1364 1365 1366 1367 1368 1369 1370 1371 1372 1373 1374 1375 1376 1377 1378 1379 1380 1381 1382 1383 1384 1385 1386 1387 1388 1389 1390 1391 1392 1393 1394 1395 1396 1397 1398 1399 1400 1401 1402 1403 1404 1405 1406 1407 1408 1409 1410 1411 1412 1413 1414 1415 1416 1417 1418 1419 1420 1421 1422 1423 1424 1425 1426 1427 1428 1429 1430 1431 1432 1433 1434 1435 1436 1437 1438 1439 1440 1441 1442 1443 1444 1445 1446 1447 1448 1449 1450 1451 1452 1453 1454 1455 1456 1457 1458 1459 1460 1461 1462 1463 1464 1465 1466 1467 1468 1469 1470 1471 1472 1473 1474 1475 1476 1477 1478 1479 1480 1481 1482 1483 1484 1485 1486 1487 1488 1489 1490 1491 1492 1493 1494 1495 1496 1497 1498 1499 1500 1501 1502 1503 1504 1505 1506 1507 1508 1509 1510 1511 1512 1513 1514 1515 1516 1517 1518 1519 1520 1521 1522 1523 1524 1525 1526 1527 1528 1529 1530 1531 1532 1533 1534 1535 1536 1537 1538 1539 1540 1541 1542 1543 1544 1545 1546 1547 1548 1549 1550 1551 1552 1553 1554 1555 1556 1557 1558 1559 1560 1561 1562 1563 1564 1565 1566 1567 1568 1569 1570 1571 1572 1573 1574 1575 1576 1577 1578 1579 1580 1581 1582 1583 1584 1585 1586 1587 1588 1589 1590 1591 1592 1593 1594 1595 1596 1597 1598 1599 1600 1601 1602 1603 1604 1605 1606 1607 1608 1609 1610 1611 1612 1613 1614 1615 1616 1617 1618 1619 1620 1621 1622 1623 1624 1625 1626 1627 1628 1629 1630 1631 1632 1633 1634 1635 1636 1637 1638 1639 1640 1641 1642 1643 1644 1645 1646 1647 1648 1649 1650 1651 1652 1653 1654 1655 1656 1657 1658 1659 1660 1661 1662 1663 1664 1665 1666 1667 1668 1669 1670 1671 1672 1673 1674 1675 1676 1677 1678 1679 1680 1681 1682 1683 1684 1685 1686 1687 1688 1689 1690 1691 1692 1693 1694 1695 1696 1697 1698 1699 1700 1701 1702 1703 1704 1705 1706 1707 1708 1709 1710 1711 1712 1713 1714 1715 1716 1717 1718 1719 1720 1721 1722 1723 1724 1725 1726 1727 1728 1729 1730 1731 1732 1733 1734 1735 1736 1737 1738 1739 1740 1741 1742 1743 1744 1745 1746 1747 1748 1749 1750 1751 1752 1753 1754 1755 1756 1757 1758 1759 1760 1761 1762 1763 1764 1765 1766 1767 1768 1769 1770 1771 1772 1773 1774 1775 1776 1777 1778 1779 1780 1781 1782 1783 1784 1785 1786 1787 1788 1789 1790 1791 1792 1793 1794 1795 1796 1797 1798 1799 1800 1801 1802 1803 1804 1805 1806 1807 1808 1809 1810 1811 1812 1813 1814 1815 1816 1817 1818 1819 1820 1821 1822 1823 1824 1825 1826 1827 1828 1829 1830 1831 1832 1833 1834 1835 1836 1837 1838 1839 1840 1841 1842 1843 1844 1845 1846 1847 1848 1849 1850 1851 1852 1853 1854 1855 1856 1857 1858 1859 1860 1861 1862 1863 1864 1865 1866 1867 1868 1869 1870 1871 1872 1873 1874 1875 1876 1877 1878 1879 1880 1881 1882 1883 1884 1885 1886 1887 1888 1889 1890 1891 1892 1893 1894 1895 1896 1897 1898 1899 1900 1901 1902 1903 1904 1905 1906 1907 1908 1909 1910 1911 1912 1913 1914 1915 1916 1917 1918 1919 1920 1921 1922 1923 1924 1925 1926 1927 1928 1929 1930 1931 1932 1933 1934 1935 1936 1937 1938 1939 1940 1941 1942 1943 1944 1945 1946 1947 1948 1949 1950 1951 1952 1953 1954 1955 1956 1957 1958 1959 1960 1961 1962 1963 1964 1965 1966 1967 1968 1969 1970 1971 1972 1973 1974 1975 1976 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 2025 2026 2027 2028 2029 2030 2031 2032 2033 2034 2035 2036 2037 2038 2039 2040 2041 2042 2043 2044 2045 2046 2047 2048 2049 2050 2051 2052 2053 2054 2055 2056 2057 2058 2059 2060 2061 2062 2063 2064 2065 2066 2067 2068 2069 2070 2071 2072 2073 2074 2075 2076 2077 2078 2079 2080 2081 2082 2083 2084 2085 2086 2087 2088 2089 2090 2091 2092 2093 2094 2095 2096 2097 2098 2099 2100 2101 2102 2103 2104 2105 2106 2107 2108 2109 2110 2111 2112 2113 2114 2115 2116 2117 2118 2119 2120 2121 2122 2123 2124 2125 2126 2127 2128 2129 2130 2131 2132 2133 2134 2135 2136 2137 2138 2139 2140 2141 2142 2143 2144 2145 2146 2147 2148 2149 2150 2151 2152 2153 2154 2155 2156 2157 2158 2159 2160 2161 2162 2163 2164 2165 2166 2167 2168 2169 2170 2171 2172 2173 2174 2175 2176 2177 2178 2179 2180 2181 2182 2183 2184 2185 2186 2187 2188 2189 2190 2191 2192 2193 2194 2195 2196 2197 2198 2199 2200 2201 2202 2203 2204 2205 2206 2207 2208 2209 2210 2211 2212 2213 2214 2215 2216 2217 2218 2219 2220 2221 2222 2223 2224 2225 2226 2227 2228 2229 2230 2231 2232 2233 2234 2235 2236 2237 2238 2239 2240 2241 2242 2243 2244 2245 2246 2247 2248 2249 2250 2251 2252 2253 2254 2255 2256 2257 2258 2259 2260 2261 2262 2263 2264 2265 2266 2267 2268 2269 2270 2271 2272 2273 2274 2275 2276 2277 2278 2279 2280 2281 2282 2283 2284 2285 2286 2287 2288 2289 2290 2291 2292 2293 2294 2295 2296 2297 2298 2299 2300 2301 2302 2303 2304 2305 2306 2307 2308 2309 2310 2311 2312 2313 2314 2315 2316 2317 2318 2319 2320 2321 2322 2323 2324 2325 2326 2327 2328 2329 2330 2331 2332 2333 2334 2335 2336 2337 2338 2339 2340 2341 2342 2343 2344 2345 2346 2347 2348 2349 2350 2351 2352 2353 2354 2355 2356 2357 2358 2359 2360 2361 2362 2363 2364 2365 2366 2367 2368 2369 2370 2371 2372 2373 2374 2375 2376 2377 2378 2379 2380 2381 2382 2383 2384 2385 2386 2387 2388 2389 2390 2391 2392 2393 2394 2395 2396 2397 2398 2399 2400 2401 2402 2403 2404 2405 2406 2407 2408 2409 2410 2411 2412 2413 2414 2415 2416 2417 2418 2419 2420 2421 2422 2423 2424 2425 2426 2427 2428 2429 2430 2431 2432 2433 2434 2435 2436 2437 2438 2439 2440 2441 2442 2443 2444 2445 2446 2447 2448 2449 2450 2451 2452 2453 2454 2455 2456 2457 2458 2459 2460 2461 2462 2463 2464 2465 2466 2467 2468 2469 2470 2471 2472 2473 2474 2475 2476 2477 2478 2479 2480 2481 2482 2483 2484 2485 2486 2487 2488 2489 2490 2491 2492 2493 2494 2495 2496 2497 2498 2499 2500 2501 2502 2503 2504 2505 2506 2507 2508 2509 2510 2511 2512 2513 2514 2515 2516 2517 2518 2519 2520 2521 2522 2523 2524 2525 2526 2527 2528 2529 2530 2531 2532 2533 2534 2535 2536 2537 2538 2539 2540 2541 2542 2543 2544 2545 2546 2547 2548 2549 2550 2551 2552 2553 2554 2555 2556 2557 2558 2559 2560 2561 2562 2563 2564 2565 2566 2567 2568 2569 2570 2571 2572 2573 2574 2575 2576 2577 2578 2579 2580 2581 2582 2583 2584 2585 2586 2587 2588 2589 2590 2591 2592 2593 2594 2595 2596 2597 2598 2599 2600 2601 2602 2603 2604 2605 2606 2607 2608 2609 2610 2611 2612 2613 2614 2615 2616 2617 2618 2619 2620 2621 2622 2623 2624 2625 2626 2627 2628 2629 2630 2631 2632 2633 2634 2635 2636 2637 2638 2639 2640 2641 2642 2643 2644 2645 2646 2647 2648 2649 2650 2651 2652 2653 2654 2655 2656 2657 2658 2659 2660 2661 2662 2663 2664 2665 2666 2667 2668 2669 2670 2671 2672 2673 2674 2675 2676 2677 2678 2679 2680 2681 2682 2683 2684 2685 2686 2687 2688 2689 2690 2691 2692 2693 2694 2695 2696 2697 2698 2699 2700 2701 2702 2703 2704 2705 2706 2707 2708 2709 2710 2711 2712 2713 2714 2715 2716 2717 2718 2719 2720 2721 2722 2723 2724 2725 2726 2727 2728 2729 2730 2731 2732 2733 2734 2735 2736 2737 2738 2739 2740 2741 2742 2743 2744 2745 2746 2747 2748 2749 2750 2751 2752 2753 2754 2755 2756 2757 2758 2759 2760 2761 2762 2763 2764 2765 2766 2767 2768 2769 2770 2771 2772 2773 2774 2775 2776 2777 2778 2779 2780 2781 2782 2783 2784 2785 2786 2787 2788 2789 2790 2791 2792 2793 2794 2795 2796 2797 2798 2799 2800 2801 2802 2803 2804 2805 2806 2807 2808 2809 2810 2811 2812 2813 2814 2815 2816 2817 2818 2819 2820 2821 2822 2823 2824 2825 2826 2827 2828 2829 2830 2831 2832 2833 2834 2835 2836 2837 2838 2839 2840 2841 2842 2843 2844 2845 2846 2847 2848 2849 2850 2851 2852 2853 2854 2855 2856 2857 2858 2859 2860 2861 2862 2863 2864 2865 2866 2867 2868 2869 2870 2871 2872 2873 2874 2875 2876 2877 2878 2879 2880 2881 2882 2883 2884 2885 2886 2887 2888 2889 2890 2891 2892 2893 2894 2895 2896 2897 2898 2899 2900 2901 2902 2903 2904 2905 2906 2907 2908 2909 2910 2911 2912 2913 2914 2915 2916 2917 2918 2919 2920 2921 2922 2923 2924 2925 2926 2927 2928 2929 2930 2931 2932 2933 2934 2935 2936 2937 2938 2939 2940 2941 2942 2943 2944 2945 2946 2947 2948 2949 2950 2951 2952 2953 2954 2955 2956 2957 2958 2959 2960 2961 2962 2963 2964 2965 2966 2967 2968 2969 2970 2971 2972 2973 2974 2975 2976 2977 2978 2979 2980 2981 2982 2983 2984 2985 2986 2987 2988 2989 2990 2991 2992 2993 2994 2995 2996 2997 2998 2999 3000 3001 3002 3003 3004 3005 3006 3007 3008 3009 3010 3011 3012 3013 3014 3015 3016 3017 3018 3019 3020 3021 3022 3023 3024 3025 3026 3027 3028 3029 3030 3031 3032 3033 3034 3035 3036 3037 3038 3039 3040 3041 3042 3043 3044 3045 3046 3047 3048 3049 3050 3051 3052 3053 3054 3055 3056 3057 3058 3059 3060 3061 3062 3063 3064 3065 3066 3067 3068 3069 3070 3071 3072 3073 3074 3075 3076 3077 3078 3079 3080 3081 3082 3083 3084 3085 3086 3087 3088 3089 3090 3091 3092 3093 3094 3095 3096 3097 3098 3099 3100 3101 3102 3103 3104 3105 3106 3107 3108 3109 3110 3111 3112 3113 3114 3115 3116 3117 3118 3119 3120 3121 3122 3123 3124 3125 3126 3127 3128 3129 3130 3131 3132 3133 3134 3135 3136 3137 3138 3139 3140 3141 3142 3143 3144 3145 3146 3147 3148 3149 3150 3151 3152 3153 3154 3155 3156 3157 3158 3159 3160 3161 3162 3163 3164 3165 3166 3167 3168 3169 3170 3171 3172 3173 3174 3175 3176 3177 3178 3179 3180 3181 3182 3183 3184 3185 3186 3187 3188 3189 3190 3191 3192 3193 3194 3195 3196 3197 3198 3199 3200 3201 3202 3203 3204 3205 3206 3207 3208 3209 3210 3211 3212 3213 3214 3215 3216 3217 3218 3219 3220 3221 3222 3223 3224 3225 3226 3227 3228 3229 3230 3231 3232 3233 3234 3235 3236 3237 3238 3239 3240 3241 3242 3243 3244 3245 3246 3247 3248 3249 3250 3251 3252 3253 3254 3255 3256 3257 3258 3259 3260 3261 3262 3263 3264 3265 3266 3267 3268 3269 3270 3271 3272 3273 3274 3275 3276 3277 3278 3279 3280 3281 3282 3283 3284 3285 3286 3287 3288 3289 3290 3291 3292 3293 3294 3295 3296 3297 3298 3299 3300 3301 3302 3303 3304 3305 3306 3307 3308 3309 3310 3311 3312 3313 3314 3315 3316 3317 3318 3319 3320 3321 3322 3323 3324 3325 3326 3327 3328 3329 3330 3331 3332 3333 3334 3335 3336 3337 3338 3339 3340 3341 3342 3343 3344 3345 3346 3347 3348 3349 3350 3351 3352 3353 3354 3355 3356 3357 3358 3359 3360 3361 3362 3363 3364 3365 3366 3367 3368 3369 3370 3371 3372 3373 3374 3375 3376 3377 3378 3379 3380 3381 3382 3383 3384 3385 3386 3387 3388 3389 3390 3391 3392 3393 3394 3395 3396 3397 3398 3399 3400 3401 3402 3403 3404 3405 3406 3407 3408 3409 3410 3411 3412 3413 3414 3415 3416 3417 3418 3419 3420 3421 3422 3423 3424 3425 3426 3427 3428 3429 3430 3431 3432 3433 3434 3435 3436 3437 3438 3439 3440 3441 3442 3443 3444 3445 3446 3447 3448 3449 3450 3451 3452 3453 3454 3455 3456 3457 3458 3459 3460 3461 3462 3463 3464 3465 3466 3467 3468 3469 3470 3471 3472 3473 3474 3475 3476 3477 3478 3479 3480 3481 3482 3483 3484 3485 3486 3487 3488 3489 3490 3491 3492 3493 3494 3495 3496 3497 3498 3499 3500 3501 3502 3503 3504 3505 3506 3507 3508 3509 3510 3511 3512 3513 3514 3515 3516 3517 3518 3519 3520 3521 3522 3523 3524 3525 3526 3527 3528 3529 3530 3531 3532 3533 3534 3535 3536 3537 3538 3539 3540 3541 3542 3543 3544 3545 3546 3547 3548 3549 3550 3551 3552 3553 3554 3555 3556 3557

   
   
   
                             Description of the Anubis Compiler
   
   
   
   This file contains  a description of the Anubis compiler together  with a discussion on
   how the next version of the compiler should be designed. 

   
   -------------------------------- Table of Contents ------------------------------------
   
   *** (1) Overall description. 
      *** (1.1) Modules of the compiler. 
      *** (1.2) How it works. 
         *** (1.2.1) Phase 1. 
         *** (1.2.2) Phase 2. 
      *** (1.3) Garbage collection. 
   
   *** (2) Tools. 
      *** (2.1) Lisp-like representation. 
      *** (2.2) Exceptions to the Lisp-like 'Expr' representation.
      *** (2.3) Catalog of all internal data representations. 
         *** (2.3.1) Internal representation of types. 
         *** (2.3.2) Internal representation of terms. 
         *** (2.3.3) Internal representation of interpretations. 
         *** (2.3.4) Internal representation of type implementations.
         *** (2.3.5) Internal representation of virtual machine instructions. 
            *** (2.3.5.1) Pseudo instructions. 
            *** (2.3.5.2) Computing instructions. 
            *** (2.3.5.3) Garbage collector instructions. 
            *** (2.3.5.4) Closure instructions.    
            *** (2.3.5.5) Equality instructions.
            *** (2.3.5.6) Dynamic variables instructions. 
            *** (2.3.5.7) Serialising/unserialising.
            *** (2.3.5.8) Starting a new virtual machine. 
            *** (2.3.5.8) Obsolete and soon obsolete instructions. 
      *** (2.4) Unification. 
      *** (2.5) Organization of the knowledge base. 
      *** (2.4) Indexation of symbols. 
   
   *** (3) The lexer. 
      *** (3.1) Generalities. 
      *** (3.2) Finding source files. 
      *** (3.3) The 'read' stack. 
      *** (3.4) Remembering already read files. 
      *** (3.5) Computing visibilities. 
   
   *** (4) The parser. 
   
   *** (5) Computing interpretations of terms. 
      *** (5.1) Overview. 
      *** (5.2) Interpreting applicative terms. 
      *** (5.3) Interpreting conditionals. 
      *** (5.4) Interpreting selective conditionals. 
   
   *** (6) Implementation of types. 
      *** (6.1) Implementation of primitive types. 
         *** (6.1.1) 'String'.
         *** (6.1.2) 'ByteArray'. 
         *** (6.1.3) 'Int32'. 
         *** (6.1.4) 'Float'. 
         *** (6.1.5) 'Listener'. 
      *** (6.2) Implementation of defined types. 
      *** (6.3) Implementation of functional types. 
      *** (6.4) Implementation of address types. 
         *** (6.4.1) File and network address types. 
         *** (6.4.2) Dynamic variable address types. 
      *** (6.5) implementation of C structure pointer types. 
   
   *** (7) Code generation. 
      *** (7.1) General conventions. 
      *** (7.2) Elimination of terminal recursion. 
      *** (7.3) Generating the code. 
         *** (7.3.1) Protect. 
         *** (7.3.2) Lock. 
         *** (7.3.3) Global symbols. 
         *** (7.3.4) Strings, Int32, small data and Floats. 
         *** (7.3.5) Local symbols. 
         *** (7.3.6) Microlocal symbols.
         *** (7.3.7) Closures.
         *** (7.3.8) Applicative terms. 
         *** (7.3.9) Conditionals.
         *** (7.3.10) Selective conditionals. 
         *** (7.3.11) Local definitions ('with ...'). 
         *** (7.3.12) Delegate. 
         *** (7.3.13) Serialize/Unserialize. 
      *** (7.4) Peephole optimisation. 
   
   *** (8) Garbage collector code. 
   
   *** (9) Equality code. 
   
   *** (10) Serialisation/unserialisation. 
      *** (10.1) How it works. 
      *** (10.2) Descriptions of small types. 
      *** (10.3) Pseudo-instructions set. 
      *** (10.4) Serialising. 
      *** (10.5) Unserialising. 
   
   *** (11) Outputing the code (making an *.adm file). 
   
   *** (12) How to make a system of macros ? 
   
   ---------------------------------------------------------------------------------------
   
   
   
   
   *** (1) Overall description. 
   
   
      *** (1.1) Modules of the compiler. 
   
   The Anubis compiler is made of the following main parts: 
   
     - the lexer (lexer.y)
     - the parser (grammar.y)
     - a module for analysing type definitions (typedef.c)
     - a module for analysing data definitions (opdef.c)
     - the interpretations module (this not an interpreter) (interp.c)
     - the compiler proper (generation/optimization of symbolic code) (compile.c)
     - the module for computing the implementations of data types (implem.c)
     - the output module (making the '*.adm' files) (output.cpp)
   
   It also contains secondary modules. Some of them are always used:
   
     - a module for deciding if a type is recursive (rectype.c)
     - tools for working on data types (unify.c, typetools.c, typewidth.c, typecmp.c, unknowns.c)
     - computing the implicit destructors (destruct.c)
     - computing the deletion code (delcode.c)
     - computing the equality code (eqcode.c)
     - a module containing various tools for manipulating expressions (expr.cpp)
     - a system for managing the memory of the compiler itself (mallocz.c)
     - a set of messages and functions for printing messages (msgtexts.c, show.c)
     - a set of predefined paragraphs (predef.c, predef.anubis, predefined.anubis)
     - tools for computing the size and binary format of instructions (vminstr.c)
   
   Some others are used only on demand (options of the compiler):
   
     - making an HTML index of all public symbols (index.c)
     - outputing a C version of some Anubis types together with the constructors (dumpct.c)
     - making a 'polish' translation (polish.c)
     - producing a symbolic code dump (*.sc) (symcode.c)
   


   
      *** (1.2) How it works. 
   
   The compiler works in two phases:
   
     - phase 1: reading source files, checking paragraphs, computing the interpretations. 
     - phase 2: generating a module for each 'global' paragraph. 
   
         *** (1.2.1) Phase 1. 
   
   During  phase 1,  the  compiler reads  source files  and  does the  following for  each
   paragraph:
   
     1. reading  the paragraph  (using the  parser and  the lexer).   This  may eventually
        produce a syntax error.
     
     2. depending on the kind of paragraph:
   
        - 'read' paragraph:  checking if the  file refered to  has already been  read. The
        file is read only  if it has not already been read.   However, visibility is setup
        correctly. See the section on the lexer below.
   
        - 'replaced  by' paragraph: this  is the  same as  'read' paragraphs,  except that
        visibility is computed  differently. See the section on the lexer below.

        - 'type' paragraph: the  type definition is checked.  See  the section 'Checking a
        type definition' below.
   
        - 'define' paragraph: the definition is checked. See the section 'Checking a datum
        definition' below. 
   
        - 'C constructor' paragraph. An Anubis type is translated into some program usable
        from within a C program. (the output,  for all those paragraphs, is written in the
        files 'construc.h.out' and 'construc.c.out').
   
   Only one error is considered per paragraph. If an error is found, the analysis restarts
   at the next paragraph. 
   
   
   
         *** (1.2.2) Phase 2. 
   
   Phase 2  is performed only  if there is  no error after phase  1. Phase 2  performs the
   following tasks:
   
     - check  if all type  definitions are  complete. Since  a type  may be  defined using
     several paragraphs, it is possible  to define incomplete types (all alternatives have
     not been defined).
   
     - determine which types are infinite (recursive). 
   
     - Dump C versions of types (if required). 
   
     - make the modules (*.adm). 
   
     - dump implementations of types (if required). 
   
     - print statistics (if required). 
   
     - print possible errors after phase 2. 
   
   
   
   
   
   
      *** (1.3) Garbage collection. 
   
   Data are implemented in two different ways:
   
     - directly
     - indirectly
   
   'Directly' means that what we have in  hand is the datum itself, and 'indirectly' means
   that what we have in hand is a pointer to a segment of memory containing the datum.

   Data may  be 'created', 'duplicated' and  'destroyed'. When we duplicate  (or 'copy') a
   direct datum,  we have  no choice,  we just  copy all the  bits of  the datum.  When we
   duplicate an indirect datum, we have the choice between:
   
     - allocating a new segment of memory, and copying the content of the original segment
       into the new segment (this is called 'real' copy), 
     - just duplicating the pointer (this is called 'virtual' copy). 
   
   Notice that these  methods are semantically equivalent for  abstract data. For concrete
   data (for example: dynamic variables, connections, etc...), indirect implementation and
   virtual copy are required.
   
   Of course, virtual  copy is faster, but it creates 'shared  data'.  When a (necessarily
   indirect) datum  is shared, we  have several pointers  pointing to the same  segment of
   memory. Then, the problem is just to know  when the segment is no more used, so that we
   can  return it  to the  pool of  memory. This  problem is  easily solved  by 'reference
   counting', which works as follows.
   
   The first 32 bits word of any  allocated segment is reserved for reference counting. It
   countains an unsigned integer, called the 'counter' (for this segment). This counter is
   either:
   
      - zero, in which case, the segment is 'permanent' and will never be freed, 
      - non zero,  in which  case its  value is the  number of  pointers pointing  to this
        segment. 
   
   Permanent segments are  created in general (always ?) at compile  time. For example, if
   you write  a character string into your  source text, the compiler  creates a permanent
   segment for  holding this string (this segment  is within the program  code itself). Of
   course the counter of a permanent datum is never changed and remains always 0. 
   
   Non   permanent   data  are   created   dynamically   at   run  time.    The   function
   'AllocateDataSegment' (see the  virtual machine source code) receives  the number of 32
   bits words  to allocate,  and returns the  pointer to  the segment (actually  casted to
   'U32'). The first 32 bits words of  the segment (the counter) is already initialized to
   1.
   
   When we need to  make a virtual copy of a (non permanent)  indirect datum, we just have
   to increment the  counter of this segment  (so that it always represents  the number of
   (significant) pointers to the segment).   Notice that this will never overflow, because
   an  overflow  would  mean a  number  of  pointers  bigger  than  the total  memory  can
   hold. 
   
   When  we want  to  delete  a (non  permanent)  indirect datum,  we  perform a  'virtual
   deletion'.  This  means that we  decrement the counter.  If the counter becomes  0, the
   segment may be freed.
   
   But  freeing the  segment is  not as  simple as  this, because  the segment  itself may
   contain pointers  which are  counted within other  segments. Freeing the  segment means
   destroying  these  pointers,   so  that  we  must  also   decrement  the  corresponding
   counters. We see that this process  is recursive, because destroying a pointer may lead
   to decrement a counter to zero, and to free another segment, and so on ...
   
   Hence, the process of freeing the segments must be coded as a (recursive) program. This
   is what we call  'garbage collector code'.  This code is generated  by the compiler and
   depends only  on the  definition of  types.  Actually, there  is one  garbage collector
   function  for each  defined type  which  may contain  indirect data  (i.e. 'mixed'  and
   'large'  types). Some  other  types  (like functional  types)  require special  garbage
   collector functions.
   
     
   
   
   
   
   *** (2) Tools. 
   
   
      *** (2.1) Lisp-like representation. 
   
   The  Anubis compiler  uses a  particular style  of programming  inspired by  Lisp.  The
   advantage is  that this system is very  flexible.  The disadvantage is  that, being not
   typed, it is not very safe. I think I will never do that again. It works as follows.
   
   The  type  'Expr'  represents all  symbolic  'expressions'  to  be manipulated  by  the
   compiler. This includes  tokens generated by the lexer,  syntactical trees generated by
   the  parser,  interpretations  generated   by  the  interpretations  module,  contexts,
   environments, symbolic code, ... 
   
   Actually 'Expr' is  just a clone of U32,  the type of unsigned 32  bits integers. These
   integers are 'dynamically' typed (using the low bits) as follows:
   
   0 mod 2 ---------------------------------> (signed) integers
   1 mod 2 --->  1 mod 32 ------------------> tags
                 3 mod 32 ------------------> temporary pairs
                 5 mod 32 ------------------> strings
                 7 mod 32 --->  7 mod 64 ---> user defined type variables
                               39 mod 64 ---> internal type variables (unknowns)
                 9 mod 32 ------------------> permanent pairs
   
   In other words:
   
      xxxxxxxx xxxxxxxx xxxxxxxx xxxxxxx0       integer
      xxxxxxxx xxxxxxxx xxxxxxxx xxx00001       tags
      xxxxxxxx xxxxxxxx xxxxxxxx xxx00011       temporary pairs
      xxxxxxxx xxxxxxxx xxxxxxxx xxx00101       strings
      xxxxxxxx xxxxxxxx xxxxxxxx xx000111       user defined type variables (like $T)
      xxxxxxxx xxxxxxxx xxxxxxxx xx010111       unknowns
      xxxxxxxx xxxxxxxx xxxxxxxx xxx01001       permanent pairs
   
   Notice for  example, that pairs (both temporary  and permanent) have the  5 lowest bits
   inhibited. Hence, the maximal number of temporary pairs is 2^27. The same for permanent
   pairs. This makes  at most 134217728 pairs  of each sort. For temporary  pairs, this is
   not a problem, but  for permanent pairs, this may be a  problem.  Nevertheless, since a
   pair occupies  8 bytes, an  array of permanent  pairs containing 134217728  pairs would
   contain 2^30  bytes, i.e. 1  gigabyte. So, with  a machine of  less than 1  gigabyte of
   memory, no problem will happen (except that very big programs may not be compiled). 
   
   A good program  should be protected against such problems independantly  of the size of
   available  memory. Examining  the  code  I see  that  it is  not  the  case for  Anubis
   1. Indeed, the function 'pcons' (defined in 'expr.cpp'), which is used for making a new
   permanent pair,  has no  protection against  an enlargement of  the array  of permanent
   pairs above the limit. This should be fixed as soon as possible. (Similarly, 'cons' for
   temporary pairs is not well protected).
   
   
   All  'Expr's except temporary  and permanent  pairs are  called 'atoms'.  Pairs, either
   temporary or permanent, are just made  of two 'Expr's, and represented (in the comments
   of the source of the compiler), as follows:
   
                                              (a . b)
   
   where 'a'  and 'b' are  two 'Expr's  respectively called the  'head' and 'tail'  of the
   pair. The macro 'cons' enables the construction of such pairs. 
   
   As in  Lisp, lists are represented  using pairs and the  tag 'nil' (the  empty list) as
   follows (for a list made of 4 'Expr's):
   
                                             (a b c d)
   
   which is actually a shortcut for:
   
                                     (a . (b . (c . (d . nil))))
   
   Such lists may be easily constructed  using the macros 'list1', 'list2', ... defined in
   'compil.h'. 
   
   Now, an  'Expr' like (a .   (b .  (c .   d))) is written '(a  b c .  d)',  and called a
   'multiple  cons'. A  set of  macros, called  'mcons3', 'mcons4',  ...  enable  a direct
   construction of such data.
   
   Examples of use and meaning of these macros (and some others):
   
     The C term                         represents
     ---------------------------------------------------------------
     nil                                ( )
     cons(a,b)                          (a . b)
     mcons3(a,b,c)                      (a b . c)
     mcons4(a,b,c,d)                    (a b c . d)
     etc...
     list1(a)                           (a)          same as (a . nil)
     list2(a,b)                         (a b)        same as (a b . nil)
     etc...
   
     If x = (a b c d e):
     car(x)                             a
     second(x)                          b
     third(x)                           c
     etc...
   
     cdr(x)                             (b c d e)
     cdr2(x)                            (c d e)
     cdr3(x)                            (d e)
     etc...
   
     Notice that 'mcons2' does not exist because it is the same a 'cons'.
   
   'consp(x)' is 1  if and only x is a  pair (i.e. not an atom). Otherwise,  it is 0. This
   predicate ('consp'  is a  traditional Lisp abbreviation  for 'cons predicate')  is very
   often used as follows:
   
      while(consp(x))
        {
           Expr head = car(x);
   
           ... do something with the head of x ...
   
           x = cdr(x);    // continue with the tail of x
        }
        // in most cases, when we arrive here, we have x == nil (the empty list). 
   
   
   
   Normally,  pairs  are  created  temporary,  and  transformed  into  permanent  only  if
   needed. The  question is that,  while working on  a paragraph, the  compiler constructs
   data which will  not be part of the  final result. When the analysis of  a paragraph is
   finished, only the data to be kept  are made permanent (using the function 'save'), and
   the array  of temporary  pairs is reset  to zero  for the next  paragraph. This  is how
   memory is managed in the compiler (this  has nothing to do with garbage collection in a
   virtual machine).
   
   Integers are coded on 31  bits (bits 1 to 31), and the bit number  0, whose value is 0,
   means  that this 'Expr'  is an  integer. The  file 'compil.h'  contains all  the macros
   needed to test the type of an 'Expr', or to construct 'Expr's.
   
   Tags  are used  for representing  simple atomic  concepts. For  example, the  tag 'nil'
   represents  the empty  list. They  are also  used for  'tagging' non  atomic  data. For
   example, when the function
   
                                    (Int32 i, String s) |-> a
   
   is read from a source file, the parser constructs the following 'Expr':
   
                      (lambda <lc> (([Int32] . i) ([String] . s)) . [a])
   
   where '[Int32]'  is a  representation of  type 'Int32' (similarly  for 'String')  , and
   '[a]' a representation of the term 'a'.
   
   Here, the tag 'lambda' represents the arrow '|->'. 
   
   
   
   
   
      *** (2.2) Exceptions to the Lisp-like 'Expr' representation.
   
   Actually, this system has exceptions. Normally, all tokens returned by the lexer have a
   value of type 'Expr', i.e. a datum  containing its own (dynamic) type. Examining such a
   value it is possible to know if we have an integer, a string, a pair, etc...
   
   However, the two tokens 'yy__integer' and 'yy__char' have a value of type 'integer'.
   
   This means that when the lexer  reads a number (regular expression: [0-9]+), it returns
   the token  'yy__integer', with a  value to be  interpretated as a  C integer, not  as a
   Lisp-like integer. We must be careful  because for example the integer 6 is represented
   like this in the two systems:
   
          ...0110           (C integer: this is not an 'Expr')
         ...01100           (Lisp-like integer: this is an 'Expr')
                ^
                |
                +--------- this bit (0) means that this 'Expr' is an integer
   
   The reason for these exceptions is that we want the compiler to be able to read 32 bits
   integers, not only 31 bits integers from source files. 
   
   The parser contains the following grammar rule:
   
      Term:    yy__integer    %prec prec_symbol  { $$ = mcons3(integer,linecol(),$1); }
   
   This means that when  an integer has been read by the  lexer, the parser constructs the
   following multiple cons as a representation of this integer:
   
                       (integer <lc> . <n>)
   
   where <n> is the value of the integer as a C integer, not as a Lisp-like integer. 
   
   As a consequence, we must always remember  that a list whose head is the tag 'integer',
   has always  the form: (integer  <lc> . <n>),  where <n> is not  an 'Expr'. It  would be
   meaningless to  apply a  predicate like 'is_integer'  to <n>,  and the result  would be
   unpredictable.
   
   The exceptions are the following (where only <Cint> is not an 'Expr'):
   
      (integer <lc> . <Cint>)
      (small_datum <type> . <Cint>)
      (anb_int32 <lc> . <Cint>)
      (load_int32 . <Cint>)
   
   (I hope there is no other exception, but I'm not absolutely sure). 

   Of course, the existence  of these exceptions is very bad. A  better solution should be
   to represent (say) the integer 746542 as the list  (7 4 6 5 4 2), where the elements of
   the list would have been Lisp-like  integers. In the compilation step, the compiler may
   transform this  representation into a packed integer  of 32, 64 bits,...   maybe with a
   warning in the case of an overflow. Actually, this is what happens already at a smaller
   scale. Indeed,  the current  version of  the compiler finds  3 interpretations  for the
   integers 180 (of types  'Int8', Int32' and 'Float'), but only 2  for 4567, because this
   number is greater that 255.
   
   

   
   
   
   
      *** (2.3) Catalog of all internal data representations. 
   
   The compiler represents the following data (internally): 
   
     - 'types' (as produced by the parser)
     - 'terms' (as produced by the parser)
     - 'interpretations' (as produced by 'term_interpretations' and related functions)
     - 'implementations' (as produced by 'type_implementation' and related functions)
     - 'instructions' (as produced by 'compile_term' and related functions)

   Note: All  data are represented  either by Lisp-like  atoms (string, tags, ...),  or by
   lists. In the case of a list, the head of list is always a tag.
   
   
   
   
         *** (2.3.1) Internal representation of types. 
   
   <type> :=
   
   yy__Symbol                        just a type name in the form of a Lisp-like string
   
      For  example, the  types  'One', 'Bool'  etc ...   are  in this  category. The  file
      'compil.h' contains the  definitions of a lot of 'predefined  strings'. Within the C
      source, such symbols  may be represented by the  C symbols 'pdstr_One', 'pdstr_Bool'
      etc... ('pdstr_' for 'predefined strings').
   
      Equivalently, 'pdstr_Bool' maybe replaced  by 'new_string("Bool")'. This has exactly
      the same effect, except that 'pdstr_Bool' is faster at execution (of the compiler).
   
      Remark: 'new_string' (defined in 'expr.cpp') does not allocate a new segment for the
      string if the  string already exists. In this case, it  returns the already existing
      Lisp-like string.
   
      All  types  (without parameters)  whose  names  are created  by  the  user are  also
      represented like this.
   
   
   yy__utvar                         user type variable (type parameter)
   
      Type parameters have a special Lisp-like representation as atoms (see above). 
   
   type_String                       a Lisp-like tag representing the type 'String'
   type_ByteArray                    idem
   type_Int32                        idem
   type_Float                        idem
   type_Listener                     idem
   
      The above being more 'primitive', they are represented by Lisp-like tags, instead of
      Lisp-like strings.
   
   (type_RAddr . <type>)             represents 'RAddr(T)'; type_RAddr is a Lisp-like tag
   (type_WAddr . <type>)             idem
   (type_RWAddr . <type>)            idem
   (type_GAddr . <type>)             idem (obsolete: global variable)
   (type_Var . <type>)               idem (dynamic variable)
   (type_MVar . <type>)              idem (multiple dynamic variable)
   
      The above are called 'address types'. 
   
   (app_ts <name> . <types>)         type scheme applied to type operands
   
      'app_ts' is for 'apply type scheme'. For example, 'List(String)' is represented as:
   
                              (app_ts new_string("List") type_String)
   
      and 'Result(One,String)' as:
   
                         (app_ts new_string("Result") pdstr_One type_String)
   
   
      Notice  that  there is  no  dot in  the  above  Exprs.  '(app_ts  new_string("List")
      . type_String)' is erroneous, because the second tail ('cdr2') of the list must be a
      list of types, not a single type.
   
   (type_struct_ptr . <id>)          encapsulated  C structure type 
   
      <id> is a Lisp-like integer identifying the structure.
   
   (functype <types> . <type>)       functional type
   
      <types> is the list of the representations of the arguments types, and <type> is the
      representation of the target type.  
   
   
   There   is  still   another  sort   of  'types'   represented  internally:   the  'type
   unknowns'. They are also  a special sort of Lisp-like Exprs. They  are not generated by
   the parser,  but in general  by the interpretation  functions. When a symbol  is found,
   representing a  scheme, all  the type variables  in this  scheme are replaced  by fresh
   unknowns. Of course, there is a generator of fresh unknowns: 'fresh_unknown'.
   
   From this we may conclude that defined  types are those which are represented either by
   a string or by (app_ts ...). 
   
   
         *** (2.3.2) Internal representation of terms. 
    
   Note:  <lc>   (generated  by   the  lexer)   is  a  packed   version  of   the  triplet
   (filename,line,column). There are macros for extracting the components:
   
       file_in(<lc>)            a C char *
       line_in(<lc>)            a C int
       col_in(<lc>)             a C int
       
       
   As usual, <Cint> represents a C integer (i.e. not a Lisp-like integer). 
   
   <term> := 
   
   (alert <lc> . <filename>)           <filename> is a Lisp-like string
   (protect <lc> . <term>)
   (lock <lc> <term> . <term>) 
   (debug_avm <lc> . <term>)
   (avm <lc> . AVM)                    (AVM is explained below)
   (terminal <lc> . <term>)
   (symbol <lc> . <Lisp-like string>)
   (integer <lc> . <Cint>)
   (fpnum <lc> <mantissa> . <exponent>)    (<mantissa> and <exponent> are Lisp-like integers)
   (of_type <lc> <type> . <term>)          (explicit typing)
   (string <lc> . <string>)                (<string> is a Lisp-like string)
   (with <lc> <symbol> <value> . <term>)
   (delegate <lc> <term> . <term>)
   (lambda <lc> <FArgs> . <term>)               (|->)
   (rec_lambda <lc> <name> <FArgs> . <term>)    (|-f->)
   (app <lc> <term> . <terms>)                  (applicative term)
   (cond <lc> <term> . <cases>)                 (conditional)
   (select_cond <lc> <term> <case> . <term>)    (selective conditional)
   (serialize <lc> . <term>)
   (unserialize <lc> . <term>)
   
   The following are still used, but should become obsolete (and replaced by system calls).
   
   (anb_read ...)          
   (anb_write ...)         
   (anb_exchange ...)      
   (connect_to_file ...)   
   (connect_to_IP ...)     
   (wait_for ...)          
   
   
   
   
         *** (2.3.3) Internal representation of interpretations. 
   
   The  function   'term_interpretations'  (in  'interp.c')  computes  the   list  of  all
   interpretations of  a term  (in a  given appropriate context).  An interpretation  is a
   pair:
   
                                          (head . env)
   
   where 'head'  is called an  'interpretation head', and  where env is  an 'environment'.
   Here, we describe interpretation heads.
   
   Unfortunately, I've used the same tags as for terms. It would be safer to use different
   tags.  Furthermore, tags for some sort of expressions should have a uniform prefix, for
   example:
   
      typ_           for types
      ter_           for terms
      int_           for interpretations
      imp_           for implementations
      ins_           for instructions
   
   The tags  should also be numbered.  This will limit the  risk of forgetting a  tag in a
   switch. Hence for example, we should have the following tags for terms:
   
     ter_00_alert
     ter_01_protect
     ter_02_lock
     etc...
   
   and a dummy one for marking the end of the list:
   
     ter_20_dummy
   
  
   Interpretations heads are the following:
   
   (protect <lc> . <head>)
   (lock <lc> <term> . <term>)
   (avm <lc> . <instructions>)
   (debug_avm <lc> . <head>)
   (terminal <lc> . <head>)
   (operation <lc> <opid> <name> <parms> <type> . <types>)
   (string <lc> . <string>)
   (anb_int32 <lc> . <Cint>)
   (small_datum <type> . <Cint>)
   (fpnum <lc> <mantissa> . <exponent>)
   (local <name> <i> . <type>)
   (micro_local <name> <d> <i> . <type>)
   (closure <lc> (f_micro_ctxt <fname> <ftype> (<sym> . <type>) ...) <args> . <body>)
   (app <lc> <head> . <heads>)
   (cond <lc> <head> ((<name> (<sym> . <type>) ...) <lc> . <head>) ...) 
   (select_cond_interp <lc> <test head> <index> <clause head> <head then> . <head else>)
   (with <lc> <symbol> <head> . <head>)
   (delegate <lc> <head> . <head>)
   (serialize <lc> . <head>)
   (unserialize <lc> <type> . <head>)
   
   The following are still used but should become obsolete: 
   
   (anb_read ...)
   (anb_write ...)
   (anb_exchange ...)
   (wait_for ...)
   
   
   
   
         *** (2.3.4) Internal representation of type implementations.

   Type implementations are computed in 'implem.c'. We never compute an interpretation for
   a scheme,  but only for an  actual type (i.e.   with no type parameter).   For example,
   'Maybe(Int8)' has an implementation different from 'Maybe(String)', but 'Maybe($T)' has
   no implementation at all.

   In the future, it would be nice to have quantification 'à la Girard' on types:
   
         forall $T ...
         exists $T ...
   
   In this case, we would have to implement for example the type:
   
         forall $T Maybe($T)
   
   which has no parameter, since $T is bound in this expression.
   
   Internal representations of implementations are  the following. Again the same tags are
   used, which is quite unfortunate.
   
   
   <implem> :=
   
   
   type_String
   type_ByteArray
   type_Int32
   type_Float
   
   (functype <implem> . <implems>)
   
   (type_RAddr . <implem>)
   (type_WAddr . <implem>)
   (type_RWAddr . <implem>)
   (type_GAddr . <implem>)       (obsolete)
   (type_MVar . <implem>)
   
      It  is very unfortunate  that dynamic  variables are  not implemented  as '(type_Var
      . <implem>). Instead,  this type is implemented as a large  type. I don't understand
      how  this can work  properly (but  it seems  to work  miraculously). This  should be
      changed of course.
   
   (type_struct_ptr . <id>)
   
   (small_type <nalt> <iw> <alt geom> ... <alt geom>)
   (mixed_type <nalt> <iw> <alt geom> ... <alt geom>)
   (large_type <nalt> <iw> <alt geom> ... <alt geom>)
   
   where:
   
   <alt geom> :=
   
   (small_alt (<imp> <offset> . <width>) ... (<imp> <offset> . <width>))
   (mixed_alt (<imp> <offset> . <width>) ... (<imp> <offset> . <width>))
   (large_alt (<imp> <offset> . <width>) ... (<imp> <offset> . <width>))
   
   
   
   
         *** (2.3.5) Internal representation of virtual machine instructions. 
   
   
            *** (2.3.5.1) Pseudo instructions. 
   
   These instructions are of size 0. 
   
   (label . <addr>)             position within the code
   (ret_point . <addr>)         same as label
   (header ...)                 used for decorating the .sc files
   (comment ...)                idem
   (context ...)                idem 
   
   
            *** (2.3.5.2) Computing instructions. 
   
   
   (glue_index . <index>)                      used for small types
   (glue_mixed_index . <index>)                used for mixed types
   (store_index . <index>)                     used for large types
   
   These instructions are used  for constructing a datum of a defined  type.  They put the
   right index (i.e. alternative number of the datum) into the datum.
   
   For a  small datum, 'glue_index' puts the  index within the least  significant bits (of
   R). This is the  first operation when constructing the datum, so  that the content of R
   is actually overwritten.
   
   For  a  mixed  datum,  the  index  is   put  into  the  2  least  significant  bits  by
   'glue_mixed_index'.
   
   For a large datum, the index (a byte)  is put at offset 4 (just after the counter) into
   the segment.
   
   
   (glue . <bit width>)                    used for small and mixed types
   (store_0 . ?)                           used for mixed and large types
   (store_1 . ?)
   (store_2 . ?)
   (store_4 . ?) 
   
   These  instructions  are  used  for   putting  the  components  into  the  datum  under
   construction (of some defined type).
   
   The instruction '(glue . <bit width>) ORes the  register R with a left shift of the top
   of stack. The number of bits of the shift is given by the operand. 
   
   '(store_? . <offset>)' stores a  word of 0, 1, 2 or 4 bytes at  the offset given by the
   operand within the segment pointed to by R. 
   
   
   (index_direct . <bit width>)
   index_indirect
   
   These instructions  are used  by conditionals.   A datum of  a defined  type is  in the
   register  'R', and  they put  its index  (alternative number)  into the  index register
   'I'. The first one is used for small and mixed data, the second one for large data.
   
   
   (switch <addr> ... <addr>)
   
   This  instruction check  the  content  of the  index  register 'I',  and  jumps to  the
   corresponding address. It is used by conditionals for jumping at the right case.
   
   
   (unglue <bit width> . <right shift>)
   (unstore <offset> . <n>)      where <n> is a Lisp-like integer equal to 0, 1, 2 or 4
   (unstore_copy . <offset>)
   (unstore_copy_mixed <offset> . <mask>)
   (unstore_copy_ptr . <offset>)
   (unstore_copy_function . <offset>)
   
   These instructions are used by conditionals for pushing the values of resurgent symbols
   onto the stack.
   
   '(unglue <bit width> . <right shift>)' is  used for extracting a component from a small
   datum. The datum is pushed into the stack. 
   
   'unstore' instructions are similar but for indirect  data.  Some of them need to make a
   virtual copy of the datum to be extracted.
   

   (jmp . <addr>)
   
   This simple 'jump' instructionn is used at the end of cases in conditionals for jumping
   to the next code (which is common to all cases).
   
   
   (apply . k)
   (ret . k)

   When '(apply . k)' is on the point  to be executed, the stack contains (starting at top
   of stack) the arguments of a function  and a return address. '(apply . k)' The function
   to be 'applied' to these arguments is in 'R'. 
   
   Thisn instruction acts  in two different ways depending on the  nature of the function,
   which  may be  either 'top  level' or  a  'closure'. See  the section  on closures  and
   functions for more explanations.
   
   '(ret . k)'  is just an ordinary  return instruction (except that is  stops the virtual
   machine if the return address is zero).
   
   'k' is the number of arguments of the function. 
   
   
   (load_Int32 . <Cint>)
   (load_float <mantissa> . <exponent>)
   (string . <string index>)
   
   These instructions are used for loading constants into 'R'. 
   
   
   (check_stack . <n>)
   
   This instruction checks if the stack is  big enough for pushing '<n>' 32 bits words. If
   it is not, it puts the virtual  machine in the state 'needs_bigger_stack' and gives up,
   so that the scheduler may enlarge the stack of this machine. 
   
   
   (push_addr . <addr>)  
   
   This instruction pushes a return address on top of stack. 

   
   (address . <addr>)
   
   The same but the address is put into 'R'. 
   
   
   push
   
   This instruction pushes the content of 'R' onto the stack. 
   
   
   (select_index_direct <bit width> <index> . <addr>)
   (select_index_indirect <index> . <addr>)
   
   These instruction are generated for selective conditionals. 

   
   (peek <name> . <depth>)
   (micro_peek <name> <depth> . <micro depth>)
   
   The instruction 'peek' takes  a 32 bits word at depth '<depth>'  into the stack and put
   it  into 'R'.  The  datum is  not virtually  copied.  The compiler  generates a  'copy'
   instruction to follow this one, if needed, depending on the type of the datum.
   
   'micro_peek' is  similar except that  the datum to  be put into  'R' is located  into a
   micro context. The micro  context is at depth <depth> into the  stack, and the datum is
   at position <micro depth> into the micro context. 
   
   
   protect
   (unprotect . <addr>)
   
   
   unlock
   lock
   
   
   odd_align
   
   This instruction is put in front of the code of a top level function if this code would
   be at an even address.
   
   
   pop1
   pop2
   pop3
   
   These instructions are just removing 1, 2 or 3 32 bits words from the top of stack. 
   
   invalid
   
   This instruction  should never be  executed. It is  generated in the  garbage collector
   code for small alternatives in mixed  types. This means that the garbage collector code
   must never be called for a datum  belonging to small alternative of a mixed type. Other
   instructions with a <mask> operand must use this mask to avoid such a call. 
   
   give_up
   
   This  instructions make  the  virtual machine  give  up as  if its  slice  of time  was
   finished.
   
   
   start_debug_avm
   stop_debug_avm
   
   These instructions  set and unset  a flag.  When this flag  is set the  virtual machine
   prints all instructions executed with other informations. 
   
   
   finish
   
   (program . <instructions>)
   
   do_alert
   
   
   
   
            *** (2.3.5.3) Garbage collector instructions. 
   
   The  null 32  bits word  '0' is  always a  valid datum  (of any  type) for  the garbage
   collector.   Of course,  this  datum doesn't  need  to be  collected,  but the  garbage
   collector may be passed such a datum.
   
   
   Allocation of memory:
   
      (alloc . <number of bytes>)
   
   The pointer to  resulting segment is put in  R. The counter (first 32 bits  word of the
   segment) contains always 1 after the allocation.
   
   
   Making virtual copies. The following instructions are used for making a virtual copy of
   a  maybe indirect  datum.   This  amounts to  increment  the counter  which  is at  the
   beginning of the  segment.  However, these instructions are generated  on behalf of the
   type of the datum only, and this is not enough to know if the counter exists (some data
   are direct).  Hence,  the existence of several variants  for these instructions.  Also,
   these instructions always check  if the counter is zero.  If it  is the case, the datum
   is permanent, and the counter is not incremented.
   
   Copying virtually within the stack. The datum is at position <depth> in the stack (0 is
   the top of stack).
   
      (copy_stack_ptr . <depth>)
   
   The  counter always  exists (except  if the  datum is  the null  word). The  counter is
   incremented if it is non zero.
   
      (copy_stack_function . <depth>)   
   
   The datum reprensents a function, i.e. the address  of some piece of code if it is odd,
   and a pointer to a closure if it is event. The counter exists only in the second case.
   
      (copy_stack_mixed <depth> . <mask>)
   
   The <mask> has at  most 4 bits. Each bit indicates if  the corresponding alternative is
   large or small. The counter exists only if the alternative is large.
   
   Virtually copying in R. This is the same except that the datum to be copied is in R. 
   
      copy_ptr
      copy_function
      (copy_mixed . <mask>)

   The next one should become obsolete:
   
      copy  
   
   It's  almost the  same  as  'copy_ptr' except  that  it does  not  check for  permanent
   data. Should disappear and be replaced by 'copy_ptr'.

   
   
   Virtual deletion within the stack. 

      (collapse . <depth>)

   This instruction  deletes the datum  at position <depth>  into the stack. The  datum is
   direct, so that the instruction has just to  move the <depth> words at top of stack one
   slot downwards,  and adjust the  stack pointer.  For  example, (collapse . 3)  works as
   follows:
   
          the stack before                          the stack after
          a b c d e f g ....                          a b c e f g ...
   
   ('d' has disappeared)

   The next  instructions delete virtually a datum  which is in R.  These instructions are
   generated only for thoses types who have indirect data. 
   
      del_ptr
   
   'del_ptr' expects a pointer in R to a segment not containing any indirect data. In this
   case, the  deletion amounts to decrement  the counter pointed  to by R. However,  R may
   contain 0 (in this  case the instruction does nothing), or the  counter may be zero (in
   this case, the datum is permanent  and the instruction does nothing). Otherwise, if the
   counter becomes zero, the segment is freed. 
   
      del_conn
   
   This  is  similar  to  'del_ptr'  except  that the  segment  pointed  to  represents  a
   'connection' (file or network). The instruction  works as 'del_ptr', except that if the
   counter becomes zero, it also closes the connection. 
   
      del_function
   
   This is  similar to 'del_ptr', except that  functions are either addresses  of code (if
   odd) or  pointers to closures (if  even).  In the first  case nothing is  done.  In the
   second  case, 'del_function'  works  like 'del_ptr',  except  that the  address of  the
   deletion  code is  found within  the closure  itself instead  of as  an operand  of the
   instruction. This  is because, the  types of  the data in  the micro context  cannot be
   known from the type of the function.
   
      (del . <addr>)
   
   '(del  . <addr>)' expects  a datum  belonging to  a large  type in  R. '<addr>'  is the
   address of  the deletion code for  this type. The instruction  checks if R  is zero. In
   this case it does  nothing.  Otherwise, R contain a pointer to  a valid segment. If the
   counter is zero (permanent datum)  the instruction does nothing. Otherwise, the counter
   is decremented.  If the counter becomes zero, a return address and the datum (in R) are
   pushed onto the stack, and the deletion code at '<addr>' is jumped to.
   
      (del_mixed <mask> . <addr>)
   
   This is  the same as  '(del . <addr>)' except  that the datum  in R belongs to  a mixed
   type. The mask is  used to check if the datum is direct or  indirect.  If it is direct,
   nothing is done. If it is indirect, the same as for '(del . <addr>)' is performed.   
   
      (del_struct_ptr . <id>)
   
   This instruction is  the same one but for C  structure encapsulation pointers. Somewhat
   curiously, it is coded in the compiler ('compile.c'), but has never been generated (the
   proof:  it is  not  coded in  'vminstr.c').   This is  probably due  to  the fact  that
   encapsulated C structures are  manipulated only from within 'predefined.anubis'. Hence,
   this is not a big problem,  because only a modification of 'predefined.anubis' can lead
   to generate this instruction.

      (del_mvar . <addr>)
   
   This  instruction  is  even more  phantomatic  than  the  previous  one. This  is  more
   problematic,  because multiple  variable  may  be manipulated  by  anyone. Hence,  this
   instruction should be coded as soon as possible.
   
   Note:  We should  also have  a '(del_var  .  <addr>)'.   The type  'Var' is  very badly
   implemented as  a large type (one of  my worst ideas). Everything  concerning this type
   should be  examined and corrected. I suspect  that this creates at  least memory leaks,
   but maybe also more serious problems.
   
      (del_stack_ptr . <depth>)
      (del_stack_conn . <depth>)
      (del_stack_function . <depth>)
      (del_stack <depth> . <addr>)
      (del_stack_mixed <depth> <mask> . <addr>)
      (del_stack_struct_ptr <id> . <depth>)
      (del_stack_var <depth> . <addr>)           (phantom)
      (del_stack_mvar <depth> . <addr>)
   
   The above instructions are  the same as the previous ones, except  that the datum to be
   virtually deleted is in the stack at depth <depth> instead of being in R. 
   
      mvar_slots_del_ptr
      mvar_slots_del_conn
      mvar_slots_del_function
      (mvar_slots_del . <addr>)
      (mvar_slots_del_mixed <mask> . <addr>)
      (mvar_slots_del_struct_ptr . <id>)
      (mvar_slots_del_var . <addr>)
      (mvar_slots_del_mvar . <addr>)

   Again,  these instructions  are  the same  as above  except  that they  have to  delete
   virtually the data in all the slots of the multiple variable.
   
   
      indirect_del_ptr
      indirect_del_conn
      indirect_del_function
      (indirect_del . <addr>)
      (indirect_del_var . <addr>)               (phantom)
      (indirect_del_mvar . <addr>)
      (indirect_del_mixed <mask> . <addr>)
      (indirect_del_struct_ptr . <id>)
   
   Again,  these instructions  are working  like the  above except  that the  datum  to be
   virtually deleted is pointed to by the top of stack. 
   
      (increment_del . <byte width>)
   
   This instruction is used  when we are really deleting an indirect  datum.  A pointer to
   some position within the segment to be  freed is on top of stack. This instruction just
   increments this pointer of the number of bytes indicated by its operand. 
   
      del_index_direct
   
   This instruction expects a datum of a mixed  type on top of stack, belonging to a large
   alternative. This datum is a 32 bits word which may be decomposed as follows:
   
         pppppppp pppppppp pppppppp ppppppii
   
   i.e into a field of 30 bits and a field of 2 bits. The two bits 'ii' give the number of
   the alternative the datum belongs to, and
   
         pppppppp pppppppp pppppppp pppppp00
   
   is the pointer to the segment. The role of the instruction is:
   
       - put 'ii' into the index register I
       - replace 'ii' by '00' on top of stack
       - duplicate the top of stack. 
   
   The reason why  we duplicate the top of  stack is that one pointer  will be incremented
   for walking into the segment, and the other one will be used for freeing the segment. 
   
      del_index_indirect
   
   This  instruction is similar  to 'del_index_direct',  except that  it concerns  a large
   datum.  The  index is got from  the data segment,  not from the manipulation  word. The
   pointer is also duplicated on top of stack.
     
      free_seg_0
      free_seg_1
   
   These two  instructions are just calling  the 'FreeDataSegment' function  of the memory
   allocator. The first  one transmits the word at  *(SP-1) (top of stack: depth  0 in the
   stack), whilst the second one transmits the word at *(SP-2) (depth 1 in the stack).
   
   
   
   
   
   
            *** (2.3.5.4) Closure instructions.    

   These instructions are used for constructing new closures. 
   
   
   (put_copy_direct <depth> . <position>)
   (put_copy_indirect <depth> . <position>)
   (put_copy_function <depth> . <position>)
   (put_copy_mixed <mask> <depth> . <position>)
   
   A memory  segment (the closure under construction)  is pointed to by  'R'.  The <depth>
   operand is  the stack  depth of the  datum to  be put.  The  <position> operand  is the
   position where the  datum must be put in  the micro context of the  closure. The offset
   (in bytes)  within the closure  where the datum  must be put is  4*(3+<position>), i.e.
   <position> is 0 for the first datum of the micro context, 1 for the next one, etc...
   
   For indirect data, the counter is incremented, because this is a duplication.
   
   
   (put_micro_copy_direct <depth> <micro depth> . <position>)
   (put_micro_copy_indirect <depth> <micro depth> . <position>)
   (put_micro_copy_function <depth> <micro depth> . <positions>)
   (put_micro_copy_mixed <mask> <depth> <micro depth> . <position>)
   
   Same as above, except that the datum  is located into another micro context which is at
   depth <depth> in the stack. The datum itself is at position <micro depth> in this other
   micro context. Again <micro depth> is 0 for the first datum in the micro context, 1 for
   the next one, etc...
   
   
   (put_closure_labels <code> . <del code>)

   This instruction puts its two operands at  offsets 4 and 8 into the closure (pointed to
   by 'R').

   Below is an example  of how a closure is constructed. First of  all, here is the source
   text (copy-pasted from the sources of our web site):
   
           (DBRow(MTxt) dbrt) |-> if dbrt is dbrow(_,_,t) then 
                                  if t is mtxt(_,v,time) then 
                                  success(mtxt(new_name,v,time))   
   
   and below is  the code generated for the construction of  the closure (copy-pasted from
   the .sc file):
   

         |       ; |-> in '/home/alp/www_dev/anubis-language/_1_6/translate_item.anubis' at line 451
         |       ; function code at label 31092
  146195 |    alloc 16
         |       ;--- stack ------------------------
         |       ;   0                            (return address)
         |       ;   1     i                      LangInfo
         |       ;   2.0   tdb                    DB(TDB)
         |       ;   2.1   old_name               String
         |       ;   2.2   new_name               String
         |       ;   3                            (return address)
         |       ;----------------------------------
  146197 |    put_micro_copy_indirect 2 2 0
  146201 |    put_closure_labels 31092 31093
   
   'alloc 16' allocates 4  32 bits words for the closure: counter,  code, del and just one
   datum in the micro context (the value of 'new_name').
   
   Remark that 'new_name'  is the sole local  symbol occuring free in the  above term, and
   that its value  is within an older micro  context located at depth 2 in  the stack. The
   value of  'new_name' is at position  2 in this older  micro context, so  that the right
   instruction for constructing the new micro context is:
   
  146197 |    put_micro_copy_indirect 2 2 0

   which means: 'take the datum at position 2  in the micro context at depth 2, and put it
   at position 0 in the new micro context' (of course, count a virtual copy). 
   
   ('indirect' because the type is 'String').  The last instruction:
   
  146201 |    put_closure_labels 31092 31093

   puts the address (label  31092) of the code for executing the  function and the address
   (label 31093) of the garbage collector code for deleting the closure itself.
   
   
   
   
   
            *** (2.3.5.5) Equality instructions.
   
   In the next version of Anubis, equality  code should be generated as generic code (like
   for implicit  destructors), so  that all the  instructions below will  become obsolete.
   Furthemore,  equality (with  Boolean value)  should be  generated only  for 'separable'
   types,  i.e.  type  for  which this  boolean  equality, in  the  mathematical sens,  is
   computable. From a  Topos theoretic view point, this means that  it should be generated
   only for types 'T' such that the internal equality '(T,T) -> Omega' factors through the
   canonical arrow 'Bool -> Omega' (sending  'true' onto 'true' and 'false' onto 'false').
   However, I don't know if there is an algorithm able to decide for this property. I know
   only a  recursive class of  types having this  property.  Of course,  internal equality
   exists for all (abstract) types and may be defined by Leibnitz rule.
   
   In Anubis 1, equality code is generated  for all types (except 'Float', but this may be
   the result  of some  misunderstanding). For concrete  types (address types  and several
   primitive types, structure  pointer types) equality is physical  (compare the addresses
   of the data).  For functional types it  is also physical, despite the fact that in some
   cases, we  could be  more clever (when  we know that  the product  of the types  of the
   arguments  is finite).  Theoretically, boolean  equality  should not  be generated  for
   functional types in most cases.
   
   So, we  have something  special to generate  only for  defined types. The  principle of
   comparison is as follows for 'x' and 'y' to be equal:
   
     1. 'x' and 'y' must belong to the same alternative,
     2. if so, the components must be equal two by two. 
   
   Of  course, the second  condition shows  that this  is a  recursive process.  Hence, an
   'equality  code'  is   generated  for  each  defined  type.   This  code  uses  special
   instructions, which is quite stupid,  because ordinary instructions would have done the
   job (maybe somewhat more slowly).
   
 
      (false_jmp . <addr>)
      (true_jmp . <addr>)
   
   Put either 'false' (i.e. 0) or 'true' (i.e. 1) in 'R' and jump (unconditionally) to the
   operand address. 

   
      (jmp_false . <addr>)
   
   Jump to address if the content of 'R' is 'false'.
   
    
      (jmp_eq_stack . <addr>)
   
   This instruction compares the two words on top of stack, puts 1 (true) in R if they are
   equal and performs a jump to 'addr'.  Otherwise the next instruction is executed, and R
   remains empty. 

   
      (jmp_neq_indexes_large . <addr>)
      (jmp_neq_indexes_mixed <bit width> <mask> . <addr>)
   
   
   
      (jmp_neq_string . <addr>)
      (jmp_neq_byte_array . <addr>)
      (jmp_neq 0 . <addr>)
      (jmp_neq 1 . <addr>)
      (jmp_neq 2 . <addr>)
      (jmp_neq 4 . <addr>)
   
   
   
      (eq . <depth>)
   
   This instruction test the equality of the two words at depths '<depth>' and '<depth>+1'
   in the stack.   The register R is set to  1 (true) if they are equal,  and to 0 (false)
   otherwise.
   
   
   
      eq_string
      eq_byte_array
   
   Tests the equality of the two strings or  byte arrays on top of stack (depths 0 and 1),
   and  puts either 'true'  (if equal)  or 'false'  (if non  equal) in  R. Of  course, the
   strings or byte arrays are walked into.
   
   
      push_eq_data
   
   
   
   
   
   
   
      (increment_eq . <byte width>)

   
   
   
            *** (2.3.5.6) Dynamic variables instructions. 
   
      (get_var_handler . <addr>)
      (get_mvar_handler . <addr>)
      get_var_monitors
      get_mvar_monitors
      push_mvar_length
      remove_monitor
      ret_if_zero
      get_vv
      xchg_vv
      get_mvv
      xchg_mvv
      create_var
      free_var_seg
      free_mvar_seg



            *** (2.3.5.7) Serialising/unserialising.
   
      (serialize . <addr>)
      (unserialize . <addr>)
   
   
   
   
            *** (2.3.5.8) Starting a new virtual machine. 
   
      (start <depth> . <addr>)
   
   This  instruction is  generated by  the  keyword 'delegate'  and starts  a new  virtual
   machine. '<depth>' is the number of slots on  top of the stack of the old machine which
   should be  copied into  the stack  of the new  machine. No  virtual copies  are needed,
   because they are already performed  (by 'copy_stack_...' instructions). The new machine
   starts execution at the instruction following this one, whilst the old machine jumps to
   the operand address. 

   
   
            *** (2.3.5.8) Obsolete and soon obsolete instructions. 
   
   The following are still used but should become obsolete (used for global variables):
   
      (create_vars ...)   
      (gv_address ...)    
      (init_gv ...)
      (del_gv ...)
      get_gvv
      xchg_gvv
      initialization_address
      variables_deletion_address
   
   
   The following have been never used (experimental ?):
   
      (call ...)  
      location 
      (code_for ...)  
      (type_list ...)  
      push_before_eq
      (dec3 . <addr>)
   
   
   
   
   
 
   
      *** (2.4) Unification. 
   
   Since the Anubis language allows  the use of type parameters (parametric polymorphism),
   we need to be able to unify types.
   
   'Environments' are lists of the form:
   
                                  ((v_1 . T_1) ... (v_k . T_k))
   
   where  each 'v_i' is  an 'unknown',  and where  each 'T_i'  is a  type (which  may also
   contain unknowns). Such an environment just says that:
   
           v_1 has value T_1
           ...
           v_k has value T_k
   
   If a  pair is of the  form '(v_i .  v_j)', i.e.  if the  value of 'v_i' is  the unknown
   'v_j', we assume that 'v_i > v_j'  (recall that they are 'Expr's, hence integers). This
   is to avoid some kind of circularity.

   Because  of the Lisp-like  representation, unification  is fairly  simple. It  works as
   follows,  where  'u(A,B)/E' is  the  result  of unifying  'A'  with  'B' relatively  to
   environment 'E'. The result is a new environment. '$1' and '$2' represent two arbitrary
   unknowns.
   
   
      u($1,$1)/E   = E
      u($1,$2)/E   = (($1 . $2) . E) (if $1 > $2)
      u($1,$2)/E   = (($2 . $1) . E)     (otherwise)
      u($1,T)/E                  = (($1 . T) . E)
      u(A,A)/E                   = E                   (for any atomic A)
                                     | not_unifiable   if u(A1,B1)/E = not_unifiable
      u((A1 . B1),(A2 . B2))/E   =   | 
                                     | u(B1,B2)/(u(A1,A2)/E)  otherwise
      u(A,B)/E                   = not_unifiable    in all other situations
   
     
   We have no problem  with the fake 'Expr' <Cint> because we  unify only types, and types
   never contain a <Cint>. 
   
   It may  be the case that  the two expressions to  be unified are  relative to different
   environments. In that  case we just have to merge the  two environments before unifying
   the two expression.
   
   We also  want to avoid  circularities, and to  that end we  use a non  circularity test
   (like  some Prolog  systems).  The  problem is  that  the above  rules  would give  the
   following for example:
   
          u($1,Maybe($1))/E = (($1 . Maybe($1)) . E)
   
   But this is not correct because, a type 'T' cannot be equal to 'Maybe(T)'.  Hence, when
   we have to compute 'u($1,T)/E', we use the function 'depends_on' to test if 'T' depends
   on '$1' directly or indirectly. If it is the case, the unification fails. 
   
   Important note: Unification is used each time  we have to compare types. Indeed, a type
   with unknowns  is a non  already completely known  type. The comparison may  succeed if
   some  unknowns are  instanciated.   This is  why  comparing types  without unifying  is
   generally an error.
   
   Also,  all the  computations of  unification and  interpretations of  terms  are purely
   deterministic  (no side  effect  at all).  This  warranties that  when  we try  several
   possibilities, there is no clash between them.
   
   It  is  often  the  case  that  we  want  to  dereference  a  type  relatively  to  its
   environment. This means replacing (recursively) all types unknows by their values (when
   they have one). This is performed by the command 'dereference_type'. 
   

   
   
   
   
   
      *** (2.5) Organization of the knowledge base. 
   
   The compiler stores the result of checking the paragraphs into several arrays:
   
    struct Type_struct        *types;      (array of type definitions)
    struct Operation_struct   *operations; (array of definitions, formerly called 'operations')
    struct Variable_struct    *variables;  (array of global variables (obsolete))
   
   See the definitions of these structures in 'compil.h'. 
   
   Checked paragraphs  are stored  into these  arrays, which work  like stacks.  The stack
   pointers are called 'next_type' and 'next_operation'.
   
   
  
   
   
   
   
   
      *** (2.4) Indexation of symbols. 
  
   In a  previous version of  Anubis, when the  interpretations of a symbol  were searched
   for, the array of types or of  operations were looked up sequentially.  During the year
   2003 (as far  as I can remember), it  appeared that this was too  slow (especially with
   Olivier's  program, containing thousands  of types  and symbols).   I decided  to index
   these arrays  by a hashing method.  The result was  that the compiler became  two times
   faster.
   
   The 'symbol index' is just an array  of 256 arrays of 'SymbolIndexEntry'. When a symbol
   is  searched for,  its 'hash'  is first  computed.   This hash  is an  U8 (computed  by
   'u8_symbol_hash' in 'interp.c').  With this hash  we may go directly to the right array
   of 'SymbolIndexEntry'.
   
   This secondary array  is searched for an entry corresponding to  our symbol (each entry
   contains the name  of a symbol). When such an  entry is found, we get  the index of the
   actual definition in the array of types or operations.
   
   In the future this system should be replaced by a B-tree, indexed on names of symbols.
   
   
   
   
   
   
   *** (3) The lexer. 
   
   
      *** (3.1) Generalities. 
   
   The role of  the lexer is essentially to  cut the input into 'tokens'.   'lexer.y' is a
   FLEX file,  which must be compiled by  'flex'. This produces the  file 'lex.yy.c' which
   becomes part of the C source. 
   
   However, the lexer performs several other tasks: 
   
     - handling 'read' and 'replaced by' paragraphs, computing visibilities, 
   
     - counting lines and columns, and level of nested comments, 
   
     - working on several special tokens (strings, floats, named arrows,...). 
   
   
   The lexer has several 'states' (in the sens of FLEX). They are the following:
   
     - INITIAL (used when reading comments outside paragraphs)
     - COM (used for reading /* ... */ comments)
     - PAR (used for reading paragraphs)
     - INCL (used for reading file names after either 'read' of 'replaced by')
     - STR (used for reading strings)
     - STRTL (used when a string is too long)
     - CONF (used for reading the configuration file)
     - CONFCOM (used for reading comments within the configuration file)
   
   The names  of all the  tokens returned by  the lexer are  prefixed by 'yy__'.  They are
   declared in 'grammar.y'. 
   
   
      *** (3.2) Finding source files. 
   
   The compiler must identify files clearly, not only by the name, because different files
   in different directories  may have the same  name. Notice that the path  which is given
   after the keyword 'read'  is not enough for identifying the file.  This is because this
   path is itself relative. 

   The compiler  keeps in memory the absolute  path of a 'current'  directory. Notice that
   this current directory  is a notion internal to  Anubis. It has nothing to  do with the
   current  directory in  the  sens of  the  host operating  system.   Anyway, the  Anubis
   compiler never changes the current directory in the sens of the host system. 
   
   When  the compiler  starts, the  'current' directory  is the  directory from  which the
   Anubis compiler has been launched.
   
   When the  compiler needs to find  the source file  corresponding to a path,  either the
   path given  on the command line,  or a path found  after a keyword  'read' or 'replaced
   by', it looks for the file using the following absolute paths successively:
   
      <current directory>/<given path>
      <my_anubis directory>/<given path>
      <anubis directory>/<given path>

   except if <given path> is absolute. In this case only this path is used.
   
   When the  file has been  found, the  directory of this  file becomes the  new 'current'
   directory. 
   
   
   
   
      *** (3.3) The 'read' stack. 
   
   (also called 'include stack' in the source of the compiler). 
   
   Of course,  when a 'read'  or 'replaced by'  keyword is encountered, the  compiler must
   read another file,  but must be able to  come back for reading the rest  of the current
   file. This is why it needs a stack.
   
   Each 'frame' in this stack contains the following informations:
   
     - a datum of type YY_BUFFER_STATE, needed by FLEX/BISON for each file, 
     - the absolute path of the file currently read
     - the current line number in the file currently read, 
     - the absolute path of the 'current' directory for the file currently read. 
   
   Note: at the time  of writting this, this stack is actually  made of 4 distinct stacks,
   one for each item. These stacks are growing in parallel, and hence share a unique stack
   pointer. It would be a good idea to replace this by a single stack whose slots would be
   a C structure.
   
   When the end of a file is reached, a  frame is poped from the stack, and the reading of
   the 'calling' file may be continued. 
   
   At the time of writting this text, this stack may contain at most 100 frames.
   
   
   
   
   
      *** (3.4) Remembering already read files. 
   
   The compiler remembers which files have been read. Indeed, during the same compilation,
   it may encounter several times a request for reading the same file. Of course, the file
   is read only once, and the compiler has just to adjust visibility. 
   
   The  absolute   paths  of   the  files   already  read  are   stored  into   the  array
   'already_included'  (of at most  'max_already_included' =  1024 paths,  at the  time of
   writting this text).
   
   
   
   
   
   
      *** (3.5) Computing visibilities. 
   
   A file B is visible from a file A only if the file A contains one of the paragraphs:
   
     read B
     replaced by B
   
   Normally,  a file  containing a  'replaced  by' paragraph  does not  contain any  other
   paragraph (but it  may contain off paragraph  comments). This kind of file  is called a
   'relay' file, and is used when the location of a file of the distribution is changed.
   
   'read' is not transitive. In other words, if we have
   
     read B
   
   in file A, and 
   
     read C
   
   in file B, this does not make file C visible from file A. However, if we have:
   
     read B
   
   in file A and 
      
     replaced by C 
   
   in file B, C is visible from file A. 
   
   The visibility informations  are stored into several arrays. First  of all, notice that
   each file  is identified  by an 'id'  which is the  position of  its path in  the array
   'already_included'. Notice that 'predefined.anubis' (which is 'pre-read') has id 0.
   
     - the array 'number_of_visible'  gives for each file id, the number  (of type U16) of
     files which are visible from this one.
   
     - the array  'is_relay_file' is an array of  booleans (actually an array  of U8). For
     each file it indicates if the file is a relay file or not. 
   
     - the  two  dimensional  array  'visibility'.   This  array  provides  some  kind  of
     'visibility stack' for each file id. For example, if the number of files visible from
     the file whose  file id is idA is 3, and  if the ids of these files  are idA, idB and
     idC, we have:
   
          visibility[idA][0] = idA
          visibility[idA][1] = idB
          visibility[idA][2] = idC
   
   Notice that since a file is always visible from itself, the first entry into this stack
   always contains the id of the file to which this stack belongs.
   
   When file  B becomes visible from  file A, ibB is  pushed into the  visibility stack of
   file A  (if not already  present).  Such stacks  are never poped  off, because if  B is
   visible from A, it remains visible  indefinitely. Notice that such stacks never need to
   be   enlarged,   because  the   number   of  files   read   cannot   be  greater   than
   'max_already_included', which is the size of these stacks. 
   
   So, the question is just to know when a file becomes visible from another one. First of
   all, each file must be visible from itself. Hence this is registered the first time the
   file is  encountered. As a  particular case, 'predefined.anubis' is  encountered before
   any other one. Furthermore, it must be visible from all other files.
   
   When a relay  file becomes visible from  some file, it must transmit  its visibility to
   the file it is pointing to. 
   
   
   
   
   *** (4) The parser. 

   The parser  (grammar.y) is a  BISON file. It  provides the function 'yy_parse'  used in
   'main.cpp'. It contains essentially the following parts: 
   
     - Declaration of tokens and non terminals with their types. 
     - Precedence and association rules. 
     - Grammar rules. 
   
   The  rules of precedence  and association  should not  be changed  (and their  order is
   important) because  if they are changed,  files which are currently  compiling ok could
   have syntax errors.
   
   Each  grammar  rule has  an  'action'  part (between  {  and  })  which constructs  the
   syntactical tree  corresponding to the rule.  These actions are the  official source of
   informations for the Lisp-like format of terms.
   
   
   
   
   *** (5) Computing interpretations of terms. 

   
      *** (5.1) Overview. 
   
   The Anubis language  allows overloading of symbols and  type parameters. These features
   introduce many ambiguities which must be resolved. 
   
   A  term may  have several  distinct interpretations.  For example,  when you  write the
   number:
   
                                          2
   
   it has  at least 3 interpretations, one  as an 'Int8', one  as an 'Int32' and  one as a
   'Float'.  Similarly,  a  symbol  or  a  binary operator  like  '+',  may  have  several
   definitions, hence several interpretations. 
   
   When atomic terms (like numbers and symbols) are combined together for forming compound
   terms,  like applicative  terms, conditionals,...   the compound  term may  itself have
   several  interpretation depending  on which  interpretation  we choose  for the  atomic
   subterms of this term. 
   
   Hence, what we need is a (necessarily recursive) function for computing the list of all
   interpretations  of a  term. This  function  is called  'term_interpretations' and  has
   several auxiliary functions. All these functions are defined in 'interp.c'. 
   
   Also terms must  be interpreted relative to  a 'context'. A context is  normally just a
   set  of declarations.   But in  the case  of  the Anubis  compiler, a  context is  more
   precisely an accurate representation of what will be the top of stack at run time. Such
   a context  is represented  as a list,  with the  head of list  being the  most recently
   pushed item.
   
   For example, if we want to interpret a term like this one in some context C:
   
                                     with x = a, t
   
   we must  first get  the list of  all interpretations  of 'a' relative  to C.   For each
   interpretation of 'a', we  may compute the list of interpretations of  't', but it must
   be relative to '((x . T) . C)',  where 'T' is the type found for this interpretation of
   'a', because after  'a' is computed (at run  time) its value is pushed  into the stack.
   (Actually, in  the case of this  example, the compiler  requires that 'a' has  only one
   interpretation. This  is for making the source  files more explicit and  easier to read
   for humans.)
   
   
   
   
   
      *** (5.2) Interpreting applicative terms. 
   
   An applicative term  is just made of  a function applied to arguments.  For example, it
   may be:
   
                                         f(a,b,c)         
   
   (or 'a + b'). Such a term is represented internally as:
   
                               (app <lc> [f] . ([a] [b] [c]))
   
   where <lc> is a Lisp integer ('Expr')  from which the name of the source file, together
   with the line and the column of the term may be recovered.
   
   In order to interpret this term, the compiler must first compute all interpretations of
   the  function  itself.  It  also  computes  all the  interpretations  of  the tuple  of
   arguments. This may lead to a very big number of interpretations.  For example, if 'a',
   'b'  and   'c'  have  respectively  2,   4  and  7  interpretations,   this  yields  56
   interpretations for  the tuple '(a,b,c)'. If  the number of interpretations  of a tuple
   exceeds say  one million, the compiler may  become very slow. The  remedy (comming from
   the  user) is  to type  explicitly each  argument  in order  to reduce  this number  of
   interpretations.
   
   I've tried  to proceed another way. Since  for each interpretation of  the function, we
   get types for  the arguments, I've tried to recompute the  interpretations of the typed
   arguments in each  case. But experimentations showed that this is  even slower. For the
   future, we need  so more clever mecanism.  This problem seems to be  similar to logical
   problems encountered in optimisations of SQL queries. 

   Now, for each  interpretation of the function and, each interpretation  of the tuple of
   arguments, we must unify the signature of the function (type of the tuple of arguments)
   with  the type  of the  actual tuple  of arguments.   If this  unification  fails, this
   combination is  impossible. This  is how the  number of  interpretations may stay  at a
   reasonable level.
   
   
   
      *** (5.3) Interpreting conditionals. 
   
   The test is first interpreted. The compiler requires (again for making the source files
   more  explicit and  easier to  read for  humans) that  the test  has one  and  only one
   interpretation. For each interpretation of the test,  the type of the test is known. Of
   course  all   interpretations  whose  type  is   not  a  type   with  alternatives  are
   rejected. However,  components in this type  may still be unknown,  the essential being
   that alternatives are known.
   
   Next, the compiler  analyses the cases. We  must have one case per  alternative, and in
   the same order.
   
   For each alternative, the compiler checks that the  name of the case is the same as the
   name of the  alternative, and that the number  of resurgent symbols is the  same as the
   number of  components. If a resurgent symbol  is explicitly typed, its  type is unified
   with  the type  of the  coresponding  component. If  this unification  fails, an  error
   message is sent.
   
   For each case, the  compiler constructs a new context for interpreting  the body of the
   case. Indeed,  resurgent symbols are declarations  which enrich the  context.  This may
   yield several interpretations  for each body of case. The  compiler constructs the list
   of all  interpretations of the tuple of  bodies. Again (as for  applicative terms) this
   may lead to a big number of  interpretations and slow down the compiler or even exhaust
   the memory.
   
   For  each such  interpretation of  the tuple  of bodies,  the types  of the  bodies are
   unified. This  is because  in a  conditional, the bodies  must all  have the  same type
   (which will be the  type of the conditional). When the unification  fails, the tuple is
   rejected. This leads in general to  a reasonable number of interpretations of the tuple
   of bodies. Here we could easily be more clever by unifying interpertations of bodies as
   soon as they are computed, i.e. unifying the types of the first and second bodies, then
   the type of the third body with the type of the first two and so on. 
   
   
   
   
   
   
      *** (5.4) Interpreting selective conditionals. 
   
   The selected  alternative is found  using the name  of the head  of case. If  there are
   several  alternatives with  this  name, the  compiler  checks the  number of  resurgent
   symbols.   If there are  several alternatives  with the  same name  and same  number of
   resurgent symbols,  the compiler  uses the declarations  (if any)  of the types  of the
   resurgent  symbols. These  types  are unified  with  the actual  types  taken from  the
   definition of the type of the test. 
   
   Of course, if  there is an 'else' case, the  type of this case must be  the same as the
   type of the body of the selected case. This is checked by unification. 
   
   
  
   
   
   

   *** (6) Implementation of types. 
   
   Notice that we implement only 'actual types', not type schemes (for the time being; see
   the discussion  on quantified  types). An  actual type is  a type  with no  unknown nor
   parameter.
   
   Types in Anubis 1 are of the following sorts:
   
     - primitive types: String, Int32, Listener, ...
   
     - defined types: types which are defined by a paragraph, even if this paragraph is in
       predef.c or predefined.anubis.
   
     - functional types: (T_1,...,T_k) -> U
   
     - address types: RAddr(T), WAddr(T), RWAddr(T), GAddr(T) (obsolete), Var(T), MVar(T). 
      
     - structure  pointer types:  These  types  are used  for  encapsulating C  structures
       comming from linked libraries (like SSL or SQLite3).
   

      *** (6.1) Implementation of primitive types. 

   There is a particular implementation for each primitive type. 

   
         *** (6.1.1) 'String'.
   
   A string in Anubis 1 is implemented as a pointer:
   
      *
      | 
      V
      +-------+-------------------------------+---+
      |  cnt  | a   b   c   d  ....           | 0 |
      +-------+-------------------------------+---+
      0       4
   
   The first 4 bytes are the counter  for the garbage collector. Notice that strings which
   are defined statically  in the source files, have  a 0 at this place,  meaning that the
   string is permanent (never collected).
   
   Starting at offset 4,  are the characters of the string.  A trailing  0 is added at the
   end, like in C.
   
   
         *** (6.1.2) 'ByteArray'. 
   
   A byte array is a pointer as follows:
   
      *
      | 
      V
      +-------+-------+-------------------------------+
      |  cnt  |   n   |  a   b   c   d  ....          |
      +-------+-------+-------------------------------+
      0       4       8
   
   where again 'cnt'  is the counter, and 'n'  the number of bytes in the  byte array (not
   including  'cnt' and  'n'). The  bytes are  stored starting  at offset  8. There  is no
   trailing 0.
   
   
   
   
   
         *** (6.1.3) 'Int32'. 
   
   An 'Int32' is just a 'int' of C (of  32 bits of course), i.e. a 32 bits word containing
   the number itself.
   
   
   
   
   
         *** (6.1.4) 'Float'. 
   
   Because data in Anubis 1 cannot have more than 32 bits, double precision floating point
   numbers are implemented indirectly (which is very bad for performances):
   
      *
      |
      V
      +--------+-----------------+
      |  cnt   | IEEE754 number  |
      +--------+-----------------+
      0        4                 12
   
   where again 'cnt' is the counter for the garbage collector. 
   
   
   
   
   
         *** (6.1.5) 'Listener'. 
   
   Despite the  fact that  'Listener' is a  primitive type,  it is implemented  exactly as
   network address types. See below. 
   
   
   
   
   
   
      *** (6.2) Implementation of defined types. 
   
   Defined types  (types with alternatives, defined  by paragraphs beginning  by 'type' or
   'public type') are  implemented according to 3 different  methods: 'small', 'mixed' and
   'large'.
   
   First of  all, the  width of  the type  is computed. The  width is  the number  of bits
   required for accomodating all data of this type. It is computed as follows. 
   
   Some kind of base  2 logarithm of the number of alternatives  is computed.  This is the
   smallest number of bits required  for accomodating all alternative numbers (also called
   'indexes'). It is:
   
      0         for at most 1 alternative       (example: 'Empty' or 'One')
      1         for 2 alternatives              (example: 'Bit' or 'Bool')
      2         for 3 to 4 alternatives
      3         for 5 to 8 alternatives
      4         for 9 to 16 alternatives
      etc...
   
   Since the maximal number of alternatives is limited to 256, the width of an index field
   is at most 8. 
   
   This logarithm of the number of alternatives is called the 'index width' for this type.
   
   Now, each alternative has its own width. It  is the sum of the index width of the type,
   and  the  widths  of  all  components  of  the  alternative.   For  example,  the  type
   'Maybe(Int8)' has 2 alternatives:
   
        failure
        success(Int8)
   
   the index width is 1 (2 alternatives).  The  width of the first alternative is 1 (= 1 +
   0; no  component), and  the width of  the second alternative  is 9  (= 1 +  8), because
   'Int8' is of width 8 (as expected).
   
   Now, the  width of the  type is the  supremum of the width  of its alternatives  (0 for
   empty types). So, it is 9 for 'Maybe(Int8)' for example. 
   
   These computations are performed in 'typewidth.c'. 

   Notice that  this width is  not well defined  in the case of  a recursive type.   For a
   recursive type (or a cross recursive type), the width is infinite (which corresponds to
   the fact that the type itself is infinite (if it is not empty)).  This is why we do not
   compute the width of recursive types.  Hence, we need a tool for detecting if a type is
   recursive  (maybe  cross recursive  through  other types).   This  tool  is defined  in
   'rectype.c'.  Notice  that a  recursive type which  is empty  (see may happen:  see the
   discussion on  'NonEmptyList' in 'predefined.anubis') is still  considered as recursive
   by Anubis 1, and its width is still infinite.
   
   The width of a type is used to  determine the number of bits needed for holding a datum
   of this type.  Above a certain limit,  and in particular for recursive  types, the type
   must be implemented indirectly, i.e. through a pointer to a segment. 
   
   The width discussed above will be called the 'theoretical width'. 
   
   Hence, we have another notion: that of 'real width'. The real width, for Anubis 1 is an
   integer between 0 and 33 included.  The value 33 means that the theoretical width is at
   least 33, and that  the type will be implemented indirectly.  Values  from 0 to 32 mean
   that the type will be implemented directly within that number of bits.
   
   Of course, allowing 64  bits or 128 bits direct data in  the future, means updating all
   these programs.
   
   If the real width of a type is at most 32, the type is 'small'. It cannot be recursive,
   and all its components, and the components of its components, and so on, are small. Any
   datum of this type may be represented on 'width' bits.
   
   If the real width is 33, there are two cases:
   
     - the number of alternatives is at most 4: the type is 'mixed'.
     - the number of alternatives is 5 or more: the type is 'large'. 
   
   If  a type is  large, all  data of  this type  are represented  indirectly, i.e.   as a
   pointer to a memory segment:
   
      *
      |
      V
      +----------+---+--------------+--------+---------------+
      |   cnt    | i |     c1       |   c2   |      c3       |
      +----------+---+--------------+--------+---------------+
      0          4   5
   
   The first  4 bytes of  the segment are  always occupied by  the counter ('cnt')  of the
   garbage  collector.  The next  byte (at  offset 4)  is the  alternative index  for this
   datum. It indicates  to which alternative the datum belongs. Starting  at offset 5, are
   the representations of the components of the datum.
   
   Since all pointers returned  by the memory allocator are multiples of  4 (the two least
   significant bits  are always  0), these two  bits are  available for any  purpose. I've
   decided to use them  for storing the alternative index (this is  why this method cannot
   be applied if there are more than 4 alternatives). This has the advantage of avoiding a
   pointer to  a segment for  some alteratives. For  example, the type 'List(T)'  (for any
   type 'T') is  mixed. The first alternative  has width 1, the second  alternative has an
   infinite width. Hence, it is tempting to  avoid the use of an indirection for the first
   alternative. This is what the 'mixed' implementation realizes.
   
   Hence, the empty list (for any type 'T') is  just the 32 bits word 0, while a non empty
   list looks like this:
   
        pppppppp pppppppp pppppppp pppppp01
   
   where  the  least significant  two  bits (01)  indicate  alternative  number 1  (second
   alternative), and where:
   
        pppppppp pppppppp pppppppp pppppp00
   
   is the pointer:

      *
      |
      V
      +----------+-------+---------------+
      |   cnt    | head  |     tail      |
      +----------+-------+---------------+
      0          4   
   
   Notice that  there is no  alternative number at  offset 4 (since  it is already  in the
   least significant bits of the datum itself), and that components are stored starting at
   offset 4. 'tail' always occupies 4 bytes, but 'head' may occupy 0, 1, 2 or 4 bytes. 
  

   
   
      *** (6.3) Implementation of functional types. 
   
   Functions are implemented in two different ways depending on how they are defined. 
   
     1. If the function is  defined by a paragraph, it is just  represented by the address
        of its code.
   
     2. If the function is defined by the arrow '|->' (or by the recursive arrow '|-f->'),
        it is represented as a 'closure'. A closure is a pointer to a segment like this:
   
      *
      |
      V
      +----------+----------+----------+----------+----------+----------+...
      |   cnt    |  code    |   del    |   x1     |   x2     |   x3     |
      +----------+----------+----------+----------+----------+----------+... etc
      0          4          8          12         16         20         24
     
   where 'cnt' is  the counter of the garbage  collector as usual. All slots  in a closure
   occupy 4 bytes. The  slot 'code' is the address of the code  of the function. The slots
   'x1', 'x2', etc... contain the values of the symbols which where present in the context
   at the time the function (i.e. the  closure) was created, and which are refered to from
   within the  body of the function  definition.  This part  of the closure is  called the
   'micro context'. The slot 'del' contains the address of the garbage collector code able
   to properly destroy the closure itself.
     
   When we have a fonction at hand (i.e. a 32 bits word representing a datum whose type is
   functional), we must be able to decide  if the function is implemented as a simple code
   address or as  a closure. Again we take  advantage of the fact that  pointers to memory
   segments are divisible  by 4. If the pointer  is even (least significant bit  = 0), the
   function is  a closure. If it is  odd (least significant bit  = 1) it is  a simple code
   address. For this reason  the code of a function defined at  top level (by a paragraph)
   is always  positioned at an odd  address.  This is the  reason of the  existence of the
   instruction 'odd_align' (which is of course of size 1 and never executed).
   
   
   
   
   
   
   
   
      *** (6.4) Implementation of address types. 
   
   There are two sorts of address types:
   
      1. File and network address types: 'RAddr(T)', 'WAddr(T)', 'RWAddr(T)', 'Listener'
      2. Dynamic variable address types: 'Var(T)', 'MVar(T)'.
   
   We ignore GAddr(T), which is obsolete. 
   
   
         *** (6.4.1) File and network address types. 
   
   A datum of such a type is a pointer:
   
      *
      |
      V
      +----------+---+---+---+---+---------------------------------+
      |   cnt    | t | m | _ | _ |           12 bytes              |
      +----------+---+---+---+---+---------------------------------+
      0          4   5   6   7   8                                 20
   
   As usual the first 4 bytes are for  the counter of the garbage collector. The next four
   bytes are:
   
       t      type indicator
       m      mode indicator
       _      not used
       _      not used
   
   The next  12 bytes starting  at offset 8  are used depending on  the value of  the type
   indicator.
   
   The type indicator may take the following values ('conn' is for 'connection', a synonym
   (!?)  of 'address'):
   
      conn_invalid             invalid connection (should never happen)
      conn_file                file on disk
      conn_listener            listening TCP/IP connection (server: type 'Listener')
      conn_network             client TCP/IP connection
      conn_ssl_network         client SSL connection
      conn_local               (obsolete: local variable)
      conn_global              (obsolete: global variable)
      conn_stdin               standard input
      conn_stdout              standard output
      conn_stderr              standard error output
   
   The values of the above are defined in 'vm.h'. 
   
   The mode is an OR of the following flags:
   
      conn_read             readable connection
      conn_write            writable connection
      conn_read_pw          (experimental: connection with password access)
      conn_write_pw         (idem)
      in_progress           (connection not yet up)
   
   The 12 bytes after offset 8 are used as follows:
   
   
   Files (conn_file):
   
               +---+---+---+---+---+---+---+---+---+---+---+---+
               |       *       |   |   |   |   |   |   |   |   |
               +---+---|---+---+---+---+---+---+---+---+---+---+
                       V
                       FILE (in the sens of C)
   
   Network (conn_network, conn_listener):
   
               +---+---+---+---+---+---+---+---+---+---+---+---+
               | socket handle |   |   |   |   |   |   |   |   |
               +---+---+---+---+---+---+---+---+---+---+---+---+
                       
   SSL connection (conn_ssl_network):
   
               +---+---+---+---+---+---+---+---+---+---+---+---+
               |  (SSL *)      | TCP/IP handle | (char *)      |
               +---+---|---+---+---+---+---+---+---+---|---+---+
                       V                               V
                       SSL struct of OpenSLL           server common name
   
   Standard connections (conn_stdin, conn_stdout, conn_stderr):
   
               +---+---+---+---+---+---+---+---+---+---+---+---+
               |   |   |   |   |   |   |   |   |   |   |   |   |   (not used)
               +---+---+---+---+---+---+---+---+---+---+---+---+
   
   

   
         *** (6.4.2) Dynamic variable address types. 

   A simple dynamic variables (type 'Var(T)') is a pointer:
   
     *
     |
     V
     +-----------+-----------+-----------+-----------+
     |   cnt     |  value    |     n     |     *     |
     +-----------+-----------+-----------+-----|-----+
     0           4           8           12    |     16
                                               V
                                               monitors table
   
   The segment contains 4 32 bits words:
   
      - at offset 0: the counter
      - at offset 4: the current value of the variable
      - at offset 8: size of monitors table (number of slots)
      - at offset 12: pointer to monitors table
   
   'n' should be 0  exactly when the pointer to monitors table  is NULL. When the variable
   is created (by instruction  'create_var'), n is 0 and the pointer  to monitors table is
   NULL.
   
   A monitor is just a datum of type 'One -> One' (most often it is a closure). 
   
   
   Multiple dynamic variables (type 'MVar(T)'). This is again a pointer:
   
     *
     |
     V
     +----------+----------+----------+----------+----------+-----------------------
     |    cnt   |    ns    |    nm    |    *     |   nbw    |       data ...              
     +----------+----------+----------+----|-----+----------+-----------------------
     0          4          8          12   V     16         20
                                           monitors table
   
   In this segement, we have:
    
      - at offset 0:    garbage collector counter
      - at offset 4:    number of slots for data in the variable
      - at offset 8:    number of slots for monitors
      - at offset 12:   pointer to the table of monitors
      - at offset 16:   normalized bit width (see below)
      - at offset 20:   data
   
   A multiple dynamic  variable is nothing else  than a 'mutable array'. This  array has a
   size (number of data of type T that we can put in the array). 
   
   Normally, we should do in such a way that slots for data has just the size required for
   type T.  For  example, if we have  a multiple variable of type  'MVar(Bool)', each slot
   should  occupy just  1  bit, not  32  bits. If  we  have a  multiple  variable of  type
   'MVar(Maybe(Int8))', each data slot should occupy 9 bits. 
   
   Because  of byte  boundaries, we  use only  powers of  2 for  the number  of bits  of a
   slot. Hence, for example, in a multiple variable of type 'MVar(Maybe(Int8))', each data
   slot will be  of 16 bits, the least number of  bits which is a power of  2 and at least
   equal to 9.
   
   This numner of bits is called the 'normalized bit width'. It may be computed as follows
   in C:
   
     if (bw <= 0)   nbw = 0;  else
     if (bw <= 1)   nbw = 1;  else
     if (bw <= 2)   nbw = 2;  else
     if (bw <= 4)   nbw = 4;  else
     if (bw <= 8)   nbw = 8;  else
     if (bw <= 16)  nbw = 16; else
     nbw = 32;
   
   where 'bw' is the real bit width, and 'nbw' the normalized bit width. 
   
   However, for  the time being, Anubis  1 sets nbw =  32. This should be  worked out, and
   maybe, we should also consider 24 as an acceptable normalized bit width.
   
   
     
  
   
      *** (6.5) implementation of C structure pointer types. 
   
   A datum of type '£StructPtr(Name)' is as follows:
   
      *
      |
      V
      +--------+--------+
      |  cnt   |    *   |
      +--------+----|---+
      0        4    |   8
                    V
                    C structure
      
   i.e.   it is  a pointer  to a  segment of  8 bytes  containing only  a counter  for the
   garbage collector, and a pointer  to the actual C structure of name  'Name'. Names of C
   structures are declared in 'bytecode.h'.
   
   
   
   
   
   
   *** (7) Code generation. 

   The   function    'compile_term'   (in   'compile.c')   generates    byte   code   from
   interpretations. Below, we describe how this code is generated.
   
   
      *** (7.1) General conventions. 
   
   Code generation is a recursive  process (just because interpretations themselves are of
   a  recursive nature).  The  code resulting  from the  compilation of  an interpretation
   should behave always  the same way with respect  to the stack and the  register R.  Our
   conventions are the following.
   
   First of all, consider  an interpretation '(head . env)'. We may  of course replace all
   occurences  of a type  unknown in  'head' by  its value  as given  by 'env'.  Hence, we
   consider only interpretation heads, and assume that they contain no unknowns. 
   
   An  interpretation head  must be  compiled relative  to a  context. This  context  is a
   precise description of what the stack will contain at run time. We represent context by
   lists, the head of list being the top of stack. Each element in the list corresponds to
   a slot in the stack. Furthermore, the stack  in Anubis 1 is homogeneous. Each slot is a
   32 bits word, able to contain a datum of any type. 
   
   The Exprs which are listed in a context are the following:
   
      (<symbol> . <type>)      
   
   This  means that  this slot  contains a  datum of  type <type>,  and that  its  name is
   <symbol>.
   
      (ret . <arity>)
   
   This means  that this  slot contains  a return address,  corresponding to  a call  to a
   function of <arity> arguments.
   
      (f_micro_ctxt <fname> <ftype> (<symbol> . <type>) ...)
   
   This slot contains a 'closure' (i.e.   function defined by '|->').  '<fname>' is either
   'nil' (in the case of '|->'), or "f" (in the case of '|-f->'). '<ftype>' is the type of
   the  function. The  list  of  '(<symbol> .  <type>)  is the  micro  context proper.  It
   indicates in which order  the data remembered by the function are  placed in the memory
   segment of the closure.
   
      (argument . <type>)
   
   This means that  the slot is occupied by  a datum which has been pushed  into the stack
   for being  an argument to  a function call.  This datum is  anonymous, but its  type is
   known.

   For example, you may see below how the compiler prints the context:
   
        (
          (argument . "SSL_Connection")
          (argument . "Bool")
          (argument . "DenialOfService")
          (ret . ...)
          ("connection" . "SSL_Connection")
          (f_micro_ctxt nil ... ("sites" app_ts "List" "Web_Site_Description") 
                                ("dos" . "DenialOfService"))
          (ret . ...)
        )
   
   into a '.sc' file:
   
         |       ;--- stack ------------------------
         |       ;   0                            SSL_Connection
         |       ;   1                            Bool
         |       ;   2                            DenialOfService
         |       ;   3                            (return address)
         |       ;   4     connection             SSL_Connection
         |       ;   5.0   sites                  List(Web_Site_Description)
         |       ;   5.1   dos                    DenialOfService
         |       ;   6                            (return address)
         |       ;----------------------------------

   
   If '<head>' is an interpretation head, and 'Ctxt' is a context, we denote by:
   
                                        [<head>]Ctxt
   
   the sequence of instructions resulting  from compiling '<head>' relative to the context
   'Ctxt'. 
   
   Convention  1:  The   stack  after  execution  of  the   sequence  of  instructions  in
   '[<head>]Ctxt' is the same as before this execution.
   
   Convention 2: After the execution of '[<head>]Ctxt', R contains the value of '<head>'. 
   
   
   
   
   
      *** (7.2) Elimination of terminal recursion. 
   
   Since the language has no notion of  loop, it is of fundamental importance to eliminate
   terminal recursion. Roughly speaking, this amounts to replace the sequence:
   
     (apply . k) 
     (ret . 0) 
   
   by:
   
     (collapse . k)
     (apply . k)
   
   Indeed, when  the first sequence is  on the point to  be executed, the  situation is as
   follows:
   
       stack                       R
       --------------------------------------------
       a_1 ... a_k r s ...         f
   
   where 'a_1,...,a_k' are  the arguments of the function, 'r' the  return address and 'f'
   the function itself. 's' is the return address of the previous call. 
   
   '(apply . k)' will perform a jump to the code of the function (found in 'R'), and after
   the return from  this function call, 'a_1 ...  a_k r' have been removed  from the stack
   which becomes:
   
                     s ...
   
   Then the instruction '(ret  . 0)' removes 's' from the stack  and jumps to address 's',
   so that the stack becomes:
   
                       ...
   
   The other sequence of instructions works  as follows. '(collapse . k)' removes 'r' from
   the stack, which becomes:
   
         a_1 ... a_k s ...
   
   Then '(apply . k)' call the function 'f',  and after the return of this call, the stack
   is:
   
                       ...
   
   as expected. 
   
   Actually,  what it  does is  just avoid  to  perform the  instruction '(ret  . ?)'  two
   consecutive times (one time with address 'r', and one time with address 's'). 
   
   Actually, the  situation is more complex because,  in general we have  the arguments of
   a previous call between 'r' and 's', so that the stack looks like this:
   
       a_1 ... a_k r b_1 ... b_i s ... 
   
   When we return from the call to 'f', the stack becomes:
   
                     b_1 ... b_i s ... 
   
   and the next instruction  cannot be '(ret . i)' (except if i ==  0), because the top of
   stack 'b_1' is  not a return address.  In this situation,  terminal recursion can still
   be eliminated.  If the recursion is terminal, the next instructions are:
   
      (del_stack ...)
      ...
      (del_stack ...)
      (ret . i)
   
   (as many 'del_stack' or 'collapse' as they  are arguments, i.e. 'i'). At that point the
   machine deletes the arguments from the stack and performs the 'ret'.
   
   So the general situation is the following sequence of instructions:
   
      (apply . k) 
      (del_stack_? . <depth>)
      (collapse . <depth>)
      ...
      (ret . i) 
  
   and the stack is the following:
   
       stack                                   R
       --------------------------------------------
       a_1 ... a_k r b_1 ... b_i s ...         f
   
   We replace this sequence by the following one:
   
      (collapse . k) 
      (del_stack_? . k+<depth>)
      (collapse . k+<depth>)
      ...
      (apply . k) 
   
   Here is how it works: 

      Instructions                              Stack
      ----------------------------------------------------------------------------------
                                                a_1 ... a_k r b_1 b_2 b_3 ... b_i s ...
      (collapse . k)                              a_1 ... a_k b_1 b_2 b_3 ... b_i s ...
      (del_stack_? . k+<depth>)                       a_1 ... a_k b_2 b_3 ... b_i s ...
      (collapse . k+<depth>)                              a_1 ... a_k b_3 ... b_i s ...
      ...
                                                                      a_1 ... a_k s ...
      (apply . k)                                                                   ...
   
   The main  difference with  the original  code is just  that 'b_1  ... b_i'  are deleted
   before the call to 'f'.
   
   
   
   
   
   
   
   
   
      *** (7.3) Generating the code. 
   
   
   
         *** (7.3.1) Protect. 
   
   The interpretation head has the following form:
   
      (protect <lc> . <head>)
   
   The code generated is:
   
       [(protect <lc> . <head>)]Ctxt = 
   
         label a:
            protect
            [<head>]Ctxt
            unprotect a
            <end code>
   
   The 'protect' instruction  is of size 2  (2 bytes).  The first byte  is the instruction
   name 'protect', and the  second byte is the 'lock'. This byte is  zero when the code is
   generated.
   
   At run time, the instruction 'protect' checks the  value of the lock (which may be 0 or
   1). If it is  1, the subsequent code is locked, and the instruction  gives up. If it is
   0, the piece of code  is not locked. So the instruction puts 1 in  the lock so that the
   subsequence code is now locked, and IP (Instruction Pointer) is incremented by 2.
   
   The subsequent  code '[<head>]Ctxt'  is executed. Then  the instruction  'unprotect' is
   reached. This instruction puts 1 into the  byte at offset 'a+1' (i.e. into the lock) so
   that the code is now unlocked.
   
   
   
         *** (7.3.2) Lock. 
   
         *** (7.3.3) Global symbols. 
   
   Global symbols are those which are defined by paragraphs. 
   
   
         *** (7.3.4) Strings, Int32, small data and Floats. 
   
         *** (7.3.5) Local symbols. 
   
         *** (7.3.6) Microlocal symbols.
   
         *** (7.3.7) Closures.
   
   The code generated must construct a closure. Hence, it begins by the instruction 'alloc'.
   
       [(closure <lc> (f_micro_ctxt fname ftype (sym . type)...) <args> . <body>)]Ctxt =
   
          alloc 3+n                 32 bits words (n = size of micro context)
          put_copy y_0              in closure (at 3+0 word)
          ...
          put_copy y_n-1            in closure (at 3+n-1 word)
          put_closure_labels a d
          <end code>
   
   'n' is the  number of values to be put  in the micro context. Each  value is taken from
   the stack,  virtually copied (if needed),  and put at  its place in the  micro context.
   This is  what the instructions 'put_copy  ...'  are doing.   However these instructions
   are  specialised  depending  on  the  type  of  the datum  to  be  put.   The  function
   'get_closure_copy' returns the list of these instructions.  It receives the description
   of  the micro  context  of  the closure  to  be constructed,  and  the current  context
   (description of the current stack).
      
   When this is  done, the closure is quite  ready. Only the code of the  function and the
   code for  garbage collection of the  closure is missing.  These codes are put  into the
   closure by the instruction 'put_closure_labels'. 
   
   Of course,  these two codes  must also be  generated by the  compiler. The code  of the
   function is generated as follows:
   
       label a: 
          [<body>]<args>+(<micro_context>)+(ret . k)
          del args
          del_closure
          ret k
      
   When a function represented by a closure is called the stack is prepared as follows:
   
                       a_1 a_2 ... a_k  c  r ...
   
   (the top  of stack on the  left side), where 'a_1'  ... 'a_k' are  the arguments (first
   argument on  top of stack), 'c'  the closure itself,  and 'r' the return  address. Just
   before the function returns the stack must become:
   
                                           r ...
   
   So  the   body  of  the  function   is  compiled  relatively  to   a  context  (denoted
   <args>+(<micro_context>)+(ret . k) above) representing  the above stack. When this code
   has finished, the value to be returned by the function is in R.  At that point, we just
   have to delete (virtually) the arguments and the closure from the stack, and return.
  
   
   
         *** (7.3.8) Applicative terms. 
   
         *** (7.3.9) Conditionals.
   
         *** (7.3.10) Selective conditionals. 
   
         *** (7.3.11) Local definitions ('with ...'). 
   
         *** (7.3.12) Delegate. 
   
         *** (7.3.13) Serialize/Unserialize. 
   
   
   
   
   
   
   
      *** (7.4) Peephole optimisation. 
   
   
   
   
   
   
   *** (8) Garbage collector code. 
   
   
   
   *** (9) Equality code. 
   
   
   
   *** (10) Serialisation/unserialisation. 

   Serialisation   is   also   called    'marshalling'.    Anubis   1   has   a   built-in
   serialisation/unserialisation mechanism. In the  next version of Anubis, this mechanism
   should be written in Anubis using macros. It will be much simpler (and safer). 
   
   In principle  any abstract datum could be  serialised. In the current  system, only the
   data of the following types are serialisable:
   
   - the primitive types: String, ByteArray, Int32, Float,
   - defined types, provided that all their components are serialisable. 
   
   Functional types  should in principle  be serialisable, but  they are not in  Anubis 1.
   This would require a more sophisticated  mechanism, which has also something to do with
   dynamically loadable libraries.
   
   
   Serialisation is  not performed by a  piece of code,  but by a piece  of 'pseudo-code'.
   The  same   'pseudo-code'  is  used   for  serialising  and  for   unserialising.   The
   'pseudo-instructions' have just a different  execution. Hence the compiler produces (if
   needed) only one such pseudo-code per serialisable type.
   
   Actually the  pseudo-code is essentially a  description of the  type of the data  to be
   serialised/unserialised.
   
   Note (2006/08/16): I just remarked, looking at the '.sc' file of the web site, that the
   compiler generates pseudo code paragraphs like this one: 
   
---------|--------------------------------------------------------
         |
         |         * * * type implementation pseudo code * * *
         |
   42901 |  label 590 
         |       ;--- stack ------------------------
         |       ;   0                            Var(List(Int32))
         |       ;   1                            (return address)
         |       ;----------------------------------
   42901 |    type_large_switch 1336 
   42907 |  label 1336 
   42907 |    large_alt_begin 9
   42909 |    indirect_type_mixed 2 591
   42915 |    pop1
   42916 |    large_alt_end 0
   42918 |    jmp 1335
   42923 |  label 1335 
   42923 |    swap
   42924 |    ret 1
---------|--------------------------------------------------------

   Strange, isn't  it ? Because 'Var(...)' is  not serialisable ! However,  I also checked
   that such pieces of code  are never called (by 'serialize/unserialize' instructions). I
   also checked with the following example:
   
global define One
   truc
     (
       List(String) args
     ) =
   forget(serialize(var(args))). 
   
   that  the compiler  doesn't see  the problem.   This is  probably due  to the  very bad
   implementation  of 'Var(...)' which  is implemented  as a  large type,  not as  a 'Var'
   type. This must be  fixed as soon as possible. The best  way is to implement 'Var(...)'
   correctly.
 
   
   

      *** (10.1) How it works. 
   
   Serialising means  'transforming into a sequence  of bytes'. This sequence  of bytes (a
   byte array)  may be saved on  disk, sent over  a network connection etc...   Of course,
   serialisation is interesting because it  does'nt loose information. Serialised data may
   be unserialised.
   
   Now, when we  are unserialising a byte array,  expecting a datum of a  certain type, we
   are not always sure  that the byte array is the result of  the serialisation of a datum
   of this  type. Hence,  unserialisation may  produce errors. This  is why  the primitive
   'unserialize' returns a datum of type 'Maybe($T)'. 
   
   Notice that when a datum is serialised, its type is not written into the serialisation.
   Hence, when  unserialising a  datum we need  to know  the type of  the datum.  In other
   words, the type is the 'key' (or the 'grammar') for unserialising. 
   
   Serialisation is fairly simple. It works as follows:
   
     - String: the characters of the string, followed by a zero byte are written. 
   
     - ByteArray:  a 4  bytes word  is written  in little  indian order,  representing the
     number of bytes in the byte array. Then the bytes of the byte array are written. 
   
     - Int32: The integer is written as a little endian 4 bytes number. 
   
     - Float: The floating point number is written in the IEEE754 8 bytes format. 
   
     - small type:  The datum is written either  as a 0 byte,  1 byte, 2 bytes  or 4 bytes
     word. 
   
     - mixed  type: The  index  of the  alternative  is first  written in  the  form of  a
     byte. Then the components are serialised. But in the case of a small alternative, the
     components are serialized as a 'small datum',  i.e. a 32 bits word containing all the
     datum of the  alternative (index included). As a consequence, in  this case the index
     is redundantly serialized. See  the pseudo-instruction 'indirect_type_mixed' for more
     informations.
   
     - large type: Same as for mixed types. 
   
   The  result of  serialisation  is  a byte  array  which needs  to  be allocated  before
   serialisation begins. The virtual machine has special registers for serialisation:
   
     U8 *serial_buf;              buffer for serialisation
     U32 serial_size;             size of this buffer
     U32 serial_next;             next free position in the buffer
     Bool unserial_failed         flag for unserialisation
   
   When  serialisation begins  (i.e. when  the instruction  'serialize' is  executed), the
   datum to  be serialised  is on  top of the  stack. The  instruction 'serialize'  has an
   address  operand  pointing to  the  pseudo  code for  the  type  of  this datum.   This
   instruction works  like a call  (it calls  the pseudo code),  hence must push  a return
   address.   However,  the  return  address  is  inserted just  below  the  datum  to  be
   serialized. The  instruction also changes the  'work sort' of the  virtual machine from
   'computing' to 'serializing',  adn allocates a data segment of  some default size whose
   address  is put  in  'serial_buf'. 'serial_size'  is  intialized to  the  size of  this
   segment, and 'serial_next' is initialized to 8  = 4+4, so as to accomodate room for the
   garbage collector counter  and the word  for the size  of the resulting byte  array. Of
   course, if the segment cannot be  allocated, the instruction gives up, waiting for next
   time.
   
   Form now on, we are serializing, executing some pseudo-code. This pseudo-code ends with
   the instructions 'revert_to_computing' and 'ret'.
   
   'revert_to_computing'  changes  the  work   sort  from  'serializing'  to  'computing',
   reallocates the  resulting byte array  at the  right size, puts  the size at  the right
   place  in the  byte  array, puts  the  new  byte array  in  R, and  a  null pointer  in
   'serial_buf'.
   
   'ret' returns from the pseudo-code to the normal code. 
   
   
   
   Unserializing works as follows. 
          
   The byte  array 'b' to be  unserialized is in  R. The 'unserialize' instruction  has an
   address as its unique operand:

   unserialize <addr>

   It is the address of a piece of pseudo-code, namely the pseudo-code for the type 'T' of
   the datum to be extracted from the byte array.

   What 'unserialize' does is just:

        1. put (the address of) 'b' into the 'serial_buf' register, and
           initialize related registers ('serial_size' and 'serial_next'),
        2. put 0 in the flag 'unserial_failed'
        3. push a return address 'r',
        4. put the machine in the state 'unserializing', 
        5. jump to <addr>

   Hence, just after 'unserialize' has executed, we have:

        R            stack          serial_buf    unserial_failed
     (empty)         r ...              b               0

   The execution of the pseudo code yields one of the following two situations:

   case 1: unserialization succeeded

        R            stack          serial_buf    unserial_failed
        d              ...              b               0

   case 2: unserialization did not succeed
            
        R            stack          serial_buf    unserial_failed
        d              ...              b               1
             
   In both cases, we get a datum 'd' of type 'T'. However, 

     - if 'unserial_failed'  is 0,  d is  the datum successfully  extracted from  the byte
       array,
   
     - if 'unserial_failed' is 1, d is a valid datum of type T, at least from the point of
       view of the garbage collector. Nevertheless, it is non significant.

   The return address 'r' has been removed from the stack.  The byte array 'b' is still in
   the 'serial_buf' register.   At that time, the machine is  still in the 'unserializing'
   state. The next instruction:

        revert_to_computing <width>

   has  the responsability  of testing  'unserial_failed', virtually  delete 'b',  put the
   machine in the state 'computing', and:

   case 1 (unserail_failed == 0):
   
     - put 'd' into a 'success', so as to construct a datum of type 'Maybe(T)', and push a
       zero on the stack. Hence, the situation becomes:

        R                stack           serial_buf        unserial_failed
        success(d)       0 ...           (empty)           (empty)

   case 2 (unserail_failed == 1):
   
     - push 'd' on the stack, and put  a zero ('failure' of type 'Maybe(T)') in R.  Hence,
       the situation becomes:

        R               stack           serial_buf        unserial_failed
        failure         d ...           (empty)           (empty)

   The next instruction is:

        del_stack_? 0

   which will delete correctly the top of  stack, hence yielding in any case the following
   situation:

        R            stack          serial_buf    unserial_failed
        result           ...           (empty)           (empty)

   where result is of type 'Maybe(T)'. 

   So, if the initial situation is:

        R         stack
        (empty)       ...

        after execution of:

        [term]
        unserialize <addr>
        revert_to_computing <width>
        del_stack_? 0

   it becomes:

        R         stack
        d           ...

   where 'd' is the result of the unserialization (of type 'Maybe(T)'). 

   Notice that the <width> operand is  used by 'revert_to_computing' on order to construct
   'success(d)'. Indeed,

     - if <width> <= 31,   success(d) is just     (d<<1)|1
     - if <width> >= 32,   success(d) requires a new data segment of two words, with:
 
        - at offset 0:   counter (= 1)
        - at offset 4:   d 
             
   and  the pointer  to this  data  segment must  be ORed  with  1 (mixed  index for  type
   'Maybe').

   To summarize, the code for (unserialize <lc> <type> . <term>) is:

        [<term>]
        unserialize <addr>
        revert_to_computing <width>
        <end code>

        except if the declared type is ByteArray. In that case, we have:

        [<term>]
        success 32            needed to put the byte array into a 'success'
        <end code>
   
   
   
   
      *** (10.2) Descriptions of small types. 
   
   They are lists on integers defined by the following grammar: 
   
   <na> :=   8 bits integer (number of alternatives)
   <iw> :=   8 bits integer (index width)
   <nc> :=   8 bits integer (number of components)
   
   <small type descr> :=    <iw> <na> <alt descr> ...   (as many <alt descr> as <na>)
   
   <alt descr> := <nc> <small type descr> ...   (as many as <nc>)

   Hence a small type description is a list of 8 bits integers. These integers become part
   of  serialization/unserialization instructions  like 'type_8',  'type_16'  etc...  (see
   'vminstr.c'). One question is: `how long this  description may be ?'. The answer is not
   so obvious !  Remember that the number  of alternatives is limited to 256, and likewise
   for the  number of components in  an alternative.  Now, if  'iw' is the  number of bits
   required for the index,  the total width of the components is at  most: 'bw - iw' where
   'bw' is  the bit width of  our type.  So, let  'dl(iw,bw)' be the maximal  length for a
   description of a type of index width 'iw' and bit width 'bw'. We have,
   
                       dl(iw1,bw1) =< 2 + (2^iw1 * (1 + bl(iw2,bw1 - iw1)))
   
   
   
   
   
   
      *** (10.3) Pseudo-instructions set. 
   
   The official list of pseudo-instructions  is defined in 'bytecode.h'. Recall that these
   instructions have two meanings: one for serialisation and one for unserialisation. 
   
   
                             action while                      action while
                             serialising                       unserialising
   ---------------------------------------------------------------------------------------
   swap                      swaps the top of stack            nothing
   
   type_0                    nothing                           nothing
   type_8                    
   type_16
   type_32
   type_String
   type_ByteArray
   type_Int32
   type_Float
   
   indirect_type_0
   indirect_type_8
   indirect_type_16
   indirect_type_32
   indirect_type_String
   indirect_type_ByteArray
   indirect_type_Int32
   indirect_type_Float
   indirect_type_mixed
   indirect_type_large
   
   type_mixed_switch
   type_large_switch
   
   
   mixed_alt_begin
   large_alt_begin
   mixed_alt_end
   large_alt_end
   
   type_small_alt
   
   revert_to_computing
   
   type_mixed (obsolete; never used)
   type_large (obsolete; never used)
   small_mixed_alt (obsolete; never used)
   
   
      *** (10.4) Serialising. 
   
   
   
      *** (10.5) Unserialising. 
   
   
   
      The machine is now doing unserialization. When unserialization begins (just after
      the instruction 'unserialize'), the machine is in the following state: 

                R            stack          serial_buf    unserial_failed
             (empty)         r ...              b               0

      where 'r' is a  return address (to be removed by the 'ret' at  the end of the pseudo
      code, and where 'b' is a byte array.

      During unserialization, the registers serial_buf and serial_size remain constant. On
      the contrary,  serial_next is incremented while  the byte array is  read.  It always
      gives the offset (relative to serial_buf) of the next byte to be read.

      Below, we simulate  a successful unserialization for various  types. The instruction
      'swap' does nothing during unserialization.

      Below, 'p' is a pointer to a newly allocated segment, whose size (in bytes) is given
      by  the  instruction  ?_alt_begin.   'e'  is  the  result  of  unserialization  when
      instructions call  another pseudo code.  'pop1'  does not pop anything,  but put the
      content of  R at the  address pointed to  by the top  of stack, and  increments that
      pointer.  'indirect' instructions  which do not call another  pseudo code, put their
      result at the address pointed to by the top of stack.

small type: Bool
                                                   R            stack 
      50 |  label 20                            (empty)         r ... 
      50 |    type_8                               d            r ... 
      51 |    swap
      52 |    (ret . 1)                            d              ... 


primitive type: String

                                                   R            stack    
      50 |  label 4                             (empty)         r ...    
      50 |    type_String                          d            r ...    
      51 |    swap
      52 |    (ret . 1)                            d              ...    


mixed type: List(String)

                                                   R            stack    
      50 |  label 15                            (empty)         r ...    
      50 |    (type_mixed_switch 2 30 29)
      61 |  label 30                            (empty)         r ...    
      61 |    type_32                              d            r ...    
      62 |    (jmp . 28)
      67 |  label 29 
      67 |    (mixed_alt_begin . 12)            (empty)  p+4  p r ...    
      69 |    indirect_type_String              (empty)  p+8  p r ...    
      70 |    (indirect_type_mixed 2 . 15)         e     p+8  p r ...    
      76 |    pop1                              (empty)  p+12 p r ...    
      77 |    (mixed_alt_end . 1)               d = p|1         r ...    
      79 |    (jmp . 28)
      84 |  label 28 
      84 |    swap                                 d            r ...    
      85 |    (ret . 1)                            d              ...    


large type: Printable_tree          (defined in basis.anubis)

                                                   R            stack    
     130 |  label 20                            (empty)         r ...    
     130 |    (type_large_switch 5 42 41 40 39 38)
     152 |  label 42 
     152 |    (large_alt_begin . 5)             (empty)  p+5  p r ...    
     154 |    (large_alt_end . 0)               d = p|0         r ...    
     156 |    (jmp . 37)
     161 |  label 41 
     161 |    (large_alt_begin . 13)            (empty)  p+5  p r ...    
     163 |    indirect_type_String              (empty)  p+9  p r ...    
     164 |    (indirect_type_large . 20)           e     p+9  p r ...    
     169 |    pop1                              (empty)  p+13 p r ...    
     170 |    (large_alt_end . 1)               d = p|1         r ...    
     172 |    (jmp . 37)
     177 |  label 40 
     177 |    (large_alt_begin . 10)            (empty)  p+5  p r ...
     179 |    indirect_type_8                   (empty)  p+6  p r ...
     180 |    (indirect_type_large . 20)           e     p+6  p r ...
     185 |    pop1                              (empty)  p+10 p r ...
     186 |    (large_alt_end . 2)               d = p|2         r ...
     188 |    (jmp . 37)
     193 |  label 39 
     193 |    (large_alt_begin . 13)            (empty)  p+5  p r ...
     195 |    indirect_type_Int32               (empty)  p+9  p r ...
     196 |    (indirect_type_large . 20)           e     p+9  p r ...
     201 |    pop1                              (empty)  p+13 p r ...
     202 |    (large_alt_end . 3)               d = p|3         r ...
     204 |    (jmp . 37)
     209 |  label 38 
     209 |    (large_alt_begin . 13)            (empty)  p+5  p r ...
     211 |    (indirect_type_large . 20)           e     p+5  p r ...
     216 |    pop1                              (empty)  p+9  p r ...
     217 |    (indirect_type_large . 20)           e     p+9  p r ...
     222 |    pop1                              (empty)  p+13 p r ...
     223 |    (large_alt_end . 4)               d = p|4         r ...
     225 |    (jmp . 37)
     230 |  label 37 
     230 |    swap                                 d            r ...
     231 |    (ret . 1)                            d              ...
      

    Hence, instructions work as follows during unserialization (assuming no error occurs). 

      'direct type' instructions: 

        type_0
        type_8
        type_16
        type_32
        type_Int32
        type_String
        type_ByteArray
        type_Float
        --> unserialize a datum (by reading the byte array) and put it in R        

      'switch' instructions:
        type_mixed_switch
        type_large_switch
        --> unserialize an index (always a byte), check it using second operand (number of 
          alternatives), and jump to the right label. 

      'alt_begin' instructions:
        mixed_alt_begin
        large_alt_begin
        --> allocate a segment, push the pointer, duplicate it, and add either 4 or 5 to the 
          copy. 

      'indirect non calling type' instructions:
        indirect_type_0
        indirect_type_8
        indirect_type_16
        indirect_type_32
        indirect_type_Int32
        indirect_type_String
        indirect_type_ByteArray
        indirect_type_Float
        --> unserialize a datum and put it at the address pointed to by pointer on top of 
          stack. Increment this pointer by the size of the datum. 

      'indirect calling type' instructions:
        indirect_type_mixed 
        indirect_type_large
        --> Call a pseudo code. This has the effect of putting an unserialized datum in R 
         upon return. 

      'pop1'
        --> put the content of R at the address pointed to by the top of stack, and increment
         that pointer (always by 4). 

      'alt_end' instructions:
        mixed_alt_end
        large_alt_end
        --> remove the pointer on top of stack, put the next one in R, and OR it with the 
         index (given by the instruction itself). 


    Now, unserialization may  fail, because the bytes in the byte  array are arbitrary. As
    soon as such  a failure is detected, the  flag 'unserial_failed' is set to  1. In that
    case,  we must  continue  a 'fake'  unserialization,  because, we  must complete  data
    partially constructed.   This is the reason  why all instructions must  first test the
    flag.

    If the flag is 0 they operate  as described above. Nevertheless, since they may detect
    a problem they may also put 1 in the flag.

    'Fake execution' of the above instructions is as follows: 

      'direct type' instructions:    put a value of 0 in R as the result of the unserialization
      'switch instructions':         jump to the first alternative, which always exists
      'alt_begin' instructions:      do not allocate a segment, but do as if, by pushing a NULL
                                       pointer and duplicating it (hence we have two NULL pointers
                                       on top of stack
      'indirect non calling type' instructions: there are two cases:
         - the two pointers on top of stack are NULL:
             do not do anything
         - the two pointers on top of stack are non NULL:
             put a value of zero as the result of unserialization, and increment the topmost
             pointer by the right value. 
      'indirect calling type' instructions: again there are two cases:
         - the two pointers on top of stack are NULL:
             do not do anything
         - the two pointers on top of stack are non NULL:
             do not call any pseudo MAM(m_code_begin), and put 0 in R. 
      'pop1': two cases:
         - the two pointers on top of stack are NULL:
             do not do anything
         - the two pointers on top of stack are non NULL:
             put 0 at the address pointed to by the top of stack, increment the pointer.
      'alt_end' instructions: two cases:
         - the two pointers on top of stack are NULL:
             remove the two NULL pointers and put a NULL pointer in R,
         - the two pointers on top of stack are non NULL:
             remove the topmost pointer, and put the other one in R, and OR it with the index.

      Notice that any instruction which detects a failure, and puts 1 in the flag, should do as if
      the flag was already set. 

     
      Finally, we must also discuss 'revert_to_computing'. After the unserialization, the state of
      the machine is: 

                R            stack          serial_buf    unserial_failed
                d              ...              b               0/1

      where 'd' is a datum, which is significant only if unserial_failed is 0. 

      In any case, the byte array in serial_buf must be virtually deleted, and a zero must
      be put in serial_buf (this will be checked by 'unserialize'/'serialize').

      If unserial_failed is: 
 
            0      then  construct 'success(d)' in R (use the <width> operand to know the 
                     bit width of type of d), and push a zero in the stack, so that
                     the next instruction (del_stack_?) may operate. 
            1      then  push R in the stack (it will be deleted by the next instruction), 
                     and put 0 (failure of type 'Maybe(T)') in R. 
        
      Note: the sequence of instructions:

          unserialize <addr>
          revert_to_computing <width>
           
      is always followed by:

      del_stack_? 0               (for the type of the unserialized datum)
   
   
   
   
   *** (11) Outputing the code (making an *.adm file). 
   
   
   
   
   
   *** (12) How to make a system of macros ? 
   
   
   
      *** (12.1) Overview.
   
   Anubis has no system  of macros, but should have one. This  would (as already remarked)
   induce  many simplification  in  the system,  and  at the  same  time provide  powerful
   programming tools for the  user. The best system of macros ever  created is that of the
   Lisp language. This system was natural  in Lisp because of the particularly simple form
   of the syntax.  Of  course, the syntax of Anubis is much more  complex, but as we shall
   see, constructing a system 'à la Lisp' is still possible.
   
   What a language is good for ?   It is good for representing 'things' (belonging to some
   'target' world) by  'expressions'. For example, the natural  language represents things
   of the real word by words, sentences, etc...  but also abstract things like statements,
   etc...  Of course, each  language has  a syntax  (grammar) but  also a  semantics.  The
   'semantics' is  the correspondance between  the expressions (the 'signifiers')  and the
   thing they represent (the 'signified').
   
                                       the semantics
                                             |
            symbolic side (language)         |        semantic side (target world)
        -------------------------------------+----------------------------------------
                                             |
                                             |
                 expression   |----------------------------> thing 
                (signifier)                  |               (signified)
                                             | 
   
   
   Remark  that in most  cases (programming  languages, mathematics,  but not  the natural
   language),  the  symbolic side  is  more concrete  (less  abstract)  than the  semantic
   side. Indeed, for example in mathematics, we have many expressions for representing the
   number (say) 2, like 2, 1+1, 5-3, etc... and these expressions may be manipulated using
   computation  rules. So,  this  is in  some sens  very  concrete. On  the contrary,  the
   signified,  i.e. the  mathematical object  2,  is a  complete abstraction  by its  very
   nature.
   
   So, we use  the language for manipulating the  things (or at least we  give ourself the
   illusion that  we manipulate the  things, because what  we actually manipulate  are the
   expressions, for example when we are computing). This leads us in some circumstances to
   write down several  times the same expressions or at least  similar expressions. So, we
   may want to make  this writing automatic. In order to do that,  we need to consider the
   expressions themselves as a new sort of  things (the 'meta-things') and to create a new
   language (the 'meta-language') for manipulating the meta-things:
   
                         meta-semantics                semantics
          meta-language         |     language              |   target world
       -------------------------+---------------------------+--------------------------   
                                |                           |
         meta-expression |-----------> expression |--------------> thing
                                |                           |
                                |                           |

   In most cases,  the meta-language is different from the language  and more simple. This
   is the  case for the  C language,  where the expressions  of the meta-language  are all
   those which begin by: #define #if etc... (a very poor language actually). 
   
   However  in Lisp, the  meta-language is  Lisp itself.  The creators  of Lisp  were much
   inspired when they did that. This has several consequences:
   
      1. The meta-language is as powerful as the language itself.
      2. The language is also the meta-meta-language, the meta-meta-meta-language, etc...
   
   Fans of Lisp (like me) know how this feature is powerful. 
   
   
   
   
      *** (12.2) The trick. 
   
   In order  to have  the language be  its own  meta-language, it is  enough to  embed the
   language  into its  own target  universe.  In  other words,  it should  be  possible to
   consider all expressions of the language as data of the target universe: 
   
   
                      +------------------------------------------------+
                      |                                                |
                      |  target universe (all sorts of data)           |
                      |                                                |
                      |                                                |
                      |                                                |
                      |                                                |
                      |                                                |
                      |       +--------------------------+             |
                      |       |                          |             |
                      |       |   the language           |             |
                      |       |                          |             |
                      |       |  (a subset of the        |             |
                      |       |         universe)        |             |
                      |       |                          |             |
                      |       |                          |             |
                      |       +--------------------------+             |
                      |                                                |
                      |                                                |
                      |                                                |
                      +------------------------------------------------+
   
   Hence  the  language,  being  able  to  represent all  data,  may  also  represent  the
   expressions of the  language itself. It may also be able  to represent expression which
   themselves represent expressions  and so on, so  that the language is at  the same time
   the meta-language, the meta-meta-language, and so on... This is the big trick. 
   
   
   
      *** (12.3) How it works. 
   
   Basically, we  need only a set  of types (called 'syntactical  types') for representing
   expressions, and an operator (named  the 'expansion' operator) for forcing the compiler
   to evaluate data of these types. We  may also have a 'quoting' operator for simplifying
   the writting of data of these types.
   
   One  of the  main types  for representing  expressions in  Anubis should  be a  type of
   'terms':
   
   public type Term:
      symbol(String name), 
      applicative(Term function, List(Term) arguments), 
      function(List(Declaration) arguments, Term body), 
      conditional(Term test, List(Case) cases), 
      ...
   
   The  quoting operator  (denoted:  ' )  should simplify  the  writting of  data of  type
   'Term'. For example, instead of writting:
   
      conditional(symbol("x"), 
                  [
                    case([("failure",[])],symbol("false")),
                    case([("success",[symbol("u")])],
                             applicative(symbol("f"),[symbol("u")]))
                  ]
                 )
   
   we would have just to write this:
   
       'if x is 
          {
            failure    then false, 
            success(u) then f(u)
          }
   
   (Remark the quoting mark  in front of this term.) The above term  is not of type 'Bool'
   but just  of type  'Term'.  Thus, the  quoting mark  just means: 'don't  interpret this
   term, just consider it as a term'.
   
   Now, we may write down functions for producing terms (of type 'Term'), type definitions
   and  all  sorts  of  'syntactical  concepts',  considered as  data  of  our  semantical
   universe. From now on, the expressions of  the language may be manipulated as any other
   kind of data.
   
   Manipulating  expressions as data  is called  'meta-programming'. But  this is  not the
   whole story, because the  destiny of a datum of type 'Term' or  of some similar type is
   at the very end to be interpeted (evaluated).
   
   When we want to evaluate a datum (say 't') of type 'Term', we just have to write:
   
      @t
   
   '@' is the 'expansion'  (or 'evaluation') operator. In some sens it  is the converse of
   quoting. It transforms a datum of type  (say) 'Term' into what this datum represents if
   we consider it as an expression.
   
   
   
      *** (12.4) An example. 
   
   The  most difficult  is to  design the  types: 'TypeDefinition',  'Definition', 'Term',
   'Case', etc... They should  be the perfect image of the whole  language. As an example,
   we consider only a small fragment of this family.
   
public type Type:
  scheme     (String name, List(Type) operands),
  functional (List(Type) source_types, Type target_type),
  ...
 
public type Component:
  anonymous  (Type type), 
  named      (Type type, String name). 
   
public type Alternative:
  singleton  (String name), 
  usual      (String name, List(Component) components). 
     
public type TypeDefinition:
  public     (String name,
              List(String) parameters, 
              List(Alternative) alternatives),
  private    (String name,
              List(String) parameters, 
              List(Alternative) alternatives). 
   
public type Paragraph:
   type_definition(TypeDefinition), 
   ...
   
   
   Now, we may want to define this
   
define TypeDefinition
  enumerated_type
    (
      String       type_name
      List(String) alternative_names
    ) =
  private(type_name,
          [],
          map((String n) |-> singleton(n), alternative_names). 
   
   
define List(String) my_names = ["ga","bu","zo", "meu"].
   
@enumerated_type("Shadok",my_names). 
   
   In this last paragraph (a 'macro-paragraph' beginning by '@'), the term:
   
                                enumerated_type("Shadok",my_names)
   
   is of type  'TypeDefinition'. Because of the presence of '@',  the compiler will expand
   it into an actual paragraph, namely this one:
   
type Shadok:
   ga,
   bu,
   zo,
   meu.
   
   
   The syntax of 'macro-paragraphs' is:
   
@t
   
   (with  '@'  in  the  leftmost  column)  where  t is  a  term  of  type  'Paragraph'  or
   'List(Paragraph)'. In  the case of a list  of paragraphs the compiler  expands the term
   into a sequence of paragraphs.
   
   
   Of course, the expansion operator should also be usable for the expansion of terms (and
   any other  sort of syntactical entity), not  only for paragraphs.  For  example, we may
   write something like this:
   
define Term
   my_cond
     (
       Term          test
       List(String)  alts
     ) =
   conditional(test,
               map((String n) |-> singleton_case(n,length(n)),
                   alts)). 
   
   Now, we may write (within a paragraph):
   
                                    @my_cond(_x,my_names)   
   
   which will be expanded into the term:
   
        if x is 
          {
            ga   then    2, 
            bu   then    2,
            zo   then    2,
            meu  then    3
          }
   
   where 'x' is the expansion of '_x'. 

   The expansion may also  be implicit. For example if a term  has an interpretation whose
   type is a syntactical  type, the compiler may also try to  interpret its expansion. And
   conversely  the quoting  may also  be implicit.  However, this  may eventually  lead to
   problems and this should be experimented.