$begin $define(article)(0)() $define(documentation)(0)() $input($anubisdir/library/MAML4/basis.maml) $input($anubisdir/library/anubis_doc.maml) $input($anubisdir/library/names.maml) $htmloptions(justify:true) $define(term)(1)($code(240,240,240)($att($1))) $title(The Anubis HTTP server (version 2)) $subtitle(User's documentation) $center($italic(Alain Prouté)) Before you start reading this documentation, you should compile the program $fname(example_web_site.anubis) in the directory $fname(library/http). This will produce a $em(example web site), launch the HTTP server on port number 2000 (or 2001, if 2000 is already in use, etc.). Then, open a browser, enter the URI $tt(http://127.0.0.1:2000) in the address field, and you will have an idea of some of the features of the HTTP server. Later on, after you have read this documentation, you can use a copy of this example program as a template for constructing your own web site. $tableofcontents $section(Introduction) This document describes how to use the $Anubis HTTP/HTTPS server version 2, written in 2020. The previous $Anubis HTTP/HTTPS server was written in 2003, and it was high time to write a new one from scratch. This new server follows very closely the specifications given in several more recent RFCs, namely RFC 7230, RFC 7231, RFC 7232, RFC 7233, RFC 2234 and RFC 3986. Not all the features defined in these RFCs are implemented, but we have implemented what is useful for a well behaved nowadays generalist web server, including the possibility of streaming videos, together with some capabilities of our own, some of which where already present in version 1. Special attention has been paid to security. Web servers can be subject to attacks and we have implemented passive defenses together with active ones and recommandations to web site designers. Of course, the server can use TLS encryption, which is now recommended in all situations. The $Anubis HTTP/HTTPS server, despite the fact that it runs within a single thread of the underlying operating system, is capable of serving simultaneously several web sites, and of course, several clients for each web site. The server can start in HTTP or in HTTPS mode or in both modes. We have implemented a $em(session) mecanism that allows to follow clients from page to page. For example, if a client chooses a language in the first page, the other pages will be served in this language thanks to the presence of $em(session tickets) within the web pages. This system is automatic. You don't have to worry about it, but you must follow the instructions given below for using it. Tightly linked to session tickets is a system of login/logout that is also implemented in this version of the server, so that you don't have anything to do in this respect except use the tools described below. Each web site has a so-called $em(web site directory) on the server. A subdirectory $fname(public) is created within this web site directory, where $em(only public resources) should go. Indeed, the content of this directory and its subdirectories will be freely accessible by any client. Confidential resources must be stored outside the directory tree whose root is this $fname(public) subdirectory of the web site directory. A mecanism for $em(private downloading) of files is implemented that allows only an authorized client to download a given file (which is of source not located in the $fname(public) subtree on the server). Authorizations are produced by strong cryptographic mecanisms. Because web sites are realized as secondary $Anubis modules, they can be modified and recompiled at any moment and the server will automatically reload them without stopping. Furthermore, the server monitors its $em(configuration file) and reloads it when it is modified without stopping. Among the projects we have for our web server is the ability to obtain server certificates automatically via the ACME (Automatic Certificate Management Environment) protocol. $section(Configuring and starting the server) The $Anubis HTTP/HTTPS server version 2 has several configuration parameters that must be given within a $em(configuration file). Starting the server is performed as follows: $term( anbexec http_server $nt(path of configuration file) ) Of course, you can start several HTTP/HTTPS servers provided that you create several configuration files, but they will need to listen on different ports. Since listening on ports in the range 0-1023 requires a privilege, $att(anbexec) must be setup correctly if you want to use these ports. The commands for this setup are (under Linux, for example for version 1.19.3): $term( sudo chown root ~/bin/anbexec-1-19-3 sudo chmod ug+s ~/bin/anbexec-1-19-3 ) (recall that $att(anbexec) is just a shell calling $att(anbexec-1-19-3)). $subsection(Format of a configuration file) $subsection(What can be dynamically changed in the configuration) $subsection(How the server interprets standard HTTP headers) HTTP headers have a semantics that is defined in RFC 7231 and others refered to from within RFC 7231. Cache-Control Expect: 100-continue Host: Max-Forwards (TRACE and OPTIONS only) Pragma Range TE If-match If-None-Match If-Modified-Since If-Unmodified-Since If-Range Accept Accept-Charset Accept-Encoding Accept-Language Authorization Proxy-Authorization From Referer User-Agent $section(Creating a web site) The $Anubis HTTP server is a primary module (to be executed by $att(anbexec)), and each web site is loaded as a secondary module. Hence, creating a web site requires the construction of such a module. $subsection(The type of a web site secondary module) The type of secondary modules that define web sites already contains by itself many informations on how to construct a web site for the $Anubis web server. Indeed, from the point of view of this server, a web site is nothing other than a datum of this type. Here it is: $acode( public type WebSiteV2: web_site( List(String) names, String web_site_directory ). ) $subsubsection($att(names)) These are the names that are acceptable as the value of the $tt(Host) HTTP header. For example, this can be $att($["127.0.0.1", "www.my_site.com", "my_site.com"$]). In this example, $att("127.0.0.1") is ment to be used during the development of the web site. $subsubsection($att(web_site_directory)) This indicates the path of the directory on the server's disk the web site can use a its $em(dedicated directory) (the so-called $em(web site directory)). Prefer an absolute path that will make this information independent of where $att(anbexec) is started from. The server creates (if they don't already exist) the following subdirectories in the web site directory: $list( $item $fname(public/) This is where only public resources should go. Indeed, the content of this directory and its subdirectories is freely accessible by any client of the web site. $item $fname(states/) This is where the server stores session states. $item $fname(members/) This is where the server keeps the database of registered clients (members). $item $fname(private_download/) This is where the server puts files that are ready for private dwonloading. $item $fname(upload_temporary/) This is where the server stores files that are uploaded before they are moved to another place. $item $fname(journal/) This is where the server puts informations on the history of server events. ) Of course, you can create other subdirectories in the web site directory, for example for installing a database. $subsection(Layout and styling) $subsection(Home page) Each web site is supposed to have a $em(home page) that can be obtained from the name of web site. For example, the name $fname(google.com), without any further information, yields the Google home page. This home page is the default entry point for your web site. $subsection(When the client clicks) Within a web page of your web site, a client can click on various buttons and links. Such a click can trigger one of several kind of effect: $list( $item trigger a purely local action (JavaScript program) that does not require a connection to a server (local action), $item leaving the web site and visit another one (foreign link), $item download a file or other resource from your web site that does not need to be computed on the fly (server link), $item trigger an action on your server that will compute another web page on the fly (server action). ) $subsection(HTML elements and CSS) $subsection(Web sockets) $subsection(Login/logout) The server has a ready to use mecanism for login/logout. You have to use the HTML element provided below, which appears as a login form if the client is not logged in and as a logout form otherwise. The appearance of this gadget is determined by some customizable CSS. From now on, we call a client a $em(visitor) if he/she is not logged in, and a $em(member) if he/she is. Session tickets are provided for both visitors and members. $subsection(Session tickets and session states) The server remembers $em(session informations) attached to each visitor. Of course, these session informations are different for visitors and for members. Session informations are never transmitted over the network. They are stored within the $fname(states/) subdirectory of the web site directory. To each such state is associated a ticket (a cryptographic hash), that is placed within the web page sent to the client. When the client triggers an action, the session ticket is sent as $em(form data) to the server. This allows the server to recover the state of the client and to compute a new state for this client. $subsection(Streaming) HTTP/1.1 has, as explained in RFC 7233, the capability of serving a range of bytes from a resource instead of the whole resource. This can be used for streaming, but also for gracious recovery after a connection is cut in the midst of a downloading. This version of the $Anubis HTTP server implements this feature, and you have essentially nothing to do with regard to it. You shall only construct your web sites according to this possibility, for example for the streaming of videos. $section(Security considerations) $subsection(Reviewing RFC 7231 recommandations) A (non exhaustive) list of possible attacks is given in section 9 of RFC 7231. They are discussed below. $subsubsection(Attacks Based on File and Path Names) The question is of controling the access to the file system on the server. For example, if the server accepts $att(..), or $att(~) within an URI, a client could possibly download a file located anywhere on the server. The $Anubis server allows access to the $fname(public/) directory associated to the web site under consideration and to its subdirectories. It is not necessarily a good idea to completely disallow double dots because it can be the case for example that an HTML page located somehere in the $fname(public/) subtree, refers to an image that is not in one of the subdirectories of the location of this HTML file. For example, the HTML file can contain something like $att(). This is acceptable if $fname(theimage.png) is still within the $fname(public/) subtree. To this end, the server checks that the resource is indeed within the $fname(public/) subtree. If this fails, an error message $tt(404. Not$~found.) is sent to the client, and the server records the IP address and browser fingerprint into its dictionary of $em(dubious clients). See below how to manage dubious clients. Isolated dots are normal in URIs, for example in $fname(theimage.png). Even an URI containing $att(.) as one of the directory names does not create a problem. $subsubsection(Attacks Based on Command, Code, or Query Injection) The $Anubis server is not much concerned by this problem because it will never consider anything present in the request line (or header's values) as an executable command. For example, if an attacker inserts SQL commands into the request line, they cannot be executed by the server even if you use a SQL database. Also, recall that the request line should never contain sensitive informations. For example, sessions tickets are never inserted into the request line. They are transmited as part of form data or within cookies. $subsubsection(Disclosure of Personal Information) This question is to be addressed by the designer of the web site. Of course, encryption should be used for transfering any personal information. The $Anubis server also provides a mecanism for $em(private download) so that only those clients that are allowed can access certain resources (including files), and these resources are of course not located within the $fname(public/) subtree of the server. $subsubsection(Disclosure of Sensitive Information in URIs) As explained in RFC 7231, URIs are intended to be shared, not secured, even if they identify secured resources. Anyway, we have already said several times in this documentation that a request line should never contain any sensitive or private information. This is mainly under the responsability of the web site designer. This has to do with the distinction between the two HTTP methods $att(GET) and $att(POST). In the case of $att(GET), there is normally no body in the request (RFC 7231 section 4.3.1). Sensitive informations should preferably be in the body of a $att(POST) request. Another question is the use of the $att(Referer) HTTP header. This header gives the URI of the resource from which the request originates. The $Anubis server checks that any request that contains a session ticket has a $att(Referer) header pointing to the right resource. If not, the request is rejected and the client marked as dubious. $subsubsection(Disclosure of Fragment after Redirects) A $em(fragment) is this part of the request line that comes after the $att(#), indicating a precise position within a web page or any other resource, with a semantics that depends on this resource. Under some circumstances, this fragment can be forwarded to another web site, which is why it should not contain sensitive informations. Again, since the fragment is part of the request line, it should anyway not contain any sensitive information. $subsubsection(Disclosure of Product Information) Here the question is that some HTTP headers, $att(User-Agent), $att(Via) and $att(Server) contain information on which particular software is used by the client. This information may help attackers, but the $Anubis server is not sensible to such things. $subsubsection(Browser Fingerprinting) According to RFC 7231 section 9.7, browser fingerprinting is a set of techniques for identifying a specific user agent over time through its unique set of characteristics. Here are some HTTP headers that can provide informations on the client: $att(From Cookie User-Agent Accept Accept-Charset Accept-Encoding Accept-Language). Because such informations should be considered as confidential, the web site designer should use them only for a good honest reason. Now, these informations are also useful for preventing attacks on the server. This is why they are part of dubious clients informations. They allow the server to reidentify a client previously marked as dubious and possibly to deny access. This, together with other methods can be an efficient tool against denial of service attacks. $subsection(Handling dubious clients) Each time a $em(dubious client) is detected, informations about this client are recorded into a dictionary. The server uses an algorithm for handling such clients. You can setup parameters in the configuration file that affect how this algorithm works. This algorithm works as follows. $list( $item When a problem arises, administrators are warned by email. The email provides a link to a web page where the administrators can follow the evolution of the situation in real time, and from where they can trigger actions. $item It associates a $em(level of dubiousness) to each dubious client, and a $em(level of trust) to each registered client. $item It refuses access to clients depending on the level of dubiousness of the client (configurable). $item It deletes the dubious client informations after some time (or never) depending on the level of dubiousness (configurable). $item It rejects connections whose behavior is suspect (configurable). $item In case it begins to be overwhelmed by connections, it restricts access through a special web page that only allows clients to login. Already logged in clients can continue to operate, unless a suspect behavior is detected, also depending on their trust level (configurable). ) $end