http_server_user_doc.maml 17.6 KB

   $begin
   $define(article)(0)()
   $define(documentation)(0)()
   $input($anubisdir/library/MAML4/basis.maml)
   $input($anubisdir/library/anubis_doc.maml)
   $input($anubisdir/library/names.maml)
   $htmloptions(justify:true) 
   $define(term)(1)($code(240,240,240)($att($1)))

   $title(The Anubis HTTP server (version 2))
   
   $subtitle(User's documentation)
   
   
      $center($italic(Alain Prouté))
   
   
   Before you start reading this documentation, you should compile the program $fname(example_web_site.anubis) in the
   directory $fname(library/http). This will produce a $em(example web site), launch the HTTP server on port number 2000
   (or 2001, if 2000 is already in use, etc.). Then, open a browser, enter the URI $tt(http://127.0.0.1:2000) in the
   address field, and you will have an idea of some of the features of the HTTP server. 
   
   Later on, after you have read this documentation, you can use a copy of this example program as a template for
   constructing your own web site.
   

   $tableofcontents

   
   $section(Introduction)
   This document describes how to use the $Anubis HTTP/HTTPS server version 2, written in 2020. The previous $Anubis
   HTTP/HTTPS server was written in 2003, and it was high time to write a new one from scratch. This new server follows
   very closely the specifications given in several more recent RFCs, namely RFC 7230, RFC 7231, RFC 7232, RFC 7233, RFC
   2234 and RFC 3986. Not all the features defined in these RFCs are implemented, but we have implemented what is useful
   for a well behaved nowadays generalist web server, including the possibility of streaming videos, together with some
   capabilities of our own, some of which where already present in version 1. 
   
   Special attention has been paid to security. Web servers can be subject to attacks and we have implemented passive
   defenses together with active ones and recommandations to web site designers. Of course, the server can use TLS
   encryption, which is now recommended in all situations. 
   
   The $Anubis HTTP/HTTPS server, despite the fact that it runs within a single thread of the underlying operating
   system, is capable of serving simultaneously several web sites, and of course, several clients for each web site. The
   server can start in HTTP or in HTTPS mode or in both modes.
  
   We have implemented a $em(session) mecanism that allows to follow clients from page to page. For example, if a client
   chooses a language in the first page, the other pages will be served in this language thanks to the presence of
   $em(session tickets) within the web pages. This system is automatic. You don't have to worry about it, but you must
   follow the instructions given below for using it. 
   
   Tightly linked to session tickets is a system of login/logout that is also implemented in this version of the server,
   so that you don't have anything to do in this respect except use the tools described below. 
   
   Each web site has a so-called $em(web site directory) on the server. A subdirectory $fname(public) is created within
   this web site directory, where $em(only public resources) should go. Indeed, the content of this directory and its
   subdirectories will be freely accessible by any client. Confidential resources must be stored outside the directory
   tree whose root is this $fname(public) subdirectory of the web site directory. 
   
   A mecanism for $em(private downloading) of files is implemented that allows only an authorized client to download a
   given file (which is of source not located in the $fname(public) subtree on the server). Authorizations are produced
   by strong cryptographic mecanisms.
   
   Because web sites are realized as secondary $Anubis modules, they can be modified and recompiled at any moment and
   the server will automatically reload them without stopping. Furthermore, the server monitors its $em(configuration
   file) and reloads it when it is modified without stopping.  
   
   Among the projects we have for our web server is the ability to obtain server certificates automatically via the ACME
   (Automatic Certificate Management Environment) protocol. 
   
   
   
   $section(Configuring and starting the server)
   The $Anubis HTTP/HTTPS server version 2 has several configuration parameters that must be given within a
   $em(configuration file). Starting the server is performed as follows:
   $term( 
anbexec http_server $nt(path of configuration file)   
   )
   Of course, you can start several HTTP/HTTPS servers provided that you create several configuration files, but they
   will need to listen on different ports. Since listening on ports in the range 0-1023 requires a privilege,
   $att(anbexec) must be setup correctly if you want to use these ports. The commands for this setup are (under Linux,
   for example for version 1.19.3):  
   $term( 
sudo chown root ~/bin/anbexec-1-19-3   
sudo chmod ug+s ~/bin/anbexec-1-19-3   
   )
   (recall that $att(anbexec) is just a shell calling $att(anbexec-1-19-3)). 
   
   
   $subsection(Format of a configuration file)
   
   
   $subsection(What can be dynamically changed in the configuration)
   
   
   
   $subsection(How the server interprets standard HTTP headers)
   HTTP headers have a semantics that is defined in RFC 7231 and others refered to from within RFC 7231. 
   
   Cache-Control
   
   Expect: 100-continue
   
   Host: 
   
   Max-Forwards   (TRACE and OPTIONS only)
   
   Pragma
   
   Range
   
   TE
   
   If-match
   
   If-None-Match
   
   If-Modified-Since
   
   If-Unmodified-Since
   
   If-Range
   
   Accept
   
   Accept-Charset
   
   Accept-Encoding
   
   Accept-Language
   
   Authorization
   
   Proxy-Authorization
   
   From
   
   Referer
   
   User-Agent
   
   
   
   
   
   
   
   
   
   $section(Creating a web site)
   The $Anubis HTTP server is a primary module (to be executed by $att(anbexec)), and each web site is loaded as a
   secondary module. Hence, creating a web site requires the construction of such a module.
   
   
   
   $subsection(The type of a web site secondary module)
   The type of secondary modules that define web sites already contains by itself many informations on how to construct
   a web site for the $Anubis web server. Indeed, from the point of view of this server, a web site is nothing other
   than a datum of this type. Here it is:
   $acode( 
public type WebSiteV2:
   web_site(
     List(String)        names, 
     String              web_site_directory
           ). 
   )
   
   
   $subsubsection($att(names))
   These are the names that are acceptable as the value of the $tt(Host) HTTP header. For example, this can be
   $att($["127.0.0.1", "www.my_site.com", "my_site.com"$]). In this example, $att("127.0.0.1") is ment to be used during
   the development of the web site.   
   
   
   $subsubsection($att(web_site_directory))
   This indicates the path of the directory on the server's disk the web site can use a its $em(dedicated directory) (the
   so-called $em(web site directory)). Prefer an absolute path that will make this information independent of where
   $att(anbexec) is started from. 
   
   The server creates (if they don't already exist) the following subdirectories in
   the web site directory:
   $list(
     $item $fname(public/) This is where only public resources should go. Indeed, the content of this directory and its
           subdirectories is freely accessible by any client of the web site.
     $item $fname(states/) This is where the server stores session states. 
     $item $fname(members/) This is where the server keeps the database of registered clients (members). 
     $item $fname(private_download/) This is where the server puts files that are ready for private dwonloading.
     $item $fname(upload_temporary/) This is where the server stores files that are uploaded before they are moved to
           another place. 
     $item $fname(journal/) This is where the server puts informations on the history of server events. 
   )
   Of course, you can create other subdirectories in the web site directory, for example for installing a database. 
   
   
   
   $subsection(Layout and styling)
   
   
   
   $subsection(Home page)
   Each web site is supposed to have a $em(home page) that can be obtained from the name of web site. For example, the
   name $fname(google.com), without any further information, yields the Google home page. 
   
   This home page is the default entry point for your web site. 
   
   
   $subsection(When the client clicks)
   Within a web page of your web site, a client can click on various buttons and links. Such a click can trigger one of
   several kind of effect:
   $list(
     $item trigger a purely local action (JavaScript program) that does not require a connection to a server (local
           action),  
     $item leaving the web site and visit another one (foreign link), 
     $item download a file or other resource from your web site that does not need to be computed on the fly (server
           link),  
     $item trigger an action on your server that will compute another web page on the fly (server action). 
   )
   
   
   
   $subsection(HTML elements and CSS)
   
   
   
   
   $subsection(Web sockets)
   
   
   
   
   $subsection(Login/logout)
   The server has a ready to use mecanism for login/logout. You have to use the HTML element provided below, which
   appears as a login form if the client is not logged in and as a logout form otherwise. The appearance of this gadget
   is determined by some customizable CSS.
   
   From now on, we call a client a $em(visitor) if he/she is not logged in, and a $em(member) if he/she is. 
   Session tickets are provided for both visitors and members. 
   

   
   $subsection(Session tickets and session states)
   The server remembers $em(session informations) attached to each visitor. Of course, these session informations are
   different for visitors and for members. Session informations are never transmitted over the network. They are stored
   within the $fname(states/) subdirectory of the web site directory. To each such state is associated a ticket (a
   cryptographic hash), that is placed within the web page sent to the client.  
   
   When the client triggers an action, the session ticket is sent as $em(form data) to the server. This allows the
   server to recover the state of the client and to compute a new state for this client. 
   
   
   $subsection(Streaming)
   HTTP/1.1 has, as explained in RFC 7233, the capability of serving a range of bytes from a resource instead of the
   whole resource. This can be used for streaming, but also for gracious recovery after a connection is cut in the
   midst of a downloading. This version of the $Anubis HTTP server implements this feature, and you have essentially
   nothing to do with regard to it. You shall only construct your web sites according to this possibility, for example
   for the streaming of videos.
   
   
   
   $section(Security considerations)
   $subsection(Reviewing RFC 7231 recommandations)
   A (non exhaustive) list of possible attacks is given in section 9 of RFC 7231. They are discussed below. 
   
   $subsubsection(Attacks Based on File and Path Names)
   The question is of controling the access to the file system on the server. For example, if the server accepts
   $att(..), or $att(~) within an URI, a client could possibly download a file located anywhere on the server. 
   
   The $Anubis server allows access to the $fname(public/) directory associated to the web site under consideration and
   to its subdirectories. It is not necessarily a good idea to completely disallow double dots because it can be the
   case for example that an HTML page located somehere in the $fname(public/) subtree, refers to an image that is not in
   one of the subdirectories of the location of this HTML file. For example, the HTML file can contain something like
   $att(<img src="../images/theimage.png">). This is acceptable if $fname(theimage.png) is still within the
   $fname(public/) subtree. 
   
   To this end, the server checks that the resource is indeed within the $fname(public/) subtree. If this fails, an
   error message $tt(404. Not$~found.) is sent to the client, and the server records the IP address and browser
   fingerprint into its dictionary of $em(dubious clients). See below how to manage dubious clients.
   
   Isolated dots are normal in URIs, for example in $fname(theimage.png). Even an URI containing $att(.) as one of the
   directory names does not create a problem.
   
   
   
   
  
   
   $subsubsection(Attacks Based on Command, Code, or Query Injection)
   The $Anubis server is not much concerned by this problem because it will never consider anything present in the
   request line (or header's values) as an executable command. For example, if an attacker inserts SQL commands into the
   request line, they cannot be executed by the server even if you use a SQL database. 
   
   Also, recall that the request line should never contain sensitive informations. For example, sessions tickets are
   never inserted into the request line. They are transmited as part of form data or within cookies.  
   
   
   
   
   $subsubsection(Disclosure of Personal Information)
   This question is to be addressed by the designer of the web site. Of course, encryption should be used for
   transfering any personal information. The $Anubis server also provides a mecanism for $em(private download) so that
   only those clients that are allowed can access certain resources (including files), and these resources are of
   course not located within the $fname(public/) subtree of the server.  
   
   
   $subsubsection(Disclosure of Sensitive Information in URIs)
   As explained in RFC 7231, URIs are intended to be shared, not secured, even if they identify secured resources.
   Anyway, we have already said several times in this documentation that a request line should never contain any
   sensitive or private information. This is mainly under the responsability of the web site designer.  

   This has to do with the distinction between the two HTTP methods $att(GET) and $att(POST). In the case of $att(GET),
   there is normally no body in the request (RFC 7231 section 4.3.1). Sensitive informations should preferably be in the
   body of a $att(POST) request.
   
   Another question is the use of the $att(Referer) HTTP header. This header gives the URI of the resource from which
   the request originates. The $Anubis server checks that any request that contains a session ticket has a $att(Referer)
   header pointing to the right resource. If not, the request is rejected and the client marked as dubious. 
   
   
  
   $subsubsection(Disclosure of Fragment after Redirects)
   A $em(fragment) is this part of the request line that comes after the $att(#), indicating a precise position within a
   web page or any other resource, with a semantics that depends on this resource. Under some circumstances, this
   fragment can be forwarded to another web site, which is why it should not contain sensitive informations. Again,
   since the fragment is part of the request line, it should anyway not contain any sensitive information. 
   
  
   
   
   $subsubsection(Disclosure of Product Information)
   Here the question is that some HTTP headers, $att(User-Agent), $att(Via) and $att(Server) contain information on
   which particular software is used by the client. This information may help attackers, but the $Anubis server is not
   sensible to such things. 
   
   
   
   $subsubsection(Browser Fingerprinting)
   According to RFC 7231 section 9.7, browser fingerprinting is a set of techniques for identifying a specific user
   agent over time through its unique set of characteristics. Here are some HTTP headers that can provide informations
   on the client: $att(From Cookie User-Agent Accept Accept-Charset Accept-Encoding Accept-Language). 
   
   Because such informations should be considered as confidential, the web site designer should use them only for a
   good honest reason. Now, these informations are also useful for preventing attacks on the server. This is why they
   are part of dubious clients informations. They allow the server to reidentify a client previously marked as dubious
   and possibly to deny access. This, together with other methods can be an efficient tool against denial of service 
   attacks.  
   
   
   $subsection(Handling dubious clients)
   Each time a $em(dubious client) is detected, informations about this client are recorded into a dictionary. The
   server uses an algorithm for handling such clients. You can setup parameters in the configuration file that affect
   how this algorithm works.  
   
   This algorithm works as follows.
   $list(
     $item When a problem arises, administrators are warned by email. The email provides a link to a web page where the
           administrators can follow the evolution of the situation in real time, and from where they can trigger
           actions. 
     $item It associates a $em(level of dubiousness) to each dubious client, and a $em(level of trust) to each
           registered client.
     $item It refuses access to clients depending on the level of dubiousness of the client (configurable).
     $item It deletes the dubious client informations after some time (or never) depending on the level of dubiousness
           (configurable).  
     $item It rejects connections whose behavior is suspect (configurable).
     $item In case it begins to be overwhelmed by connections, it restricts access through a special web page that only
           allows clients to login. Already logged in clients can continue to operate, unless a suspect behavior is
           detected, also depending on their trust level (configurable).  
   )
   
   
   
   
   
   
   
   
   
   
   $end