devdoc/Cache.txt

 June 2000, --Jcid
 Last update: Jul 09

                              -------
                               CACHE
                              -------

   The  cache  module  is  the  main  abstraction  layer  between
rendering and networking.

   The  capi module acts as a discriminating wrapper which either
calls  the  cache  or  the  dpi routines depending on the type of
request.

   Every  URL  must be requested using a_Capi_open_url, which
sends the request to the cache if the data is cached, to dillo's
http module for http: URLs, and through dillo's DPI system for
other URLs.

   Here we'll document non dpi requests.


                         ----------------
                         CACHE PHILOSOPHY
                         ----------------

   Dillo's  cache  is  very  simple; every single resource that's
retrieved  (URL)  is  kept  in  memory. NOTHING is saved to disk.
This is mainly for three reasons:

   - Dillo encourages personal privacy and it assures there'll be
no recorded tracks of the sites you visited.

   -  The Network is full of intermediate transparent proxys that
serve as caches.

   -  If  you still want to have cached stuff, you can install an
external cache server (such as WWWOFFLE), and benefit from it.


                         ---------------
                         CACHE STRUCTURE
                         ---------------

   Currently, dillo's cache code is spread in different sources:
mainly  in  cache.[ch],  dicache.[ch]  and  it  uses  some  other
functions from mime.c and web.cc.

   Cache.c  is  the  principal  source,  and  it also is the one
responsible  for  processing  cache-clients  (held  in  a queue).
Dicache.c  is  the interface to the decompressed RGB representations
of currently-displayed images held in DW's imgbuf.

   mime.c  and  web.cc  are  used  for secondary tasks such as
assigning the right "viewer" or "decoder" for a given URL.


----------------
A bit of history
----------------

   Some  time  ago,  the  cache  functions,  URL  retrieval  and
external  protocols  were  a whole mess of mixed code, and it was
getting  REALLY hard to fix, improve or extend the functionality.
The  main  idea  of  this  "layering" is to make code-portions as
independent  as  possible  so  they  can  be  understood,  fixed,
improved or replaced without affecting the rest of the browser.

   An  interesting  part of the process is that, as resources are
retrieved,  the  client  (dillo  in  this  case) doesn't know the
Content-Type  of the resource at request-time. It only becomes known
when  the  resource  header  is retrieved (think of http). This
happens  when  the  cache  has control, so the cache sets the
proper  viewer for it (unless the Callback function was already
specified with the URL request).

   You'll find a good example in http.c.

   Note: All resources received by the cache have HTTP-style headers.
   The file/data/ftp DPIs generate these headers when sending their
   non-HTTP resources. Most importantly, a Content-Type header is
   generated based on file extension or file contents.


-------------
Cache clients
-------------

   Cache clients MUST use a_Capi_open_url to request an URL. The
client structure and the callback-function prototype are defined,
in cache.h, as follows:

struct _CacheClient {
   int Key;                 /* Primary Key for this client */
   const DilloUrl *Url;     /* Pointer to a cache entry Url */
   int Version;             /* Dicache version of this Url (0 if not used) */
   void *Buf;               /* Pointer to cache-data */
   uint_t BufSize;          /* Valid size of cache-data */
   CA_Callback_t Callback;  /* Client function */
   void *CbData;            /* Client function data */
   void *Web;               /* Pointer to the Web structure of our client */
};

typedef void (*CA_Callback_t)(int Op, CacheClient_t *Client);


   Notes:

   * Op is the operation that the callback is asked to perform
   by the cache. { CA_Send | CA_Close | CA_Abort }.

   * Client: The Client structure that originated the request.


--------------------------
Key-functions descriptions
--------------------------

································································
int a_Cache_open_url(void *Web, CA_Callback_t Call, void *CbData)

   if Web->url is not cached
      Create a cache-entry for that URL
      Send client to cache queue
   else
      Feed our client with cached data

································································

----------------------
Redirections mechanism
 (HTTP 30x answers)
----------------------

  This is by no means complete. It's a work in progress.

  Whenever  an  URL is served under an HTTP 30x header, its cache
entry  is  flagged  with 'CA_Redirect'. If it's a 301 answer, the
additional  'CA_ForceRedirect'  flag  is  also set, if it's a 302
answer,  'CA_TempRedirect'  is  also set (this happens inside the
Cache_parse_header() function).

  Later  on,  in Cache_process_queue(), when the entry is flagged
with 'CA_Redirect' Cache_redirect() is called.


-----------
Notes
-----------

   The  whole  process is asynchronous and very complex. I'll try
to document it in more detail later (source is commented).
   Currently  I  have  a drawing to understand it; hope the ASCII
translation serves the same as the original.
   If  you're  planning to understand the cache process thoroughly,
write  me  a  note and I will assign higher priority to further
improvement of this doc.
   Hope this helps!