Documentation/rpc-cache.txt - third_party/linux - Git at Google

 This document gives a brief introduction to the caching
 mechanisms in the sunrpc layer that is used, in particular,
 for NFS authentication.

 CACHES
 ======
 The caching replaces the old exports table and allows for
 a wide variety of values to be caches.

 There are a number of caches that are similar in structure though
 quite possibly very different in content and use.  There is a corpus
 of common code for managing these caches.

 Examples of caches that are likely to be needed are:
   - mapping from IP address to client name
   - mapping from client name and filesystem to export options
   - mapping from UID to list of GIDs, to work around NFS's limitation
     of 16 gids.
   - mappings between local UID/GID and remote UID/GID for sites that
     do not have uniform uid assignment
   - mapping from network identify to public key for crypto authentication.

 The common code handles such things as:
    - general cache lookup with correct locking
    - supporting 'NEGATIVE' as well as positive entries
    - allowing an EXPIRED time on cache items, and removing
      items after they expire, and are no longe in-use.

    Future code extensions are expect to handle
    - making requests to user-space to fill in cache entries
    - allowing user-space to directly set entries in the cache
    - delaying RPC requests that depend on as-yet incomplete
      cache entries, and replaying those requests when the cache entry
      is complete.
    - maintaining last-access times on cache entries
    - clean out old entries when the caches become full

 The code for performing a cache lookup is also common, but in the form
 of a template.  i.e. a #define.
 Each cache defines a lookup function by using the DefineCacheLookup
 macro, or the simpler DefineSimpleCacheLookup macro

 Creating a Cache
 ----------------

 1/ A cache needs a datum to cache.  This is in the form of a
    structure definition that must contain a
      struct cache_head
    as an element, usually the first.
    It will also contain a key and some content.
    Each cache element is reference counted and contains
    expiry and update times for use in cache management.
 2/ A cache needs a "cache_detail" structure that
    describes the cache.  This stores the hash table, and some
    parameters for cache management.
 3/ A cache needs a lookup function.  This is created using
    the DefineCacheLookup macro.  This lookup function is used both
    to find entries and to update entries.  The normal mode for
    updating an entry is to replace the old entry with a new
    entry.  However it is possible to allow update-in-place
    for those caches where it makes sense (no atomicity issues
    or indirect reference counting issue)
 4/ A cache needs to be registered using cache_register().  This
    includes in on a list of caches that will be regularly
    cleaned to discard old data.  For this to work, some
    thread must periodically call cache_clean

 Using a cache
 -------------

 To find a value in a cache, call the lookup function passing it a the
 datum which contains key, and possibly content, and a flag saying
 whether to update the cache with new data from the datum.   Depending
 on how the cache lookup function was defined, it may take an extra
 argument to identify the particular cache in question.

 Except in cases of kmalloc failure, the lookup function
 will return a new datum which will store the key and
 may contain valid content, or may not.
 This datum is typically passed to cache_check which determines the
 validity of the datum and may later initiate an upcall to fill
 in the data.

 cache_check can be passed a "struct cache_req *".  This structure is
 typically embedded in the actual request and can be used to create a
 deferred copy of the request (struct cache_deferred_req).  This is
 done when the found cache item is not uptodate, but the is reason to
 believe that userspace might provide information soon.  When the cache
 item does become valid, the deferred copy of the request will be
 revisited (->revisit).  It is expected that this method will
 reschedule the request for processing.


 Populating a cache
 ------------------

 Each cache has a name, and when the cache is registered, a directory
 with that name is created in /proc/net/rpc

 This directory contains a file called 'channel' which is a channel
 for communicating between kernel and user for populating the cache.
 This directory may later contain other files of interacting
 with the cache.

 The 'channel' works a bit like a datagram socket. Each 'write' is
 passed as a whole to the cache for parsing and interpretation.
 Each cache can treat the write requests differently, but it is
 expected that a message written will contain:
   - a key
   - an expiry time
   - a content.
 with the intention that an item in the cache with the give key
 should be create or updated to have the given content, and the
 expiry time should be set on that item.

 Reading from a channel is a bit more interesting.  When a cache
 lookup fail, or when it suceeds but finds an entry that may soon
 expiry, a request is lodged for that cache item to be updated by
 user-space.  These requests appear in the channel file.

 Successive reads will return successive requests.
 If there are no more requests to return, read will return EOF, but a
 select or poll for read will block waiting for another request to be
 added.

 Thus a user-space helper is likely to:
   open the channel.
     select for readable
     read a request
     write a response
   loop.

 If it dies and needs to be restarted, any requests that have not be
 answered will still appear in the file and will be read by the new
 instance of the helper.

 Each cache should define a "cache_parse" method which takes a message
 written from user-space and processes it.  It should return an error
 (which propagates back to the write syscall) or 0.

 Each cache should also define a "cache_request" method which
 takes a cache item and encodes a request into the buffer
 provided.


 Note: If a cache has no active readers on the channel, and has had not
 active readers for more than 60 seconds, further requests will not be
 added to the channel but instead all looks that do not find a valid
 entry will fail.  This is partly for backward compatibility: The
 previous nfs exports table was deemed to be authoritative and a
 failed lookup meant a definite 'no'.

 request/response format
 -----------------------

 While each cache is free to use it's own format for requests
 and responses over channel, the following is recommended are
 appropriate and support routines are available to help:
 Each request or response record should be printable ASCII
 with precisely one newline character which should be at the end.
 Fields within the record should be separated by spaces, normally one.
 If spaces, newlines, or nul characters are needed in a field they
 much be quotes.  two mechanisms are available:
 1/ If a field begins '\x' then it must contain an even number of
    hex digits, and pairs of these digits provide the bytes in the
    field.
 2/ otherwise a \ in the field must be followed by 3 octal digits
    which give the code for a byte.  Other characters are treated
    as them selves.  At the very least, space, newlines nul, and
    '\' must be quoted in this way.
	This document gives a brief introduction to the caching
	mechanisms in the sunrpc layer that is used, in particular,
	for NFS authentication.

	CACHES
	======
	The caching replaces the old exports table and allows for
	a wide variety of values to be caches.

	There are a number of caches that are similar in structure though
	quite possibly very different in content and use. There is a corpus
	of common code for managing these caches.

	Examples of caches that are likely to be needed are:
	- mapping from IP address to client name
	- mapping from client name and filesystem to export options
	- mapping from UID to list of GIDs, to work around NFS's limitation
	of 16 gids.
	- mappings between local UID/GID and remote UID/GID for sites that
	do not have uniform uid assignment
	- mapping from network identify to public key for crypto authentication.

	The common code handles such things as:
	- general cache lookup with correct locking
	- supporting 'NEGATIVE' as well as positive entries
	- allowing an EXPIRED time on cache items, and removing
	items after they expire, and are no longe in-use.

	Future code extensions are expect to handle
	- making requests to user-space to fill in cache entries
	- allowing user-space to directly set entries in the cache
	- delaying RPC requests that depend on as-yet incomplete
	cache entries, and replaying those requests when the cache entry
	is complete.
	- maintaining last-access times on cache entries
	- clean out old entries when the caches become full

	The code for performing a cache lookup is also common, but in the form
	of a template. i.e. a #define.
	Each cache defines a lookup function by using the DefineCacheLookup
	macro, or the simpler DefineSimpleCacheLookup macro

	Creating a Cache
	----------------

	1/ A cache needs a datum to cache. This is in the form of a
	structure definition that must contain a
	struct cache_head
	as an element, usually the first.
	It will also contain a key and some content.
	Each cache element is reference counted and contains
	expiry and update times for use in cache management.
	2/ A cache needs a "cache_detail" structure that
	describes the cache. This stores the hash table, and some
	parameters for cache management.
	3/ A cache needs a lookup function. This is created using
	the DefineCacheLookup macro. This lookup function is used both
	to find entries and to update entries. The normal mode for
	updating an entry is to replace the old entry with a new
	entry. However it is possible to allow update-in-place
	for those caches where it makes sense (no atomicity issues
	or indirect reference counting issue)
	4/ A cache needs to be registered using cache_register(). This
	includes in on a list of caches that will be regularly
	cleaned to discard old data. For this to work, some
	thread must periodically call cache_clean

	Using a cache
	-------------

	To find a value in a cache, call the lookup function passing it a the
	datum which contains key, and possibly content, and a flag saying
	whether to update the cache with new data from the datum. Depending
	on how the cache lookup function was defined, it may take an extra
	argument to identify the particular cache in question.

	Except in cases of kmalloc failure, the lookup function
	will return a new datum which will store the key and
	may contain valid content, or may not.
	This datum is typically passed to cache_check which determines the
	validity of the datum and may later initiate an upcall to fill
	in the data.

	cache_check can be passed a "struct cache_req *". This structure is
	typically embedded in the actual request and can be used to create a
	deferred copy of the request (struct cache_deferred_req). This is
	done when the found cache item is not uptodate, but the is reason to
	believe that userspace might provide information soon. When the cache
	item does become valid, the deferred copy of the request will be
	revisited (->revisit). It is expected that this method will
	reschedule the request for processing.


	Populating a cache
	------------------

	Each cache has a name, and when the cache is registered, a directory
	with that name is created in /proc/net/rpc

	This directory contains a file called 'channel' which is a channel
	for communicating between kernel and user for populating the cache.
	This directory may later contain other files of interacting
	with the cache.

	The 'channel' works a bit like a datagram socket. Each 'write' is
	passed as a whole to the cache for parsing and interpretation.
	Each cache can treat the write requests differently, but it is
	expected that a message written will contain:
	- a key
	- an expiry time
	- a content.
	with the intention that an item in the cache with the give key
	should be create or updated to have the given content, and the
	expiry time should be set on that item.

	Reading from a channel is a bit more interesting. When a cache
	lookup fail, or when it suceeds but finds an entry that may soon
	expiry, a request is lodged for that cache item to be updated by
	user-space. These requests appear in the channel file.

	Successive reads will return successive requests.
	If there are no more requests to return, read will return EOF, but a
	select or poll for read will block waiting for another request to be
	added.

	Thus a user-space helper is likely to:
	open the channel.
	select for readable
	read a request
	write a response
	loop.

	If it dies and needs to be restarted, any requests that have not be
	answered will still appear in the file and will be read by the new
	instance of the helper.

	Each cache should define a "cache_parse" method which takes a message
	written from user-space and processes it. It should return an error
	(which propagates back to the write syscall) or 0.

	Each cache should also define a "cache_request" method which
	takes a cache item and encodes a request into the buffer
	provided.


	Note: If a cache has no active readers on the channel, and has had not
	active readers for more than 60 seconds, further requests will not be
	added to the channel but instead all looks that do not find a valid
	entry will fail. This is partly for backward compatibility: The
	previous nfs exports table was deemed to be authoritative and a
	failed lookup meant a definite 'no'.

	request/response format
	-----------------------

	While each cache is free to use it's own format for requests
	and responses over channel, the following is recommended are
	appropriate and support routines are available to help:
	Each request or response record should be printable ASCII
	with precisely one newline character which should be at the end.
	Fields within the record should be separated by spaces, normally one.
	If spaces, newlines, or nul characters are needed in a field they
	much be quotes. two mechanisms are available:
	1/ If a field begins '\x' then it must contain an even number of
	hex digits, and pairs of these digits provide the bytes in the
	field.
	2/ otherwise a \ in the field must be followed by 3 octal digits
	which give the code for a byte. Other characters are treated
	as them selves. At the very least, space, newlines nul, and
	'\' must be quoted in this way.