#include <HTTPCache.h>
Collaboration diagram for HTTPCache:
Public Member Functions | |
bool | cache_response (const string &url, time_t request_time, const vector< string > &headers, const FILE *body) |
FILE * | get_cached_response (const string &url, vector< string > &headers) |
FILE * | get_cached_response_body (const string &url) |
vector< string > | get_conditional_request_headers (const string &url) |
bool | is_url_in_cache (const string &url) |
bool | is_url_valid (const string &url) |
void | purge_cache () |
void | release_cached_response (FILE *response) |
void | update_response (const string &url, time_t request_time, const vector< string > &headers) |
virtual | ~HTTPCache () |
Accessors and Mutators for various properties. | |
bool | get_always_validate () const |
vector< string > | get_cache_control () |
CacheDisconnectedMode | get_cache_disconnected () const |
string | get_cache_root () const |
int | get_default_expiration () const |
unsigned long | get_max_entry_size () const |
unsigned long | get_max_size () const |
bool | is_cache_enabled () const |
bool | is_cache_protected () const |
bool | is_expire_ignored () const |
void | set_always_validate (bool validate) |
void | set_cache_control (const vector< string > &cc) |
void | set_cache_disconnected (CacheDisconnectedMode mode) |
void | set_cache_enabled (bool mode) |
void | set_cache_protected (bool mode) |
void | set_default_expiration (int exp_time) |
void | set_expire_ignored (bool mode) |
void | set_max_entry_size (unsigned long size) |
void | set_max_size (unsigned long size) |
Static Public Member Functions | |
HTTPCache * | instance (const string &cache_root, bool force=false) |
Friends | |
class | DeleteByHits |
class | DeleteCacheEntry |
class | DeleteExpired |
class | DeleteUnlockedCacheEntry |
class | HTTPCacheInterruptHandler |
class | HTTPCacheTest |
class | WriteOneCacheEntry |
Clients that run as users lacking a writable HOME directory MUST disable this cache. Use Connect::set_cache_enable(false).
The design of this class was taken from the W3C libwww software. That code was originally written by Henrik Frystyk Nielsen, Copyright MIT 1995. See the file MIT_COPYRIGHT. This software is a complete rewrite in C++ with additional features useful to the DODS and OPeNDAP projects.
This cache does not implement range checking. Partial responses should not be cached (HFN's version did, but it doesn't mesh well with the DAP for which this is being written).
The cache uses the local file system to store responses. If it is being used in a MT application, care should be taken to ensure that the number of available file descriptors is not exceeded.
In addition, when used in a MT program only one thread should use the mutators to set property values. Even though the methods are robust WRT MT software, having several threads change values of cache's properties will lead to odd behavior on the part of the cache. Many of the public methods lock access to the class' interface. This is noted in the documentation for those methods.
Even though the public interface to the cache is typically locked when accessed, an extra locking mechanism is in place for `entries' which are accessed. If a thread accesses a entry, that response must be locked to prevent it from being updated until the thread tells the cache that it's no longer using it. The methods get_cache_response() and get_cache_response_body() both lock an entry; use release_cache_response() to release the lock. Entries are locked using a combination of a counter and a mutex. The following methods block when called on a locked entry: is_url_valid(), get_conditional_request_headers(), update_response(). (The locking scheme could be modified so that a distinction is made between reading from and writing to an entry. In this case is_url_valid() and get_conditional_request_headers() would only lock when an entry is in use for writing. But I haven't done that.)
Change the entry locking scheme to distinguish between entries accessed for reading and for writing.
Test in MT software. Is the entry locking scheme good enough? The current software throws an exception if there's an attempt to modify an entry that is locked by another thread. Maybe it should block instead? Maybe we should provide a tests to see if an update would block (one that returns right away and one that blocks). Note: Rob Morris added tests for MT-safety. 02/06/03 jhrg
Definition at line 131 of file HTTPCache.h.
|
Destroy an instance of HTTPCache. This writes the cache index and frees the in-memory cache table structure. The persistent cache (the response headers and bodies and the index file) are not removed. To remove those, either erase the directory that contains the cache using a file system command or use the purge_cache() method (which leaves the cache directory structure in place but removes all the cached information).
This class uses the singleton pattern. Clients should never call this method. The HTTPCache::instance() method arranges to call the HTTPCache::delete_instance() using Definition at line 346 of file HTTPCache.cc. |
|
Add a new response to the cache, or replace an existing cached response with new data. This method returns True if the information for Note that the FILE *body is rewound so that the caller can re-read it without using fseek or rewind.
If a response for This method locks the class' interface.
Definition at line 1816 of file HTTPCache.cc. |
|
Should every cache entry be validated before each use?
Definition at line 1325 of file HTTPCache.cc. References parse_time(). |
Here is the call graph for this function:
|
Get the Cache-Control headers. This method locks the class' interface.
Definition at line 1400 of file HTTPCache.cc. |
|
Get the cache's disconnected mode property. Definition at line 1141 of file HTTPCache.cc. |
|
Get the current cache root directory.
Definition at line 1049 of file HTTPCache.cc. |
|
Get information from the cache. For a given URL, get the headers and body stored in the cache. Note that this method increments the hit counter for This method locks the class' interface. This method does not check to see that the response is valid, just that it is in the cache. To see if a cached response is valid, use is_url_valid(). The FILE* returned can be used for both reading and writing. The latter allows a client to update the body of a cached response without having to first dump it all to a separate file and then copy it into the cache (using cache_response()).
Definition at line 2174 of file HTTPCache.cc. |
|
Get a pointer to a cached response body. For a given URL, find the cached response body and return a FILE * to it. This updates the hit counter and it locks the entry. To release the lock, call release_cached_response(). Methods that block on a locked entry are: get_conditional_request_headers(), update_response() and is_url_valid(). In addition, purge_cache() throws Error if it's called and any entries are locked. The garbage collection system will not reclaim locked entries (but works fine when some entries are locked). NB: This method does not check to see that the response is valid, just that it is in the cache. To see if a cached response is valid, use is_url_valid(). This method locks the class' interface.
Definition at line 2237 of file HTTPCache.cc. |
|
Build the headers to send along with a GET request to make that request conditional. This method examines the headers for a given response in the cache and formulates the correct headers for a valid HTTP 1.1 conditional GET request. See RFC 2616, Section 13.3.4. Rules: If an ETag is present, it must be used. Use If-None-Match. If a Last-Modified header is present, use it. Use If-Modified-Since. If both are present, use both (this means that HTTP 1.0 daemons are more likely to work). If a Last-Modified header is not present, use the value of the Cache-Control max-age or Expires header(s). Note that a 'Cache-Control: max-age' header overrides an Expires header (Sec 14.9.3). This method locks the cache interface and the cache entry.
Definition at line 1918 of file HTTPCache.cc. |
|
Get the default expiration time used by the cache. Definition at line 1306 of file HTTPCache.cc. |
|
Get the maximum size of an individual entry in the cache.
Definition at line 1276 of file HTTPCache.cc. |
|
How big is the cache? The value returned is the size in megabytes. Definition at line 1228 of file HTTPCache.cc. |
|
Get a pointer to the HTTP 1.1 compliant cache. If not already instantiated, this creates an instance of the HTTP cache object and initializes it to use Default values: is_cache_enabled(): true, is_cache_protected(): false, is_expire_ignored(): false, the total size of the cache is 20M, 2M of that is reserved for response headers, during GC the cache is reduced to at least 18M (total size - 10% of the total size), and the max size for an individual entry is 3M. It is possible to change the size of the cache, but not to make it smaller than 5M. If expiration information is not sent with a response, it is assumed to expire in 24 hours.
Definition at line 245 of file HTTPCache.cc. Referenced by HTTPConnect::HTTPConnect(). |
|
Is the cache currently enabled? Definition at line 1080 of file HTTPCache.cc. Referenced by HTTPConnect::fetch_url(), and HTTPConnect::is_cache_enabled(). |
|
Should we cache protected responses? Definition at line 1111 of file HTTPCache.cc. |
|
Definition at line 1170 of file HTTPCache.cc. |
|
Look in the cache for the given This method locks the class' interface.
Definition at line 1600 of file HTTPCache.cc. References DBG. |
|
Look in the cache and return the status (validity) of the cached response. This method should be used to determine if a cached response requires validation. This method locks the class' interface and the cache entry.
Definition at line 2067 of file HTTPCache.cc. |
|
Purge both the in-memory cache table and the contents of the cache on disk. This method deletes every entry in the persistent store but leaves the structure intact. The client of HTTPCache is responsible for making sure that all threads have released any responses they pulled from the cache. If this method is called when a response is still in use, it will throw an Error object and not purge the cache. This method locks the class' interface.
Definition at line 2355 of file HTTPCache.cc. |
|
Call this method to inform the cache that a particular response is no longer in use. When a response is accessed using get_cached_response(), it is locked so that updates and removal (e.g., by the garbage collector) are not possible. Calling this method frees that lock. This method locks the class' interface.
Definition at line 2291 of file HTTPCache.cc. Referenced by HTTPCacheResponse::~HTTPCacheResponse(). |
|
Should every cache entry be validated?
Definition at line 1316 of file HTTPCache.cc. Referenced by HTTPConnect::HTTPConnect(). |
|
Set the request Cache-Control headers. If a request must be satisfied using HTTP, these headers should be included in request since they might be pertinent to a proxy cache. Ignored headers: no-transform, only-if-cached. These headers are not used by HTTPCache and are not recorded. However, if present in the vector passed to this method, they will be present in the vector returned by get_cache_control. This method locks the class' interface.
Definition at line 1347 of file HTTPCache.cc. |
|
Set the cache's disconnected property. The cache can operate either disconnected from the network or using a proxy cache (but tell that proxy not to use the network). This method locks the class' interface.
Definition at line 1127 of file HTTPCache.cc. |
|
Enable or disable the cache. The cache can be temporarily suspended using the enable/disable property. This does not prevent the cache from being enabled/disable at a later point in time. Default: yes This method locks the class' interface.
Definition at line 1066 of file HTTPCache.cc. Referenced by HTTPConnect::HTTPConnect(), and HTTPConnect::set_cache_enabled(). |
|
Should we cache protected responses? A protected response is one that comes from a server/site that requires authorization. Default: no This method locks the class' interface.
Definition at line 1097 of file HTTPCache.cc. |
|
Set the default expiration time. Use the default expiration property to determine when a cached response becomes stale if the response lacks the information necessary to compute a specific value. Default: 24 hours (86,400 seconds) This method locks the class' interface.
Definition at line 1292 of file HTTPCache.cc. Referenced by HTTPConnect::HTTPConnect(). |
|
How should the cache handle the Expires header? Default: no This method locks the class' interface.
Definition at line 1155 of file HTTPCache.cc. References DBG, LOCK, MEGA, and MIN_CACHE_TOTAL_SIZE. Referenced by HTTPConnect::HTTPConnect(). |
|
Set the maximum size for a single entry in the cache. Default: 3M This method locks the class' interface.
Definition at line 1242 of file HTTPCache.cc. Referenced by HTTPConnect::HTTPConnect(). |
|
Cache size management. The default cache size is 20M. The minimum size is 5M in order not to get into weird problems while writing the cache. The size is indicated in Mega bytes. Note that reducing the size of the cache may trigger a garbage collection operation.
Definition at line 1191 of file HTTPCache.cc. Referenced by HTTPConnect::HTTPConnect(). |
|
Update the meta data for a response already in the cache. This method provides a way to merge response headers returned from a conditional GET request, for the given URL, with those already present. This method locks the class' interface and the cache entry.
Definition at line 1990 of file HTTPCache.cc. |
|
Definition at line 260 of file HTTPCache.h. |
|
Definition at line 261 of file HTTPCache.h. |
|
Definition at line 259 of file HTTPCache.h. |
|
Definition at line 262 of file HTTPCache.h. |
|
Definition at line 255 of file HTTPCache.h. |
|
Definition at line 254 of file HTTPCache.h. |
|
Definition at line 263 of file HTTPCache.h. |