The basic concept is that a feed publisher may provide a special HTTP header, called an ETag, when it publishes a feed. You should send this ETag back to the server on subsequent requests. If the feed has not changed since the last time you requested it, the server will return a special HTTP status code (304) and no feed data.
Example: Using ETags to reduce bandwidth
>>> import feedparser >>> d = feedparser.parse('http://feedparser.org/docs/examples/atom.xml') >>> d.etag '"6c132-941-ad7e3080"' >>> d2 = feedparser.parse('http://feedparser.org/docs/examples/atom.xml', etag=d.etag) >>> d2.status 304 >>> d2.feed {} >>> d2.entries [] >>> d2.debug_message 'The feed has not changed since you last checked, so the server sent no data. This is a feature, not a bug!'
There is a related concept which accomplishes the same thing, but slightly differently. In this case, the server publishes the last-modified date of the feed in the HTTP header. You can send this back to the server on subsequent requests, and if the feed has not changed, the server will return HTTP status code 304 and no feed data.
Example: Using Last-Modified headers to reduce bandwidth
>>> import feedparser >>> d = feedparser.parse('http://feedparser.org/docs/examples/atom.xml') >>> d.modified (2004, 6, 11, 23, 0, 34, 4, 163, 0) >>> d2 = feedparser.parse('http://feedparser.org/docs/examples/atom.xml', modified=d.modified) >>> d2.status 304 >>> d2.feed {} >>> d2.entries [] >>> d2.debug_message 'The feed has not changed since you last checked, so the server sent no data. This is a feature, not a bug!'
Clients should support both ETag and Last-Modified headers, as some servers support one but not the other.
If you do not support ETag and Last-Modified headers, you will repeatedly download feeds that have not changed. This wastes your bandwidth and the publisher's bandwidth, and the publisher may ban you from accessing their server. |