Password-Protected Feeds

Universal Feed Parser supports downloading and parsing password-protected feeds that are protected by HTTP authentication. Both basic and digest authentication are supported.

For basic authentication, the easiest way is to embed the username and password in the feed URL itself.

Example: Downloading a feed protected by HTTP basic authentication (the easy way)

In this example, the username is test and the password is basic.

>>> import feedparser
>>> d = feedparser.parse('http://test:basic@feedparser.org/docs/examples/basic_auth.xml')
>>> d.feed.title
u'Sample Feed'

You can also construct a HTTPBasicAuthHandler that contains the password information, then pass that as a handler to the parse function. HTTPBasicAuthHandler is part of the standard urllib2 module.

Example: Downloading a feed protected by HTTP basic authentication (the hard way)

import urllib2, feedparser

# Construct the authentication handler
auth = urllib2.HTTPBasicAuthHandler()

# Add password information: realm, host, user, password.
# A single handler can contain passwords for multiple sites;
# urllib2 will sort out which passwords get sent to which sites
# based on the realm and host of the URL you're retrieving
auth.add_password('BasicTest', 'feedparser.org', 'test', 'basic')

# Pass the authentication handler to the feed parser.
# handlers is a list because there might be more than one
# type of handler (urllib2 defines lots of different ones,
# and you can build your own)
d = feedparser.parse('http://feedparser.org/docs/examples/basic_auth.xml', \
                     handlers=[auth])

Digest authentication is handled in much the same way, by constructing an HTTPDigestAuthHandler and populating it with the necessary realm, host, user, and password information. Note that there is no shortcut like stuffing the information into the URL. You need to do it this way:

Example: Downloading a feed protected by HTTP digest authentication (the only way)

import urllib2, feedparser

auth = urllib2.HTTPDigestAuthHandler()
auth.add_password('DigestTest', 'feedparser.org', 'test', 'digest')
d = feedparser.parse('http://feedparser.org/docs/examples/digest_auth.xml', \
                     handlers=[auth])
Caution
Prior to Python 2.3.3, urllib2 did not properly support digest authentication. These examples have been verified on Python 2.3.3 on Windows and Python 2.3.4 on Debian GNU/Linux. Mac OS X 10.3 ships with Python 2.3. OS X users will need to upgrade to the latest version of Python in order to use digest authentication.

The examples so far have assumed that you know in advance that the feed is password-protected. But what if you don't know?

If you try to download a password-protected feed without sending all the proper password information, the server will return an HTTP status code 401. Universal Feed Parser makes this status code available in d.status.

Details on the authentication scheme are in d.headers['www-authenticate']. Universal Feed Parser does not do any further parsing on this field; you will need to parse it yourself. Everything before the first space is the type of authentication (probably Basic or Digest), which controls which type of handler you'll need to construct. The realm name is given as realm="foo" -- so foo would be your first argument to auth.add_password. Other information in the www-authenticate header is probably safe to ignore; the urllib2 module will handle it for you.

Example: Determining that a feed is password-protected

>>> import feedparser
>>> d = feedparser.parse('http://feedparser.org/docs/examples/digest_auth.xml')
>>> d.status
401
>>> d.headers['www-authenticate']
'Basic realm="Use test/basic"'
>>> d = feedparser.parse('http://feedparser.org/docs/examples/digest_auth.xml')
>>> d.status
401
>>> d.headers['www-authenticate']
'Digest realm="DigestTest",
 nonce="+LV/uLLdAwA=5d77397291261b9ef256b034e19bcb94f5b7992a",
 algorithm=MD5,
 qop="auth"'
← HTTP Redirects
Other HTTP Headers →