Accessing an value in defaultdict and stripping out url portion of it

I have a very large defaultdict that has a dict within a dict, the inner dict containing html from an email body. I only want to return an http string from within the inner dict. What's the best way to go about extracting that?

Do I need to convert the dict to another data structure before using regex? Is there a better way? I'm still fairly new to Python and appreciate any pointers.

For example, what I'm working with:

defaultdict(<type 'dict'>, {16: {u'SEQ': 16, u'RFC822': u'Delivered-To: 
somebody@email.com      LOTS MORE HTML until http://the_url_I_want_to_extract.com' }}

One thing I've tried is using re.findall on defaultdict which didn't work:

confirmation_link = re.findall('Click this link to confirm your registration:<br />"
(.*?)"', body)

for conf in confirmation_link:
    print conf

Error:

line 177, in findall
return _compile(pattern, flags).findall(string)
TypeError: expected string or buffer

Solution

You can only only use the regular expression, once you've iterated over your dictionary for the corresponding value:

import re

d = defaultdict(<type 'dict'>, {16: {u'SEQ': 16, u'RFC822': u'Delivered-To: somebody@email.com      LOTS MORE HTML until http://the_url_I_want_to_extract.com' }}

for k, v in d.iteritems():
    #v is the dictionary that contains your html string:
    str_with_html = v['RFC822']

    #this regular expression starts with matching http, and then 
    #continuing until a white space character is hit.
    match = re.search("http[^\s]+", str_with_html)
    if match:
        print match.group(0)

Output:

http://the_url_I_want_to_extract.com