Search code examples
pythonweb-scrapingbeautifulsoupurllib2

Can I bind read() function to type 'instance' in python?


i'm learning BeautifulSoup and i encountered this

from bs4 import BeautifulSoup
import urllib2

url = "https://en.wikipedia.org/wiki/Katy_Perry"
open_url = urllib2.urlopen(url)
read = open_url.read()
print(read)

This prints the html code of the page. But how can we use read() here ? Its a FileIO function and should be used along with the file object. but the variable "open_url" here isn't a file object.

print(type(open_url))

output:

<type 'instance'>

Obviously "open_url" isn't a file object, So what made it possible to bind read() to "open_url" ?


Solution

  • If you print both open_url you will see that fp = socket._fileobject ..

    <addinfourl at 139707791457312 whose fp = <socket._fileobject object at 0x7f104303bd50>>
    

    So you see the file object is actually a socket._fileobject which you can access with open_url.fp:

    <socket._fileobject object at 0x7f104303bd50>
    

    If you remove the first read call you will see that you can access the socket object and call .read on that directly, that is what happens when you call open_url.read() etc..:

    open_url.fp.read()