i'm learning BeautifulSoup and i encountered this
from bs4 import BeautifulSoup
import urllib2
url = "https://en.wikipedia.org/wiki/Katy_Perry"
open_url = urllib2.urlopen(url)
read = open_url.read()
print(read)
This prints the html code of the page. But how can we use read() here ? Its a FileIO function and should be used along with the file object. but the variable "open_url" here isn't a file object.
print(type(open_url))
output:
<type 'instance'>
Obviously "open_url" isn't a file object, So what made it possible to bind read() to "open_url" ?
If you print both open_url
you will see that fp = socket._fileobject ..
<addinfourl at 139707791457312 whose fp = <socket._fileobject object at 0x7f104303bd50>>
So you see the file object is actually a socket._fileobject
which you can access with open_url.fp
:
<socket._fileobject object at 0x7f104303bd50>
If you remove the first read call you will see that you can access the socket object and call .read on that directly, that is what happens when you call open_url.read()
etc..:
open_url.fp.read()