Search code examples
twitterpython-3.5url-parsing

Using python to parse twitter url


I am using the following code but I am not able to extract any information from the url.

from urllib.parse import urlparse

if __name__ == "__main__":
    z = 5
    url = 'https://twitter.com/isro/status/1170331318132957184'
    df = urlparse(url)
    print(df)

ParseResult(scheme='https', netloc='twitter.com', path='/isro/status/1170331318132957184', params='', query='', fragment='')

I was hoping to extract the tweet message, time of tweet and other information available from the link but the code above clearly doesn't achieve that. How do I go about it from here ?

print(df)
ParseResult(scheme='https', netloc='twitter.com', path='/isro/status/1170331318132957184', params='', query='', fragment='')

Solution

  • I think you may be misunderstanding the purpose of the urllib parseurl function. From the Python documentation:

    urllib.parse.urlparse(urlstring, scheme='', allow_fragments=True)

    Parse a URL into six components, returning a 6-item named tuple. This corresponds to the general structure of a URL: scheme://netloc/path;parameters?query#fragment

    From the result you are seeing in ParseResult, your code is working perfectly - it is breaking your URL up into the component parts.

    It sounds as though you actually want to fetch the web content at that URL. In that case, I might take a look at urllib.request.urlopen instead.