I'm struggling to get the following csv
from this site into Pandas
.
I've tried a few things, but so far I can't make a workable csv
. The ultimate aim is to be able to make it into a Pandas dataframe
.
Can anyone help point me in the right direction and explain why the below doesn't work?
Using Python 3.7, Windows 10
import requests
import urllib
import csv
csv_url = 'https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/773656/HMRC_spending_over_25000_for_December_2018.csv'
response = urllib.request.urlopen(csv_url)
cr = csv.reader(response)
for row in cr:
print(row)
# csv.Error: iterator should return strings, not bytes (did you open the file in text mode?)
response = urllib.request.urlopen(csv_url)
response = response.read().decode()
cr = csv.reader(response)
for row in cr:
print(row)
# UnicodeDecodeError: 'utf-8' codec can't decode byte 0x96 in position 28452: invalid start byte
response = requests.get(csv_url).text
cr = csv.reader(response)
for row in cr:
print(row)
# malformed, prints individual characters
If you're using pandas >= 0.19.2
, you can input the csv
url directly.:
import pandas as pd
url="https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/773656/HMRC_spending_over_25000_for_December_2018.csv"
c=pd.read_csv(url, encoding ='latin1') # otherwise you get a UnicodeDecodeError: 'utf-8' codec can't decode byte 0x96 in position 12: invalid start byte
Otherwise use String.IO, i.e.:
import pandas as pd
import requests
from io import StringIO
url="https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/773656/HMRC_spending_over_25000_for_December_2018.csv"
s=requests.get(url).content
c=pd.read_csv(StringIO(s.decode("latin1")))