Good morning guys,
I was writing a small script to manage the data in R, but, I don't understand why, when I import an huge csv (3.5 gb) file in R, it doesn't work.
To solve this problem quickly I decide to use pandas
with reticulate
.
#Package from python
pd<-import("pandas", as="pd")
#leggo il file csv con pandas
pd$read_csv("C:\\Users\\Befrancesco\\Desktop\\X_dataset\\x_file_name.csv, error_bad_lines= FALSE, encoding = "utf-8" )
R returns me this type of error:
Error in py_call_impl(callable, dots$args, dots$keywords) :
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf6 in position 105: invalid start byte
Where I wronge?
Thank you in advance for oyour answer.
Francesco
It could be that your encoding isn't UTF-8. Try some of the other encodings, such as ISO-8859-1 in your read_csv
call e.g.
pd$read_csv("C:\\Users\\Befrancesco\\Desktop\\X_dataset\\x_file_name.csv, error_bad_lines= FALSE, encoding = "ISO-8859-1")
See this answer for more on different encodings: https://stackoverflow.com/a/18172249/5269252