Search code examples
pandaskuberneteshadoophdfspandas.excelwriter

Error while trying to save data into hdfs


I'm trying to move data from local to hdfs using jupyter after the Data cleaning, i found some issues while doing it, and the data won't move into hdfs ( hdfs & jupyter deployed in minikube k8s)

This is the code in jupyter :

writer = pd.ExcelWriter("data.xlsx")
data.to_excel( excel_writer=writer)
writer.save("hdfs://hdfs-namenode-0.hdfs-namenode.default.svc.cluster.local/data")

The error is :

save() takes 1 positional argument but 2 were given

Solution

  • This is how i solved my problem :

    Client = InsecureClient('http://hdfs-namenode.default.svc.cluster.local:50070', user='hdfs')
    data = pd.read_csv('name_of_file.csv')
    with client.upload('path/name_of_file.csv' , 'name_of_file.csv', n_threads=1, temp_dir=None) as writer : 
        data.to_csv(writer)