I have a piece of code that manipulates data of a txt file and writes a new csv file with the manipulated data. The original file does not have headers and column 1 includes unwanted data.
The code does 3 things:
import pandas as pd
file = pd.read_csv("example.txt", usecols=[0,1]) #to only get the first 2 columns
headerList = ['store', 'sku'] #name headers
file.to_csv("test.csv", header=headerList, index=False) #create new csv file headers
file = pd.read_csv("test.csv") #read new file including headers
file['store']=file['store'].str.split('R ').str[-1] #remove chars before str num
file['store']=file['store'].str.split(' -').str[0] #remove chars after str num
file.to_csv("test.csv", index=False) #updates the header file
This is easy to do with one file at a time, but I would like to apply this code to all files within a zip file that are formatted the same way, but have different names and data. Is there a way to maybe create a loop that goes through each file within the zip to run this code and create a new zip file with the modified data?
From the read_csv
docs, you can pass in a filename or buffer (that is, a file-like object). The zipfile.ZipFile.open
will open a file contained in a zipfile. Put those together and you can enumerate the zipfile, processing each file. Also, you can apply your own header to the data as you read it, so there is no need for an intermediate file
import pandas as pd
import zipfile
with zipfile.ZipFile("example.zip") as zippy:
for filename in zippy.infolist():
df = pd.read_csv(zippy.open(filename), usecols=[0,1],
header=0, names=['store', 'sku'])
print(df)