I am new to programming and I have a bunch of CSV files that are about 50 to 60 rows each. After an unspecified amount of rows, there is a string called 'NAME' in the second column. I want to take everything in the second column after 'NAME' and print it to a text file. I initially thought to use pandas
and skiprows
to do this however the problem is that each csv I run through will have 'NAME' in a different row. Also, if it helps, there is a blank line three rows before 'NAME' in every CSV.
header, header
header, header
NUMBER,ITEM
N1,Shoe
N2,Heel
N3,Tee, random stuff
N4,Polo, random stuff
N5,Sneaker
N10,Heel
N11,Tee
...
...
...
How
Count 17
SORT,NAME
H1,Thing, random stuff
H2,WTANK
H3,TEE2
H4,TEE
I would also like to not have repeats in the text file because I will be running through around 1000 CSVs in a directory. Here is a bit of code that I started out with and is where i got stuck.
import pandas as pd
import csv
import glob
fns = glob.glob('*.csv') #goes through every CSV file in director
for csv in fns:
prod_df = pd.read_csv(csv, skiprows=???)
with open (os.path.join('out', fn), 'wb') as f:
w = csv.writer(f)
test_alias = prod_df['NAME'].unique()
w.writerow(row)
I know it doesn't work, and is probably not a very good bit of code. Any help would be greatly appreciated. Thank You!
Assuming that NAME
only occurs one time you could use get_loc
import pandas as pd
import glob
fn = 'out.csv'
fns = glob.glob('*.csv') #goes through every CSV file in director
colname = 'ITEM'
rowname = 'NAME'
for csv in fns:
s = pd.read_csv(csv, usecols=[colname], squeeze=True, skiprows=7, header=0)
row_idx = pd.Index(s).get_loc(rowname)
sub_items = s[row_idx + 1:]
pd.Series(sub_items.unique()).to_csv(fn, mode='a')
EDIT: Explanation
usecols
only reads in the column you need, i.e., ITEM
squeeze=True
returns a Series
get_loc
returns the index of its argument
You said you wanted everything after 'NAME'
, so the index is row_idx + 1
mode='a'
appends to a single csv file (called out.csv
), if you want to write to individual csv files then you should adjust your code accordingly