This question have been asked multiple times in this community but I couldn't find the correct answers since I am beginner in Python. I got 2 questions actually:
import os import pandas as pd
directory = 'C:/Path' ext = ('.csv')
for filename in os.listdir(directory): f = os.path.join(directory, filename)
if f.endswith(ext):
head_tail = os.path.split(f)
head_tail1 = 'C:/Output'
k =head_tail[1]
r=k.split(".")[0]
p=head_tail1 + "/" + r + " - Revised.csv"
mydata = pd.read_csv(f)
new =mydata[["A","B","C","D"]]
new = new.rename(columns={'D': 'Total'})
new['Total'] = 1
new.to_csv(p ,index=False)
On the link page, you can get the total numbers of uniqueid by Serial ID
I have no idea how to do this, but I would really appreciate if someone can help me on this project and would learn a lot from this.
Thank you very much. God Bless
Searched in Google, Youtube and Stackoverflow, couldn't find the correct answer.
I'm not sure that I understand your question correctly. However, if you know exactly the column names (e.g., A
, B
, and C
) that you want to concatenate you can do something like code below.
''.join(merge_columns)
is to concatenate column names.
new[merge_columns].apply(lambda x: ''.join(x), axis=1)
is to concatenate their values.
Then, you can count unique values of the new column using groupby().count()
.
new = mydata[["A","B","C","D"]]
new = new.rename(columns={'D': 'Total'})
new['Total'] = 1
# added lines
merge_columns = ['A', 'B', 'C']
merged_col = ''.join(merge_columns)
new[merged_col] = new[merge_columns].apply(lambda x: ''.join(x), axis=1)
new.drop(merge_columns, axis=1, inplace=True)
new = new.groupby(merged_col).count().reset_index()
new.to_csv(p ,index=False)
example:
# before
> new
A B C Total
0 a b c 1
1 x y z 1
2 a b c 1
# after execute added lines
> new
ABC Total
0 abc 2
1 xyz 1