Search code examples
pythoncsvhyperlinkconcatenation

How to concatenate columns in CSV file using Python and Count the Total per UniqueID?


This question have been asked multiple times in this community but I couldn't find the correct answers since I am beginner in Python. I got 2 questions actually:

  1. I want to concatenate 3 columns (A,B,C) with its value into 1 Column. Header would be ABC.

import os import pandas as pd

directory = 'C:/Path' ext = ('.csv')

for filename in os.listdir(directory): f = os.path.join(directory, filename)

if f.endswith(ext):

    head_tail = os.path.split(f)
    head_tail1 = 'C:/Output'
    k =head_tail[1]
    r=k.split(".")[0]

    p=head_tail1 + "/" + r + " - Revised.csv"

    mydata = pd.read_csv(f)

    new =mydata[["A","B","C","D"]]
    new = new.rename(columns={'D': 'Total'})
    new['Total'] = 1
    new.to_csv(p ,index=False)
  1. Once concatenated, is it possible to count the uniqueid and put the total in Column D? Basically, to get the total count per uniqueid (Column ABC),the data can be found on a link when you click that UniqueID. For ex: Column ABC - uniqueid1, -> click -> go to the next page, total of that uniqueid.

On the link page, you can get the total numbers of uniqueid by Serial ID

I have no idea how to do this, but I would really appreciate if someone can help me on this project and would learn a lot from this.

Thank you very much. God Bless

Searched in Google, Youtube and Stackoverflow, couldn't find the correct answer.


Solution

  • I'm not sure that I understand your question correctly. However, if you know exactly the column names (e.g., A, B, and C) that you want to concatenate you can do something like code below.

    ''.join(merge_columns) is to concatenate column names.

    new[merge_columns].apply(lambda x: ''.join(x), axis=1) is to concatenate their values.

    Then, you can count unique values of the new column using groupby().count().

    new = mydata[["A","B","C","D"]]
    new = new.rename(columns={'D': 'Total'})
    new['Total'] = 1
    
    # added lines
    merge_columns = ['A', 'B', 'C']
    merged_col = ''.join(merge_columns)
    new[merged_col] = new[merge_columns].apply(lambda x: ''.join(x), axis=1)
    new.drop(merge_columns, axis=1, inplace=True)
    new = new.groupby(merged_col).count().reset_index()
    
    new.to_csv(p ,index=False)
    

    example:

    # before
    
    >     new
    
       A  B  C  Total
    0  a  b  c      1
    1  x  y  z      1
    2  a  b  c      1
    
    # after execute added lines
    
    >     new
    
       ABC  Total
    0  abc      2
    1  xyz      1