Search code examples
pythonpandascsvdataframeconcatenation

Importing multiple csv files into pandas and merge them into one DataFrame


I have multiple csv files (Each file contains N number of Rows (e.g., 1000 rows) and 43 Columns).

I would like to read several csv files from a folder into pandas and merge them into one DataFrame.

I have not been able to figure it out though.

The problem is that, the final output of the DataFrame (i.e., frame = pd.concat(li, axis=0, ignore_index=True) ) merge all columns (i.e., 43 columns) into one column (see the attached image) Screenshot of the code

an example of selected rows and columns (file one)

               Client_ID    Client_Name  Pointer_of_Bins   Date        Weight
                C0000001       POLYGONE      TI006093     12/03/2019   0.5
                C0000001       POLYGONE      TI006093     12/03/2019   0.6
                C0000001       POLYGONE      TI006093     12/03/2019   1.4
                C0000001       POLYGONE      TI006897     14/03/2019   2.9

an example of selected rows and columns (file two) Client_ID Client_Name Pointer_of_Bins Date Weight C0000001 POLYGONE TI006093 22/04/2019 1.5 C0000001 ALDI TI006098 22/04/2019 0.7 C0000001 ALDI TI006098 22/04/2019 2.4 C0000001 ALDI TI006898 24/04/2019 1.9

The expected outputs would look like this (merge of multiple files that might contains thousands of rows and several columns, as the attached data is just an example, while the actual csv files might contain thousands of rows and more than 45 columns in each file)

               Client_ID    Client_Name  Pointer_of_Bins   Date        Weight
                C0000001       POLYGONE      TI006093     12/03/2019   0.5
                C0000001       POLYGONE      TI006093     12/03/2019   0.6
                C0000001       POLYGONE      TI006093     12/03/2019   1.4
                C0000001       POLYGONE      TI006897     14/03/2019   2.9   
                C0000001       POLYGONE      TI006093     22/04/2019   1.5
                C0000001       ALDI          TI006098     22/04/2019   0.7
                C0000001       ALDI          TI006098     22/04/2019   2.4
                C0000001       ALDI          TI006898     24/04/2019   1.9                                                             

TO Download the two CSV files, click here (dummy data

Here is what I have done so far:

import pandas as pd
import glob
path = r'C:\Users\alnaffakh\Desktop\doc\Data\data2\Test'
all_files = glob.glob(path + "/*.csv")
li = []
for filename in all_files:
    df = pd.read_csv(filename, sep='delimiter', index_col=None, header=0)
  # df = pd.read_csv(filename, sep='\t', index_col=None, header=0)
    li.append(df)
frame = pd.concat(li, axis=0, ignore_index=True)


Solution

  • Solution

    You could use pandas.concat to recursively concatenate the .csv file contents.
    In fact, I see that you used it and your application of concat seems fine to me. Try investigating the individual dataframes that you read. The only way your columns could merge into a single column is if you did not mention the correct delimiter.

    import pandas as pd
    
    dfs = list()
    for filename in filesnames:    
        df = pd.read_csv(filename)    
        dfs.append(df)
    frame = pd.concat(dfs, axis=0, ignore_index=True)
    df.head()
    

    Example with Dummy Data

    Since the dummy data available is not in text format yet, I am using just some dummy data I made.

    import pandas as pd
    from io import StringIO # needed for string to dataframe conversion
    
    file1 = """
    Col1    Col2    Col3    Col4    Col5
    1   ABCDE   AE10    CD11    BC101F
    2   GHJKL   GL20    JK22    HJ202M
    3   MNPKU   MU30    PK33    NP303V
    4   OPGHD   OD40    GH44    PG404E
    5   BHZKL   BL50    ZK55    HZ505M
    """
    
    file2 = """
    Col1    Col2    Col3    Col4    Col5
    1   AZYDE   AE10    CD11    BC100F
    2   GUFKL   GL24    JK22    HJ207M
    3   MHPRU   MU77    PK39    NP309V
    4   OPGBB   OE90    GH41    PG405N
    5   BHTGK   BL70    ZK53    HZ508Z
    """
    

    Load data as individual dataframes and then concatenate them.

    df1 = pd.read_csv(StringIO(file1), sep='\t')
    df2 = pd.read_csv(StringIO(file2), sep='\t')
    print(pd.concat([df1, df2], ignore_index=True))
    

    Output:

       Col1   Col2  Col3  Col4    Col5
    0     1  ABCDE  AE10  CD11  BC101F
    1     2  GHJKL  GL20  JK22  HJ202M
    2     3  MNPKU  MU30  PK33  NP303V
    3     4  OPGHD  OD40  GH44  PG404E
    4     5  BHZKL  BL50  ZK55  HZ505M
    5     1  AZYDE  AE10  CD11  BC100F
    6     2  GUFKL  GL24  JK22  HJ207M
    7     3  MHPRU  MU77  PK39  NP309V
    8     4  OPGBB  OE90  GH41  PG405N
    9     5  BHTGK  BL70  ZK53  HZ508Z