I have multiple csv files (Each file contains N number of Rows (e.g., 1000 rows) and 43 Columns).
I would like to read several csv files from a folder into pandas and merge them into one DataFrame.
I have not been able to figure it out though.
The problem is that, the final output of the DataFrame (i.e., frame = pd.concat(li, axis=0, ignore_index=True)
) merge all columns (i.e., 43 columns) into one column (see the attached image)
Screenshot of the code
an example of selected rows and columns (file one)
Client_ID Client_Name Pointer_of_Bins Date Weight
C0000001 POLYGONE TI006093 12/03/2019 0.5
C0000001 POLYGONE TI006093 12/03/2019 0.6
C0000001 POLYGONE TI006093 12/03/2019 1.4
C0000001 POLYGONE TI006897 14/03/2019 2.9
an example of selected rows and columns (file two) Client_ID Client_Name Pointer_of_Bins Date Weight C0000001 POLYGONE TI006093 22/04/2019 1.5 C0000001 ALDI TI006098 22/04/2019 0.7 C0000001 ALDI TI006098 22/04/2019 2.4 C0000001 ALDI TI006898 24/04/2019 1.9
The expected outputs would look like this (merge of multiple files that might contains thousands of rows and several columns, as the attached data is just an example, while the actual csv files might contain thousands of rows and more than 45 columns in each file)
Client_ID Client_Name Pointer_of_Bins Date Weight
C0000001 POLYGONE TI006093 12/03/2019 0.5
C0000001 POLYGONE TI006093 12/03/2019 0.6
C0000001 POLYGONE TI006093 12/03/2019 1.4
C0000001 POLYGONE TI006897 14/03/2019 2.9
C0000001 POLYGONE TI006093 22/04/2019 1.5
C0000001 ALDI TI006098 22/04/2019 0.7
C0000001 ALDI TI006098 22/04/2019 2.4
C0000001 ALDI TI006898 24/04/2019 1.9
TO Download the two CSV files, click here (dummy data
Here is what I have done so far:
import pandas as pd
import glob
path = r'C:\Users\alnaffakh\Desktop\doc\Data\data2\Test'
all_files = glob.glob(path + "/*.csv")
li = []
for filename in all_files:
df = pd.read_csv(filename, sep='delimiter', index_col=None, header=0)
# df = pd.read_csv(filename, sep='\t', index_col=None, header=0)
li.append(df)
frame = pd.concat(li, axis=0, ignore_index=True)
You could use pandas.concat
to recursively concatenate the .csv
file contents.
In fact, I see that you used it and your application of concat
seems fine to me. Try investigating the individual dataframes that you read. The only way your columns could merge into a single column is if you did not mention the correct delimiter.
import pandas as pd
dfs = list()
for filename in filesnames:
df = pd.read_csv(filename)
dfs.append(df)
frame = pd.concat(dfs, axis=0, ignore_index=True)
df.head()
Since the dummy data available is not in text format yet, I am using just some dummy data I made.
import pandas as pd
from io import StringIO # needed for string to dataframe conversion
file1 = """
Col1 Col2 Col3 Col4 Col5
1 ABCDE AE10 CD11 BC101F
2 GHJKL GL20 JK22 HJ202M
3 MNPKU MU30 PK33 NP303V
4 OPGHD OD40 GH44 PG404E
5 BHZKL BL50 ZK55 HZ505M
"""
file2 = """
Col1 Col2 Col3 Col4 Col5
1 AZYDE AE10 CD11 BC100F
2 GUFKL GL24 JK22 HJ207M
3 MHPRU MU77 PK39 NP309V
4 OPGBB OE90 GH41 PG405N
5 BHTGK BL70 ZK53 HZ508Z
"""
Load data as individual dataframes and then concatenate them.
df1 = pd.read_csv(StringIO(file1), sep='\t')
df2 = pd.read_csv(StringIO(file2), sep='\t')
print(pd.concat([df1, df2], ignore_index=True))
Output:
Col1 Col2 Col3 Col4 Col5
0 1 ABCDE AE10 CD11 BC101F
1 2 GHJKL GL20 JK22 HJ202M
2 3 MNPKU MU30 PK33 NP303V
3 4 OPGHD OD40 GH44 PG404E
4 5 BHZKL BL50 ZK55 HZ505M
5 1 AZYDE AE10 CD11 BC100F
6 2 GUFKL GL24 JK22 HJ207M
7 3 MHPRU MU77 PK39 NP309V
8 4 OPGBB OE90 GH41 PG405N
9 5 BHTGK BL70 ZK53 HZ508Z