I am need to assign a new list value in a column each time through a for loop. My problem is that I'm getting the last list value assigned to all the rows. In other words, the last list item appears in every row of the column "Plant_Name". The first 5 rows below should display in 'Plant_Name' as Caetite I as you can see from the 1st list item below.
My dataframe "dfn" looks like this -
dfn.head()
Out[191]:
YYMMDD HHMM BestGuess(kWh) Plant_Name
0 20180101 100 23405.0 SANTANA II
1 20180101 200 23302.0 SANTANA II
2 20180101 300 22885.0 SANTANA II
3 20180101 400 21315.0 SANTANA II
4 20180101 500 19476.0 SANTANA II
And, the last rows look like this -
dfn.tail()
Out[190]:
YYMMDD HHMM BestGuess(kWh) Plant_Name
26299 20201231 2000 9933.38 SANTANA II
26300 20201231 2100 19419.90 SANTANA II
26301 20201231 2200 22468.79 SANTANA II
26302 20201231 2300 22244.09 SANTANA II
26303 20210101 0 18975.02 SANTANA II
The list values of 'li' looks like this:
li
Out[192]:
['CAETITE I',
'CAETITE II',
'CALANGO I',
'CALANGO II',
'LAGOA I',
'LAGOA II',
'SANTANA I',
'ARIZONA I',
'MEL II',
'CAETITE III',
'CALANGO III',
'CALANGO IV',
'CALANGO V',
'CALANGO VI',
'CANOAS',
'RIO DO FOGO',
'SANTANA II']
My code looks like this to create the appended list and the dataframe 'dfn':
import pandas as pd
import glob
from sys import exit
import re
pd.set_option('display.max_columns', None)
path = r'\\porfiler03\\gtdshare\\VORTEX\\OBS\\ALL\\' # use your path
all_files = glob.glob(path + "/*BRASIL*.csv")
li = []
temp = []
count = 0
for filename in all_files:
count = count + 1
df = pd.read_csv(filename, index_col=None, header=0)
temp = re.sub(r'.*\.(.*)\.Power.*', r'\1', filename)
li.append(temp) #a list of sites in order of reading files from path
dfn = df.assign(Plant_Name=li[count-1])
exit()
IIUC:
import pandas as pd
import glob
from sys import exit
import re
pd.set_option('display.max_columns', None)
path = r'\\porfiler03\\gtdshare\\VORTEX\\OBS\\ALL\\' # use your path
all_files = glob.glob(path + "/*BRASIL*.csv")
dfn = pd.concat({
re.sub(r'.*\.(.*)\.Power.*', r'\1', filename): pd.read_csv(filename, index_col=None, header=0)
for filename in all_files
}, names=['Plant_Name']).reset_index('Plant_Name')
exit()