Search code examples
pandaslistloopsassign

Pandas Assign New List Value to Rows in For Loop


I am need to assign a new list value in a column each time through a for loop. My problem is that I'm getting the last list value assigned to all the rows. In other words, the last list item appears in every row of the column "Plant_Name". The first 5 rows below should display in 'Plant_Name' as Caetite I as you can see from the 1st list item below.

My dataframe "dfn" looks like this -

dfn.head()
Out[191]: 
      YYMMDD  HHMM  BestGuess(kWh)  Plant_Name
0  20180101   100         23405.0  SANTANA II
1  20180101   200         23302.0  SANTANA II
2  20180101   300         22885.0  SANTANA II
3  20180101   400         21315.0  SANTANA II
4  20180101   500         19476.0  SANTANA II

And, the last rows look like this -

dfn.tail()
Out[190]: 
         YYMMDD  HHMM  BestGuess(kWh)  Plant_Name
26299  20201231  2000         9933.38  SANTANA II
26300  20201231  2100        19419.90  SANTANA II
26301  20201231  2200        22468.79  SANTANA II
26302  20201231  2300        22244.09  SANTANA II
26303  20210101     0        18975.02  SANTANA II

The list values of 'li' looks like this:

li
Out[192]: 
['CAETITE I',
'CAETITE II',
'CALANGO I',
'CALANGO II',
'LAGOA I',
'LAGOA II',
'SANTANA I',
'ARIZONA I',
'MEL II',
'CAETITE III',
'CALANGO III',
'CALANGO IV',
'CALANGO V',
'CALANGO VI',
'CANOAS',
'RIO DO FOGO',
'SANTANA II']

My code looks like this to create the appended list and the dataframe 'dfn':

import pandas as pd
import glob
from sys import exit
import re

pd.set_option('display.max_columns', None)

path = r'\\porfiler03\\gtdshare\\VORTEX\\OBS\\ALL\\' # use your path

all_files = glob.glob(path + "/*BRASIL*.csv")

li = []
temp = []

count = 0
for filename in all_files:
    count = count + 1
    df = pd.read_csv(filename, index_col=None, header=0) 
    temp = re.sub(r'.*\.(.*)\.Power.*', r'\1', filename)
    li.append(temp) #a list of sites in order of reading files from path
    dfn = df.assign(Plant_Name=li[count-1])

exit()

Solution

  • IIUC:

    import pandas as pd
    import glob
    from sys import exit
    import re
    
    pd.set_option('display.max_columns', None)
    
    path = r'\\porfiler03\\gtdshare\\VORTEX\\OBS\\ALL\\' # use your path
    
    all_files = glob.glob(path + "/*BRASIL*.csv")
    
    dfn = pd.concat({
        re.sub(r'.*\.(.*)\.Power.*', r'\1', filename): pd.read_csv(filename, index_col=None, header=0)
        for filename in all_files
    }, names=['Plant_Name']).reset_index('Plant_Name')
    
    
    exit()