Search code examples
pythonpandasstring-formatting

python string formatting variables' names from a list


Hi everyone I'm trying to define a set of variables and I want to format their names.

The set up is:

features=['Gender','Age','Rank'] + other11columns #selected columns of my data

    In [1]:data['Gender'].unique()
    Out[1]: array([0, 1], dtype=int64)

    In [2]:data['Age'].unique()
    Out[2]: array([10, 20, 30, 40, 50], dtype=int64)

    In [3]:data['Rank'].unique()
    Out[3]: array([0, 1, 2, 3, 4, 5, 6], dtype=int64)

    .....

first I want to set up some empty data frames with each tag. I want something like these:

report_Gender
Out[3]: 
  Prediction Actual
0        NaN    NaN
1        NaN    NaN

report_Age
Out[5]: 
  Prediction Actual
10        NaN    NaN
20        NaN    NaN
30        NaN    NaN
40        NaN    NaN
50        NaN    NaN

report_Rank
Out[6]: 
  Prediction Actual
0        NaN    NaN
1        NaN    NaN
2        NaN    NaN
3        NaN    NaN
4        NaN    NaN
5        NaN    NaN
6        NaN    NaN

....... 

The following code doesn't work but indicates what I want to do

for i in range(len(features)-1):
    report_features[i]=pd.DataFrame(index=data[feature[i]].unique(),columns=['Prediction','Actual'])

I tried to play with the string formatting with %s operation but didn't figure out how to put in variables' name... any help is appreciated :)


Solution

  • Dynamically creating global variables can get hairy. It is much easier if you put it in a smaller scope ==> any object, e.g., a dictionary. You can achieve what you want like this

    my_dictionary = dict()
    for f in features:
        my_dictionary['report_{}'.format(f)] = pd.DataFrame(index=data[f].unique(),columns=['Prediction','Actual'])
    

    You can access the df like my_dictionary['report_Gender'] for example.

    Another way would be to create a class:

    class Reports:
        pass
    
    for f in features:
        setattr(Reports, 'report_{}'.format(f), pd.DataFrame(index=data[f].unique(),columns=['Prediction','Actual'])
    

    Then access as Reports.report_Gender etc...