Search code examples
pythonfor-looppandasdataframenaivebayes

Preventing pandas data frame header row from repeating in for statement


I am iterating through pipeline to print out the 20 most informative features for a class called safety.

classnum_saf = 3
inds = np.argsort(clf_3.named_steps['clf'].coef_[classnum_saf, :])[-20:]
for i in inds: 
   f = feature_names[i]
   c = clf_3.named_steps['clf'].coef_[classnum_saf, [i]]
   print(f,c)
   output = {'features':f, 'coefficients':c}
   df = pd.DataFrame(output, columns = ['features', 'coefficients'])
   print(df)

I want a data frame outputted with only one header, but instead I'm returning this output which appears to repeat the header over and over again since it's iterating through [i].

   1800 [-8.73800344]
   features  coefficients
   0     1800     -8.738003
   hr [-8.73656027]
   features  coefficients
   0       hr      -8.73656
   wa [-8.7336777]
   features  coefficients
   0       wa     -8.733678
   1400 [-8.72197545]
   features  coefficients
   0     1400     -8.721975
   hrwa [-8.71952656]
   features  coefficients
   0     hrwa     -8.719527
   perimeter [-8.71173264]
   features  coefficients
   0  perimeter     -8.711733
   response [-8.67388885]
   features  coefficients
   0  response     -8.673889
   analysis [-8.65460329]
   features  coefficients
   0  analysis     -8.654603
   00 [-8.58386785]
   features  coefficients
   0       00     -8.583868
   raw [-8.56148006]
   features  coefficients
   0      raw      -8.56148
   run [-8.51374794]
   features  coefficients
   0      run     -8.513748
   factor [-8.50725691]
   features  coefficients
   0   factor     -8.507257
   200 [-8.50334896]
   features  coefficients
   0      200     -8.503349
   file [-8.39990841]
   features  coefficients
   0     file     -8.399908
   pb [-8.38173753]
   features  coefficients
   0       pb     -8.381738
   mar [-8.21304343]
   features  coefficients
   0      mar     -8.213043
   1998 [-8.21239836]
   features  coefficients
   0     1998     -8.212398
   signal [-8.02426499]
   features  coefficients
   0   signal     -8.024265
   area [-8.01782987]
   features  coefficients
   0     area      -8.01783
   98 [-7.3166918]
   features  coefficients
   0       98     -7.316692

How do I return a data frame like:

          features     coefficients
   0      1800          -8.738003
   ..     ...           ...
   18     area          -8.01783
   19     98            -7.316692

Right now when I return print(d,f), it shows the following top values:

   1800 [-8.73800344]
   hr [-8.73656027]
   wa [-8.7336777]
   1400 [-8.72197545]
   hrwa [-8.71952656]
   perimeter [-8.71173264]
   response [-8.67388885]
   analysis [-8.65460329]
   00 [-8.58386785]
   raw [-8.56148006]
   run [-8.51374794]
   factor [-8.50725691]
   200 [-8.50334896]
   file [-8.39990841]
   pb [-8.38173753]
   mar [-8.21304343]
   1998 [-8.21239836]
   signal [-8.02426499]
   area [-8.01782987]
   98 [-7.3166918]

I researched a few similar questions here, here, and here, but it doesn't seem to directly address my question.

Thank you in advance, still learning here.


Solution

  • I try simulate some data and you can append list to L in each step in loop and last create df from L:

    L = []
    classnum_saf = 3
    inds = np.argsort(clf_3.named_steps['clf'].coef_[classnum_saf, :])[-20:]
    for i in inds: 
       f = feature_names[i]
       c = clf_3.named_steps['clf'].coef_[classnum_saf, [i]]
       print(f,c)
       #add [0] for removing list of list (it works nice if len of f[i] == 1)
       L.append([c[i], f[i][0]])
    
    df = pd.DataFrame(L, columns = ['features', 'coefficients'])
    print(df) 
    

    Sample:

    import pandas as pd
    
    f = [[1],[2],[3]]
    c = ['a','b','c']
    
    L = []
    for i in range(3): 
    #   print(f[i],c[i])
       #swap c and f
       L.append([c[i], f[i][0]])
    
    print (L)
    [['a', 1], ['b', 2], ['c', 3]]
    
    df = pd.DataFrame(L, columns = ['features', 'coefficients'])
    print(df)  
    
      features  coefficients
    0        a             1
    1        b             2
    2        c             3