Search code examples
pythonpandasdataframetransformationunpivot

How to transform survey pandas dataframe into a different format usable with BI tools in Python?


I need to convert survey results into something that is usable in a BI tool like Tableau.

The survey is in the format of the following dataframe

df = pd.DataFrame({'Respondent': ['Sally', 'Tony', 'Fred'],
               'What project did you work on with - Chris?': ['Project A','Project B', np.nan], 
               'What score would you give - Chris': [9,7,np.nan], 
               'Any other feedback for - Chris': ['Random Comment','Okay performance',np.nan],
               'What project did you work on with - Matt?': [np.nan,'Project C', 'Project X'], 
               'What score would you give - Matt': [np.nan,9,8], 
               'Any other feedback for - Matt': [np.nan, 'Great to work with Matt', 'Work was just okay'],
               'What project did you work on with - Luke?': ['Project B','Project D', 'Project Y'], 
               'What score would you give - Luke': [10,8,7], 
               'Any other feedback for - Luke': ['Work was excellent', 'Was a bit technical', 'Another Random Comment'],
              })

I need this to be transformed into a format like below:

df = pd.DataFrame({'Name': ['Chris','Chris','Matt','Matt','Luke','Luke','Luke'],
               'Assessor': ['Sally','Tony','Tony','Fred','Sally','Tony','Fred'], 
               'Project Name': ['Project A', 'Project B', 'Project C', 'Project X', 'Project B', 'Project D', 'Project Y'], 
               'NPS Score': [9,7,9,8,10,8,7],
               'Feedback': ['Random Comment','Okay performance','Great to work with Matt','Work was just okay','Work was excellent','Was a bit technical','Another Random Comment']
              })

As you can see, it needs to be able to pull the names from the columns. The real data is actually much larger so I need the code to work with any size and not just for this example.


Solution

  • new_data = pd.DataFrame(columns = ["Assessor", "Project Name","NPS Score","Feedback", "Name"])
    i = 1
    while i < (len(df.columns)):
        data = df.iloc[:,[0,i,i+1,i+2]]
        data["Name"] = str(data.columns[-1].split(" ")[-1])
        data.columns = ["Assessor", "Project Name","NPS Score","Feedback","Name"]
        new_data = new_data.append(data)
        i = i + 3
        
    new_data = new_data.reset_index(drop = True)
    new_data