I am trying to create a spare matrix assigning 0 and 1 to different roles in order to match two different dataframes. However, I am struggling with the format of the data:
data = {'Location': {0: 'Madrid',
1: 'Barcelona',
2: 'Paris',
3: 'London ',
4: 'New York',
5: 'Berlin',
6: 'Birminham',
7: 'Tanzania'},
'Description': {0: 'M3',
1: 'P5',
2: 'M3P5',
3: 'M3',
4: 'M3P5T8',
5: 'P5T8',
6: '',
7: 'FT7 M3'},
'Branch_A': {0: 'Auditor or Auditor(S), Accountant or Accountant(S), PayRoll_Manager, 2 Brand_Manager, 3 IT_Support, Business_analyst, Developer, Cyber_security',
1: 'Accountant or Accountant(S), PayRoll_Manager, Brand_Manager, 2 Developer, Cyber_security',
2: 'Auditor or Auditor(S), 2 Accountant, Business_analyst, Developer, Cyber_security',
3: "Auditor or Auditor(S), Accountant, PayRoll_Manager, 3 Brand_Manager, 2 IT_Support, Business_analyst, Developer, Cyber_security'",
4: 'Auditor or Auditor(S), Accountant or Accountant(S), PayRoll_Manager, Brand_Manager, IT_Support, Business_analyst, Developer, Cyber_security',
5: 'Auditor or Auditor(S), 2 PayRoll_Manager, Brand_Manager, 2 Business_analyst, Developer, Cyber_security',
6: '----',
7: 'Auditor or Auditor(S), IT_Support, Business_analyst, Developer, Cyber_security'},
'Branch_B': {0: 'Accountant or Accountant(S), PayRoll_Manager, Brand_Manager',
1: 'Accountant or Accountant(S), PayRoll_Manager, Brand_Manager',
2: 'Accountant or Accountant(S), PayRoll_Manager, Brand_Manager',
3: 'Accountant or Accountant(S), PayRoll_Manager, Brand_Manager',
4: '',
5: 'Accountant or Accountant(S), PayRoll_Manager, Brand_Manager, Developer',
6: '',
7: ''},
'Branch_C': {0: 'IT_Support, Business_analyst, Developer, Cyber_security',
1: 'IT_Support, Business_analyst, Developer, Cyber_security',
2: 'IT_Support, Business_analyst, Developer, Cyber_security',
3: '',
4: 'IT_Support, Business_analyst, Developer, Cyber_security',
5: 'IT_Support, Business_analyst, Developer, Cyber_security',
6: '----',
7: ''}}
I managed to create a list of dictionaries so I can split every single role:
def extract_data(row):
positions = row['Branch_A'].split(',')
result = []
for pos in positions:
result.append({pos})
return result
df['Branch_A'] = df.apply(extract_data, axis=1)
Obtaining:
df['Branch_A']
0 [{Auditor or Auditor(S)}, { Accountant or Acco...
1 [{Accountant or Accountant(S)}, { PayRoll_Mana...
2 [{Auditor or Auditor(S)}, { 2 Accountant}, { B...
3 [{Auditor or Auditor(S)}, { Accountant}, { Pay...
4 [{Auditor or Auditor(S)}, { Accountant or Acco...
5 [{Auditor or Auditor(S)}, { 2 PayRoll_Manager}...
6 [{----}]
7 [{Auditor or Auditor(S)}, { IT_Support}, { Bus...
What I am trying to do is, where I have a number in front of the role, repeat that role as many time as the number says. The idea is to fill those roles with workers that I have in another dataframe, however I do not think that I can find a code that would understand that I want 2 PayRoll Managers for example. Also, in the workers dataframe I have some workers that are Auditor and others who are Auditor(S). Is there any way where I can write a code that will use an "or" operator?
Thank youuu
What you expect it's not really clear but you can use the following code as starting point to extract number of roles per location
pat = r'(?P<Number>\d*)?\s*(?P<Role>[^,]+),?\s*'
out = df2.melt(['Location', 'Description'], var_name='Branch', value_name='Positions')
out = out.join(out.pop('Positions').str.extractall(pat).fillna(1).droplevel(1))
Output:
>>> out
Location Description Branch Number Role
0 Madrid M3 Branch_A 1 Auditor or Auditor(S)
0 Madrid M3 Branch_A 1 Accountant or Accountant(S)
0 Madrid M3 Branch_A 1 PayRoll_Manager
0 Madrid M3 Branch_A 2 Brand_Manager
0 Madrid M3 Branch_A 3 IT_Support
.. ... ... ... ... ...
21 Berlin P5T8 Branch_C 1 Business_analyst
21 Berlin P5T8 Branch_C 1 Developer
21 Berlin P5T8 Branch_C 1 Cyber_security
22 Birminham Branch_C 1 ----
23 Tanzania FT7 M3 Branch_C NaN NaN
[88 rows x 5 columns]