Using Python 3.2 I was hoping to solve the below issue. My data consist of hundreds of rows (signifying a project) and 21 columns. The first of which is a unique project ID and the other 20 columns is the group of people, or person, that led the project. person_1 is always filled and if there is a name in person_3 that means 3 people are working together. If there is a name in person_18 that means 18 people are working together.
I have an excel spreadsheet that is setup the following way:
unique ID person_1 person _2 person_3 person_4 ... person_20
12 Tom Sally Mike
16 Joe Mike
5 Joe Sally
1 Sally Mike Tom
6 Sally Tom Mike
2 Jared Joe Mike John ... Carl
I want to do a few things:
1) Make a column that will give me a unique 'Group Name' which will be, using unique ID 1 as my example, Sally/Mike/Tom. So it will be the names separated by '/'.
2) How can I treat, from my example, Sally/Mike/Tom the same as Sally/Tom/Mike. Meaning, I would like another column that makes the group name in alphabetical order (no matter the actual permutation), still separated by '/'.
3) This question is similar to (2). However, I want the person listed in person_1 to matter. Meaning Joe/Tom/Mike is different from Tom/Joe/Mike but not different than Joe/Mike/Tom. So there will be another column that keeps person_1 at the start of the group name but alphabetizes person_2 through person_20 if applicable (i.e., if the project has more than 1 person on it).
Thanks for the help and suggestions
The previous answer gave a clear statement of method, but perhaps you are stuck on either the string processing or the csv processing. Both are demonstrated in the following code. The relevant string methods are sorted
and join
. '/'.join
tells join
to use /
as separator between joined items. The +
operator between lists in tname
and writerow
statements concatenates the lists. A csv.reader
is an iterator that delivers one list per row, and a csv.writer
converts a list to a row and writes it out. You will want to add error testing to the file opens, etc. The data file used to test this code is shown after the code.
import csv
fi = open('xgroup.csv')
fo = open('xgroup3.csv', 'w')
w = csv.writer(fo)
r = csv.reader(fi)
li = 0
print "Opened reader and writer"
for row in r:
gname = '/'.join(row[1:])
sname = '/'.join(sorted(row[1:]))
tname = '/'.join([row[1]]+sorted(row[2:]))
w.writerow([row[0], gname, sname, tname]+row[1:])
li += 1
fi.close()
fo.close()
print "Closed reader and writer after",li,"lines"
File xgroup.csv
is shown next.
unique-ID,person_1,person,_2,person_3,person_4,...,person_20
12,Tom,Sally,Mike
16,Joe,Mike
5,Joe,Sally
1,Sally,Mike,Tom
6,Sally,Tom,Mike
2,Jared,Joe,Mike,John,...,Carl
Upon reading data as above, the program prints Opened reader and writer
and Closed reader and writer after 7 lines
and produces output in file xgroup3.csv
as shown next.
unique-ID,person_1/person/_2/person_3/person_4/.../person_20,.../_2/person/person_1/person_20/person_3/person_4,person_1/.../_2/person/person_20/person_3/person_4,person_1,person,_2,person_3,person_4,...,person_20
12,Tom/Sally/Mike,Mike/Sally/Tom,Tom/Mike/Sally,Tom,Sally,Mike
16,Joe/Mike,Joe/Mike,Joe/Mike,Joe,Mike
5,Joe/Sally,Joe/Sally,Joe/Sally,Joe,Sally
1,Sally/Mike/Tom,Mike/Sally/Tom,Sally/Mike/Tom,Sally,Mike,Tom
6,Sally/Tom/Mike,Mike/Sally/Tom,Sally/Mike/Tom,Sally,Tom,Mike
2,Jared/Joe/Mike/John/.../Carl,.../Carl/Jared/Joe/John/Mike,Jared/.../Carl/Joe/John/Mike,Jared,Joe,Mike,John,...,Carl
Note, given a data line like
5,Joe,Sally,,,,,
instead of
5,Joe,Sally
the program as above produces
5,Joe/Sally/////,/////Joe/Sally,Joe//////Sally,Joe,Sally,,,,,
instead of
5,Joe/Sally,Joe/Sally,Joe/Sally,Joe,Sally
If that's a problem, filter out empty entries. For example, if
row=['5', 'Joe', 'Sally', '', '', '', '', '']
, then
'/'.join(row[1:])
produces
'Joe/Sally/////'
, while
'/'.join(filter(lambda x: x, row[1:]))
and
'/'.join(x for x in row[1:] if x)
and
'/'.join(filter(len, row[1:]))
produce
'Joe/Sally'
.