How can I read and group this CSV data?

The csv looks like this. '|' means different columns.

2014-09-01 | I love chicken

2014-09-01 | I eat chicken

2014-09-02 | She loves chicken

2014-09-02 | Ha ha ha I love chicken

2014-09-03 | Blah Blah Blah

I want to treat the data so it would look like this.

2014-09-01 | 'i', 2 | 'love', 1 | 'chicken', 2 | 'eat', 1 |

2014-09-02 | 'she', 1 | 'love', 2 | 'chicken', 2 | 'ha', 3 | 'I', 1 |

2014-09-03 | 'blah', 3 |

DATE | WORD, WORDCOUNTS | WORD2, WORDCOUNTS2 | ...

What approach should I use here? I ultimately want to plot a graph that shows Date on x-axis and word counts (frequency) on the y-axis.

Below is my best approach yet.

TestStartDate = "2013-11-11"
TestEndDate = "2014-06-10"

with open('Simplified.csv') as f:
    reader = csv.reader(f)
    for row in reader:
        if str(row[0:1])[2:12] == TestStartDate:
            #str(row[1:2])[2:str(row[1:2]).find('"')-1] is the second column
            tagger = MeCab.Tagger()
            rose = tagger.parse(str(row[1:2])[2:str(row[1:2]).find('"')-1])
            #print rose
            wordCount = {}
            wordList = rose.split()[:-1:2]
            for word in wordList:
                wordCount.setdefault(word, 0)
                wordCount[word] += 1
            for word, count in wordCount.items():
                print '"%s, %i"' % (word, count)

I plan to add word and count into Data.

Solution

this works for me ~ and do you really need the last '|' ? because when you split it with '|' again when you put it into matplotlib or something else, you 'll get a '' in your result.

the code below will not append a '|' to each row of result, if you think it's necessary, just append a '|' to the function d, like this:

return '%s| %s|'%(tokens[0],'|'.join(["'%s',%s"%(word,words.count(word)) for word in set(words)]))

===========

def d(s):
    tokens = s.split('|')
    words = tokens[-1].strip().lower().split(' ')
    return '%s| %s'%(tokens[0],'|'.join(["'%s',%s"%(word,words.count(word)) for word in set(words)]))

def wordcount():
    lines=[
        '2014-09-01 | I love chicken',
        '2014-09-01 | I eat chicken',
        '2014-09-02 | She loves chicken',
        '2014-09-02 | Ha ha ha I love chicken',
        '2014-09-03 | Blah Blah Blah'
    ]
    rows={}
    for line in lines:
        t_line = line.split(' | ')
        if t_line[0] not in rows:
            rows[t_line[0]]=''
        rows[t_line[0]]+=(' '+t_line[-1])
    newrows=[]
    for k,v in rows.items():
        newrows.append(d('%s | %s'%(k,v)))
    print '\n'.join(newrows)


>>2014-09-02 | 'love',1|'i',1|'she',1|'loves',1|'chicken',2|'ha',3
>>2014-09-03 | 'blah',3
>>2014-09-01 | 'i',2|'chicken',2|'love',1|'eat',1