Search code examples
pythoncsvtwitter

Python how to get the tweet data using specific word in csv file and put it in new csv file


I have data twitter in a CSV file (that I'm mining with a Python API). I get around 1000 lines of data. Now I want to shorten the tweet data using the specific Indonesian words “macet” or “kecelakaan” (in English “traffic” or “accident”) and put the matching rows into a new separate CSV file, just like in Excel using find all.

The sample data twitter is example1.csv and the new file which will be created after the search of the word "macet" or "kecelakaan" is example2.csv. But there is no result.

import re
import csv

with open('example1.csv', 'r') as csvFile:
    reader = csv.reader(csvFile)

if re.search(r'macet', reader):
    for row in reader:
        myData = list(row)
        print(row)

newFile = open('example2.csv', 'w')
with newFile:
    writer = csv.writer(newFile)
    writer.writerows(myData)

print("Writing complete")

I use spyder for environment Python 3.6.

The CSV file is already in the same folder with Spyder. Here is the screen capture image of my CSV twitter data

myCSVtwitterData

updated : Sample of csv file. OS using : Windows


Solution

  • There are a couple of problems with your code.

    In your reading loop you are passing a csv.reader object to re.search, but it doesn't know how to search that object. You need to pass it text or byte strings.

    The line

    myData = list(row)
    

    converts row into a new list and saves it to myData, but it's already a list, so no conversion is necessary. And that line replaces the previous contents of myData, but you actually want to save all the matching rows. However, there's no need to save the rows, you can just write them to the new file as you go.

    Anyway, here's a repaired version of your code. From the screen shot it looks like you only want to search the text in column 2 of the input data (which corresponds to column C in your spreadsheet). I've created a regex that searches for the whole words "macet" and "kecelakaan", the "\b" matches at word boundaries so we don't get a match if "macet" or "kecelakaan" is part of a larger word.

    import re
    import csv
    
    # Make a case-insensitive regex to match the words "macet" or "kecelakaan"
    pattern = re.compile(r'\bmacet\b|\bkecelakaan\b', re.I)
    
    with open('example1.csv', 'r', newline='') as csvFile, open('example2.csv', 'w', newline='') as newFile:
        reader = csv.reader(csvFile)
        writer = csv.writer(newFile)
    
        for row in reader:
            # Skip empty rows
            if not row:
                continue
            if pattern.search(row[2]):
                print(row)
                writer.writerow(row)
    
    print("Writing complete")
    

    I've just made a couple of improvements to that code. It now uses the newline='' arg to open the CSV files, and it skips any empty lines in the input CSV. And the regex now ignores the case when looking for matching words.