python dataframe web-scraping beautifulsoup export-to-csv

After writing data to a csv file from a list, some columns cells are empty

I have a code which scrapes rotten tomatoes website for top 100 movies. After parsing, the data was put into a list. Here is the code:

# create and write headers to a list 
rows = []
rows.append(['Rank', 'Rating', 'Title', 'No. of Reviews'])
print(rows)

# loop over results
for result in results:
    # find all columns per result
    data = result.find_all('td')
    # check that columns have data 
    if len(data) == 0: 
        continue
        
    # write columns to variables
    rank = data[0].getText()
    rating = data[1].getText()
    title = data[2].getText()
    reviews = data[3].getText()
    
    # write each result to rows
    rows.append([rank, rating, title, reviews])
    
print(rows)

And the output looks like this:

[['Rank', 'Rating', 'Title', 'No. of Reviews'], ['1.', '\n\n\n\xa096%\n\n', '\n\n            Black Panther (2018)\n', '503'], ['2.', '\n\n\n\xa094%\n\n', '\n\n            Avengers: Endgame (2019)\n', '514'], ['3.', '\n\n\n\xa093%\n\n', '\n\n            Us (2019)\n', '520'], ['4.', '\n\n\n\xa097%\n\n', '\n\n            Toy Story 4 (2019)\n', '433'], ['5.', '\n\n\n\xa098%\n\n', '\n\n           The Wizard of Oz (1939)\n', '117'], ['6.', '\n\n\n\xa099%\n\n', '\n\n  Lady Bird (2017)\n', '388']...

Then I wrote the data to a csv file.

# Create csv and write rows to output file
with open('rottentomato.csv','w', newline='') as f_output:
    csv_output = csv.writer(f_output)
    csv_output.writerows(rows)

But only column 'Rank' and 'No. of Reviews' have data. Column 'Rating' and 'Title' are empty.

Solution

I tried to reproduce your problem the only issue I found was that the special chars where creating empty spaces. You can clean those with strip

import csv
rows = [['Rank', 'Rating', 'Title', 'No. of Reviews'], ['1.', '\n\n\n\xa096%\n\n', '\n\nBlack Panther (2018)\n', '503'], ['2.', '\n\n\n\xa094%\n\n', '\n\nAvengers: Endgame (2019)\n', '514'], ['3.', '\n\n\n\xa093%\n\n', '\n\nUs (2019)\n', '520'], ['4.', '\n\n\n\xa097%\n\n', '\n\nToy Story 4 (2019)\n', '433'], ['5.', '\n\n\n\xa098%\n\n', '\n\nThe Wizard of Oz (1939)\n', '117'], ['6.', '\n\n\n\xa099%\n\n', '\n\nLady Bird (2017)\n', '388']]
for i, row in enumerate(rows):
    for j, data in enumerate(row):
        rows[i][j] = data.strip()

with open('rottentomato.csv','w', newline='') as f_output:
    csv_output = csv.writer(f_output)
    csv_output.writerows(rows)

This was the output I got: Rank,Rating,Title,No. of Reviews
1.,96%,Black Panther (2018),503
2.,94%,Avengers: Endgame (2019),514
3.,93%,Us (2019),520
4.,97%,Toy Story 4 (2019),433
5.,98%,The Wizard of Oz (1939),117
6.,99%,Lady Bird (2017),388