Search code examples
pythoncsvpandasstrip

Remove trailing whitespaces from CSV


I am having trouble passing the stripped whitespace for stop_headsign back to stop_times for output to CSV. Alternatively, is there a way to .rstrip() the entire stop_headsign column?

Here is the stop_times.txt Gist.

Here is the pandas rstrip reference.

Below is my code:

import pandas as pd

stop_times = pd.read_csv('csv/stop_times.txt')

for x in stop_times['stop_headsign']:
    if type(x) == str:
        x = x.rstrip()
        # figure out how to pass store new value
    if type(x) == float:
        pass

stop_times['distance'] = 0

stop_times.to_csv('csv/stop_times.csv', index=False)

Below is what the csv output shows:

trip_id,arrival_time,departure_time,stop_id,stop_sequence,pickup_type,drop_off_type,stop_headsign,distance
568036,,,00382,26,0,0,78 UO                                             ,0
568036,,,00396,7,0,0,78 UO <> 78 via 18th AVE                          ,0
568036,,,00398,8,0,0,78 UO <> 78 via 18th AVE                          ,0
568036,,,00400,9,0,0,78 UO <> 78 via 18th AVE                          ,0
568036,,,00404,10,0,0,78 UO <> 78 via 18th AVE                          ,0
568036,,,00407,11,0,0,78 UO <> 78 via 18th AVE                          ,0
568036,,,00412,13,0,0,78 UO <> 78 via 18th AVE                          ,0
568036,,,00413,14,0,0,78 UO <> 78 via 18th AVE                          ,0
568036,,,00416,15,0,0,78 UO <> 78 via 18th AVE                          ,0
568036,,,00418,16,0,0,78 UO <> 78 via 18th AVE                          ,0
568036,,,00419,17,0,0,78 UO <> 78 via 18th AVE                          ,0
568036,,,00422,18,0,0,78 UO <> 78 via 18th AVE                          ,0
568036,,,00423,19,0,0,78 UO <> 78 via 18th AVE                          ,0
568036,,,00425,20,0,0,78 UO <> 78 via 18th AVE                          ,0
568036,,,00427,21,0,0,78 UO <> 78 via 18th AVE                          ,0
568036,,,01006,2,0,0,78 UO <> 78 via 18th AVE                          ,0

Solution

  • Pandas has a handy "extension" property on Series objects for this:

    stop_times["stop_headsign"] = stop_times["stop_headsign"].str.rstrip()
    

    Actually, your link is pointing to this, .str is of type StringMethods.

    There is a section Vectorized String Methods in the basics-documentation on this that links to Working with Text Data.