Search code examples
pythonexcelcsvdata-cleaning

How can I get ride of the brackets and apostrophe (') in the language column in csv


Excel Sheet I need help with data cleaning. How do I make the language column(D) into form in column(F) Basically just get rid of the brackets and apostrophe, and save the comma between each language. It can be done either using python or excel itself Thanks!

I tried to google but it didn't work


Solution

  • You could use pandas to read the csv into a dataframe then "apply" a function to the column that did something like this:

    def clean(value: str) -> str:
        for c in "[']":
            value = value.replace(c, "")
        return value
    

    It's worth noting if you remove the brackets you'll have an "embedded" delimiter meaning you wont be able to save this as a "csv" without a few headaches.

    If you want to clean it up in Excel you could, similarly to my formula above, add a formula that replaces all of the unwanted characters with an empty string with something like this:

    =SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(D2,"'",""),"[",""),"]","")
    

    *where D2 is the first language cell

    FWIW, I'd take a few minutes to play with this in pandas too -- it's always good to pick up a new skill