Search code examples
pythonreplacestring-conversion

How can i remove all extra characters from list of strings to convert to ints


Hi I'm pretty new to programming and Python, and this is my first post, so I apologize for any poor form.

I am scraping a website's download counts and am receiving the following error when attempting to convert the list of string numbers to integers to get the sum. ValueError: invalid literal for int() with base 10: '1,015'

I have tried .replace() but it does not seem to be doing anything.

And tried to build an if statement to take the commas out of any string that contains them: Does Python have a string contains substring method?

Here's my code:

    downloadCount = pageHTML.xpath('//li[@class="download"]/text()')
    downloadCount_clean = []

    for download in downloadCount:
        downloadCount_clean.append(str.strip(download))

    for item in downloadCount_clean:
        if "," in item:
            item.replace(",", "")
    print(downloadCount_clean)

    downloadCount_clean = map(int, downloadCount_clean)
    total = sum(downloadCount_clean)

Solution

  • Strings are not mutable in Python. So when you call item.replace(",", ""), the method returns what you want, but it is not stored anywhere (thus not in item).

    EDIT :

    I suggest this :

    for i in range(len(downloadCount_clean)):
        if "," in downloadCount_clean[i]:
            downloadCount_clean[i] = downloadCount_clean[i].replace(",", "")
    

    SECOND EDIT :

    For a bit more simplicity and/or elegance :

    for index,value in enumerate(downloadCount_clean):
        downloadCount_clean[index] = int(value.replace(",", ""))