Python Pytagcloud osx ValueError invalid literal for int() with base 10: '3)

always getting this error

  ValueError: invalid literal for int() with base 10: '3),'

reading from text file looks like that:

[('cloud', 3), 
('words', 2), 
('code', 1), 
('word', 1), 
('appear', 1)]

as you see I tried to replace some stuff with word.replace()

from pytagcloud import create_tag_image, make_tags
from pytagcloud.lang.counter import get_tag_counts


counts = []
with open("terms.txt") as FIN:
   for line in FIN:
  
       # Assume lines look like: word, number
       word,n = line.strip().split()
       word = word.replace(',', '')
       word = word.replace("'", "")
       word = word.replace("(", "")
       word = word.replace("[", "")
       word = word.replace(")", "")
       word = word.replace(" ", "")
       n = n.replace("'", "")
       n = n.replace(" ", "")

       counts.append([word,int(n.strip())])

       tags = make_tags(counts, maxsize=120)
create_tag_image(tags, 'cloud_large.png', size=(1200, 800), fontname='Crimson Text')

Solution

This happens because you're not replacing all non numeric characters from n. Now, the simplest solution (minimum changes) starting from your existing code, is to replace this line:

counts.append([word,int(n.strip())])

by:

counts.append([word, int(n.strip(",)]"))])

Of course, the code can be improved/simplified, but more changes are needed. Here's an example (replace this chunk of code from the snippet you provided):

with open("terms.txt") as FIN:
    for line in FIN:

        # Assume lines look like: word, number
        word,n = line.strip().split()
        word = word.replace(',', '')
        word = word.replace("'", "")
        word = word.replace("(", "")
        word = word.replace("[", "")
        word = word.replace(")", "")
        word = word.replace(" ", "")
        n = n.replace("'", "")
        n = n.replace(" ", "")

        counts.append([word,int(n.strip())])

by:

with open("terms.txt") as FIN:
    for line in FIN:
        word, n = line.strip("[](), \r\n").split()
        counts.append([word.strip("',"), int(n.strip())])

There's a 3rd form but that uses eval (which is highly discouraged); this is how you could get your counts contents (note that here, it will be a list of tuples not a list of lists):

counts = []
with open("terms.txt") as FIN:
    counts = eval(FIN.read())