Search code examples
pythonlistwhile-loopcountindex-error

One of the indexes running too far in a Python while loop


I have a function that is supposed to count the number of words of each length up to and including the longest word in any given text. I'm stuck in my loop. PyCharm says:

sana = sanat[i].strip(",.")  
IndexError: list index out of range

I have no clue why does variable I run too far (if that's what's happening here). This is in Python but this kind of problem doesn't really have anything to do with the language. I'd appreciate any help a lot.

The text is arbitrary for testing. Also, the prints are for testing.

    teksti = "Har du någon tanken. Om inriktningsmöjligheten i matematik."

    def sanamaarat(merkkijono):
        sanat = merkkijono.split()
        sanat.sort(key=len)
        lista = []
        lista.append(0)
        apulista = []
        apulista2 = []

        for sana in sanat:
            sana = sana.strip(",.")
            pituus = len(sana)
            apulista.append(pituus)

        joukko = list(set(apulista))
        for numero in joukko:
            apulista2.append(apulista.count(numero))
        print(sanat)
        print(apulista2)
        print(apulista)
        print(int(apulista[-1])+1)

        k = 1
        i = 0
        j = 0
        while k < int(apulista[-1]) + 1:
            sana = sanat[i].strip(",.")
            pituus = len(sana)
            if pituus == k:
                j += 1
                i += 1
            else:
                if j != 0:
                    lista.append(j)
                lista.append(0)
                k += 1

        return lista

And the output is here:

    (venv) C:\python>testailua.py
    ['i', 'du', 'Om', 'Har', 'någon', 'tanken.', 'matematik.', 'inriktningsmöjligheten']
    [1, 2, 1, 1, 1, 1, 1]
    [1, 2, 2, 3, 5, 6, 9, 22]
    23
    Traceback (most recent call last):
      File "C:\python\testailua.py", line 54, in <module>
        print(sanamaarat(teksti))
      File "C:\python\testailua.py", line 28, in sanamaarat
        sana = sanat[i].strip(",.")
    IndexError: list index out of range

So I'm trying to add the needed zeros in the right indexes of the returned list. But there's some logical error in the while loop that I cannot see.


Expected result is [0,1,2,1,0,1,1,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,1]

First '0' (to index zero) because there are zero words that have length zero. First '1' (to index 1) because there is one word of length one. First '2' (to index 2) because there are two words of length two. And so on. So each index should have the number of words that have the length of that index.


@kederrac solved this with importing Counter from collections. That's a good answer but I'd like to know how to do it in the original way with loops because I still don't know what is wrong in my loop.


Solution

  • if you modify your while loop to check the value of the index i and the length of the variable saant:

    print('saant lenght: ', len(sanat))
    while k < int(apulista[-1]) + 1:
        print('i = ', i)
        sana = sanat[i].strip(",.")
        pituus = len(sana)
        if pituus == k:
            j += 1
            i += 1
        else:
            if j != 0:
                lista.append(j)
            lista.append(0)
            k += 1
    

    the output:

    saant lenght:  8
    i =  0
    i =  1
    i =  1
    i =  2
    i =  3
    i =  3
    i =  4
    i =  4
    i =  4
    i =  5
    i =  5
    i =  6
    i =  6
    i =  6
    i =  6
    i =  7
    i =  7
    i =  7
    i =  7
    i =  7
    i =  7
    i =  7
    i =  7
    i =  7
    i =  7
    i =  7
    i =  7
    i =  7
    i =  7
    i =  8
    
    ---------------------------------------------------------------------------
    IndexError                                Traceback (most recent call last)
    <ipython-input-61-873709d80e77> in <module>
         41     return lista
         42 
    ---> 43 sanamaarat(teksti )
    
    <ipython-input-61-873709d80e77> in sanamaarat(merkkijono)
         28     while k < int(apulista[-1]) + 1:
         29         print('i = ', i)
    ---> 30         sana = sanat[i].strip(",.")
         31         pituus = len(sana)
         32         if pituus == k:
    
    IndexError: list index out of range
    

    you will find that you are trying to access an index with the same value with the length of your list saanat which it is not possible, so you got IndexError

    your list saanat has length 8 so you can access elements untill index 7, but you can see that before IndexError your value for i is 8, this cause your issue


    to solve your problem you could use collections.Counter to find the frequency of words by length:

    from collections import Counter
    
    teksti = "Har du någon tanken. Om inriktningsmöjligheten i matematik."
    
    def sanamaarat(merkkijono):
        count = Counter(map(len, merkkijono.split()))
        max_lenght = max(count)
        return [count.get(n, 0) for n in range(max_lenght + 1)]
    
    print(sanamaarat(teksti))
    

    output:

    [0, 1, 2, 1, 0, 1, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1]