I have a function that is supposed to count the number of words of each length up to and including the longest word in any given text. I'm stuck in my loop. PyCharm says:
sana = sanat[i].strip(",.")
IndexError: list index out of range
I have no clue why does variable I run too far (if that's what's happening here). This is in Python but this kind of problem doesn't really have anything to do with the language. I'd appreciate any help a lot.
The text is arbitrary for testing. Also, the prints are for testing.
teksti = "Har du någon tanken. Om inriktningsmöjligheten i matematik."
def sanamaarat(merkkijono):
sanat = merkkijono.split()
sanat.sort(key=len)
lista = []
lista.append(0)
apulista = []
apulista2 = []
for sana in sanat:
sana = sana.strip(",.")
pituus = len(sana)
apulista.append(pituus)
joukko = list(set(apulista))
for numero in joukko:
apulista2.append(apulista.count(numero))
print(sanat)
print(apulista2)
print(apulista)
print(int(apulista[-1])+1)
k = 1
i = 0
j = 0
while k < int(apulista[-1]) + 1:
sana = sanat[i].strip(",.")
pituus = len(sana)
if pituus == k:
j += 1
i += 1
else:
if j != 0:
lista.append(j)
lista.append(0)
k += 1
return lista
And the output is here:
(venv) C:\python>testailua.py
['i', 'du', 'Om', 'Har', 'någon', 'tanken.', 'matematik.', 'inriktningsmöjligheten']
[1, 2, 1, 1, 1, 1, 1]
[1, 2, 2, 3, 5, 6, 9, 22]
23
Traceback (most recent call last):
File "C:\python\testailua.py", line 54, in <module>
print(sanamaarat(teksti))
File "C:\python\testailua.py", line 28, in sanamaarat
sana = sanat[i].strip(",.")
IndexError: list index out of range
So I'm trying to add the needed zeros in the right indexes of the returned list. But there's some logical error in the while loop that I cannot see.
Expected result is [0,1,2,1,0,1,1,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,1]
First '0' (to index zero) because there are zero words that have length zero. First '1' (to index 1) because there is one word of length one. First '2' (to index 2) because there are two words of length two. And so on. So each index should have the number of words that have the length of that index.
@kederrac solved this with importing Counter from collections. That's a good answer but I'd like to know how to do it in the original way with loops because I still don't know what is wrong in my loop.
if you modify your while
loop to check the value of the index i
and the length of the variable saant
:
print('saant lenght: ', len(sanat))
while k < int(apulista[-1]) + 1:
print('i = ', i)
sana = sanat[i].strip(",.")
pituus = len(sana)
if pituus == k:
j += 1
i += 1
else:
if j != 0:
lista.append(j)
lista.append(0)
k += 1
the output:
saant lenght: 8
i = 0
i = 1
i = 1
i = 2
i = 3
i = 3
i = 4
i = 4
i = 4
i = 5
i = 5
i = 6
i = 6
i = 6
i = 6
i = 7
i = 7
i = 7
i = 7
i = 7
i = 7
i = 7
i = 7
i = 7
i = 7
i = 7
i = 7
i = 7
i = 7
i = 8
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
<ipython-input-61-873709d80e77> in <module>
41 return lista
42
---> 43 sanamaarat(teksti )
<ipython-input-61-873709d80e77> in sanamaarat(merkkijono)
28 while k < int(apulista[-1]) + 1:
29 print('i = ', i)
---> 30 sana = sanat[i].strip(",.")
31 pituus = len(sana)
32 if pituus == k:
IndexError: list index out of range
you will find that you are trying to access an index with the same value with the length of your list saanat
which it is not possible, so you got IndexError
your list saanat
has length 8 so you can access elements untill index 7, but you can see that before IndexError
your value for i
is 8, this cause your issue
to solve your problem you could use collections.Counter
to find the frequency of words by length:
from collections import Counter
teksti = "Har du någon tanken. Om inriktningsmöjligheten i matematik."
def sanamaarat(merkkijono):
count = Counter(map(len, merkkijono.split()))
max_lenght = max(count)
return [count.get(n, 0) for n in range(max_lenght + 1)]
print(sanamaarat(teksti))
output:
[0, 1, 2, 1, 0, 1, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1]