Search code examples
pythonstartswith

Want to remove elements based on first character - Python


This is a program that lists all the substrings except the one that starts with vowel letters.

However, I don't understand why startswith() function doesn't work as I expected. It is not removing the substrings that start with the letter 'A'.

Here is my code:

ban = 'BANANA'

cur_pos=0
sub = []

#Finding the substrings
for i in range(len(ban)):
    limit=1
    for j in range(len(ban)):
        a = ban[cur_pos:limit]
        sub.append(a)
        limit+=1
    cur_pos+=1 
 

#removing the substrings that starts with vowels
for i in sub:
    if (i.startswith(('A','E','I','O','U'))):
        sub.remove(i)

print(sub)  

Solution

  • Why this doesn't work...

    To answer your question, the mantra for this issue is delete array elements in reverse order, which I occasionally forget and wonder whatever has gone wrong.

    Explanation

    The problem isn't with startswith() but using remove() inside this specific type of for loop, which uses an iterator rather than a range.

    for i in sub:
    

    This fails in this code for the following reason.

    ban = 'BANANA'
    
    cur_pos=0
    sub = []
    
    #Finding the substrings
    for i in range(len(ban)):
        limit=1
        for j in range(len(ban)):
            a = ban[cur_pos:limit]
            sub.append(a)
            limit+=1
        cur_pos+=1 
    
    print(sub)
    
    #removing the subtrings that start with vowels
    for i in sub:
        if (i.startswith(('A','E','I','O','U'))):
            sub.remove(i)
        print(sub)
    print(sub)
    

    I've added some print statements to assist debugging.

    Initially the array is:

    ['B', 'BA', 'BAN', 'BANA', 'BANAN', 'BANANA', '', 'A', 'AN', 'ANA', 'ANAN', 'ANANA', '', '', 'N', 'NA', 'NAN', 'NANA', '', '', '', 'A', 'AN', 'ANA', '', '', '', '', 'N', 'NA', '', '', '', '', '', 'A']
    

    ...then we eventually get to remove the first 'A', which seems to be removed fine...

    ['B', 'BA', 'BAN', 'BANA', 'BANAN', 'BANANA', '', 'AN', 'ANA', 'ANAN', ...etc...
    

    ...but there is some nastiness happening behind the scenes that shows up when we reach the next vowel...

    ['B', 'BA', 'BAN', 'BANA', 'BANAN', 'BANANA', '', 'AN', 'ANAN', 
    

    Notice that 'ANA' was removed, not the expected 'AN'!

    Why?

    Because the remove() modified the array and shifted all the elements along by one position, but the for loop index behind the scenes does not know about this. The index is still pointing to the next element which it expects is 'AN' but because we moved all the elements by one position it is actually pointing to the 'ANA' element.

    Fixing the problem

    One way is to append vowel matches to a new empty array:

    ban = 'BANANA'
    
    cur_pos=0
    sub = []
    add = []
    
    #Finding the subtrings
    for i in range(len(ban)):
        limit=1
        for j in range(len(ban)):
            a = ban[cur_pos:limit]
            sub.append(a)
            limit+=1
        cur_pos+=1 
    
    #adding the subtrings that don't start with vowels
    for i in sub:
        if (not i.startswith(('A','E','I','O','U'))):
            add.append(i)
    print(add)
    

    Another way

    There is, however a simple way to modifying the original array, as you wanted, and that's to iterate through the array in reverse order using an index-based for loop.

    The important part here is that you are not modifying any of the array elements that you are processing, only the parts that you are finished with, so that when you remove an element from the array, the array index won't point to the wrong element. This is common and acceptable practice, so long as you understand and make clear what you're doing.

    ban = 'BANANA'
    
    cur_pos=0
    sub = []
    
    #Finding the subtrings
    for i in range(len(ban)):
        limit=1
        for j in range(len(ban)):
            a = ban[cur_pos:limit]
            sub.append(a)
            limit+=1
        cur_pos+=1 
    
    #removing the badtrings that start with vowels, in reverse index order
    start = len(sub)-1  # last element index, less one (zero-based array indexing)
    stopAt = -1 # first element index, less one (zero-based array indexing)
    step = -1 # step backwards
    for index in range(start,stopAt,step):  # count backwards from last element to the first
        i = sub[index]
        if (i.startswith(('A','E','I','O','U'))):
            print('#'+str(index)+' = '+i)
            del sub[index]
    print(sub)
    

    For more details, see the official page on for https://docs.python.org/3/reference/compound_stmts.html#index-6

    Aside: This is my favourite array problem.

    Edit: I just got bitten by this in Javascript, while removing DOM nodes.