Search code examples
pythonlistbranching-strategy

Using list of indexes to append underscores to string tokens


The design of this is not meeting expectations:

# Explanation:
# Read split of splits until index of indexes reached. Apply underscore to split token with no space if split followed by another index
# Therefore line output should be: '7 Waitohu Road _York_Bay Co Manager _York_Bay Asst Co Dir _Central_Lower_Hutt General Hand _Wainuiomata School Caretaker' 

# A list of suburb words and there index position in line
uniqueList = ['York', 3, 'Bay', 4, 'York', 7, 'Bay', 8, 'Central', 12, 'Lower', 13, 'Hutt', 14, 'Wainuiomata', 17]

# Using indexes = uniqueList[1::2] to reduce uniqueList down to just indexes
indexes = [3, 4, 7, 8, 12, 13, 14, 17]

# The line example
line = '7 Waitohu Road York Bay Co Manager York Bay Asst Co Dir Central Lower Hutt General Hand Wainuiomata School Caretaker'

# Split the line into tokens for counting indexes
splits = line.split(' ')

# Read index 
for i in range(len(indexes)):
    check = indexes[i]
    for j in range(len(splits)):
        if j == check and (i + 1 < len(indexes)):
            # Determine if next index incremental
            next = indexes[i + 1]
            if 1 == next - check:
                splits[j] = '_' + splits[j] + '_' + splits[j + 1]            
        else:
            if j == check:
                splits[j] = '_' + splits[j]

# Results here                
newLine = ' '.join(splits)
print(newLine)

Output:

7 Waitohu Road _York_Bay Bay Co Manager _York_Bay Bay Asst Co Dir _Central_Lower _Lower_Hutt Hutt General Hand _Wainuiomata School Caretaker

How to:

  • Not output/remove doubled up word Bay and Hutt
  • Deal with an additional underscored word to get _Central_Lower_Hutt

Solution

  • There are three cases:

    • A word in the list where the previous word was also in the list
    • A word in the list where the previous word was NOT in the list
    • A word not in the list

    We just need to do the right thing for those three cases.

    # A list of suburb words and there index position in line
    
    indexes = [3, 4, 7, 8, 12, 13, 14, 17]
    
    # The line example
    line = '7 Waitohu Road York Bay Co Manager York Bay Asst Co Dir Central Lower Hutt General Hand Wainuiomata School Caretaker'
    
    # Split the line into tokens for counting indexes
    splits = line.split(' ')
    
    # Read index 
    outs = []
    for i,word in enumerate(splits):
        if i in indexes:
            if i-1 not in indexes:
                outs.append(' ')
            outs.append('_')
        elif outs:
            outs.append(' ')
        outs.append(word)
    
    # Results here                
    newLine = ''.join(outs)
    print(newLine)