Search code examples
pythonunicodeconcatenationstring-concatenation

Concatenating unicode strings based on length


I have a unicode text which is broken into different segments. I want to concatenate them based on their lengths, but make sure no segment is repeated. Here is psuedo-cdoe:

i=0
text = 'whole text'
spaceString = ' '
avgLength = len(text)/len(total number of segments)
newtext=[]
previousLength = len(segments[i-1])
nextLength = len(segments[i+1])
for each segment [i]:
    if len(segments[i])<avgLength:
        compare previousLength and nextLength and get the lower result
        Concatenate segment[i] to either segment[i+1] or segment[i-1] (and attach to newtext) depending on which is lower
    else if segment[i]>=avgLength:
         Attach segment[i] to newtext and move to next segment


    elif . . . 

The aim is to join segments which are less than the average length. If any segment is smaller than the average length, compare previousLength and nextLength, and join segment i to the one which is less. (The first segment might be less or greater than the average). However, if any segment is more than the average length, just append that segment. newtext should be similar to text but should contain segments which are more or equal to the length of the average segment. Thanks


Solution

  • Based on what I understood from your question, I have wrriten the following code.

    If it is not what you are looking for, could you please clarify your requirements and I can edit the code appropriately. - Maybe try to write it in psuedo code?

    Code:

    temp_string = ''
    for i in range(len(segments)):
      if len(segments[i]) < avgLength:
        #if it is in the temp string do nothing to avoid repitition
        #else add to temp_string
        if segments[i] in temp_string:
          continue
        else:
          temp_string += spaceString + segments[i]
    
        #if temp_string is not >= avgLength, add it to newtext and reset temp_string
        if len(temp_string) >= avgLength:
          newtext.append(temp_string)
          temp_string = ''
      else:
        #when if len(segments[i]) >= avgLength:
        #if the segment is in the temp_string, append temp_string and reset it
        if segments[i] in temp_string:
          newtext.append(temp_string)
          temp_string = ''
        else:
          #if segment is not in the temp_string, add space and segment
          temp_string += spaceString + segments[i]
          #add to newtext and reset
          newtext.append(temp_string)
          temp_string = ''