I have a unicode text which is broken into different segments. I want to concatenate them based on their lengths, but make sure no segment is repeated. Here is psuedo-cdoe:
i=0
text = 'whole text'
spaceString = ' '
avgLength = len(text)/len(total number of segments)
newtext=[]
previousLength = len(segments[i-1])
nextLength = len(segments[i+1])
for each segment [i]:
if len(segments[i])<avgLength:
compare previousLength and nextLength and get the lower result
Concatenate segment[i] to either segment[i+1] or segment[i-1] (and attach to newtext) depending on which is lower
else if segment[i]>=avgLength:
Attach segment[i] to newtext and move to next segment
elif . . .
The aim is to join segments which are less than the average length. If any segment is smaller than the average length, compare previousLength
and nextLength
, and join segment i
to the one which is less. (The first segment might be less or greater than the average). However, if any segment is more than the average length, just append that segment. newtext
should be similar to text
but should contain segments which are more or equal to the length of the average segment.
Thanks
Based on what I understood from your question, I have wrriten the following code.
If it is not what you are looking for, could you please clarify your requirements and I can edit the code appropriately. - Maybe try to write it in psuedo code?
Code:
temp_string = ''
for i in range(len(segments)):
if len(segments[i]) < avgLength:
#if it is in the temp string do nothing to avoid repitition
#else add to temp_string
if segments[i] in temp_string:
continue
else:
temp_string += spaceString + segments[i]
#if temp_string is not >= avgLength, add it to newtext and reset temp_string
if len(temp_string) >= avgLength:
newtext.append(temp_string)
temp_string = ''
else:
#when if len(segments[i]) >= avgLength:
#if the segment is in the temp_string, append temp_string and reset it
if segments[i] in temp_string:
newtext.append(temp_string)
temp_string = ''
else:
#if segment is not in the temp_string, add space and segment
temp_string += spaceString + segments[i]
#add to newtext and reset
newtext.append(temp_string)
temp_string = ''