Search code examples
pythontruncate

Truncate a long string in python - but only after specific character


I use textwrap to split a long string into chunks, limited to 280 characters each. I don't want the split to occur at random though; it should only occur after a specific character. In my case after the sign and a single line break \n.

This is my code:

query = 'Lorem ipsum dolor\n\n Lorem ipsum 0.5€\n Lorem ipsum 0.2€\n (...)'

for item in [query]:
    # obtain length of string
    item_length = len(item)

    # check length
    if item_length <= 280:
        # do something here

    elif item_length >= 280:
        item_length_limit = item_length / 280

        # determine the number of items
        item_chunk_length = item_length / math.ceil(item_length_limit)

        # chunk the item into individual pieces
        item_chunks = textwrap.wrap(item,  math.ceil(
            item_chunk_length), break_long_words=False, replace_whitespace=False)

        # iterate over the chunks
        for x, chunk in zip(range(len(item_chunks)), item_chunks):
            if x == 0:
                print(f'{chunk} 1/{len(item_chunks)}')
            else:
                print(f'{chunk} {x+1}/{len(item_chunks)}')

Current output (split at 60 characters for convenience):

Lorem ipsum dolor\n\n Lorem ipsum 0.5€\n Lorem ipsum 1/3
dolor 0.2€\n Lorem ipsum 0.4€\n Lorem ipsum 0.4€\n Lorem 2/3
Ipsum 0.4€ 3/3

Desired output:

Lorem ipsum dolor\n\n Lorem ipsum 0.5€\n 1/4
Lorem ipsum dolor 0.2€\n 2/4
Lorem ipsum 0.4€\n Lorem ipsum 0.4€\n 3/4
Lorem Ipsum 0.4€ 4/4

Solution

  • This won't be the best algorithm out there, but gets the job done.

    import re
    query = "<String> I used & as a seperator"
    query = f"{'a'*100}&{'b'*150}&{'c'*210}&{'d'*200}&{'e'*70}&"
    
    chunks = re.split('&',query)
    
    def joiner(chunks):
        i = 0
        s = ""
        newchunks = []
        while (i<len(chunks)):
            try:
                if len(chunks[i]) + len(chunks[i+1]) < 280:
                    newchunks.append(chunks[i]+chunks[i+1])
                    i += 1
                else:
                    newchunks.append(chunks[i])
                i+= 1
            except IndexError:
                newchunks.append(chunks[i])
                i += 1
        if chunks == newchunks:##if at maximum chunking
            return chunks
        else:
            return joiner(newchunks)
    

    to print the values out, just print the return value of this function