I'd need to calculate the length of each string included in the list:
list_strings=["I'm selfish, impatient and a little insecure. I make mistakes, I am out of control and at times hard to handle. But if you can't handle me at my worst, then you sure as hell don't deserve me at my best","So many books, so little time.","In three words I can sum up everything I've learned about life: it goes on.","if you tell the truth, you don't have to remember anything.","Always forgive your enemies; nothing annoys them so much."]
to split each of them into three parts:
I'd be able to calculate the length of each string into the list, but I do not know how to split each string into three parts and saved them. E.g.:
the first sentence "I'm selfish, impatient and a little insecure. I make mistakes, I am out of control and at times hard to handle. But if you can't handle me at my worst, then you sure as hell don't deserve me at my best"
has length 201 (tokenisation) so I'd need to take
I read about the use of chunk but I've no idea on how I could apply. Also, I'd need a condition that can ensure me that I am taking integer (elements such words cannot be consider 1/2) words and I am not going beyond the length.
Splitting text according to percents on punctuation marks
def split_text(s):
""" Partitions text into three parts
in proportion 30%, 40%, 30%"""
i1 = int(0.3*len(s)) # first part from 0 to i1
i2 = int(0.7*len(s)) # 2nd for i1 to i2, 3rd i2 onward
# Use isalpha() to check when we are at a punctuation
# i.e. . or ; or , or ? " or ' etc.
# Find nearest alphanumeric boundary
# backup as long as we are in an alphanumeric
while s[i1].isalpha() and i1 > 0:
i1 -= 1
# Find nearest alphanumeric boundary (for 2nd part)
while s[i2].isalpha() and i2 > i1:
i2 -= 1
# Returns the three parts
return s[:i1], s[i1:i2], s[i2:]
for s in list_strings:
# Loop over list reporting lengths of parts
# Three parts are a, b, c
a, b, c = split_text(s)
print(f'{s}\nLengths: {len(a)}, {len(b)}, {len(c)}')
print()
Output
I'm selfish, impatient and a little insecure. I make mistakes, I am out of control and at times hard to handle. But if you can't handle me at my worst, then you sure as hell don't deserve me at my best
Lengths: 52, 86, 63
So many books, so little time.
Lengths: 7, 10, 13
In three words I can sum up everything I've learned about life: it goes on.
Lengths: 20, 31, 24
if you tell the truth, you don't have to remember anything.
Lengths: 15, 25, 19
Always forgive your enemies; nothing annoys them so much.
Lengths: 14, 22, 21
Output of split_text
Code
for s in list_strings:
a, b, c = split_text(s)
print(a)
print(b)
print(c)
print()
Result
I'm selfish, impatient and a little insecure. I make
mistakes, I am out of control and at times hard to handle. But if you can't handle me
at my worst, then you sure as hell don't deserve me at my best
So many
books, so
little time.
In three words I can
sum up everything I've learned
about life: it goes on.
if you tell the
truth, you don't have to
remember anything.
Always forgive
your enemies; nothing
annoys them so much.
To Capture the Partitions
result_a, result_b, result_c = [], [], []
for s in list_strings:
# Loop over list reporting lengths of parts
# Three parts are a, b, c
a, b, c = split_text(s)
result_a.append(a)
result_b.append(b)
result_c.append(c)