Search code examples
pythonsplitindentation

How to print lines in splitlines by indentation?


I have string:

text = '''TextTextTextTextTextTextTextTextText1
        TextTextTextTextTextTextTextTextText1
    TextTextTextTextTextTextTextTextText2
        TextTextTextTextTextTextTextTextText2
        TextTextTextTextTextTextTextTextText2
        TextTextTextTextTextTextTextTextText2
        TextTextTextTextTextTextTextTextText2
    TextTextTextTextTextTextTextTextText3
        TextTextTextTextTextTextTextTextText3
        TextTextTextTextTextTextTextTextText3
    TextTextTextTextTextTextTextTextText4
        TextTextTextTextTextTextTextTextText4
        TextTextTextTextTextTextTextTextText4'''

I want to split this string by indentations and add them to a list. Here is my current code:

nr_lines = 0
indent_dict = {}
for line in summary1.splitlines(True):
    print(line)
    print("------------------------------")
    nr_lines+=1
    whitespaces_count = len(line) - len(line.lstrip())
    indent_dict[nr_lines] = whitespaces_count
print(indent_dict)

list_of_values = []

# Removed first key with value (indent) = 0
indent_dict_without = dict(indent_dict)
key = 1
del indent_dict_without[key]

# Adding values from dict to list
for key, value in indent_dict_without.items():
    list_of_values.append(value)
print(list_of_values)

# Finding minimum value
x = min(list_of_values)

list_of_small = []

for nr in list_of_values:
    if nr == x:
        list_of_small.append(nr)

print(list_of_small)

# Finding which line have all smallest indent
n = 0
key_1 = []
for key, value in indent_dict.items():
    if value == list_of_small[n]:
        key_1.append(key)
print(key_1)

Output is:

{1: 0, 2: 12, 3: 8, 4: 12, 5: 12, 6: 12, 7: 12, 8: 8, 9: 12, 10: 12, 11: 8, 12: 12, 13: 12} # dict with line and value (indent)
[12, 8, 12, 12, 12, 12, 8, 12, 12, 8, 12, 12] # list with indents
[8, 8, 8] # the smallest indents
[3, 8, 11] # lines for smallest indents

Now, I don't know how to split and add those 4 parts as elements of list:

list = ['TextTextTextTextTextTextTextTextText1
            TextTextTextTextTextTextTextTextText1',
        'TextTextTextTextTextTextTextTextText2
            TextTextTextTextTextTextTextTextText2
            TextTextTextTextTextTextTextTextText2
            TextTextTextTextTextTextTextTextText2
            TextTextTextTextTextTextTextTextText2',
        'TextTextTextTextTextTextTextTextText3
            TextTextTextTextTextTextTextTextText3
            TextTextTextTextTextTextTextTextText3',
        'TextTextTextTextTextTextTextTextText4
            TextTextTextTextTextTextTextTextText4
            TextTextTextTextTextTextTextTextText4']

Should I create a new variable and add lines one by one until a new indent?


Solution

  • If I understand you correctly, you want to split the text in parargraphs based on the lines with the smallest indentation.

    The way I would approach tis is as follows. I would create a defaultdict with as key the nummer of spaces that make up the indentation and as value a list with all the indexes of the lines that have this indentation count:

    from collections import defaultdict
    
    text = '''TextTextTextTextTextTextTextTextText1
            TextTextTextTextTextTextTextTextText1
        TextTextTextTextTextTextTextTextText2
            TextTextTextTextTextTextTextTextText2
            TextTextTextTextTextTextTextTextText2
            TextTextTextTextTextTextTextTextText2
            TextTextTextTextTextTextTextTextText2
        TextTextTextTextTextTextTextTextText3
            TextTextTextTextTextTextTextTextText3
            TextTextTextTextTextTextTextTextText3
        TextTextTextTextTextTextTextTextText4
            TextTextTextTextTextTextTextTextText4
            TextTextTextTextTextTextTextTextText4'''
    
    def count_indentation(line):
        return len(line) - len(line.lstrip())
    
    lines = text.splitlines(keepends=False)
    indent_dict = defaultdict(list)
    for idx, line in enumerate(lines):
        if count_indentation(line) > 0:
            indent_dict[count_indentation(line)].append(idx)
    

    Now indent_dict looks like:

    defaultdict(list, {8: [1, 3, 4, 5, 6, 8, 9, 11, 12], 4: [2, 7, 10]})
    

    Next, we take the smallest key to find the indexes of the relevant lines:

    smallest_indent = min(indent_dict)
    line_idexes_smallest_indents = indent_dict[smallest_indent]
    

    The result of line_idexes_smallest_indents is [2, 7, 10]. Indexing is zero-based so that is why my indexes are all one less then your result. Now we need to partition our original text according to these indexes.

    def partition(lines, indices):
        return [''.join(lines[i:j]) for i, j in zip([0]+indices, indices+[None])]
    
    partition(lines, line_idexes_smallest_indents)
    

    Result:

    ['TextTextTextTextTextTextTextTextText1        TextTextTextTextTextTextTextTextText1',
     '    TextTextTextTextTextTextTextTextText2        TextTextTextTextTextTextTextTextText2        TextTextTextTextTextTextTextTextText2        TextTextTextTextTextTextTextTextText2        TextTextTextTextTextTextTextTextText2',
     '    TextTextTextTextTextTextTextTextText3        TextTextTextTextTextTextTextTextText3        TextTextTextTextTextTextTextTextText3',
     '    TextTextTextTextTextTextTextTextText4        TextTextTextTextTextTextTextTextText4        TextTextTextTextTextTextTextTextText4']