Search code examples
pythontextparagraph

how can i count paragraphs of text file using python?


I'm trying to write a book cipher decoder, and the following is what i got so far.

code = open("code.txt", "r").read() 
my_book = open("book.txt", "r").read() 
book = my_book.txt 
code_line = 0 
while code_line < 6 :
      sl = code.split('\n')[code_line]+'\n'
      paragraph_num = sl.split(' ')[0]
      line_num =  sl.split(' ')[1]
      word_num = sl.split(' ')[2]
      x = x+1

the loop changes the paragraph , line , word variables and every thing is working just fine .

but what i need now is how to specify the paragraph then the line then the word ,a for loop in the while loop would work perfectly.

so i want to get from paragraph number "paragraph_num" and line number "line_num" the word number "word_num"

that's my code file ,which I'm trying to convert into words

"paragraph number","line number","word number"

70 1 3
50 2 2
21 2 9
28 1 6
71 2 2
27 1 4

and then i want my output to look something like this

word 
word  
word 
word 
word 
word

my book "that file that i need to get the words from" looks something like this

word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word

word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word

word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word


Solution

  • Theory

    If you want to get paragraphs out of your text, you could split by "\n\n" :

    >>> "word\n\nword\nword\n\nword".split("\n\n")
    ['word', 'word\nword', 'word']
    

    You now have a list of paragraphs. For each paragraph, you can split by "\n" and get a list of lines.

    For each line, you can split without argument and get a list of words.

    Nested loops

    text = """word word word word word word word word word
    word word word word word word word
    word word word word word word word word word word word word word word word word word word word word word
    word word word word word word word word word word word word word word word word word word
    
    word word word word boat word word word word word
    word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word
    
    word word word word word word word word word word word
    word word word word word word word word word word word word word word
    word word word word word word word word word word word word word word word word word word word word word word"""
    
    for i, paragraph in enumerate(text.split("\n\n")):
        for j, line in enumerate(paragraph.split("\n")):
            for k, word in enumerate(line.split()):
                print("%d, %d, %d : %s" % (i,j,k,word))
    

    It outputs :

    0, 0, 0 : word
    0, 0, 1 : word
    0, 0, 2 : word
    0, 0, 3 : word
    0, 0, 4 : word
    0, 0, 5 : word
    0, 0, 6 : word
    0, 0, 7 : word
    0, 0, 8 : word
    0, 1, 0 : word
    0, 1, 1 : word
    0, 1, 2 : word
    0, 1, 3 : word
    0, 1, 4 : word
    0, 1, 5 : word
    0, 1, 6 : word
    0, 2, 0 : word
    0, 2, 1 : word
    0, 2, 2 : word
    0, 2, 3 : word
    0, 2, 4 : word
    0, 2, 5 : word
    0, 2, 6 : word
    0, 2, 7 : word
    0, 2, 8 : word
    0, 2, 9 : word
    0, 2, 10 : word
    0, 2, 11 : word
    0, 2, 12 : word
    0, 2, 13 : word
    0, 2, 14 : word
    0, 2, 15 : word
    0, 2, 16 : word
    0, 2, 17 : word
    0, 2, 18 : word
    0, 2, 19 : word
    0, 2, 20 : word
    0, 3, 0 : word
    0, 3, 1 : word
    0, 3, 2 : word
    0, 3, 3 : word
    0, 3, 4 : word
    0, 3, 5 : word
    0, 3, 6 : word
    0, 3, 7 : word
    0, 3, 8 : word
    0, 3, 9 : word
    0, 3, 10 : word
    0, 3, 11 : word
    0, 3, 12 : word
    0, 3, 13 : word
    0, 3, 14 : word
    0, 3, 15 : word
    0, 3, 16 : word
    0, 3, 17 : word
    1, 0, 0 : word
    1, 0, 1 : word
    1, 0, 2 : word
    1, 0, 3 : word
    1, 0, 4 : boat
    1, 0, 5 : word
    1, 0, 6 : word
    

    The loops are useful to see what the required indices are.

    Nested list comprehensions

    If you want fast lookup, you can use a nested list comprehension to create a "3D-list" :

    table = [[[word for word in line.split()] for line in paragraph.split("\n")] for paragraph in text.split("\n\n")]
    

    It outputs :

    [[['word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word'], ['word', 'word', 'word', 'word', 'word', 'word', 'word'], ['word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word'], ['word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word']], [['word', 'word', 'word', 'word', 'boat', 'word', 'word', 'word', 'word', 'word'], ['word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word']], [['word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word'], ['word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word'], ['word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word']]]
    

    You can get to the desired word this way :

    table[1][0][4]
    # "boat"
    

    If you have a list of tuples :

    codes = [
            (1, 0, 4),
            (2, 1, 3)
            ]
    
    for i,j,k in codes:
        print(table[i][j][k])