Search code examples
pythonstringlisttext-files

Reading a text file column wise and count which column it is written in Python


A text file contains:

//     H HGD
//     i eoa
//       loy
//       ld
//       o
//       _
//       W
//       o
//       r
//       l
//       d

How can I put it on a list and add at which column it was written such that it outputs: [[<1st_word>,<location of the 1st word>],[<2nd_word>,<location of the 2nd word>],...]

So far I have written a code (shown below), but I'am stuck:

file = open(text_file, "r")
wrd = ""
for line in file: #Each line from file
    for c in ln: #Each character in line
        if(c != ' ' or c != '\n' or c != '/'): #Ignore slashes (/), white spaces, and newlines
            #list = put word on a list and count at which column was found
            #print(list)

How can I put it on a list and add at which column it was written, such that it outputs: [[Hi,8],[Hello_World,10],[Good,11],[Day,12]]


Solution

  • I have done this way:

    1. creating a dictionary of which characters are observed in what index
    2. reformatting into the list format specified in the question
    results = {}
    with open(text_file, "r") as fp:    
        for line in fp:
            for idx, c in enumerate(line):
                if c not in [' ', '\n', '/', '\t']:
                    if idx not in results.keys():
                        results[idx] = []
                    results[idx].append(c)
    
    print(results)
    #Out[6]: 
    #{7: ['H', 'i'],
    # 9: ['H', 'e', 'l', 'l', 'o', '_', 'W', 'o', 'r', 'l', 'd'],
    # 10: ['G', 'o', 'o', 'd'],
    # 11: ['D', 'a', 'y']}
    
    final_results = [[''.join(v), k+1] for k,v in results.items()]
    print(final_results)
    # Out[10]: [['Hi', 8], ['Hello_World', 10], ['Good', 11], ['Day', 12]]
    

    If the text has tabs as well, you can use Textwrap to translate them into whitespace:

    import textwrap
    wrapper = textwrap.TextWrapper()
    
    results = {}
    with open(text_file, "r") as fp:    
        for line in fp:
            for idx, c in enumerate(wrapper.wrap(line)[0]):
                if c not in [' ', '\n', '/']:
                    if idx not in results.keys():
                        results[idx] = []
                    results[idx].append(c)
    
    #{8: ['H', 'i'],
    # 10: ['H', 'e', 'l', 'l', 'o', '_', 'W', 'o', 'r', 'l', 'd'],
    # 11: ['G', 'o', 'o', 'd'],
    # 12: ['D', 'a', 'y']}
    
    final_results = [[''.join(v), k] for k,v in results.items()]
    print(final_results)
    #[['Hi', 8], ['Hello_World', 10], ['Good', 11], ['Day', 12]]
    
    

    You can do it without list comprehension as well:

    final_results = []
    for idx, char_vals in results.items():
        word = ''.join(char_vals)
        final_results.append([word, idx])
    print(final_results)
    #[['Hi', 8], ['Hello_World', 10], ['Good', 11], ['Day', 12]]