A text file contains:
// H HGD
// i eoa
// loy
// ld
// o
// _
// W
// o
// r
// l
// d
How can I put it on a list and add at which column it was written such that it outputs:
[[<1st_word>,<location of the 1st word>],[<2nd_word>,<location of the 2nd word>],...]
So far I have written a code (shown below), but I'am stuck:
file = open(text_file, "r")
wrd = ""
for line in file: #Each line from file
for c in ln: #Each character in line
if(c != ' ' or c != '\n' or c != '/'): #Ignore slashes (/), white spaces, and newlines
#list = put word on a list and count at which column was found
#print(list)
How can I put it on a list and add at which column it was written, such that it outputs:
[[Hi,8],[Hello_World,10],[Good,11],[Day,12]]
I have done this way:
results = {}
with open(text_file, "r") as fp:
for line in fp:
for idx, c in enumerate(line):
if c not in [' ', '\n', '/', '\t']:
if idx not in results.keys():
results[idx] = []
results[idx].append(c)
print(results)
#Out[6]:
#{7: ['H', 'i'],
# 9: ['H', 'e', 'l', 'l', 'o', '_', 'W', 'o', 'r', 'l', 'd'],
# 10: ['G', 'o', 'o', 'd'],
# 11: ['D', 'a', 'y']}
final_results = [[''.join(v), k+1] for k,v in results.items()]
print(final_results)
# Out[10]: [['Hi', 8], ['Hello_World', 10], ['Good', 11], ['Day', 12]]
If the text has tabs as well, you can use Textwrap to translate them into whitespace:
import textwrap
wrapper = textwrap.TextWrapper()
results = {}
with open(text_file, "r") as fp:
for line in fp:
for idx, c in enumerate(wrapper.wrap(line)[0]):
if c not in [' ', '\n', '/']:
if idx not in results.keys():
results[idx] = []
results[idx].append(c)
#{8: ['H', 'i'],
# 10: ['H', 'e', 'l', 'l', 'o', '_', 'W', 'o', 'r', 'l', 'd'],
# 11: ['G', 'o', 'o', 'd'],
# 12: ['D', 'a', 'y']}
final_results = [[''.join(v), k] for k,v in results.items()]
print(final_results)
#[['Hi', 8], ['Hello_World', 10], ['Good', 11], ['Day', 12]]
You can do it without list comprehension as well:
final_results = []
for idx, char_vals in results.items():
word = ''.join(char_vals)
final_results.append([word, idx])
print(final_results)
#[['Hi', 8], ['Hello_World', 10], ['Good', 11], ['Day', 12]]