I have write down a code to enumerate char "a" from a text file (simple text document copied from pdf):
input_f = open('/home/zebrafish/Desktop/stackq/doc.txt','r')
#text i used in "doc.txt file"
#
#unctional similarities between the ATP binding pockets of
#kinases or between chemotypes of inhibitors that cannot
#be predicted from the sequence of the kinase or the
#chemical structure of the inhibitor.
#We first compared PI3-K family members according to
output_f = open('/home/zebrafish/Desktop/stackq/svm_in.txt','w')
for line in input_f :
a = line
print "\n",
for y in enumerate([x[0] for x in enumerate(line) if x[1]=='a']):
a = ("%d:%d" % (y[0]+1,y[1]+1))
#print a,
output_f.write(a+" ")
input_f.close()
output_f.close()
output of this code look like this if i run this script without generating the output file as per my requirement, for each line it calculate the position of "a" with frequency, as in first line "a" appeared in two times first at 8th position and second at 16th position and hence enumerated as "1:8 2:16" and so one for each and every line:
1:8 2:16
1:4 2:47 3:51
1:42
1:7
1:14 2:26 3:40
but when i write down output in a text file "svm_in.txt" with "output_f.write()" output is very wired . some thing like this:
1:8 2:16 1:4 2:47 3:51 1:42 1:7 1:14 2:26 3:40
how can i get result in a output file for each line with "+" sine at the beginning of line like this:
+ 1:8 2:16
+ 1:4 2:47 3:51
+ 1:42
+ 1:7
+ 1:14 2:26 3:40
I would do it like this:
for line in input_f:
# find the positions of As in the line
positions = [n for n, letter in enumerate(line, 1) if letter == 'a']
# Create list of strings of the form "x:y"
pairs = [("%d:%d" % (i, n)) for i, n in enumerate(positions, 1)]
# Join all those strings into a single space-separated string
all_pairs = ' '.join(pairs)
# Write the string to the file, with a + sign at the beginning
# and a newline at the end
output_f.write("+ %s\n" % all_pairs)
You can modify the string in the last line to control how the line will be written in the output file.