I newbie just Started to learning Python from YouTube, I am trying to make a program to replace old binary Numbers to new binary Numbers, and facing problem while replacing numbers. want to replace index-wise My file (1x.txt) data is like this...
(01010110110111011110111101111011110101101101101011011011010101010101010101011101110101110111101)
It's a Random data but it is in form of 01, 011, 0111 and 01111. I want to replace "010" to "0", "0110" to "00", "01110" to "000" and "011110" into "0000" So with above given numbers my results should be (0101 011011 011101111 0111101111 0111101 011011 01101 011011 01101 0101 0101 0101 0101 01110111 010111 0111101) (01 0011 0001111 00001111 00001 0011 001 0011 001 01 01 01 01 000111 0111 00001) so Far I tried to make a program that can do the task but its taking toooooooo much time, for just 8MB file it's taken more then 2 hours so Anyone can suggest me a better way to do the same, My is mentioned below
def bytes_from_file(filename):
newstring = ''
old_list = ['010', '0110', '01110', '011110']
new_list = ['0', '00', '000', '0000']
with open(filename, "rb", buffering=200000) as f:
while True:
try:
chunk = f.read()
except:
print('Error while file opening')
if chunk:
chunk2 = chunk.decode('utf-8')
n = len(chunk2)
i = 0
while i < n:
flag = False
for j in range(6, 2, -1):
if chunk2[i:i + j] in old_list:
flag = True
index = old_list.index(chunk2[i:i + j])
newstring = newstring + new_list[index]
i = i + j
break
if flag == False:
newstring = newstring + chunk2[i]
i = i + 1
newstring=''.join((newstring))
else:
try:
f = open('2x.txt', "a")
f.write(newstring)
f.close()
except:
print('Error While writing into file')
break
bytes_from_file('1x.txt')
You are greatly overcomplicating this in general, but the most important problem is here:
newstring = newstring + chunk2[i]
i = i + 1
newstring=''.join((newstring))
newstring
is already a string, which you build by repeatedly concatenating substrings (like newstring + chunk2[i]
). This means that ''.join((newstring))
treats the string as an iterable, and joins it up by taking it apart into each letter and doing the join operation. And it does this every time that old_list
doesn't match, getting slower and slower as the string gets longer. The newstring=''.join((newstring))
step actually has no effect, but Python can't optimize it out. On the flip side, using technique like newstring + chunk2[i]
to build the string, defeats any purpose that ''.join
could have.
If your plan is to build a single string, you do still want to use ''.join
. But you want to use it once, and you want to use it on a list of the substrings:
# initially, set
newstring = []
# any time you find something else to append to the output:
newstring.append(whatever)
# one time, right before opening the output file:
newstring = ''.join(newstring)
That said, there are other approaches. Rather than building up a list, one useful technique is to use a generator to yield
each piece that needs to be written. Then you can either iterate to write those, or build the joined-up string before writing (like ''.join(my_generator_function())
). Or you can have both files open, and just .write
each output chunk as you determine it from the input.