I recently started learning Python (my first programming language since using GW BASIC as a kid). I’ve noticed that when adding bytes to a bytes object, each byte takes more time to add than the last; and by contrast, when adding integers to a list object, each integer takes the same amount of time to add as the last. The following program illustrates.
import time
import struct
time.clock() # for Windows
def time_list():
print("adding 9,999,999 0s to one list 9 times:")
a = []
for i in range(9):
start_time = time.clock()
for j in range(9999999):
a += [0]
end_time = time.clock()
print("loop %d took %f seconds" %(i, end_time - start_time))
print()
def time_bytes_object():
print("adding 99,999 pad bytes to a bytes object 9 times:")
a = bytes()
for i in range(9):
start_time = time.clock()
for j in range(99999):
a += struct.pack('<B', 0)
end_time = time.clock()
print("loop %d took %f seconds" %(i, end_time - start_time))
print()
time_list()
time_bytes_object()
What is it about the bytes object (or the struct.pack function) that makes adding bytes take increasing amounts of time? Or is there a faster way to collect a bunch of bytes than the way used in my example?
Thanks for any help,
Victor
Byte strings (and Unicode strings) in Python are immutable, whereas lists are mutable.
What this means is that every append (+=
) done on a byte string must make a copy of that string; the original is not modified (though it will be garbage-collected later). In contrast, the append
method of list
(also used by +=
) will actually modify the list.
What you want is the bytearray
type, which is a mutable type functioning much like a list of bytes. Appending to a bytearray
takes (amortized) constant time, and it is easily converted to and from a byte string.