I'm running Python 2.7 on a Linux machine, and by far the slowest part of my script is loading a large json file from disk (a SSD) using the ujson library. When I check top
during this loading process, my cpu usage is basically at 100%, leading me to believe that I'm being bottlenecked by parsing the json rather than by transferring the bytes from disk into memory. Is this a valid assumption to be making, or will ujson burn empty loops or something while waiting for the disk? I'm interested in knowing because I'm not sure whether dedicating another core of my cpu for another script that does a lot of disk i/o will significantly slow down the first script.
Without seeing your code, I'm going to assume you are doing something like this:
with open('data.json') as datafile:
data = json.loads(datafile.read())
Instead, you could split the steps of reading the file and parsing it:
with open('data.json') as datafile:
raw_data = datafile.read()
data = json.loads(raw_data)
If you add some timing calls, you can determine how long each step is taking:
# Timing decorator from https://www.andreas-jung.com/contents/a-python-decorator-for-measuring-the-execution-time-of-methods
import time
def timeit(method):
def timed(*args, **kw):
ts = time.time()
result = method(*args, **kw)
te = time.time()
print '%r (%r, %r) %2.2f sec' % \
(method.__name__, args, kw, te-ts)
return result
return timed
with open('data.json') as datafile:
@timeit
raw_data = datafile.read()
@timeit
data = json.loads(raw_data)