Search code examples
pythonlistio

Import vs File Read speed in Python


I'm working with an large list that I need to import into a Python script, and for my purposes the simplest and fastest ways of doing that I can think of are either using Pythons import function or directly reading from a text file. I was wondering which was faster and why.

How I go about importing the list:

Via importing:

from list_file import the_list

Via file read:

with open("list_file.txt") as f:
    the_list = []
    for item in f:
        the_list.append(item)

I know that importing comes with the risk of executing code you don't want to execute, but I was curious as to which was faster and what was going on "under the hood" so to speak to make one faster than the other.


Solution

  • First, let's look directly at the timeit code I used:

    import timeit
    
    setup = """
    import pickle
    
    def test_import():
        from foo import data
        return data
    
    def test_read():
        with open("foo.txt", "r") as file_in:
            return list(file_in)
    
    def test_pickle():
        with open("foo.pkl", "rb") as file_in:
            return pickle.load(file_in)
    """
    
    print(timeit.timeit("data = test_import()", setup=setup, number=1))
    print(timeit.timeit("data = test_read()", setup=setup, number=1))
    print(timeit.timeit("data = test_pickle()", setup=setup, number=1))
    

    With an initial list of 10k names as strings with one iteration I get results like:

    0.00120
    0.00103
    0.00031
    

    Sometimes the import is faster than the read, sometimes slower but the pickle always wins hands down. Of course if you increase the number parameter, the import will win.

    Now, since you are not familiar with pickle you might think of it as working a lot like the json package (loads() and dumps()).

    If it helps, here is the code I used to create "foo.pkl" from "foo.py"

    import pickle
    
    from foo import data
    with open("foo.pkl", "wb") as file_out:
        pickle.dump(data, file_out)