Search code examples
pythonarraysperformancememorystability

Python: push item vs creating empty list (efficiency)


I have done two algorithms and I want to check which one of them is more 'efficient' and uses less memory. The first one creates a numpy array and modifies the array. The second one creates a python empty array and pushes values into this array. Who's better? First program:

 f = open('/Users/marcortiz/Documents/vLex/pylearn2/mlearning/classify/files/models/model_training.txt')
        lines = f.readlines()
        f.close()
        zeros = np.zeros((60343,4917))

        for l in lines:
            row = l.split(",")
            for element in row:
                zeros[lines.index(l), row.index(element)] = element

        X = zeros[1,:]
        Y = zeros[:,0]
        one_hot = np.ones((counter, 2))

The second one:

 f = open('/Users/marcortiz/Documents/vLex/pylearn2/mlearning/classify/files/models/model_training.txt')
        lines = f.readlines()
        f.close()
        X = []
        Y = []

        for l in lines:
            row = l.split(",")
            X.append([float(elem) for elem in row[1:]])
            Y.append(float(row[0]))

        X = np.array(X)
        Y = np.array(Y)
        one_hot = np.ones((counter, 2))

My theory is that the first one is slower but uses less memory and it's more 'stable' while working with large files. The second one it's faster but uses a lot of memory and its not so stable while working with large files (543MB, 70,000 lines)

Thanks!


Solution

  • Well finally I made some changes thanks to the answers. My two programs:

    f = open('/Users/marcortiz/Documents/vLex/pylearn2/mlearning/classify/files/models/model_training.txt')
        zeros = np.zeros((60343,4917))
        counter = 0
    
        start = timeit.default_timer()
        for l in f:
            row = l.split(",")
            counter2 = 0
            for element in row:
                zeros[counter, counter2] = element
                counter2 += 1
            counter = counter + 1
        stop = timeit.default_timer()  
        print stop - start 
        f.close()
    

    Time of the first program--> 122.243036032 seconds

    Second program:

    f = open('/Users/marcortiz/Documents/vLex/pylearn2/mlearning/classify/files/models/model_training.txt')
        zeros = np.zeros((60343,4917))
        counter = 0
    
        start = timeit.default_timer()
        for l in f:
            row = l.split(",")
            counter2 = 0
            zeros[counter, :] = [i for i in row]
            counter = counter + 1
        stop = timeit.default_timer()
        print stop - start
        f.close()
    

    Time of the second program: 102.208696127 seconds! Thanks.