Search code examples
pythonperformancejupyter-notebookcurve-fitting

Jupyter Notebook runs my code abnormally slowly


Edit: this is my second ever post on this website. I'm trying my best to follow guidelines and etiquette, but I've done some mistakes. I've almost entirely re-written the original post, to make it clearer and easier to understand.

Disclaimer: I'm only familiar with Matlab as an amateur programmer and Physics student, so please be patient. I should be learning Python in a lab course I'm attending during these months, but due to apparent misunderstandings we have been told to learn Python basics on our own. Moreover, English is not my first language. Please, be patient, and thank you in advance.

I created a file in a new Jupyter notebook. Some lines of code run abnormally slowly; in some cases, they take minutes to run. No errors or warnings pop up when the cells execute. I've tried to execute the same script on Colaboratory, and the problem stays pretty much the same.

The following is the first "preliminary" cell. It raises no errors and has never caused any similar problem when I've used it in other scripts.

%matplotlib inline
%matplotlib widget

import numpy as np
import matplotlib.pyplot as plt
import scipy.optimize as opt
from scipy.optimize import curve_fit

This line of code (when run alone in a separated cell) takes 10-20 seconds to execute:

f = c*[1/472e-9, 1/505e-9, 1/525e-9, 1/588e-9, 1/611e-9]

where c = 299792458 is the speed of light in m/s.

The following takes anywhere from 30 seconds up to several minutes to execute. If I run each line in a separate cell, they execute instantly without errors. When put together, they suddenly become abnormally slow. I should also point out that the CSVs contain only 20 numbers each, correctly formatted to give a 20-row column vector when imported. Quite literally a text file with 20 three-digits numbers, each one on a different consecutive row, without spaces, empty lines, commas or anything else.

intens = np.arange(5,101,5)

A = np.loadtxt('A.csv',delimiter=',',unpack=True)
A = np.flip(A)*1e-3

B = np.loadtxt('B.csv',delimiter=',',unpack=True)
B = np.flip(B)*1e-3

C = np.loadtxt('C.csv',delimiter=',',unpack=True)
C = np.flip(C)*1e-3

D = np.loadtxt('D.csv',delimiter=',',unpack=True)
D = np.flip(D)*1e-3

E = np.loadtxt('E.csv',delimiter=',',unpack=True)
E = np.flip(E)*1e-3

I'm running Jupyter through Anaconda on a MacBook Air M1, which I use only for PDF reading and editing, Matlab coding, LaTeX writing and Python.

I've checked memory usage: there's more than 400 GB of free space on my Mac. RAM usage increases dramatically when I run the slowest parts of the code. System monitoring app doesn't report anything anomalous, I think. There's nothing running in background.

This question has been suggested in the comments; while that issue seems related to mine, there's still no satisfying answer. Someone suggested it can be a problem related to the libraries I import, which run and may execute on C. However, I've already used them in this exact fashion and I can't see what's different here.


Solution

  • You have at least one problem. Multiplying a list by an integer repeat elements:

    5 * [1, 2, 3]
    # [1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3]
    

    Therefore the following call:

    299792458 * [1/472e-9, 1/505e-9, 1/525e-9, 1/588e-9, 1/611e-9]
    

    Creates a huge list repeating the five elements and taking about 5.5 Go in RAM. Which explains why it takes 30 seconds everywhere: it's time to room space in RAM to allocate the list.

    If you are looking for element-wise multiplication use ndarray instead

    5 * np.array([1, 2, 3])
    # array([ 5, 10, 15])