Search code examples
pythonpandastensorflow

Importing pandas before tensorflow makes the script freeze


So I have changed from my Windows machine to a MacBook Pro with Apple M3 Pro (36 GB) running with macOS Sonoma (version 14.5) due to a work requirement. I realized something very strange. In a small sample script I managed to extract the root cause of this issue.

When I import pandas before tensorflow / keras the script freezes. It works the other way around.

The script:

import numpy as np
import os
import pandas as pd
from tensorflow.keras import layers, models

print("Creating simple model...")
try:
    model = models.Sequential([
        layers.Input(shape=(10,)),
        layers.Dense(64, activation='relu'),
        layers.Dense(1, activation='linear')
    ])
    print("Model created successfully.")
except Exception as e:
    print(f"Error creating model: {e}")

x_train = np.random.rand(100, 10)
y_train = np.random.rand(100, 1)

# Compile the model
model.compile(optimizer='adam', loss='mean_squared_error')

# Train the model
try:
    model.fit(x_train, y_train, epochs=5, batch_size=32)
    print("Model training completed successfully.")
except Exception as e:
    print(f"Error during training: {e}")

This, when run, gives me the following output:

Creating simple model...
2024-05-31 18:04:07.639131: I metal_plugin/src/device/metal_device.cc:1154] Metal device set to: Apple M3 Pro
2024-05-31 18:04:07.639149: I metal_plugin/src/device/metal_device.cc:296] systemMemory: 36.00 GB
2024-05-31 18:04:07.639154: I metal_plugin/src/device/metal_device.cc:313] maxCacheSize: 13.50 GB
2024-05-31 18:04:07.639170: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:305] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.
2024-05-31 18:04:07.639186: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:271] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 0 MB memory) -> physical PluggableDevice (device: 0, name: METAL, pci bus id: <undefined>)

The script freezes at this point and has to be terminated. When I swap the order of import

from tensorflow.keras import layers, models
import pandas as pd

I get the following:

Creating simple model...
2024-05-31 18:07:18.879661: I metal_plugin/src/device/metal_device.cc:1154] Metal device set to: Apple M3 Pro
2024-05-31 18:07:18.879680: I metal_plugin/src/device/metal_device.cc:296] systemMemory: 36.00 GB
2024-05-31 18:07:18.879685: I metal_plugin/src/device/metal_device.cc:313] maxCacheSize: 13.50 GB
2024-05-31 18:07:18.879705: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:305] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.
2024-05-31 18:07:18.879717: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:271] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 0 MB memory) -> physical PluggableDevice (device: 0, name: METAL, pci bus id: <undefined>)
Model created successfully.
Epoch 1/5
2024-05-31 18:07:19.269585: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:117] Plugin optimizer for device_type GPU is enabled.
4/4 ━━━━━━━━━━━━━━━━━━━━ 1s 16ms/step - loss: 0.1177 
Epoch 2/5
4/4 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - loss: 0.1078 
Epoch 3/5
4/4 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - loss: 0.0932 
Epoch 4/5
4/4 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - loss: 0.1008 
Epoch 5/5
4/4 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - loss: 0.0865 
Model training completed successfully.

Note that I dont even use pandas in the script. For reference I imported os and didnt use it anywhere in the script either but it doesnt affect it.

Here is my env package pip list:

Package                      Version
---------------------------- -----------
absl-py                      2.1.0
astunparse                   1.6.3
Bottleneck                   1.3.7
cachetools                   5.3.3
certifi                      2024.2.2
charset-normalizer           3.3.2
db-dtypes                    1.2.0
flatbuffers                  24.3.25
gast                         0.5.4
google-api-core              2.19.0
google-auth                  2.29.0
google-cloud-bigquery        3.23.1
google-cloud-core            2.4.1
google-crc32c                1.5.0
google-pasta                 0.2.0
google-resumable-media       2.7.0
googleapis-common-protos     1.63.0
grpcio                       1.64.0
grpcio-status                1.62.2
h5py                         3.11.0
idna                         3.7
importlib_metadata           7.1.0
joblib                       1.4.2
keras                        3.3.3
libclang                     18.1.1
Markdown                     3.6
markdown-it-py               3.0.0
MarkupSafe                   2.1.5
mdurl                        0.1.2
ml-dtypes                    0.3.2
namex                        0.0.8
numexpr                      2.8.7
numpy                        1.26.4
opt-einsum                   3.3.0
optree                       0.11.0
packaging                    24.0
pandas                       2.2.1
pip                          24.0
proto-plus                   1.23.0
protobuf                     4.25.3
pyarrow                      16.1.0
pyasn1                       0.6.0
pyasn1_modules               0.4.0
Pygments                     2.18.0
python-dateutil              2.9.0.post0
pytz                         2024.1
requests                     2.32.3
rich                         13.7.1
rsa                          4.9
scikit-learn                 1.4.2
scipy                        1.11.4
setuptools                   69.5.1
six                          1.16.0
tensorboard                  2.16.2
tensorboard-data-server      0.7.2
tensorflow                   2.16.1
tensorflow-io-gcs-filesystem 0.37.0
tensorflow-macos             2.16.1
tensorflow-metal             1.1.0
termcolor                    2.4.0
threadpoolctl                3.5.0
tqdm                         4.66.4
typing_extensions            4.12.0
tzdata                       2024.1
urllib3                      2.2.1
Werkzeug                     3.0.3
wheel                        0.43.0
wrapt                        1.16.0
zipp                         3.19.0

Suggestion from comments (@Ze'ev Ben-Tsvi)

import numpy as np
import os
import pandas as pd
from tensorflow.keras import layers, models

print("Creating simple model...")

try:
    print("Initializing Sequential model...")
    model = models.Sequential()
    print("Adding input layer...")
    model.add(layers.Input(shape=(10,)))
    print("Adding first Dense layer...")
    model.add(layers.Dense(64, activation='relu'))
    print("Adding output Dense layer...")
    model.add(layers.Dense(1, activation='linear'))
    print("Model created successfully.")
except Exception as e:
    print(f"Error creating model: {e}")

x_train = np.random.rand(100, 10)
y_train = np.random.rand(100, 1)

# Compile the model
try:
    print("Compiling model...")
    model.compile(optimizer='adam', loss='mean_squared_error')
    print("Model compiled successfully.")
except Exception as e:
    print(f"Error during compilation: {e}")

# Train the model
try:
    print("Training model...")
    model.fit(x_train, y_train, epochs=5, batch_size=32)
    print("Model training completed successfully.")
except Exception as e:
    print(f"Error during training: {e}")

The output of this script is:

Initializing Sequential model...
Adding input layer...
Adding first Dense layer...
Adding output Dense layer...
Model created successfully.
Compiling model...
Model compiled successfully.
Training model...
Epoch 1/5

It seems to get a little bit further in the execution when written like this. Now it doesnt get stuck at models.Sequential anymore but at model.fit.

Swapping the order of import again (tensorflow then pandas) I get:

Creating simple model...
Initializing Sequential model...
Adding input layer...
Adding first Dense layer...
Adding output Dense layer...
Model created successfully.
Compiling model...
Model compiled successfully.
Training model...
Epoch 1/5
4/4 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - loss: 0.4620  
Epoch 2/5
4/4 ━━━━━━━━━━━━━━━━━━━━ 0s 636us/step - loss: 0.3263
Epoch 3/5
4/4 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - loss: 0.2322 
Epoch 4/5
4/4 ━━━━━━━━━━━━━━━━━━━━ 0s 629us/step - loss: 0.1395
Epoch 5/5
4/4 ━━━━━━━━━━━━━━━━━━━━ 0s 690us/step - loss: 0.1251
Model training completed successfully.

The main issue here is that at no point do I get an exception, not even when wrapping all imports individually in try/catch blocks. Something seems to either swallow the errors or none are thrown.


Solution

  • I have spent hours debugging this and found a somewhat satisfying solution. It works without requiring the import order swap quick fix. I decided to post the answer in case someone else runs into this issue.

    Downgrading TensorFlow to version 2.15.0 resolved the issue, allowing the script to run regardless of the import order of pandas and tensorflow.

    pip install tensorflow==2.15.0
    

    Context: The freezing occurs in the quick_execute function in TensorFlow's execute.py and did so only when pandas was imported before tensorflow, for some reason:

    def quick_execute(op_name, num_outputs, inputs, attrs, ctx, name=None):
        device_name = ctx.device_name
        try:
            ctx.ensure_initialized()
            tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name, inputs, attrs, num_outputs)
        except core._NotOkStatusException as e:
            if name is not None:
                e.message += " name: " + name
            raise core._status_to_exception(e) from None
        except TypeError as e:
            keras_symbolic_tensors = [x for x in inputs if _is_keras_symbolic_tensor(x)]
            if keras_symbolic_tensors:
                raise core._SymbolicException(
                    "Inputs to eager execution function cannot be Keras symbolic "
                    "tensors, but found {}".format(keras_symbolic_tensors))
            raise e
        return tensors
    

    The function call that froze was:

    tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name, inputs, attrs, num_outputs)
    

    I couldn't determine why this function within TensorFlow causes the freeze as it did not allow me to step-into further from that point, but downgrading to TensorFlow 2.15.0 avoids the issue.