Search code examples
pythonbayesian-networkspgmpy

BayesianModelSampling (pgmpy) - IndexError: too many indices for array: array is 1-dimensional, but 3 were indexed


I need to learn a Bayesian network and sampling a bunch of synthetic data from it. I simulated a dataframe from it and learned a network.

However, why is this snippet of code retriving an error?

Here is the error:

File "c:\Users\a-rotalintiy\PhD\correlations\import pandas as pd.py", line 38, in <module>
    synthetic_data = sampler.forward_sample(size=synthetic_data_size)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\a-rotalintiy\.virtualenvs\correlations-k4Gxn6_V\Lib\site-packages\pgmpy\sampling\Sampling.py", line 125, in forward_sample
    sampled[node] = sample_discrete_maps(
                    ^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\a-rotalintiy\.virtualenvs\correlations-k4Gxn6_V\Lib\site-packages\pgmpy\utils\mathext.py", line 194, in sample_discrete_maps
    samples[(weight_indices == weight_index)] = np.random.choice(
    ~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
IndexError: too many indices for array: array is 1-dimensional, but 3 were indexed

Here is the code:

import pandas as pd
from pgmpy.models import BayesianNetwork
from pgmpy.estimators import HillClimbSearch, BicScore, BayesianEstimator
from pgmpy.sampling import BayesianModelSampling

# Sample DataFrame
data = {
    'A': [1, 0, 1, 0, 1],
    'B': [0, 1, 0, 1, 0],
    'C': [1, 1, 0, 0, 1]
}
df = pd.DataFrame(data)

# Ensure the DataFrame is correctly structured
print("DataFrame shape:", df.shape)
print("DataFrame head:\n", df.head())

# Estimate the model structure
hc = HillClimbSearch(df)
scoring_method = BicScore(df)
best_model = hc.estimate(scoring_method=scoring_method)

# Print learned edges
print("Learned edges:", best_model.edges())

# Create and fit the Bayesian model
bn_model = BayesianNetwork(best_model.edges())
bn_model.fit(df, estimator=BayesianEstimator, prior_type="BDeu")

# Ensure the model is fitted correctly
for cpd in bn_model.get_cpds():
    print(f"CPD of {cpd.variable}:")
    print(cpd)

# Sample synthetic data
sampler = BayesianModelSampling(bn_model)
synthetic_data_size = len(df)  # You can adjust this size as needed
synthetic_data = sampler.forward_sample(size=synthetic_data_size)

print("Synthetic Data Sample:\n", synthetic_data)

Solution

  • I think the error is because of numpy 2.0. pgmpy doesn't support numpy 2.0 yet (https://github.com/pgmpy/pgmpy/pull/1780). If you downgrade your numpy version to 1.x, the code should work fine:

    pip install numpy==1.26
    

    Also a simpler way to do forward sampling is to use the BayesianNetwork.simulate method. Something like:

    bn_model.simulate(n_samples=len(df))