I need to learn a Bayesian network and sampling a bunch of synthetic data from it. I simulated a dataframe from it and learned a network.
However, why is this snippet of code retriving an error?
Here is the error:
File "c:\Users\a-rotalintiy\PhD\correlations\import pandas as pd.py", line 38, in <module>
synthetic_data = sampler.forward_sample(size=synthetic_data_size)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\a-rotalintiy\.virtualenvs\correlations-k4Gxn6_V\Lib\site-packages\pgmpy\sampling\Sampling.py", line 125, in forward_sample
sampled[node] = sample_discrete_maps(
^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\a-rotalintiy\.virtualenvs\correlations-k4Gxn6_V\Lib\site-packages\pgmpy\utils\mathext.py", line 194, in sample_discrete_maps
samples[(weight_indices == weight_index)] = np.random.choice(
~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
IndexError: too many indices for array: array is 1-dimensional, but 3 were indexed
Here is the code:
import pandas as pd
from pgmpy.models import BayesianNetwork
from pgmpy.estimators import HillClimbSearch, BicScore, BayesianEstimator
from pgmpy.sampling import BayesianModelSampling
# Sample DataFrame
data = {
'A': [1, 0, 1, 0, 1],
'B': [0, 1, 0, 1, 0],
'C': [1, 1, 0, 0, 1]
}
df = pd.DataFrame(data)
# Ensure the DataFrame is correctly structured
print("DataFrame shape:", df.shape)
print("DataFrame head:\n", df.head())
# Estimate the model structure
hc = HillClimbSearch(df)
scoring_method = BicScore(df)
best_model = hc.estimate(scoring_method=scoring_method)
# Print learned edges
print("Learned edges:", best_model.edges())
# Create and fit the Bayesian model
bn_model = BayesianNetwork(best_model.edges())
bn_model.fit(df, estimator=BayesianEstimator, prior_type="BDeu")
# Ensure the model is fitted correctly
for cpd in bn_model.get_cpds():
print(f"CPD of {cpd.variable}:")
print(cpd)
# Sample synthetic data
sampler = BayesianModelSampling(bn_model)
synthetic_data_size = len(df) # You can adjust this size as needed
synthetic_data = sampler.forward_sample(size=synthetic_data_size)
print("Synthetic Data Sample:\n", synthetic_data)
I think the error is because of numpy 2.0. pgmpy doesn't support numpy 2.0 yet (https://github.com/pgmpy/pgmpy/pull/1780). If you downgrade your numpy version to 1.x, the code should work fine:
pip install numpy==1.26
Also a simpler way to do forward sampling is to use the BayesianNetwork.simulate
method. Something like:
bn_model.simulate(n_samples=len(df))