Search code examples
pythonnumpymatplotlibdatasetgaussian

How to create 3D data set suitable for Gaussian Distribution


I'm trying to create a dataset suitable for a Gaussian distribution. The x and y values will be the same, and on the z-axis, these values will be in accordance with the gaussian distribution. Taking this site as a resource for myself: https://towardsdatascience.com/a-python-tutorial-on-generating-and-plotting-a-3d-guassian-distribution-8c6ec6c41d03 I wrote the following code. But unfortunately, the output I got was not like the one in the link I gave. I think there is an error with the mathematical formulas. I would be very happy if you could help me fix it. While I was waiting for a graph like this I got that kind of graph.

Thank you in advance.

import numpy as np
import matplotlib.pyplot as plt
from matplotlib import cm
from mpl_toolkits.mplot3d import Axes3D

def np_bivariate_normal_pdf(domain, mean, variance):
  X = np.arange(-domain+mean, domain+mean, variance)
  Y = np.arange(-domain+mean, domain+mean, variance)
  X, Y = np.meshgrid(X, Y)
  R = np.sqrt(X**2 + Y**2)
  Z = ((1. / np.sqrt(2 * np.pi)) * np.exp(-.5*R**2))

  return X, Y, Z

    
def plt_plot_bivariate_normal_pdf(x, y, z):
  fig = plt.figure(figsize=(10, 10))
  ax = fig.gca(projection='3d')
  ax.plot_surface(x, y, z, 
                  cmap=cm.coolwarm,
                  linewidth=0, 
                  antialiased=True)
  ax.set_xlabel('x')
  ax.set_ylabel('y')
  ax.set_zlabel('z');
  ax.set_xlim(-5, +20)
  ax.set_ylim(-5, +20)
  plt.show()
  
a = np_bivariate_normal_pdf(0.75, 5, 0.01)
b = np_bivariate_normal_pdf(1.875, 3, 0.025)
c = np_bivariate_normal_pdf(1.5, 7.5, 0.02)
d = np_bivariate_normal_pdf(2.25, 12, 0.03)

plt_plot_bivariate_normal_pdf(*a)
plt_plot_bivariate_normal_pdf(*b)
plt_plot_bivariate_normal_pdf(*c)
plt_plot_bivariate_normal_pdf(*d)

Solution

  • The function np_bivariate_normal_pdf() uses the formula for the one-dimensional normal distribution, while you intend to compute the multivariate normal distribution. The multivariate distribution depends on a mean which is a vector (this determines where the peak of the graph is located), and a covariance matrix (which controls how steep the graph is as you approach the peak from different sides). In your function both mean and variance are numbers, and the formula you are using actually does not involve these parameters at all. You can change your code to fix it, or you can use one of several Python libraries (e.g. scipy.stats) that have the multivariate normal distribution implemented.