Search code examples
pythonpandasnumpyprobability

Determine probabilites based on an array python


Let's say I have and array.shape = (296,3)

Where the first column contains 0 or 1, the second column contains 0,1 or 2 and the final column contains also 0,1 or 2.

I want to know how I can calculate all the probabilities for each one of the 18 possible combinations (2x3x3) of the 3 columns. Possible sequences are [0,0,0];[0,0,1];etc... .


Solution

  • You can use numpy.unique to collect the unique sequences, along with their counts.

    >>> (unique, counts) = numpy.unique(data, return_counts=True, axis=0)
    >>> unique
    array([0, 1, 1],
          [0, 1, 2],
          [1, 1, 1],
           ...])
    >>> counts
    array([2, 2, 1, ...])
    

    Therefore the probability of the i-th element from uniques is its corresponding value from counts over the number of rows from your original data. In other words for example a count of 28 over 296 rows would be a probability of ~9.46%.