Search code examples
pythonpython-3.xnumpyuniquefrequency

The axis argument to unique is not supported for dtype object


I am trying to get unique counts column-wise but my array has categorical variables (dtype object)

val, count = np.unique(x, axis=1, return_counts=True)

Though I am getting an error like this:

TypeError: The axis argument to unique is not supported for dtype object

How do I sove this problem?

Sample x:

array([[' Private', ' HS-grad', ' Divorced'],
       [' Private', ' 11th', ' Married-civ-spouse'],
       [' Private', ' Bachelors', ' Married-civ-spouse'],
       [' Private', ' Masters', ' Married-civ-spouse'],
       [' Private', ' 9th', ' Married-spouse-absent'],
       [' Self-emp-not-inc', ' HS-grad', ' Married-civ-spouse'],
       [' Private', ' Masters', ' Never-married'],
       [' Private', ' Bachelors', ' Married-civ-spouse'],
       [' Private', ' Some-college', ' Married-civ-spouse']], dtype=object)

Need the following counts:

for x_T in x.T:
    val, count = np.unique(x_T, return_counts=True)
    print (val,count)


[' Private' ' Self-emp-not-inc'] [8 1]
[' 11th' ' 9th' ' Bachelors' ' HS-grad' ' Masters' ' Some-college'] [1 1 2 2 2 1]
[' Divorced' ' Married-civ-spouse' ' Married-spouse-absent'
 ' Never-married'] [1 6 1 1]

Solution

  • You could use Itemfreq eventhough it the output does not look like yours it delivers the desired counts:

    import numpy as np
    from scipy.stats import itemfreq
    
    x = np. array([[' Private', ' HS-grad', ' Divorced'],
           [' Private', ' 11th', ' Married-civ-spouse'],
           [' Private', ' Bachelors', ' Married-civ-spouse'],
           [' Private', ' Masters', ' Married-civ-spouse'],
           [' Private', ' 9th', ' Married-spouse-absent'],
           [' Self-emp-not-inc', ' HS-grad', ' Married-civ-spouse'],
           [' Private', ' Masters', ' Never-married'],
           [' Private', ' Bachelors', ' Married-civ-spouse'],
           [' Private', ' Some-college', ' Married-civ-spouse']], dtype=object)
    
    itemfreq(x)
    

    Output:

    array([[' 11th', 1],
           [' 9th', 1],
           [' Bachelors', 2],
           [' Divorced', 1],
           [' HS-grad', 2],
           [' Married-civ-spouse', 6],
           [' Married-spouse-absent', 1],
           [' Masters', 2],
           [' Never-married', 1],
           [' Private', 8],
           [' Self-emp-not-inc', 1],
           [' Some-college', 1]], dtype=object)
    

    otherwise you could try to specifiy another dtype such as:

    val, count = np.unique(x.astype("<U22"), axis=1, return_counts=True)
    

    for this however your array has to be different