Search code examples
pythonpandasfunctionparametersapply

How to understand pandas .apply(axis='columns')?


Below is an answer code I received from Kaggle Pandas course.

def stars(row):
    if row.country == 'Canada':
        return 3
    elif row.points >= 95:
        return 3
    elif row.points >= 85:
        return 2
    else:
        return 1

star_ratings_2 = reviews.apply(stars, axis='columns')  

The question goes like this:

We'd like to host these wine reviews on our website, but a rating system ranging from 80 to 100 points is too hard to understand - we'd like to translate them into simple star ratings. A score of 95 or higher counts as 3 stars, a score of at least 85 but less than 95 is 2 stars. Any other score is 1 star.

Also, the Canadian Vintners Association bought a lot of ads on the site, so any wines from Canada should automatically get 3 stars, regardless of points.

Create a series star_ratings with the number of stars corresponding to each review in the dataset.

The dataset looks like this: Table

My question is: star_ratings_2 = reviews.apply(stars, axis='columns') Why axis='columns instead of axis='rows'? since the stars() functions has to process country and points columns of a row, shouldn't we pass a row to the stars() function?

I just didn't expect the correct answer will be axis='columns', I ve asked around including ChatGPT, but there is no good answer for me. ChatGPT even think that I am right where the axis='rows' should be correct.


Solution

  • The terminology is maybe misleading. However the apply documentation is pretty clear:

    axis: {0 or ‘index’, 1 or ‘columns’}, default 0

    Axis along which the function is applied:

    0 or ‘index’: apply function to each column.

    1 or ‘columns’: apply function to each row.

    You can make the parallel with aggregation functions: df.sum(axis=1) takes each row and aggregates it into a single value. This is the same here: apply on axis=1/axis='columns' takes each row and performs something.