I have a data frame with a columns named 'age'. The ages range from 6 - 90. Is there a way to group ages in interval range as '5-9', '10-14' etc. So that we can display on a graph the age ranges between these instead of individual ages.
I am not an expert, but this is what I put together:
map
)import pandas as pd
import matplotlib.pyplot as plt
# Sample data (to have something to work with in this example)
data = {'age': [6, 10, 12, 15, 20, 22, 25, 30, 35, 52, 53, 54, 55, 60, 65, 70, 75, 84, 85, 90]}
df = pd.DataFrame(data)
# Define the age ranges
## List of touples, each containing our min and max values of delimiter
age_ranges = [(start, start + 4) for start in range(5, 86, 5)]
## [(5, 9), (10, 14), (15, 19), .....]
# Adjust ranges for border values (5 is bigger than 4.9, 9 is smaller than 9.1, etc)
# If entry data is not composed of integers this doesn't work (won't work well with float age values like 5.7, 15.2, etc)
ranges_adjusted=[(this_tuple[0]-0.1,this_tuple[1]+0.1) for this_tuple in age_ranges]
# [(4.9, 9.1), (9.9, 14.1), (14.9, 19.1),.....]
# Define the bins
bins=pd.IntervalIndex.from_tuples(ranges_adjusted)
# Define "nice-looking" labels (otherwise x axis will read "(4.9, 9.1] (9.9, 14.1] .....")
labels=[f"{start}-{end}" for start, end in age_ranges]
# Use "cut" method to group ages into ranges
df['age_range'] = pd.cut(df['age'],
bins=bins,
).map(dict(zip(bins, labels)))
# Count the occurrences of each age range
age_counts = df['age_range'].value_counts().sort_index()
# Plotting the data
age_counts.plot(kind='bar', rot=0)
plt.xlabel('Age Range')
plt.ylabel('Count')
plt.title('Age Distribution')
plt.show()
Again, not an expert here. I will be more careful next time posting an answer! See you around :)