Data aggregation and group operations are essential techniques in data analysis. They allow you to analyze trends, summarize data, and generate insights by grouping values and applying aggregation functions such as sum, mean, median, count, and more. Pandas, a powerful data analysis library in Python, provides robust and intuitive APIs for grouping and aggregating data using the groupby() method.
This document provides a comprehensive guide to aggregation and group operations in Pandas, covering:
The GroupBy operation in Pandas involves three steps:
import pandas as pd
data = {
'department': ['HR', 'IT', 'IT', 'HR', 'Finance', 'Finance', 'IT'],
'employee': ['Alice', 'Bob', 'Charlie', 'David', 'Eve', 'Frank', 'Grace'],
'salary': [50000, 60000, 55000, 52000, 70000, 72000, 61000],
'bonus': [5000, 7000, 6000, 4500, 8000, 8500, 6500]
}
df = pd.DataFrame(data)
print(df)
grouped = df.groupby('department')
print(grouped['salary'].mean()) # Average salary per department
print(grouped['salary'].sum()) # Total salary per department
print(grouped['bonus'].max()) # Maximum bonus per department
print(grouped['employee'].count()) # Number of employees per department
result = grouped.agg({
'salary': 'mean',
'bonus': 'sum'
})
print(result)
result = grouped['salary'].agg(['mean', 'max', 'min'])
print(result)
result = grouped.agg(
avg_salary=('salary', 'mean'),
total_bonus=('bonus', 'sum')
)
print(result)
result = grouped['salary'].agg(lambda x: x.max() - x.min())
print(result)
def range_func(x):
return x.max() - x.min()
result = grouped.agg(salary_range=('salary', range_func))
print(result)
The transform() method applies a function to each group but returns an object that has the same shape as the original DataFrame.
df['normalized_salary'] = grouped['salary'].transform(lambda x: x / x.mean())
print(df)
# Keep only departments with total bonus over 20000
filtered = grouped.filter(lambda x: x['bonus'].sum() > 20000)
print(filtered)
group_sizes = grouped.size()
large_groups = group_sizes[group_sizes > 2]
print(df[df['department'].isin(large_groups.index)])
grouped_multi = df.groupby(['department', 'employee'])
print(grouped_multi['salary'].sum())
multi_agg = grouped_multi.agg(
salary_mean=('salary', 'mean'),
bonus_total=('bonus', 'sum')
)
print(multi_agg)
print(multi_agg.loc[('Finance', 'Eve')])
df['dept_avg_salary'] = df.groupby('department')['salary'].transform('mean')
print(df)
df['above_avg'] = df['salary'] > df['dept_avg_salary']
print(df)
desc = grouped['salary'].describe()
print(desc)
print(desc['mean'])
pivot = pd.pivot_table(df, values='salary', index='department', aggfunc='mean')
print(pivot)
pivot = pd.pivot_table(df, values=['salary', 'bonus'], index='department', aggfunc={'salary': 'mean', 'bonus': 'sum'})
print(pivot)
df['department'] = df['department'].astype('category')
grouped = df.groupby('department')
print(grouped['salary'].mean())
sales_data = pd.DataFrame({
'region': ['East', 'East', 'West', 'West', 'East', 'West'],
'salesperson': ['Alice', 'Bob', 'Charlie', 'David', 'Alice', 'Charlie'],
'sales': [2500, 3000, 2200, 2800, 2700, 2600]
})
region_sales = sales_data.groupby('region')['sales'].sum()
print(region_sales)
performance_data = pd.DataFrame({
'employee': ['Alice', 'Bob', 'Charlie', 'David', 'Alice'],
'month': ['Jan', 'Jan', 'Jan', 'Jan', 'Feb'],
'score': [80, 85, 90, 88, 87]
})
monthly_avg = performance_data.groupby('month')['score'].mean()
print(monthly_avg)
orders = pd.DataFrame({
'customer': ['Tom', 'Jerry', 'Tom', 'Jerry', 'Spike'],
'product': ['A', 'B', 'A', 'C', 'B'],
'amount': [100, 150, 120, 130, 140]
})
customer_total = orders.groupby('customer')['amount'].sum()
print(customer_total)
Using named aggregation provides clear and readable column names after aggregation.
Transform is useful when you want to perform operations and keep the shape of the original DataFrame.
agg = df.groupby('department').agg({'salary': 'mean'}).reset_index().sort_values(by='salary', ascending=False)
print(agg)
grouped = df.groupby('department', dropna=False)
print(grouped['salary'].mean())
print(grouped.size())
print(grouped.describe())
Pandas' groupby and aggregation capabilities are vital for analyzing structured data. Whether you're working with sales data, HR records, financial statements, or customer transactions, grouping and summarizing information is key to extracting insights.
In this tutorial, we covered:
By mastering data aggregation and group operations in Pandas, you can enhance your data analysis workflows and produce meaningful summaries, reports, and dashboards.
Python is commonly used for developing websites and software, task automation, data analysis, and data visualisation. Since it's relatively easy to learn, Python has been adopted by many non-programmers, such as accountants and scientists, for a variety of everyday tasks, like organising finances.
Learning Curve: Python is generally considered easier to learn for beginners due to its simplicity, while Java is more complex but provides a deeper understanding of how programming works.
The point is that Java is more complicated to learn than Python. It doesn't matter the order. You will have to do some things in Java that you don't in Python. The general programming skills you learn from using either language will transfer to another.
Read on for tips on how to maximize your learning. In general, it takes around two to six months to learn the fundamentals of Python. But you can learn enough to write your first short program in a matter of minutes. Developing mastery of Python's vast array of libraries can take months or years.
6 Top Tips for Learning Python
The following is a step-by-step guide for beginners interested in learning Python using Windows.
Best YouTube Channels to Learn Python
Write your first Python programStart by writing a simple Python program, such as a classic "Hello, World!" script. This process will help you understand the syntax and structure of Python code.
The average salary for Python Developer is βΉ5,55,000 per year in the India. The average additional cash compensation for a Python Developer is within a range from βΉ3,000 - βΉ1,20,000.
Copyrights © 2024 letsupdateskills All rights reserved