Pandas groupby

在 Pandas 中，groupby 操作允许我们根据特定列对数据进行分组。这意味着我们可以根据这些列中的值将 DataFrame 分成更小的组。

分组后，我们可以分别对每个组应用函数。这些函数有助于汇总或聚合每个组中的数据。

在 Pandas 中按单列分组

在 Pandas 中，我们使用 groupby() 函数按单列分组，然后计算聚合值。例如：

import pandas as pd

# create a dictionary containing the data
data = {'Category': ['Electronics', 'Clothing', 'Electronics', 'Clothing'],
        'Sales': [1000, 500, 800, 300]}

# create a DataFrame using the data dictionary
df = pd.DataFrame(data)

# group the DataFrame by the Category column and
# calculate the sum of Sales for each category
grouped = df.groupby('Category')['Sales'].sum()

# print the grouped data
print(grouped)

输出

Category
Clothing        800
Electronics    1800
Name: Sales, dtype: int64

在上面的示例中，df.groupby('Category')['Sales'].sum() 用于按单列分组并计算总和。

此行执行以下操作：

df.groupby('Category') - 按 Category 列中的唯一值对 df DataFrame 进行分组。
['Sales'] - 指定我们感兴趣的是每个组内的 Sales 列。
.sum() - 计算每个组中 Sales 值的总和。

在 Pandas 中按多列分组

我们也可以在 Pandas 中对多列进行分组并计算多个聚合值。

让我们看一个例子。

import pandas as pd

# create a DataFrame with student data
data = {
    'Gender': ['Male', 'Female', 'Male', 'Female', 'Male'],
    'Grade': ['A', 'B', 'A', 'A', 'B'],
    'Score': [90, 85, 92, 88, 78]
}

df = pd.DataFrame(data)

# define the aggregate functions to be applied to the Score column
agg_functions = {
    # calculate both mean and maximum of the Score column
    'Score': ['mean', 'max'] 
}

# group the DataFrame by Gender and Grade, then apply the aggregate functions
grouped = df.groupby(['Gender', 'Grade']).aggregate(agg_functions)

# print the resulting grouped DataFrame
print(grouped)

输出

             Score    
              mean max
Gender Grade          
Female A      88.0  88
       B      85.0  85
Male   A      91.0  92
       B      78.0  78

在此，输出显示数据已按 Gender 和 Grade 分组，并且每个组的平均分和最高分显示在生成的 DataFrame grouped 中。

分组分类数据

我们按 分类数据 进行分组，以便根据特定类别分析数据。

Pandas 提供了强大的工具，可以使用 groupby() 函数高效地处理分类数据。

让我们看一个例子。

import pandas as pd

# sample data
data = {'Category': ['A', 'B', 'A', 'B', 'A', 'B'],
        'Sales': [100, 150, 200, 50, 300, 120]}

df = pd.DataFrame(data)

# convert Category column to categorical type
df['Category'] = pd.Categorical(df['Category'])

# group by Category  and calculate the total sales
grouped = df.groupby('Category')['Sales'].sum()

print(grouped)

输出

Category
A    600
B    320
Name: Sales, dtype: int64

这里，首先使用 pd.Categorical() 将 Category 列转换为分类数据类型。

然后使用 groupby() 函数按 Category 列对数据进行分组。并使用 sum() 聚合函数计算每个类别的总销售额。

热门教程

热门实例

参考资料

认证课程

成为一名认证的 Python
程序员。

热门教程

参考资料

热门实例

简介

DataFrame 操作和处理

数据导入和导出

数据清洗

数据分析和聚合

数据可视化

Pandas groupby

在 Pandas 中按单列分组

在 Pandas 中按多列分组

分组分类数据

目录

热门教程

热门实例

参考资料

认证课程

成为一名认证的 Python程序员。

热门教程

参考资料

热门实例

简介

DataFrame 操作和处理

数据导入和导出

数据清洗

数据分析和聚合

数据可视化

Pandas groupby

在 Pandas 中按单列分组

在 Pandas 中按多列分组

分组分类数据

目录

成为一名认证的 Python
程序员。