我正在用 Pandas 学习 Python 资料分析
我有一个游戏销售资料框,看起来像这样:
(此资料不真实,仅供提问)
Name Year Publisher Total Sales
GTA V 2013 Rockstar 133000
Super Mario Bros 1985 Nintendo 430500
GTA VI 2025 Rockstar 86000
RDR 3 2025 Rockstar 129030
Super Mario Sister 1985 Nintendo 308900
Super Mario End 2000 Nintendo 112100
然后我洗掉名称并使用以下命令按发布者名称对其进行分组:
df.drop(columns='Name', inplace=True)
df.groupby(['Publisher','Year','Total Sales']).sum().reset_index()
资料框现在看起来像这样:
Publisher Year Total Sales
Nintendo 1985 308900
Nintendo 1985 430500
Nintendo 2000 112100
Rockstar 2013 133000
Rockstar 2025 129030
Rockstar 2025 86000
这很好,但我想总结同一出版商同年的总销售额
我希望资料框看起来像这样:
Publisher Year Total Sales
Nintendo 1985 739400
Nintendo 2000 86000
Rockstar 2013 129030
Rockstar 2025 215030
有没有办法做到这一点?
这是我的 df 代码:
data = {'Name':['GTA V','Super Mario Bros','GTA VI','RDR 3','Super Mario Sister','Super Mario End'],'Year':['2013','1985','2025','2025','1985','2000'],
'Publisher':['Rockstar','Nintendo','Rockstar','Rockstar','Nintendo','Nintendo'],'Total Sales':['133000','430500','86000','129030','308900','112100']}
df = pd.DataFrame(data)
df
uj5u.com热心网友回复:
使用pivot_table
:
>>> df.pivot_table('Total Sales', ['Year', 'Publisher'], aggfunc='sum').reset_index()
Year Publisher Total Sales
0 1985 Nintendo 739400
1 2000 Nintendo 112100
2 2013 Rockstar 133000
3 2025 Rockstar 215030
注意:如果Total Sales
列包含字符串,请将其转换为int
(或float
):
>>> df.astype({'Total Sales': int}).pivot_table(...)
uj5u.com热心网友回复:
import pandas as pd
data = {'Name':['GTA V','Super Mario Bros','GTA VI','RDR 3','Super Mario Sister','Super Mario End'],'Year':['2013','1985','2025','2025','1985','2000'],
'Publisher':['Rockstar','Nintendo','Rockstar','Rockstar','Nintendo','Nintendo'],'Total Sales':['133000','430500','86000','129030','308900','112100']}
df = pd.DataFrame(data)
df['Total Sales'] = df['Total Sales'].astype(int)
df.groupby(['Year', 'Publisher'])['Total Sales'].agg('sum').reset_index()
uj5u.com热心网友回复:
这是一种方法:
df.drop(columns='Name', inplace=True)
df['Total Sales'] = pd.to_numeric(df['Total Sales'])
df2 = df.groupby(['Publisher','Year']).sum().reset_index()
df2
0 评论