Pandas - Groupby with conditional formula(Pandas - 带条件公式的 Groupby)
问题描述
Survived SibSp Parch
0 0 1 0
1 1 1 0
2 1 0 0
3 1 1 0
4 0 0 1
Given the above dataframe, is there an elegant way to groupby
with a condition?
I want to split the data into two groups based on the following conditions:
(df['SibSp'] > 0) | (df['Parch'] > 0) = New Group -"Has Family"
(df['SibSp'] == 0) & (df['Parch'] == 0) = New Group - "No Family"
then take the means of both of these groups and end up with an output like this:
SurvivedMean
Has Family Mean
No Family Mean
Can it be done using groupby or would I have to append a new column using the above conditional statement?
An easy way to group that is to use the sum of those two columns. If either of them is positive, the result will be greater than 1. And groupby accepts an arbitrary array as long as the length is the same as the DataFrame's length so you don't need to add a new column.
family = np.where((df['SibSp'] + df['Parch']) >= 1 , 'Has Family', 'No Family')
df.groupby(family)['Survived'].mean()
Out:
Has Family 0.5
No Family 1.0
Name: Survived, dtype: float64
这篇关于Pandas - 带条件公式的 Groupby的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持编程学习网!
本文标题为:Pandas - 带条件公式的 Groupby
基础教程推荐
- 使用PyInstaller后在Windows中打开可执行文件时出错 2022-01-01
- 线程时出现 msgbox 错误,GUI 块 2022-01-01
- 在 Python 中,如果我在一个“with"中返回.块,文件还会关闭吗? 2022-01-01
- Dask.array.套用_沿_轴:由于额外的元素([1]),使用dask.array的每一行作为另一个函数的输入失败 2022-01-01
- 筛选NumPy数组 2022-01-01
- 如何让 python 脚本监听来自另一个脚本的输入 2022-01-01
- 何时使用 os.name、sys.platform 或 platform.system? 2022-01-01
- 用于分类数据的跳跃记号标签 2022-01-01
- Python kivy 入口点 inflateRest2 无法定位 libpng16-16.dll 2022-01-01
- 如何在海运重新绘制中自定义标题和y标签 2022-01-01