Identifying closest value in a column for each filter using Pandas(使用 Pandas 为每个过滤器识别列中最接近的值)
问题描述
我有一个包含类别和值的数据框.我需要在每个类别中找到最接近某个值的值.我想我已经接近了,但在将 argsort 的结果应用于原始数据帧时,我无法真正获得正确的输出.
I have a data frame with categories and values. I need to find the value in each category closest to a value. I think I'm close but I can't really get the right output when applying the results of argsort to the original dataframe.
例如,如果输入是在下面的代码中定义的,则输出应该只有 (a, 1, True)
, (b, 2, True)
,(c, 2, True)
和所有其他 isClosest Values
应为 False.
For example, if the input was defined in the code below the output should have only (a, 1, True)
, (b, 2, True)
, (c, 2, True)
and all other isClosest Values
should be False.
如果多个值最接近,那么它应该是列出的第一个标记的值.
If multiple values are closest then it should be the first value listed marked.
这是我拥有的代码,但我无法让它正确地重新应用于数据帧.我会喜欢一些指针.
Here is the code I have which works but I can't get it to reapply to the dataframe correctly. I would love some pointers.
df = pd.DataFrame()
df['category'] = ['a', 'b', 'b', 'b', 'c', 'a', 'b', 'c', 'c', 'a']
df['values'] = [1, 2, 3, 4, 5, 4, 3, 2, 1, 0]
df['isClosest'] = False
uniqueCategories = df['category'].unique()
for c in uniqueCategories:
filteredCategories = df[df['category']==c]
sortargs = (filteredCategories['value']-2.0).abs().argsort()
#how to use sortargs so that we set column in df isClosest=True if its the closest value in each category to 2.0?
推荐答案
可以创建一列绝对差值:
You can create a column of absolute differences:
df['dif'] = (df['values'] - 2).abs()
df
Out:
category values dif
0 a 1 1
1 b 2 0
2 b 3 1
3 b 4 2
4 c 5 3
5 a 4 2
6 b 3 1
7 c 2 0
8 c 1 1
9 a 0 2
然后用groupby.transform
检查每组的最小值是否等于你计算的差:
And then use groupby.transform
to check whether the minimum value of each group is equal to the difference you calculated:
df['is_closest'] = df.groupby('category')['dif'].transform('min') == df['dif']
df
Out:
category values dif is_closest
0 a 1 1 True
1 b 2 0 True
2 b 3 1 False
3 b 4 2 False
4 c 5 3 False
5 a 4 2 False
6 b 3 1 False
7 c 2 0 True
8 c 1 1 False
9 a 0 2 False
df.groupby('category')['dif'].idxmin()
还会为您提供每个类别的最接近值的索引.您也可以将其用于映射.
df.groupby('category')['dif'].idxmin()
would also give you the indices of the closest values for each category. You can use that for mapping too.
供选择:
df.loc[df.groupby('category')['dif'].idxmin()]
Out:
category values dif
0 a 1 1
1 b 2 0
7 c 2 0
分配:
df['is_closest'] = False
df.loc[df.groupby('category')['dif'].idxmin(), 'is_closest'] = True
df
Out:
category values dif is_closest
0 a 1 1 True
1 b 2 0 True
2 b 3 1 False
3 b 4 2 False
4 c 5 3 False
5 a 4 2 False
6 b 3 1 False
7 c 2 0 True
8 c 1 1 False
9 a 0 2 False
这些方法之间的区别在于,如果您根据差异检查相等性,则在出现平局的情况下,所有行都会得到 True.但是,使用 idxmin
它将在第一次出现时返回 True(每个组只有一个).
The difference between these approaches is that if you check equality against the difference, you would get True for all rows in case of ties. However, with idxmin
it will return True for the first occurrence (only one for each group).
这篇关于使用 Pandas 为每个过滤器识别列中最接近的值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持编程学习网!
本文标题为:使用 Pandas 为每个过滤器识别列中最接近的值
基础教程推荐
- 筛选NumPy数组 2022-01-01
- 在 Python 中,如果我在一个“with"中返回.块,文件还会关闭吗? 2022-01-01
- 如何在海运重新绘制中自定义标题和y标签 2022-01-01
- Python kivy 入口点 inflateRest2 无法定位 libpng16-16.dll 2022-01-01
- 何时使用 os.name、sys.platform 或 platform.system? 2022-01-01
- 如何让 python 脚本监听来自另一个脚本的输入 2022-01-01
- 用于分类数据的跳跃记号标签 2022-01-01
- 线程时出现 msgbox 错误,GUI 块 2022-01-01
- Dask.array.套用_沿_轴:由于额外的元素([1]),使用dask.array的每一行作为另一个函数的输入失败 2022-01-01
- 使用PyInstaller后在Windows中打开可执行文件时出错 2022-01-01