列表头对应的引用DataFrame值-Python问题

Reference DataFrame value corresponding to column header(列表头对应的引用DataFrame值)

本文介绍了列表头对应的引用DataFrame值的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试向我的DataFrame追加一列，该列基于所指示的列名引用的值。

我有以下DataFrame：

     Area      1         2         3         4      Select     
-----------------------------------------------------------
0      22     54        33        46        23           4       
1      45     36        54        32        14           1        
2      67     34        29        11        14           3       
3      54     35        19        22        45           2        
4      21     27        39        43        22           3

"；Select"；下的值引用了"；Select"；显示的列号下的值。例如，对于第0行，"；Select"；显示4，它指的是第0行的第"；4"；列下的值，即23。然后，对于第1行，"；Select"；显示1，它指的是第1行中第1列"；下的值，即36。

我要向我的DataFrame追加一个新列，该列具有"；Select"；正在引用的值。

因此我需要使用我的DataFrame并创建以下DataFrame：

     Area      1         2         3         4      Select      Value
----------------------------------------------------------------------
0      22     54        33        46        23           4         23
1      45     36        54        32        14           1         36 
2      67     34        29        11        14           3         11
3      54     35        19        22        45           2         19
4      21     27        39        43        22           3         43

我不确定如何从"；Select"；列引用的编号列下提取值，因为列标题只是标题，而不是要索引的实际值。如何在python中实现这一点？

推荐答案

我们可以使用Looking up values by index/column labels上的文档推荐的数字索引来替代过时的DataFrame.lookup。

WITHfactorizeSelectANDreindex：

idx, cols = pd.factorize(df['Select'])
df['value'] = (
    df.reindex(cols, axis=1).to_numpy()[np.arange(len(df)), idx]
)

注意1：如果因子分解列中存在与列标题不对应的值，则结果值将为NaN(表示缺少数据)。
注意2：两个索引器都需要是从0开始的范围索引(与numpy索引兼容)。np.arange(len(df))基于DataFrame的长度创建范围索引，因此适用于所有情况。

但是，如果DataFrame已经有兼容的索引(如本例所示)，df.index可以直接使用。

idx, cols = pd.factorize(df['Select'])
df['value'] = (
    df.reindex(cols, axis=1).to_numpy()[df.index, idx]
)

df：

   Area   1   2   3   4  Select  value
0    22  54  33  46  23       4     23
1    45  36  54  32  14       1     36
2    67  34  29  11  14       3     11
3    54  35  19  22  45       2     19
4    21  27  39  43  22       3     43

另一个选项是Index.get_indexer：

df['value'] = df.to_numpy()[
    df.index.get_indexer(df.index),
    df.columns.get_indexer(df['Select'])
]

注意：同样的情况也适用，如果df.index已经是连续的0索引(兼容numpy索引)，我们可以直接使用df.index，而不是用Index.get_indexer处理：

df['value'] = df.to_numpy()[
    df.index,
    df.columns.get_indexer(df['Select'])
]

df：

   Area   1   2   3   4  Select  value
0    22  54  33  46  23       4     23
1    45  36  54  32  14       1     36
2    67  34  29  11  14       3     11
3    54  35  19  22  45       2     19
4    21  27  39  43  22       3     43

警告get_indexer：如果Select中有一个值与列标题不对应，则返回值为-1，它将返回DataFrame中最后一列的值(因为Python支持相对于结尾的负索引)。这不如NaN安全，因为它将从Select列返回一个数值，可能很难立即判断数据无效。

示例程序：

import pandas as pd df = pd.DataFrame({ 'Select': ['B', 'A', 'C', 'D'], 'A': [47, 2, 51, 95], 'B': [56, 88, 10, 56], 'C': [70, 73, 59, 56] }) df['value'] = df.to_numpy()[ df.index, df.columns.get_indexer(df['Select']) ] print(df)
注意，在最后一行中，Select列是D，但是它从DataFrame(-1)中的最后一列C中提取值。这不能立即看出查找失败/不正确。

Select A B C value 0 B 47 56 70 56 1 A 2 88 73 2 2 C 51 10 59 59 3 D 95 56 56 56 # <- Value from C

对比factorize：

idx, cols = pd.factorize(df['Select']) df['value'] = ( df.reindex(cols, axis=1).to_numpy()[df.index, idx] )

请注意，在最后一行中，"选择"列是D，相应的值是NaN，它在 pandas 中用于指示缺少的数据。

Select A B C value 0 B 47 56 70 56.0 1 A 2 88 73 2.0 2 C 51 10 59 59.0 3 D 95 56 56 NaN # <- Missing Data

安装和导入：

import numpy as np # (Only needed is using np.arange) import pandas as pd df = pd.DataFrame({ 'Area': [22, 45, 67, 54, 21], 1: [54, 36, 34, 35, 27], 2: [33, 54, 29, 19, 39], 3: [46, 32, 11, 22, 43], 4: [23, 14, 14, 45, 22], 'Select': [4, 1, 3, 2, 3] })

这篇关于列表头对应的引用DataFrame值的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持编程学习网！

问题描述

推荐答案

基础教程推荐