在python pandas中构建共现矩阵

Constructing a co-occurrence matrix in python pandas(在python pandas中构建共现矩阵)

本文介绍了在python pandas中构建共现矩阵的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我知道如何在 R.但是,pandas 中是否有任何函数可以将数据帧转换为 nxn 共现矩阵,其中包含两个方面共现的计数.

例如矩阵df:

将pandas导入为pddf = pd.DataFrame({'TFD': ['AA', 'SL', 'BB', 'D0', 'Dk', 'FF'],'小吃' : ['1', '0', '1', '1', '0', '0'],'Trans' : ['1', '1', '1', '0', '0', '1'],'Dop' : ['1', '0', '1', '0', '1', '1']}).set_index('TFD')打印文件>>>Dop 零食 TransTFDAA 1 1 1SL 0 0 1BB 1 1 1D0 0 1 0DK 1 0 0FF 1 0 1[6 行 x 3 列]

会产生:

 Dop Snack Trans掺杂 0 2 3零食 2​​ 0 2跨 3 2 0

由于矩阵镜像在对角线上,我想会有一种优化代码的方法.

解决方案

这是一个简单的线性代数,您将矩阵与其转置相乘(您的示例包含字符串,不要忘记将它们转换为整数):

<预><代码>>>>df_asint = df.astype(int)>>>coocc = df_asint.T.dot(df_asint)>>>库克Dop 零食 Trans多普 4 2 3零食 2​​ 3 2跨 3 2 4

如果,如在 R 答案中,你想重置对角线,你可以使用 numpy 的 fill_diagonal:

<预><代码>>>>将 numpy 导入为 np>>>np.fill_diagonal(coocc.values, 0)>>>库克Dop 零食 Trans掺杂 0 2 3零食 2​​ 0 2跨 3 2 0

I know how to do this in R. But, is there any function in pandas that transforms a dataframe to an nxn co-occurrence matrix containing the counts of two aspects co-occurring.

For example a matrix df:

import pandas as pd

df = pd.DataFrame({'TFD' : ['AA', 'SL', 'BB', 'D0', 'Dk', 'FF'],
                    'Snack' : ['1', '0', '1', '1', '0', '0'],
                    'Trans' : ['1', '1', '1', '0', '0', '1'],
                    'Dop' : ['1', '0', '1', '0', '1', '1']}).set_index('TFD')

print df

>>> 
    Dop Snack Trans
TFD                
AA    1     1     1
SL    0     0     1
BB    1     1     1
D0    0     1     0
Dk    1     0     0
FF    1     0     1

[6 rows x 3 columns]

would yield:

    Dop Snack Trans

Dop   0     2     3
Snack 2     0     2
Trans 3     2     0

Since the matrix is mirrored on the diagonal I guess there would be a way to optimize code.

解决方案

It's a simple linear algebra, you multiply matrix with its transpose (your example contains strings, don't forget to convert them to integer):

>>> df_asint = df.astype(int)
>>> coocc = df_asint.T.dot(df_asint)
>>> coocc
       Dop  Snack  Trans
Dop      4      2      3
Snack    2      3      2
Trans    3      2      4

if, as in R answer, you want to reset diagonal, you can use numpy's fill_diagonal:

>>> import numpy as np
>>> np.fill_diagonal(coocc.values, 0)
>>> coocc
       Dop  Snack  Trans
Dop      0      2      3
Snack    2      0      2
Trans    3      2      0

这篇关于在python pandas中构建共现矩阵的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持编程学习网!

本文标题为:在python pandas中构建共现矩阵

基础教程推荐