df.unique() 基于列的整个 DataFrame

df.unique() on whole DataFrame based on a column(df.unique() 基于列的整个 DataFrame)

本文介绍了df.unique() 基于列的整个 DataFrame的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个 DataFrame df 填充有重复 Id 的行和列:

I have a DataFrame df filled with rows and columns where there are duplicate Id's:

Index   Id   Type
0       a1   A
1       a2   A
2       b1   B
3       b3   B
4       a1   A
...

当我使用时:

uniqueId = df["Id"].unique() 

我得到一个唯一 ID 列表.

I get a list of unique IDs.

但是,我怎样才能在整个 DataFrame 上应用此过滤,以便它保留结构但删除重复项(基于Id")?

How can I however apply this filtering on the whole DataFrame such that it keeps the structure but that the duplicates (based on "Id") are removed?

推荐答案

看来你需要DataFrame.drop_duplicates 参数 subset 指定测试重复的位置:

It seems you need DataFrame.drop_duplicates with parameter subset which specify where are test duplicates:

#keep first duplicate value
df = df.drop_duplicates(subset=['Id'])
print (df)
       Id Type
Index         
0      a1    A
1      a2    A
2      b1    B
3      b3    B

<小时>

#keep last duplicate value
df = df.drop_duplicates(subset=['Id'], keep='last')
print (df)
       Id Type
Index         
1      a2    A
2      b1    B
3      b3    B
4      a1    A

<小时>

#remove all duplicate values
df = df.drop_duplicates(subset=['Id'], keep=False)
print (df)
       Id Type
Index         
1      a2    A
2      b1    B
3      b3    B

这篇关于df.unique() 基于列的整个 DataFrame的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持编程学习网!

本文标题为:df.unique() 基于列的整个 DataFrame

基础教程推荐