Deep copy of Pandas dataframes and dictionaries(Pandas 数据框和字典的深拷贝)
问题描述
I'm creating a small Pandas dataframe:
df = pd.DataFrame(data={'colA': [["a", "b", "c"]]})
I take a deepcopy of that df. I'm not using the Pandas method but general Python, right?
import copy
df_copy = copy.deepcopy(df)
A df_copy.head() gives the following:
Then I put these values into a dictionary:
mydict = df_copy.to_dict()
That dictionary looks like this:
Finally, I remove one item of the list:
mydict['colA'][0].remove("b")
I'm surprized that the values in df_copy are updated. I'm very confused that the values in the original dataframe are updated too! Both dataframes look like this now:
I understand Pandas doesn't really do deepcopy, but this wasn't a Pandas method. My questions are:
1) how can I build a dictionary from a dataframe that doesn't update the dataframe?
2) how can I take a copy of a dataframe which would be completely independent?
thanks for your help!
Cheers, Nicolas
Disclaimer
Notice that putting mutable objects inside a DataFrame can be an antipattern so make sure that you really need it and you understand what you are doing.
Why doesn't your copy independent
When applied on an object, copy.deepcopy is looked up for a _deepcopy_ method of that object, that is called in turn. It's added to avoid copying too much for objects. In the case of a DataFrame instance in version 0.20.0 and above - _deepcopy_ doesn`t work recursively.
Similarly, if you will use DataFrame.copy(deep=True)
deep copy will copy the data, but will not do so recursively. .
How to solve the problem
To take a truly deep copy of a DataFrame containing a list(or other python objects), so that it will be independent - you can use one of the methods below.
df_copy = pd.DataFrame(columns = df.columns, data = copy.deepcopy(df.values))
For a dictionary, you may use same trick:
mydict = pd.DataFrame(columns = df.columns, data = copy.deepcopy(df_copy.values)).to_dict()
mydict['colA'][0].remove("b")
There's also a standard hacky way of deep-copying python objects:
import pickle
df_copy = pickle.loads(pickle.dumps(df))
Feel free to ask for any clarifications, if needed.
这篇关于Pandas 数据框和字典的深拷贝的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持编程学习网!
本文标题为:Pandas 数据框和字典的深拷贝
基础教程推荐
- 使用PyInstaller后在Windows中打开可执行文件时出错 2022-01-01
- Python kivy 入口点 inflateRest2 无法定位 libpng16-16.dll 2022-01-01
- 线程时出现 msgbox 错误,GUI 块 2022-01-01
- 在 Python 中,如果我在一个“with"中返回.块,文件还会关闭吗? 2022-01-01
- Dask.array.套用_沿_轴:由于额外的元素([1]),使用dask.array的每一行作为另一个函数的输入失败 2022-01-01
- 用于分类数据的跳跃记号标签 2022-01-01
- 如何在海运重新绘制中自定义标题和y标签 2022-01-01
- 何时使用 os.name、sys.platform 或 platform.system? 2022-01-01
- 如何让 python 脚本监听来自另一个脚本的输入 2022-01-01
- 筛选NumPy数组 2022-01-01