Extract Business Days in Time Series using Python/Pandas(使用 Python/Pandas 提取时间序列中的工作日)
问题描述
我正在处理时间序列中的高频数据,我想从我的数据中获取所有工作日.我的数据观察以秒为单位,所以每天有 86400 秒,我的数据集分布在 31 天(所以有 2,678,400 个观察!).
I am working with high frequency data in Time Series and I would like to get all the business days from my data. My data observations are separated by seconds, so there are 86400 seconds each day and my data set are spread over 31 days (so there are 2,678,400 observations!).
这是我的(部分)数据:
Here is (part) of my data:
In[1]: ts
Out[1]:
2013-01-01 00:00:00 0.480928
2013-01-01 00:00:01 0.480928
2013-01-01 00:00:02 0.483977
2013-01-01 00:00:03 0.486725
2013-01-01 00:00:04 0.486725
...
2013-01-31 23:59:56 0.451630
2013-01-31 23:59:57 0.451630
2013-01-31 23:59:58 0.451630
2013-01-31 23:59:59 0.454683
Freq: S, Length: 2678400
我想做的是创建一个新的时间序列,其中包含本月的工作日,但我希望它们具有相应的数据秒数.例如,如果 2013 年 1 月 2 日(星期三)到 2013 年 1 月 4 日(星期五)是 1 月第一周的第一个工作日,那么:
What I would like to do is to create a new time series which consists of the business days from this month, but I would like to have them with their corresponding data seconds. For example, if 2013-01-02 (WED) until 2013-01-04 (Fri) are the first business days of the first week in January, then:
2013-01-02 00:00:00 0.507477
2013-01-02 00:00:01 0.501373
...
2013-01-03 00:00:00 0.489778
2013-01-03 00:00:01 0.489778
...
2013-01-04 23:59:58 0.598115
2013-01-04 23:59:59 0.598115
Freq: S, Length: 259200
因此它当然会排除 2013 年 1 月 5 日和 2013 年 1 月 6 日星期六的所有数据,因为这些是周末.等等……
so it will exclude of course all the data on Sat 2013-01-05 and 2013-01-06 since these are the weekend days. and so on...
我尝试使用一些 pandas 内置命令,但找不到合适的命令,因为它们按天聚合,而没有考虑到每一天都包含子列.也就是说,每一秒都有一个值,它们不应该被平均,只是组合在一起形成一个新的系列..
I tried to use some pandas built-in commands, but couldn't find the right one since they aggregate by day without taking into consideration that each day contains sub columns in them. That is, for each second there is a value and they should not be averaged, just grouped together to a new series..
例如我试过:
ts.asfreq(BDay())
--> 查找工作日,但每天取平均值ts.resample()
--> 你必须定义'how' (mean, max, min...)ts.groupby(lambda x : x.weekday)
--> 也不是!ts = pd.Series(df, index = pd.bdate_range(start = '2013/01/01 00:00:00', end = '2013/01/31 23:59:59',频率 = 'S'))
--> df 因为原始数据是 DataFramem.使用 pd.bdate_range 并没有帮助,因为 df 和 index 必须在同一维度中..
ts.asfreq(BDay())
--> finds the business day but averages over each dayts.resample()
--> you must define 'how' (mean, max, min...)ts.groupby(lambda x : x.weekday)
--> not either!ts = pd.Series(df, index = pd.bdate_range(start = '2013/01/01 00:00:00', end = '2013/01/31 23:59:59' , freq = 'S'))
--> df since the original data is as DataFramem. Using pd.bdate_range didn't help since df and index must be in the same dimension..
我在 pandas 文档中搜索,谷歌搜索但找不到线索...
有人有想法吗?
I searched in pandas documentation, googled but could not find a clue...
Does anybody have an idea?
非常感谢您的帮助!
谢谢!
附言我宁愿不使用循环,因为我的数据集非常大......(我还有其他月份要分析)
p.s I would rather not use loops for that, since my data set is very large... (I have also other months to analyse)
推荐答案
不幸的是这有点慢,但至少应该给出你正在寻找的答案.
Unfortunately this is a little slow, but should at least give the answer you are looking for.
#create an index of just the date portion of your index (this is the slow step)
ts_days = pd.to_datetime(ts.index.date)
#create a range of business days over that period
bdays = pd.bdate_range(start=ts.index[0].date(), end=ts.index[-1].date())
#Filter the series to just those days contained in the business day range.
ts = ts[ts_days.isin(bdays)]
这篇关于使用 Python/Pandas 提取时间序列中的工作日的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持编程学习网!
本文标题为:使用 Python/Pandas 提取时间序列中的工作日
基础教程推荐
- 如何在Python中绘制多元函数? 2022-01-01
- 症状类型错误:无法确定关系的真值 2022-01-01
- 使 Python 脚本在 Windows 上运行而不指定“.py";延期 2022-01-01
- 使用Python匹配Stata加权xtil命令的确定方法? 2022-01-01
- 哪些 Python 包提供独立的事件系统? 2022-01-01
- 将 YAML 文件转换为 python dict 2022-01-01
- Python 的 List 是如何实现的? 2022-01-01
- 合并具有多索引的两个数据帧 2022-01-01
- 使用 Google App Engine (Python) 将文件上传到 Google Cloud Storage 2022-01-01
- 如何在 Python 中检测文件是否为二进制(非文本)文 2022-01-01