TSQL Time Series Pattern Data Mining(TSQL 时间序列模式数据挖掘)
问题描述
以包含以下 3 个字段的 SQL 表为例:
Take a SQL table with the following 3 fields:
Id,TimeStamp,Item,UserId
我想确定会话中 UserId
最常见的 Item
序列.会话将简单地由时间阈值定义(即,如果 X 分钟内没有完整内容,则未来的任何条目都将被分组到一个新会话中).
I would like to determine the most common sequences of Item
for a UserId
in a session. A session would simply be defined by a threshold of time (i.e. if there are no entires for X minutes, any future entries would be grouped into a new session).
理想情况下,项目序列可以有一种模糊分组,其中序列中的一个或两个差异仍然可以被视为相同并组合在一起.
Ideally, the sequence of Items could have a sort of fuzzy grouping where one or two differences in the sequence could still be counted as the same and grouped together.
有人知道我如何在 SQL 中解决这个问题吗?
Anyone know how I might tackle this problem in SQL?
更新:
为了澄清,让我们假设 Items 是杂货店岛.我有一个月的人去杂货店.基本问题是人们使用什么岛以及它的顺序是什么.他们最常去的是1,2,3
还是1,2,1,3,4
?
(现在我很好奇用户在我们网站上的路径,但你知道,杂货店更直观).
(Right now I am curious about paths of users on our sites, but you know, grocery store is more visual).
更新 2:
这是一个简单的案例:
Update 2:
Here is a simple case:
CREATE Table #StoreActivity
(
id int,
CreationDate datetime ,
Isle int,
UserId int
)
Insert INTO #StoreActivity
Values
(1, CAST('12-1-2011 03:10:01' AS Datetime), 1, 2222),
(2, CAST('12-1-2011 03:10:07' AS Datetime), 1, 1111),
(3, CAST('12-1-2011 03:10:12' AS Datetime), 2, 2222),
(4, CAST('12-1-2011 04:10:01' AS Datetime), 1, 2222),
(5, CAST('12-1-2011 04:10:23' AS Datetime), 2, 2222)
Select * from #StoreActivity
DROP Table #StoreActivity
/* So with the above data, we have 2 sequences if we declare a session or visit dead if there is no activity for a minute : `1,2` (With a count of 2), and `1` (with a count of 1)*/
推荐答案
WITH q AS
(
SELECT *,
ROW_NUMBER() OVER (PARTITION BY UserId ORDER BY TimeStamp, Id) AS rn,
ROW_NUMBER() OVER (PARTITION BY UserId, Item ORDER BY TimeStamp, Id) AS rnd
FROM mytable
)
SELECT *,
rnd - rn AS sequence
FROM q
sequence
列将在给定 UserId
的序列中的所有记录之间共享.您可以对其进行分组或做任何您喜欢的事情.
The sequence
column will be shared among all records in a sequence for a given UserId
. You can group on it or do whatever you like.
这篇关于TSQL 时间序列模式数据挖掘的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持编程学习网!
本文标题为:TSQL 时间序列模式数据挖掘
基础教程推荐
- 无法在 ubuntu 中启动 mysql 服务器 2021-01-01
- 如何在 SQL Server 的嵌套过程中处理事务? 2021-01-01
- SQL Server 2016更改对象所有者 2022-01-01
- ERROR 2006 (HY000): MySQL 服务器已经消失 2021-01-01
- SQL Server 中单行 MERGE/upsert 的语法 2021-01-01
- SQL Server:只有 GROUP BY 中的最后一个条目 2021-01-01
- 使用pyodbc“不安全"的Python多处理和数据库访问? 2022-01-01
- 将数据从 MS SQL 迁移到 PostgreSQL? 2022-01-01
- Sql Server 字符串到日期的转换 2021-01-01
- 在 VB.NET 中更新 SQL Server DateTime 列 2021-01-01