percentile by COUNT(DISTINCT) with correlated WHERE only works with a view (or without DISTINCT)(COUNT(DISTINCT) 的百分位数与相关 WHERE 仅适用于视图(或没有 DISTINCT))
问题描述
我遇到了一个奇怪的问题,我不知道是我的语法(看起来很简单)还是错误(或者只是不受支持).
I've got a weird one, and I don't know if it's my syntax (which seems straightforward) or a bug (or just unsupported).
这是我的查询有效但不必要的慢:
Here's my query that works but is needlessly slow:
UPDATE table1
SET table1column1 =
(SELECT COUNT(DISTINCT table2column1) FROM table2view WHERE table2column1 <= (SELECT table2column1 FROM table2 WHERE table2.id = table1.id) )
/
(SELECT COUNT(DISTINCT table2column1) FROM table2)
+ (SELECT COUNT(DISTINCT table2column2) FROM table2view WHERE table2column2 <= (SELECT table2column2 FROM table2 WHERE table2.id = table1.id) )
/
(SELECT COUNT(DISTINCT table2column2) FROM table2)
+ (SELECT COUNT(DISTINCT table2column3) FROM table2view WHERE table2column3 <= (SELECT table2column3 FROM table2 WHERE table2.id = table1.id) )
/ (SELECT COUNT(DISTINCT table2column3) FROM table2);
它只是去除重复项后的三个百分位数(table2column1、table2column2 和 table2column3)的总和.
It's just the sum of three percentiles (of table2column1, table2column2, and table2column3) with duplicates removed.
这就是奇怪的地方.我必须使用一个视图来处理带有 WHERE
的子查询,否则它只会 UPDATE
table1
的第一行,并设置其余行的 table1column1
到 0
.table2view
与 table2
完全相同.是的,很奇怪.
Here's where it gets weird. I have to use a view for this to work on the subquery with the WHERE
or it will only UPDATE
the first row of table1
, and set the rest of the rows' table1column1
to 0
. That table2view
is an exact duplicate of table2
. Yeah, weird.
如果我不使用 DISTINCT
,我可以在没有视图的情况下完成.那有意义吗?注意:我必须有 DISTINCT
因为我有很多重复项.
If I don't use DISTINCT
, I can do it without the view. Does that make sense? Note: I have to have DISTINCT
because I have lots of duplicates.
我尝试仅从视图中创建 SELECT
,但这会减慢速度.
I tried making it SELECT
only from the view, but that slowed it down worse.
有谁知道问题是什么以及重新处理这个查询的最佳方法,所以它不会花费这么长时间?它在 TRIGGER
中,并且更新的数据非常随需.
Does anyone know what the problem is and the best way to rework this query so it doesn't take so long? It's in a TRIGGER
, and the updated data is pretty on demand.
非常感谢!
详情
我正在 phpMyAdmin 的命令行中测试速度.
I'm testing the speed in phpMyAdmin's command line.
我很确定降级来自视图,因为视图越多,我使用的实际表越少,它变得越慢.
I'm pretty sure the degradation is coming from the view since the more of the view and the less of the actual table I use, the slower it gets.
当我在没有 DISTINCT
的情况下做一个时,它的速度非常快.
When I do the one without DISTINCT
, it's lightning fast.
仅适用于视图?
好的,所以我只是设置了 table2
的副本.我首先尝试执行原始查询,用副本替换视图.不行.
OK, so I just set up a copy of table2
. I tried first to do the original query substituting the view with the copy. No go.
我尝试使用副本而不是视图执行以下查询.不行.
I tried to do the query below with the copy instead of the view. No go.
希望这些常量的引入能更好地展示我正在尝试做的事情.
Hopefully the introduction of these constants will better show what I'm trying to do.
SET @table2column1_distinct_count = (SELECT COUNT(DISTINCT table2column1) FROM table2);
SET @table2column2_distinct_count = (SELECT COUNT(DISTINCT table2column2) FROM table2);
SET @table2column3_distinct_count = (SELECT COUNT(DISTINCT table2column3) FROM table2);
UPDATE table1, table2
SET table1.table1column1 = (SELECT COUNT(DISTINCT table2column1) FROM table2view WHERE table2column1 <= table2.table2column1) / @table2column1_distinct_count
+ (SELECT COUNT(DISTINCT table2column2) FROM table2view WHERE table2column2 <= table2.table2column2) / @table2column2_distinct_count
+ (SELECT COUNT(DISTINCT table2column3) FROM table2view WHERE table2column3 <= table2.table2column3) / @table2column3_distinct_count
WHERE table1.id = table2.id;
同样,当我使用 table2
而不是 table2view
时,它只会正确更新第一行并设置所有其他行的 table1.table1column1 = 0代码>.
Again, when I use table2
instead of the table2view
, it only updates the first row properly and sets all other rows' table1.table1column1 = 0
.
数学
我正在尝试将 table1.table1column1
= 设置为 table2column1
、table2column2
和 table2column3 的百分位数之和
通过 id
.
I'm trying to set table1.table1column1
= to the sum of the percentiles of table2column1
, table2column2
, and table2column3
by id
.
我通过(计算 table2columnX
的不同值 <= 到当前 table2columnX
)/(不同 table2columnX 的总计数
s).
I do a percentile by (counting the distinct values of a table2columnX
<= to the current table2columnX
) / (the total count of distinct table2columnX
s).
我使用 DISTINCT
来去除过多的重复.
I use DISTINCT
to get rid of the excessive duplicates.
查看
这是视图的 SELECT
.这有帮助吗?
Here's the SELECT
for the view. Does this help?
CREATE VIEW myTable.table2view AS SELECT
table2.table2column1 AS table2column1,
table2.table2column2 AS table2column2,
table2.table2column2 AS table2column3,
FROM table2
GROUP BY table2.id;
视图的 SELECT
中的 GROUP BY
是否有什么特别之处使这项工作(我没有看到)?
Is there something special about the GROUP BY
in the view's SELECT
that makes this work (that I'm not seeing)?
推荐答案
我可能会说查询很慢,因为它在触发器触发时重复访问表.
I would probably say that the query is slow because it is repeatedly accessing the table when the trigger fires.
我不是 SQL 专家,但我曾尝试使用临时表组合查询.您可以查看它是否有助于加快查询速度.我在下面的代码示例中使用了不同但发音相似的列名称.
I am no SQL expert but I have tried to put together a query using temporary tables. You can see if it helps speed up the query. I have used different but similar sounding column names in my code sample below.
在我之前的代码中有一个计算错误.现已更新.
EDIT : There was a calculation error in my earlier code. Updated now.
SELECT COUNT(id) INTO @no_of_attempts from tb2;
-- DROP TABLE IF EXISTS S1Percentiles;
-- DROP TABLE IF EXISTS S2Percentiles;
-- DROP TABLE IF EXISTS S3Percentiles;
CREATE TEMPORARY TABLE S1Percentiles (
s1 FLOAT NOT NULL,
percentile FLOAT NOT NULL DEFAULT 0.00
);
CREATE TEMPORARY TABLE S2Percentiles (
s2 FLOAT NOT NULL,
percentile FLOAT NOT NULL DEFAULT 0.00
);
CREATE TEMPORARY TABLE S3Percentiles (
s3 FLOAT NOT NULL,
percentile FLOAT NOT NULL DEFAULT 0.00
);
INSERT INTO S1Percentiles (s1, percentile)
SELECT A.s1, ((COUNT(B.s1)/@no_of_attempts)*100)
FROM (SELECT DISTINCT s1 from tb2) A
INNER JOIN tb2 B
ON B.s1 <= A.s1
GROUP BY A.s1;
INSERT INTO S2Percentiles (s2, percentile)
SELECT A.s2, ((COUNT(B.s2)/@no_of_attempts)*100)
FROM (SELECT DISTINCT s2 from tb2) A
INNER JOIN tb2 B
ON B.s2 <= A.s2
GROUP BY A.s2;
INSERT INTO S3Percentiles (s3, percentile)
SELECT A.s3, ((COUNT(B.s3)/@no_of_attempts)*100)
FROM (SELECT DISTINCT s3 from tb2) A
INNER JOIN tb2 B
ON B.s3 <= A.s3
GROUP BY A.s3;
-- select * from S1Percentiles;
-- select * from S2Percentiles;
-- select * from S3Percentiles;
UPDATE tb1 A
INNER JOIN
(
SELECT B.tb1_id AS id, (C.percentile + D.percentile + E.percentile) AS sum FROM tb2 B
INNER JOIN S1Percentiles C
ON B.s1 = C.s1
INNER JOIN S2Percentiles D
ON B.s2 = D.s2
INNER JOIN S3Percentiles E
ON B.s3 = E.s3
) F
ON A.id = F.id
SET A.sum = F.sum;
-- SELECT * FROM tb1;
DROP TABLE S1Percentiles;
DROP TABLE S2Percentiles;
DROP TABLE S3Percentiles;
这样做是为了记录每个分数组的百分位数,然后最后只用必要的数据更新 tb1
列,而不是重新计算每个学生行的百分位数.
What this does is that it records the percentile for each score group and then finally just updates the tb1
column with the requisite data instead of recalculating the percentile for each student row.
您还应该索引 s1
、s2
和 s3
列以优化对这些列的查询.
You should also index columns s1
, s2
and s3
for optimizing the queries on these columns.
注意:请根据您的数据库架构更新列名称.另请注意,每个百分位数计算都乘以 100
,因为我相信百分位数通常是这样计算的.
Note: Please update the column names according to your db schema. Also note that each percentile calculation has been multiplied by 100
as I believe that percentile is usually calculated that way.
这篇关于COUNT(DISTINCT) 的百分位数与相关 WHERE 仅适用于视图(或没有 DISTINCT)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持编程学习网!
本文标题为:COUNT(DISTINCT) 的百分位数与相关 WHERE 仅适用于视图(或没有 DISTINCT)
基础教程推荐
- 无法在 ubuntu 中启动 mysql 服务器 2021-01-01
- 将数据从 MS SQL 迁移到 PostgreSQL? 2022-01-01
- ERROR 2006 (HY000): MySQL 服务器已经消失 2021-01-01
- SQL Server 中单行 MERGE/upsert 的语法 2021-01-01
- 使用pyodbc“不安全"的Python多处理和数据库访问? 2022-01-01
- 在 VB.NET 中更新 SQL Server DateTime 列 2021-01-01
- 如何在 SQL Server 的嵌套过程中处理事务? 2021-01-01
- SQL Server 2016更改对象所有者 2022-01-01
- SQL Server:只有 GROUP BY 中的最后一个条目 2021-01-01
- Sql Server 字符串到日期的转换 2021-01-01