ORDER BY RAND() 和大表的问题

Problems with ORDER BY RAND() and big tables(ORDER BY RAND() 和大表的问题)

本文介绍了ORDER BY RAND() 和大表的问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

你好我今天早上问了一个问题,我意识到问题不是我在找的地方(这里是原始问题)

Hello I asked a question this morning, and I realized that the problem was not where I was looking (here the original question)

我有这个查询可以从地址簿中随机生成注册表.

I have this query to randomly generate registries from an address book.

SELECT * FROM address_book ab 
            WHERE 
            ab.source = "PB" AND 
            ab.city_id = :city_id AND 
            pb_campaign_id = :pb_campaign_id AND 
            ab.id NOT IN (SELECT address_book_id FROM calls WHERE calls.address_book_id = ab.id AND calls.status_id IN ("C","NO") OR (calls.status_id IN ("NR","OC") AND TIMESTAMPDIFF(MINUTE,calls.updated_at,NOW()) < 30))
            ORDER BY RAND()
            LIMIT 1';

但我注意到按 rand () 排序";花费超过 50 秒并使用高达 25-50% 的 CPU 和大表(100k +)所以我在这里寻找解决方案,但我没有找到任何有效的方法.注意:ids 不是自增的,可能会有差距

but I noticed that "order by rand ()" take more than 50s and use up to 25-50% CPU with large tables (100k +) so i looked for solutions here but i didn't find anything that worked. note: ids are not self-incrementing, there may be gaps

有什么想法吗?

推荐答案

我建议这样写:

SELECT *
FROM address_book ab 
WHERE ab.source = 'PB' AND 
      ab.city_id = :city_id AND 
      pb_campaign_id = :pb_campaign_id AND 
      NOT EXISTS (SELECT 1
                  FROM calls c
                  WHERE c.address_book_id = ab.id AND
                        ( c.status_id IN ('C', 'NO') OR
                         (c.status_id IN ('NR', 'OC') AND c.updated < now() - interval 30 minute)
                        ) 
                )

ORDER BY RAND()
LIMIT 1;

请注意,这会更改相关子查询中的逻辑,因此 c.address_book_id = ab.id 始终适用.我怀疑这是性能问题.

Note that this changes the logic in the correlated subquery so c.address_book_id = ab.id always applies. I suspect that is the issue with performance.

然后,在以下位置创建索引:

Then, create indexes on:

  • address_book(source, city_id, campaign_id, id)
  • 调用(address_book_id、status_id、更新)

我猜这足以提高性能.如果碰巧有无数行符合条件,那么 order by rand() 可能是个问题.

I am guessing that this will be sufficient to improve performance. If there happen to be a zillion rows that match the conditions, then the order by rand() might be an issue.

这篇关于ORDER BY RAND() 和大表的问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持编程学习网!

本文标题为:ORDER BY RAND() 和大表的问题

基础教程推荐