在连接时查询大型数据集(15+ 百万行)-数据库问题

Querying Large Dataset on Join (15+ million rows)(在连接时查询大型数据集(15+ 百万行))

本文介绍了在连接时查询大型数据集(15+ 百万行)的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试加入两个表，products 和 products_markets.products 不到一百万条记录，而 product_markets 接近 2000 万条记录.数据已更改，因此架构创建表中可能有一两个错字:

I am trying to join two tables, products and products_markets. While products is under a million records, product_markets is closer to 20 million records. The data has been changed so there might be a typo or two in the schema create tables:

CREATE TABLE `products_markets` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `product_id` int(10) unsigned NOT NULL,
  `country_code_id` int(10) unsigned NOT NULL,
  `created_at` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
  `updated_at` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
  PRIMARY KEY (`id`),
  UNIQUE KEY `unique_index` (`product_id`,`country_code_id`)
) ENGINE=InnoDB AUTO_INCREMENT=21052102 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;

CREATE TABLE `products` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `manufacturer_id` int(10) unsigned NOT NULL,
  `department_id` int(10) unsigned NOT NULL,
  `code` varchar(100) COLLATE utf8mb4_unicode_ci NOT NULL,
  `popularity` int(11) DEFAULT NULL,
  `name` varchar(255) COLLATE utf8mb4_unicode_ci NOT NULL,
  `value` bigint(20) unsigned NOT NULL,
  `created_at` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
  `updated_at` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
  PRIMARY KEY (`id`),
  UNIQUE KEY `products_code_unique` (`code`),
  KEY `products_department_id_foreign` (`department_id`),
  KEY `products_manufacturer_id_foreign` (`manufacturer_id`),
  CONSTRAINT `products_department_id_foreign`
       FOREIGN KEY (`department_id`) REFERENCES `departments` (`id`),
  CONSTRAINT `products_manufacturer_id_foreign`
       FOREIGN KEY (`manufacturer_id`) REFERENCES `manufacturers` (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=731563 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;

我正在尝试返回 50 条特定国家/地区最受欢迎产品的记录，我遇到的时间约为 50 秒，这似乎比预期的要长.

I am trying to return 50 records of the most popular products available in a specific country and I am running into times around ~50 seconds which seems higher than expected.

我尝试了几个不同的查询，但没有成功:

I've tried a few different queries with no success:

select  `products_markets`.`product_id`
    from  products_markets
    left join  
        ( SELECT  products.id, products.popularity
            from  products
        ) p  ON p.id = products_markets.product_id
    where products_markets.country_code_id = 121
    order by  `popularity` desc, `p`.`id` asc
    limit  50

和

select  `products`.*
    from  `products`
    where  products.id in (
        SELECT  product_id
            from  products_markets
            where  products_markets.country_code_id = 121
                          )
    group by  `products`.`name`, `products`.`manufacturer_id`
    order by  `popularity` desc, `products`.`id` asc
    limit  50

这个查询的解释是:

id  select_type  table              type possible_keys key           key_len refs             rows              extra
1   PRIMARY      products           ALL  PRIMARY       NULL          NULL    NULL             623848            Using temporary; Using filesort
1   PRIMARY      products_markets   ref  unique_index  unique_index  4       main.products.id 14                Using where; Using index; FirstMatch(products)

我喜欢的一个选项是将 products_markets 拆分为每个国家/地区的单独表以减少查询.我尝试向服务器添加更多内存但没有取得多大成功.任何人都可以识别出数据库设计/查询有什么明显错误吗?

One option I am entertaining is splitting up products_markets into individual tables for each country to lessen the query. I've tried adding more memory to the server without much success. Can anyone identify anything glaringly wrong with the database design/query?

还有哪些其他选项可以使这个查询只占当前约 50 秒的一小部分?

What other options are available to make this query a fraction of its current ~50 seconds?