Large primary key: 1+ billion rows MySQL + InnoDB?(大主键:1+ 十亿行 MySQL + InnoDB?)
问题描述
我想知道 InnoDB 是否是格式化表格的最佳方式?该表包含一个字段,即主键,该表每天(估计)将获得 816k 行.这将很快变得非常大!我正在研究文件存储方式(这样会更快吗)?该表将存储已处理的 Twitter Id 的 ID 号?
I was wondering if InnoDB would be the best way to format the table? The table contains one field, primary key, and the table will get 816k rows a day (est.). This will get very large very quick! I'm working on a file storage way (would this be faster)? The table is going to store ID numbers of Twitter Ids that have already been processed?
此外,在 SELECT min('id')
语句中是否有任何估计的内存使用量?非常感谢任何其他想法!
Also, any estimated memory usage on a SELECT min('id')
statement? Any other ideas are greatly appreciated!
推荐答案
唯一确定的答案是同时尝试并测试,看看会发生什么.
The only definitive answer is to try both and test and see what happens.
通常,MyISAM 的写入和读取速度更快,但不能同时进行.当您写入 MyISAM 表时,整个表都会被锁定以完成插入.InnoDB 有更多的开销,但使用行级锁定,因此读写可以并发发生,而不会出现 MyISAM 的表锁定带来的问题.
Generally, MyISAM is faster for writes and reads, but not both at the same time. When you write to a MyISAM table the entire table gets locked for the insert to complete. InnoDB has more overhead but uses row-level locking so that reads and writes can happen concurrently without the problems that MyISAM's table locking incurs.
但是,如果我理解正确的话,您的问题有点不同.只有一列,作为主键的列在 MyISAM 和 InnoDB 处理主键索引的不同方式中具有重要的考虑.
However, your problem, if I understand it correctly, is a little different. Having only one column, that column being a primary key has an important consideration in the different ways that MyISAM and InnoDB handle primary key indexes.
在 MyISAM 中,主键索引就像任何其他二级索引一样.在内部,每一行都有一个行 ID,索引节点只指向数据页的行 ID.主键索引的处理方式与任何其他索引没有区别.
In MyISAM, the primary key index is just like any other secondary index. Internally each row has a row id and the index nodes just point to the row ids of the data pages. A primary key index is not handled differently than any other index.
然而,在 InnoDB 中,主键是集群的,这意味着它们保持附加到数据页并确保行内容根据主键保持在磁盘上的物理排序顺序(但仅在单个数据页内,它们本身可以以任何顺序分散.)
In InnoDB, however, primary keys are clustered, meaning they stay attached to the data pages and ensure that the row contents remain in physically sorted order on disk according to the primary key (but only within single data pages, which themselves could be scattered in any order.)
在这种情况下,我希望 InnoDB 可能具有优势,因为 MyISAM 本质上必须做双重工作——在数据页中写入一次整数,然后在索引页中再次写入.InnoDB 不会这样做,主键索引将与数据页相同,并且只需写入一次.它只需要在一个地方管理数据,而 MyISAM 将不必要地管理两个副本.
This being the case, I would expect that InnoDB might have an advantage in that MyISAM would essentially have to do double work -- write the integer once in the data pages, and then write it again in the index pages. InnoDB wouldn't do this, the primary key index would be identical to the data pages, and would only have to write once. It would only have to manage the data in one place, where MyISAM would needlessly have to manage two copies.
对于任一存储引擎,在索引列上执行 min() 或 max() 之类的操作应该是微不足道的,或者只是检查索引中是否存在数字.由于该表只有一列,因此甚至不需要书签查找,因为数据将完全在索引本身内表示.这应该是一个非常有效的索引.
For either storage engine, doing something like min() or max() should be trivial on an indexed column, or just checking the existence of a number in the index. Since the table is only one column no bookmark lookups would even be necessary as the data would be represented entirely within the index itself. This should be a very efficient index.
我也不会太担心桌子的大小.当一行的宽度只有一个整数时,每个索引/数据页可以容纳大量的行.
I also wouldn't be all that worried about the size of the table. Where the width of a row is only one integer, you can fit a huge number of rows per index/data page.
这篇关于大主键:1+ 十亿行 MySQL + InnoDB?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持编程学习网!
本文标题为:大主键:1+ 十亿行 MySQL + InnoDB?
基础教程推荐
- SQL Server 中单行 MERGE/upsert 的语法 2021-01-01
- 在 VB.NET 中更新 SQL Server DateTime 列 2021-01-01
- SQL Server 2016更改对象所有者 2022-01-01
- Sql Server 字符串到日期的转换 2021-01-01
- 无法在 ubuntu 中启动 mysql 服务器 2021-01-01
- 如何在 SQL Server 的嵌套过程中处理事务? 2021-01-01
- 使用pyodbc“不安全"的Python多处理和数据库访问? 2022-01-01
- 将数据从 MS SQL 迁移到 PostgreSQL? 2022-01-01
- ERROR 2006 (HY000): MySQL 服务器已经消失 2021-01-01
- SQL Server:只有 GROUP BY 中的最后一个条目 2021-01-01