Batch Dumping Statistics Delta

Background Recently, we have been tackling the challenge of supporting 3 million tables within a single TiDB cluster. One of the most significant hurdles we’ve faced is optimizing the performance of statistics collection. In its current implementation, TiDB gathers basic table information from all servers and consolidates it into a single system table. While functional, this approach becomes highly inefficient when managing millions of tables, consuming excessive CPU and taking a considerable amount of time....

December 14, 2024 · 8 min · Rustin liu

Simplifying TiDB Statistics Collection: Unifying Concurrency Controls

Background TiDB provides the analyze table table_name command to generate statistics for tables. When analyzing partitioned tables, TiDB processes each partition independently and in parallel. Once analysis completes, TiDB aggregates the individual partition statistics into a single global statistics object for the entire table. The analyze table command has two concurrency-related parameters: tidb_build_stats_concurrency: Determines how many partitions can be processed simultaneously during statistics collection tidb_analyze_partition_concurrency: Controls parallel workers for saving partition statistics However, these parameters have several drawbacks:...

November 18, 2024 · 6 min · Rustin liu