Performance of Inserts on Partitions – MySQL 5.6 v/s MySQL 5.7

Recently, I was discussing with one of my colleagues about how insert statement performs for MySQL partitioned tables. General prediction is that it should be slower than for non-partitioned tables, but how much that we didn’t know. So, I thought let’s test with different types of partitions (i.e range, list and hash) and also with different number of partitions and check how’s performance. As people says, MySQL 5.7 is must faster than old one, so I also tested partitions with it.

So, I took simple table with 3.2M records on Centos 6.7 VM (4 core with 2GB RAM) with default my.cnf settings and then created tables for range, list and hash partitioning with 5,25,50 and 100 partitions. i.e with 5 partition (range and list), the table structures were like

CREATE TABLE emp_range_5(
id int,
fname varchar (30),
lname varchar (30),
hired_date date not null,
separated_date date not null,
job_code int,
store_id int
)
PARTITION BY RANGE (store_id) (
PARTITION p0 VALUES LESS THAN (5),
PARTITION p1 VALUES LESS THAN (25),
PARTITION p2 VALUES LESS THAN (50),
PARTITION p3 VALUES LESS THAN (75),
PARTITION p4 VALUES LESS THAN MAXVALUE
);
CREATE TABLE emp_list_5(
id int, 
fname varchar(30),
lname varchar(30),
hired_date date not null,
separated_date date not null,
job_code int,
store_id int
)
PARTITION BY LIST(store_id) (
PARTITION pNorth VALUES IN (3,5,6,9,17,21,22,23,24,25,26,27,28,29,30,31),
PARTITION pEast VALUES IN (1,2,10,11,19,20,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50),
PARTITION pWest VALUES IN (4,12,13,14,18,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70),
PARTITION pSouth VALUES IN (86,87,88,89,90,91,92,93,94,95,96,97,98,99,100),
PARTITION pCentral VALUES IN (7,8,15,16,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85)
);

One by one, I have inserted 3.2M records to each table with different number of partitions and types like,

mysql> INSERT INTO emp_range_5 SELECT * FROM emp_new;
Query OK, 3276800 rows affected (23.15 sec)
Records: 3276800 Duplicates: 0 Warnings: 0

and get the below result.

p4

p2

p3

If we check MySQL 5.6 test result, range type is better than list and hash partitions, but when we add more and more partitions it’s taking more time to insert records than into non-partitioned table. So at least this test proves that “Partitions are slow”. Here, partitioned tables are slower than non-partitioned tables for inserts because different types of partitioning and number of partitions add different overhead. With huge tables, opposite trend may appear where partitions are fast because insert into one huge b-tree can be slower than search and insert into 100 times smaller b-tree. Need to do more research on this.

If we compare MySQL 5.6 test result with 5.7, surprisingly it’s more slow but non-partition table is faster than with MySQL 5.6. So we can consider that 5.7 is faster than 5.6 for non-partition tables but not the ones with partitions.

For MySQL 5.6, I was bit curious about why Inserts are slow with partitions so I’ve tried to figure out with different ways. First I’ve checked profile data from Performance_Schema (P_S), i.e. for below query

mysql> INSERT INTO emp_range_5 SELECT * FROM emp_new;
Query OK, 3276800 rows affected (23.12 sec)
Records: 3276800 Duplicates: 0 Warnings: 0
mysql> SELECT EVENT_ID, TRUNCATE(TIMER_WAIT/1000000000000,6) as Duration, SQL_TEXT FROM performance_schema.events_statements_history_long WHERE SQL_TEXT like '%emp%';
+----------+-----------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------+
| EVENT_ID | Duration | SQL_TEXT |
+----------+-----------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------+
| 256 | 0.000328 | SELECT EVENT_ID, TRUNCATE(TIMER_WAIT/1000000000000,6) as Duration, SQL_TEXT FROM performance_schema.events_statements_history_long WHERE SQL_TEXT like '%emp%' |
| 333 | 22.837071 | INSERT INTO emp_range_5 SELECT * FROM emp_new |
+----------+-----------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------+
2 rows in set (0.02 sec)
mysql> SELECT event_name AS Stage, TRUNCATE(TIMER_WAIT/1000000000000,6) AS Duration FROM performance_schema.events_stages_history_long WHERE NESTING_EVENT_ID=333;
+--------------------------------+-----------+
| Stage | Duration |
+--------------------------------+-----------+
| stage/sql/init | 0.000052 |
| stage/sql/checking permissions | 0.000003 |
| stage/sql/checking permissions | 0.000001 |
| stage/sql/Opening tables | 0.001760 |
| stage/sql/init | 0.001701 |
| stage/sql/System lock | 0.000017 |
| stage/sql/optimizing | 0.000004 |
| stage/sql/statistics | 0.000010 |
| stage/sql/preparing | 0.000008 |
| stage/sql/executing | 0.000000 |
| stage/sql/Sending data | 23.064697 |
| stage/sql/end | 0.000027 |
| stage/sql/query end | 0.057433 |
| stage/sql/closing tables | 0.000024 |
| stage/sql/freeing items | 0.001319 |
| stage/sql/logging slow query | 0.000001 |
| stage/sql/cleaning up | 0.000002 |
+--------------------------------+-----------+
17 rows in set (0.01 sec)

So “Sending data” is main task which is taking more time and that is expected as we are doing “insert into …select * from”. But why? Then I have tried to check in P_S  if we really have any waits instrumented in detailed enough level so it can apply to P_S for partitioning and we can check. I ran below query.

mysql> select event_name, count_star, sum_timer_wait
-> from performance_schema.events_waits_summary_global_by_event_name
-> where count_star > 0
-> order by sum_timer_wait desc limit 10;
+--------------------------------------+------------+-------------------+
| event_name | count_star | sum_timer_wait |
+--------------------------------------+------------+-------------------+
| idle | 550 | 39826303901000000 |
| wait/io/table/sql/handler | 78522954 | 330357979232652 |
| wait/io/file/innodb/innodb_data_file | 555776 | 251974879365594 |
| wait/io/file/innodb/innodb_log_file | 18619 | 61417239253993 |
| wait/io/file/myisam/dfile | 3125 | 407945499647 |
| wait/io/file/sql/FRM | 2788 | 238943294597 |
| wait/io/file/sql/partition | 643 | 46937210061 |
| wait/io/file/myisam/kfile | 336 | 33820372614 |
| wait/io/file/sql/pid | 3 | 9358099177 |
| wait/io/file/sql/casetest | 10 | 2381744687 |
+--------------------------------------+------------+-------------------+
10 rows in set (1.68 sec)

I got the details but I didn’t truncated that table before test so it might be the old details. So I truncate that table, run the insert test again and check but I didn’t get event “wait/io/file/sql/partition” in the list. When we drop or truncate, it shows otherwise not. After some more research I found that probably it is instrumented only for DDL commands, but not for DML 🙁

We clearly see that inserts are getting slow from the above statistics, but I was not able to see where time is spent more from P_S. So finally I moved to perf utility where we can monitor particular PID and get the details. In one session, I ran this command “perf record -p 3306” and started insert test from another session in mysql, and captured the perf.data for each statements. I compare few partition related entries along with IO and found what’s presented below.

With 5 partitions:
6.26% mysqld [kernel.kallsyms] [k] finish_task_switch
5.55% mysqld [kernel.kallsyms] [k] ioread32
1.77% mysqld [kernel.kallsyms] [k] iowrite32
0.30% mysqld mysqld [.] ha_partition::write_row(unsigned char*)
 0.24% mysqld mysqld [.] get_partition_id_range(partition_info*, unsigned int*, long long*)
 0.22% mysqld mysqld [.] select_insert::send_data(List<Item>&)
With 25 partitions:
7.14% mysqld [kernel.kallsyms] [k] finish_task_switch
6.56% mysqld [kernel.kallsyms] [k] ioread32
1.78% mysqld [kernel.kallsyms] [k] iowrite32
0.59% mysqld mysqld [.] get_partition_id_range(partition_info*, unsigned int*, long long*)
 0.26% mysqld mysqld [.] ha_partition::write_row(unsigned char*)
0.26% mysqld mysqld [.] select_insert::send_data(List<Item>&)
With 50 partitions:
6.75% mysqld [kernel.kallsyms] [k] finish_task_switch
6.23% mysqld [kernel.kallsyms] [k] ioread32
1.71% mysqld [kernel.kallsyms] [k] iowrite32
0.62% mysqld mysqld [.] get_partition_id_range(partition_info*, unsigned int*, long long*)
 0.30% mysqld mysqld [.] ha_partition::write_row(unsigned char*)
0.29% mysqld mysqld [.] select_insert::send_data(List<Item>&)
With 100 partitions:
6.45% mysqld [kernel.kallsyms] [k] ioread32
6.10% mysqld [kernel.kallsyms] [k] finish_task_switch
1.69% mysqld [kernel.kallsyms] [k] iowrite32
0.70% mysqld mysqld [.] get_partition_id_range(partition_info*, unsigned int*, long long*)
 0.43% mysqld mysqld [.] ha_partition::write_row(unsigned char*)
0.25% mysqld mysqld [.] select_insert::send_data(List<Item>&)

So it seems as and when number of partitions increases, there are more ranges to compare with for each record and so more CPU is spent on that.  While without partitions this cost will be saved. As in above result get_partition_id_range function was related to partition, I tried to check the code,

int get_partition_id_range(partition_info *part_info,
uint *part_id,
longlong *func_value)
...
/* Search for the partition containing part_func_value */
while (max_part_id > min_part_id)
{
loc_part_id= (max_part_id + min_part_id) / 2;
if (range_array[loc_part_id] <= part_func_value)
min_part_id= loc_part_id + 1;
else
max_part_id= loc_part_id;
}
loc_part_id= max_part_id;
*part_id= (uint)loc_part_id;
if (loc_part_id == max_partition &&
part_func_value >= range_array[loc_part_id] &&
!part_info->defined_max_value)
...
}

There is a loop. the more partitions, the more time might be spent there. It is non-linear search and midpoint so log iterations where n is no. of partitions. Now, if we check same code for the list function, It’s linear, no midpoint and searching from both the side.

int get_partition_id_list(partition_info *part_info,
uint32 *part_id,
longlong *func_value)
...
while (max_list_index >= min_list_index)
{
list_index= (max_list_index + min_list_index) >> 1;
list_value= list_array[list_index].list_value;
if (list_value < part_func_value)
min_list_index= list_index + 1;
else if (list_value > part_func_value)
{
if (!list_index)
goto notfound;
max_list_index= list_index - 1;
}
else
{
*part_id= (uint32)list_array[list_index].partition_id;
DBUG_RETURN(0);
}
}
...

This is probably the reason why list partitions are slower than range.

I’ve tried to check the same perf report for MySQL 5.7, but it seems it uses different functions. I wasn’t able to find get_partition_id_range and ha_partitions. While checking doc, found that:

“As of MySQL 5.7.6, InnoDB supports native partitioning. Previously, InnoDB relied on the ha_partition handler, which creates a handler object for each partition. With native partitioning, a partitioned InnoDB table uses a single partition-aware handler object. This enhancement reduces the amount of memory required for partitioned InnoDB tables “

http://dev.mysql.com/doc/refman/5.7/en/mysql-nutshell.html

Instead I got this function,

 1.14% mysqld mysqld [.] row_insert_for_mysql_using_ins_graph(unsigned char const*, row_prebuilt_t*)

This function must have some explanation but I already covered many things on this post. I don’t want to make it more lengthy. Seems, I’ll have to make part II to understand why the partitions are slow in MySQL 5.7 compared to 5.6.

Conclusion: during this research I found that partitioned tables are slower comparing to non-partitioned tables for INSERTs, and list and hash are slower partition types (with the same number of partitions) comparing to range partitioning. But few questions are still open like:

  1. Are partitioned tables really always slow for INSERTs? Might be my test was done within limited environment and result can be different with huge tables and large environment.
  2. What changes has been done in MySQL 5.7 that made it noticeably more slow with partitions comparing to MySQL 5.6?

4 thoughts on “Performance of Inserts on Partitions – MySQL 5.6 v/s MySQL 5.7”

  1. Interesting analysis, what you have discovered is pretty normal.
    Hash partitions is O(1) in search for partition,
    range partition is O(log N) in search for partition
    and list partitions are O(N) in search for partition.

    One of the main reasons to use partitioning is for extremely
    large tables. In this case you can decrease the amount of
    disk lookups when inserting data by splitting table into
    multiple partitions. For small tables partitioning has no
    special benefits other than that it is very easy drop partitions
    and it can improve some of the scanning queries.

    1. Agree with you Mikael. I was more surprise to see that Partitions are even slow in MySQL 5.7 where I was expected it to be improve than MySQL 5.6

  2. Some nice info here. Just wanted to help out a little with the P_S info above.

    The wait/io/file/sql/partition event is for file IO around the .par file that gets created with partitions.

    As you’re comfortable around the code, you can check for the keys here for the file IO types:

    https://github.com/mysql/mysql-server/blob/mysql-5.6.31/sql/mysqld.cc#L9708

    The key_file_partition key is found around where we create the par files (DDL CREATE TABLE):

    https://github.com/mysql/mysql-server/blob/mysql-5.6.31/sql/ha_partition.cc#L2617

    When we open and read the contents of the file (probably DML, like reading frm files for instance, when the table is opened, though does happen during DDL too):

    https://github.com/mysql/mysql-server/blob/mysql-5.6.31/sql/ha_partition.cc#L2795

    And when we delete or rename it (DDL with DROP/RENAME/ALTER):

    https://github.com/mysql/mysql-server/blob/mysql-5.6.31/sql/ha_partition.cc#L658

    In any case, it is 49ms vs the 330 seconds spent doing table IO (wait/io/table/sql/handler – which is essentially what the “Sending data” stage tracks, as well), so not really the thing to focus on from the stats anyway.

    Turning on all instruments (UPDATE performance_schema.setup_instruments SET enabled = ‘YES’, timed = ‘YES’) may show further down in InnoDB where it’s spending some other time on internal sync points etc., but it still may not find the level you do when looking at perf (simply because not every function call is instrumented itself).

    A note on the test.

    Whilst the raw numbers of a single thread, large import, is interesting (what you are essentially testing above), I think another benefit also comes with concurrent access to those partitions from many sessions, all writing small bits of data, not large chunks, and as my colleague Mikael notes above, with large amounts of data.

    sysbench would let you test the time do these inserts across multiple threads pretty easily, and would be a really interesting benchmark to see on top of this, if you’re interested. 🙂

Leave a Reply

Your email address will not be published. Required fields are marked *