본문 바로가기

MySQL/MySQL Admin

pt-online-schema-change(pt-OSC)의 chunk_size_limit 함정

pt-osc를 많은 케이스에 쓰리라 생각이 됩니다. 저도 한때 많이 썼구요. 다만, pt-osc의 옵션에서 잘 못 알고 있는 케이스가 있어서 여기서 말하고자 합니다.

 

pt-osc의 큰 이점 중 하나는 alter를 실행시켰을때 progress가 나오면서 진행도를 알 수 있다는 것입니다. 제가 이 글에서 언급하려는 옵션을 잘 못 사용하게 되면 progress를 볼 수 없습니다. where 조건을 제대로 사용하지 못하기 때문이죠. where 조건을 사용하지 않으면 select를 한번에 해서 insert할때 문제가 되며 마지막 시점에 commit이 밀리는 등 application에 문제를 일으킬 수 있습니다. 실제로 대량의 트랜잭션이 있을 때, 15초까지 commit이 밀렸습니다. 

 

예제 pt-osc command

/backup/Eddie/order/pt-online-schema-change-eddie \
--alter "DROP COLUMN is_deleted, MODIFY order_typee VARCHAR(30) NOT NULL COMMENT '주문 유형'" D=choong,t=choong_order \
--no-drop-old-table \
--no-drop-new-table \
--recursion-method="none" \
--nocheck-replication-filters \
--chunk-size=10000 \
--chunk-size-limit=20000 \
--host="rds-aurora-choong-cluster.cluster-cxm9prevocey.ap-northeast-2.rds.amazonaws.com" \
--port=3306 \
--user=choong \
--password=###PASSWORD### \
--progress=time,30 \
--max-load="Threads_running=20" \
--chunk-index=PRIMARY \
--charset=UTF8 \
--execute

 

예제 pt-osc의 insert query

INSERT LOW_PRIORITY IGNORE INTO `choong`.`_choong_order_new` (`seq`, `order_id`, `created_at`) SELECT `seq`, `order_id`, `created_at` FROM `choong`.`choong_order` LOCK IN SHARE MODE /*pt-online-schema-change 19573 copy table*/

위 내용을 검증하기 위해 테스트를 진행했습니다.


원인분석을 위한 준비

- pt-osc version : pt-online-schema-change 3.0.10
- DB : MySQL 5.6
- target table : choong_task
- where 조건 없이 사용했을때 소요 시간 : 33 mins.

테스트 시나리오 준비

- 시나리오 1 : chunk-size : 10,000, chunk-size-limit : 20,000 (기존 option)
- 시나리오 2 : chunk-size : 500, chunk-size-limit : 600
- 시나리오 3 : chunk-size : 5,000, chunk-size-limit : 6,000
- 시나리오 4 : chunk-size : 1,000, chunk-size-limit : 2,000
- 시나리오 5 : chunk-time : 5, chunk-size : default(1,000), chunk-size-limit : default(4.0)


테스트 결과

시나리오 1: chunk-size : 10,000, chunk-size-limit : 20,000 (기존 option)

pt-osc command

pt-online-schema-change --alter "CHANGE COLUMN secondaryInfo3 secondaryInfo3 VARCHAR(5000) DEFAULT NULL COMMENT 'choong item data'" D=choong,t=choong_tasks \
--chunk-size=10000 \
--chunk-size-limit=20000 \
--host="localhost" \
--port=3306 \
--user=choong_admin \
--ask-pass --progress=time,30 \
--max-load="Threads_running=20" \
--chunk-index=PRIMARY \
--charset=UTF8 \
--execute
Enter MySQL password:
No slaves found.  See --recursion-method if host ip-172-23-0-37 has slaves.
Not checking slave lag because no slaves were found and --check-slave-lag was not specified.
# A software update is available:
Operation, tries, wait:
  analyze_table, 10, 1
  copy_rows, 10, 0.25
  create_triggers, 10, 1
  drop_triggers, 10, 1
  swap_tables, 10, 1
  update_foreign_keys, 10, 1
Altering `choong`.`choong_tasks`...
Creating new table...
Created new table choong._choong_tasks_new OK.
Altering new table...
Altered `choong`.`_choong_new` OK.
2018-06-11T14:40:08 Creating triggers...
2018-06-11T14:40:08 Created triggers OK.
2018-06-11T14:40:08 Copying approximately 19976976 rows...
 
# Exiting on SIGINT.
Not dropping triggers because the tool was interrupted.  To drop the triggers, execute:
DROP TRIGGER IF EXISTS `choong`.`pt_osc_choong_tasks_del`
DROP TRIGGER IF EXISTS `choong`.`pt_osc_choong_tasks_upd`
DROP TRIGGER IF EXISTS `choong`.`pt_osc_choong_tasks_ins`
`choong`.`choong_tasks` was not altered.

 

Test Result
- 여전히 progress는 나오지 않으며 where절없이 query가 실행됩니다.

시나리오 2 : chunk-size : 500, chunk-size-limit : 600

pt-osc command

pt-online-schema-change \
> --alter "CHANGE COLUMN secondaryInfo3 secondaryInfo3 VARCHAR(5000) DEFAULT NULL COMMENT 'Ondemand choong item data'" D=choong,t=choong_tasks \
> --chunk-size=500 \
> --chunk-size-limit=600 \
> --host="localhost" \
> --port=3306 \
> --user=astrolabe \
> --ask-pass \
> --progress=time,1 \
> --max-load="Threads_running=20" \
>  --chunk-index=PRIMARY \
>  --charset=UTF8 \
>  --execute
Enter MySQL password:
No slaves found.  See --recursion-method if host ip-172-23-0-37 has slaves.
Not checking slave lag because no slaves were found and --check-slave-lag was not specified.
Operation, tries, wait:
  analyze_table, 10, 1
  copy_rows, 10, 0.25
  create_triggers, 10, 1
  drop_triggers, 10, 1
  swap_tables, 10, 1
  update_foreign_keys, 10, 1
Altering `choong`.`choong_tasks`...
Creating new table...
Created new table choong.__choong_tasks_new OK.
Altering new table...
Altered `choong`.`__choong_tasks_new` OK.
2018-06-11T14:52:27 Creating triggers...
2018-06-11T14:52:28 Created triggers OK.
2018-06-11T14:52:28 Copying approximately 19976976 rows...
Copying `choong`.`choong_tasks`:   0% 41:36 remain
Copying `choong`.`choong_tasks`:   0% 41:35 remain
...
Copying `choong`.`choong_tasks`:   0% 43:02 remain
Copying `choong`.`choong_tasks`:   0% 43:04 remain
^C# Exiting on SIGINT.
Not dropping triggers because the tool was interrupted.  To drop the triggers, execute:
DROP TRIGGER IF EXISTS `choong`.`pt_osc_choong_tasks_del`
DROP TRIGGER IF EXISTS `choong`.`pt_osc_choong_tasks_upd`
DROP TRIGGER IF EXISTS `choong`.`pt_osc_choong_tasks_ins`
Not dropping the new table `choong`.`__choong_tasks_new` because the tool was interrupted.  To drop the new table, execute:
DROP TABLE IF EXISTS `choong`.`__choong_tasks_new`;
`choong`.`choong_tasks` was not altered.

 

Test Result
- where 조건을 사용하는 insert query가 생성되며 progress가 노출됐습니다.
- 단, 기존에 33분 걸리던 쿼리가 40분이 넘어가는 것을 볼 수 있습니다.

시나리오 3 : chunk-size : 5,000, chunk-size-limit : 6,000

pt-osc command

pt-online-schema-change \
> --alter "CHANGE COLUMN secondaryInfo3 secondaryInfo3 VARCHAR(5000) DEFAULT NULL COMMENT 'Ondemand choong item data'" D=choong,t=choong_tasks \
> --chunk-size=5000 \
> --chunk-size-limit=6000 \
> --host="localhost" \
> --port=3306 \
> --user=astrolabe \
> --ask-pass \
> --progress=time,1 \
> --max-load="Threads_running=20" \
> --chunk-index=PRIMARY \
> --charset=UTF8 \
> --execute
Enter MySQL password:
No slaves found. See --recursion-method if host ip-172-23-0-37 has slaves.
Not checking slave lag because no slaves were found and --check-slave-lag was not specified.
Operation, tries, wait:
 analyze_table, 10, 1
 copy_rows, 10, 0.25
 create_triggers, 10, 1
 drop_triggers, 10, 1
 swap_tables, 10, 1
 update_foreign_keys, 10, 1
Altering `choong`.`choong_tasks`...
Creating new table...
Created new table choong.____choong_tasks_new OK.
Altering new table...
Altered `choong`.`____choong_tasks_new` OK.
2018-06-11T14:58:06 Creating triggers...
2018-06-11T14:58:06 Created triggers OK.
2018-06-11T14:58:06 Copying approximately 19976976 rows...
^C^C# Exiting on SIGINT.
Not dropping triggers because the tool was interrupted. To drop the triggers, execute:
DROP TRIGGER IF EXISTS `choong`.`pt_osc_choong_tasks_del`
DROP TRIGGER IF EXISTS `choong`.`pt_osc_choong_tasks_upd`
DROP TRIGGER IF EXISTS `choong`.`pt_osc_choong_tasks_ins`
`choong`.`choong_tasks` was not altered.

 

Test Result
- progress가 나오지 않으며 where 조건을 사용하지 않은 쿼리가 실행됩니다.

시나리오 4 : chunk-size : 1,000, chunk-size-limit : 2,000

pt-osc command

pt-online-schema-change \
> --alter "CHANGE COLUMN secondaryInfo3 secondaryInfo3 VARCHAR(5000) DEFAULT NULL COMMENT 'Ondemand choong item data'" D=choong,t=choong_tasks \
> --chunk-size=1000 \
> --chunk-size-limit=2000 \
> --host="localhost" \
> --port=3306 \
> --user=astrolabe \
> --ask-pass \
> --progress=time,1 \
> --max-load="Threads_running=20" \
> --chunk-index=PRIMARY \
> --charset=UTF8 \
> --execute
Enter MySQL password:
No slaves found. See --recursion-method if host ip-172-23-0-37 has slaves.
Not checking slave lag because no slaves were found and --check-slave-lag was not specified.
Operation, tries, wait:
 analyze_table, 10, 1
 copy_rows, 10, 0.25
 create_triggers, 10, 1
 drop_triggers, 10, 1
 swap_tables, 10, 1
 update_foreign_keys, 10, 1
Altering `choong`.`choong_tasks`...
Creating new table...
Created new table choong._____choong_tasks_new OK.
Altering new table...
Altered `choong`.`_____choong_tasks_new` OK.
2018-06-11T14:59:45 Creating triggers...
2018-06-11T14:59:45 Created triggers OK.
2018-06-11T14:59:45 Copying approximately 19976976 rows...
Copying `choong`.`choong_tasks`: 0% 02:46:27 remain
Copying `choong`.`choong_tasks`: 0% 55:27 remain
 
Copying `choong`.`choong_tasks`: 4% 53:21 remain
^C# Exiting on SIGINT.
Not dropping triggers because the tool was interrupted. To drop the triggers, execute:
DROP TRIGGER IF EXISTS `choong`.`pt_osc_choong_tasks_del`
DROP TRIGGER IF EXISTS `choong`.`pt_osc_choong_tasks_upd`
DROP TRIGGER IF EXISTS `choong`.`pt_osc_choong_tasks_ins`
Not dropping the new table `choong`.`_____choong_tasks_new` because the tool was interrupted. To drop the new table, execute:
DROP TABLE IF EXISTS `choong`.`_____choong_tasks_new`;

 

Test Result
- where 조건을 사용하는 insert query가 processlist에 찍혔습니다.
- progress도 노출이 됐지만 완료까지 500,600보다 더 많은 시간이 걸립니다.

시나리오 5 : chunk-time : 5, chunk-size : default(1,000), chunk-size-limit : default(4.0)

pt-osc command

pt-online-schema-change \
> --alter "CHANGE COLUMN secondaryInfo3 secondaryInfo3 VARCHAR(5000) DEFAULT NULL COMMENT 'Ondemand choong item data'" D=choong,t=choong_tasks \
> --chunk-time=5 \
> --host="localhost" \
> --port=3306 \
> --user=astrolabe \
> --ask-pass \
> --progress=time,1 \
> --max-load="Threads_running=20" \
> --chunk-index=PRIMARY \
> --charset=UTF8 \
> --execute
Enter MySQL password:
No slaves found. See --recursion-method if host ip-172-23-0-37 has slaves.
Not checking slave lag because no slaves were found and --check-slave-lag was not specified.
Operation, tries, wait:
 analyze_table, 10, 1
 copy_rows, 10, 0.25
 create_triggers, 10, 1
 drop_triggers, 10, 1
 swap_tables, 10, 1
 update_foreign_keys, 10, 1
Altering `choong`.`choong_tasks`...
Creating new table...
Created new table choong.______choong_tasks_new OK.
Altering new table...
Altered `choong`.`______choong_tasks_new` OK.
2018-06-11T15:03:28 Creating triggers...
2018-06-11T15:03:28 Created triggers OK.
2018-06-11T15:03:28 Copying approximately 19976976 rows...
Copying `choong`.`choong_tasks`: 0% 34:44 remain
Copying `choong`.`choong_tasks`: 0% 32:18 remain
Copying `choong`.`choong_tasks`: 0% 31:49 remain
...
Copying `choong`.`choong_tasks`: 99% 00:02 remain
2018-06-11T15:37:27 Copied rows OK.
2018-06-11T15:37:27 Analyzing new table...
2018-06-11T15:37:27 Swapping tables...
2018-06-11T15:37:27 Swapped original and new tables OK.
2018-06-11T15:37:27 Dropping old table...
2018-06-11T15:37:27 Dropped old table `choong`.`_choong_tasks_old` OK.
2018-06-11T15:37:27 Dropping triggers...
2018-06-11T15:37:27 Dropped triggers OK.
Successfully altered `choong`.`choong_tasks`.

 

Test Result
- 이번에는 chunk-time만가지고 실행을 시킨 결과이며 progress 노출도 잘되고 where 조건이 없을때와 비슷한 시간에 종료됐습니다.(34분)


테스트 총정리

- chunk-size 및 chunk-size-limit을 가지고 progress 노출 및 where 조건의 상관관계를 증명할 수 있었습니다.
- chunk-time을 이용해서도 chunk를 나눌 수 있으며, default는 0.5 sec입니다. 

--chunk-time

type: float; default: 0.5
Adjust the chunk size dynamically so each data-copy query takes this long to execute. The tool tracks the copy rate (rows per second) and adjusts the chunk size after each data-copy query, so that the next query takes this amount of time (in seconds) to execute. It keeps an exponentially decaying moving average of queries per second, so that if the server’s performance changes due to changes in server load, the tool adapts quickly.

If this option is set to zero, the chunk size doesn’t auto-adjust, so query times will vary, but query chunk sizes will not. Another way to do the same thing is to specify a value for --chunk-size explicitly, instead of leaving it at the default.

https://www.percona.com/doc/percona-toolkit/2.2/pt-online-schema-change.html#cmdoption-pt-online-schema-change--chunk-time

 

여태까지 일반적으로 많은 사람들이 사용한 chunk-size-limit은 잘못된 사용입니다. chunk-size-limit은 배수(times)값이며 단순한 max값이 아닙니다. chunk-size : 500, chunk-size-limit : 600이면 최대 조회 row 수는 500~600이 아닌 500 * 600입니다.

 

--chunk-size-limit

type: float; default: 4.0
Do not copy chunks this much larger than the desired chunk size.When a table has no unique indexes, chunk sizes can be inaccurate. This option specifies a maximum tolerable limit to the inaccuracy. The tool uses <EXPLAIN> to estimate how many rows are in the chunk. If that estimate exceeds the desired chunk size times the limit, then the tool skips the chunk.

The minimum value for this option is 1, which means that no chunk can be larger than --chunk-size. You probably don’t want to specify 1, because rows reported by EXPLAIN are estimates, which can be different from the real number of rows in the chunk. You can disable oversized chunk checking by specifying a value of 0.

The tool also uses this option to determine how to handle foreign keys that reference the table to be altered. See --alter-foreign-keys-method for details.

https://www.percona.com/doc/percona-toolkit/2.2/pt-online-schema-change.html#cmdoption-pt-online-schema-change--chunk-size-limit

결론적으로 chunk-size : 500, chunk-size-limit : 2 이런식으로 사용하는 것이 올바릅니다.


chunk-size와 chunk-size-limt 값에 따라 progress 및 where 절이 노출되지 않는 이유

자세한 내용은 debugging을 통해 알 수 있었습니다.

 

시나리오 1일때(chunk-size : 10,000, chunk-size-limit : 20,000)

1313 # NibbleIterator:5971 31586 EXPLAIN SELECT * FROM `choong`.`choong_tasks` WHERE 1=1
1314 # NibbleIterator:5973 31586 $VAR1 = {
1315 #   extra => undef,
1316 #   id => '1',
1317 #   key => undef,
1318 #   key_len => undef,
1319 #   possible_keys => undef,
1320 #   ref => undef,
1321 #   rows => '21708691',
1322 #   select_type => 'SIMPLE',
1323 #   table => 'tasks',
1324 #   type => 'ALL'
1325 # };
1326 #
1327 # NibbleIterator:5841 31586 One nibble: yes
1328 # NibbleIterator:5875 31586 User wants to use index PRIMARY
1329 # NibbleIterator:5936 31586 Best index: PRIMARY
1330 # NibbleIterator:5489 31586 One nibble statement: INSERT LOW_PRIORITY IGNORE INTO `choong`.`_______choong_tasks_new` (`id`, `objectkey` FROM `choong`.`choong_tasks` LOCK IN SHARE MODE /*pt-online-schema-change 31586 copy table*/
1331 # NibbleIterator:5499 31586 Explain one nibble statement: EXPLAIN SELECT `id`, `objectkey` FROM `choong`.`choong_tasks` LOCK IN SHARE MODE /*explain pt-online-schema-change 31586 copy table*/
1332 # NibbleIterator:5983 31586 Preparing statement handles
1333 2018-06-11T16:24:58 Copying approximately 21708691 rows...

 

시나리오 5일때(chunk-time : 5 sec, chunk-size : default, chunk-size-limit : default)

1258 # NibbleIterator:5971 31560 EXPLAIN SELECT * FROM `choong`.`choong_tasks` WHERE 1=1
1259 # NibbleIterator:5973 31560 $VAR1 = {
1260 #   extra => undef,
1261 #   id => '1',
1262 #   key => undef,
1263 #   key_len => undef,
1264 #   possible_keys => undef,
1265 #   ref => undef,
1266 #   rows => '21708691',
1267 #   select_type => 'SIMPLE',
1268 #   table => 'tasks',
1269 #   type => 'ALL'
1270 # };
1271 #
1272 # NibbleIterator:5841 31560 One nibble: no
1273 # NibbleIterator:5875 31560 User wants to use index PRIMARY
1274 # NibbleIterator:5936 31560 Best index: PRIMARY
1275 # TableNibbler:2980 31560 Will ascend index PRIMARY
1276 # TableNibbler:2992 31560 Will ascend columns id
1277 # TableNibbler:3003 31560 Will ascend, in ordinal position: 0
...
1338 # NibbleIterator:5538 31560 First lower boundary statement: SELECT /*!40001 SQL_NO_CACHE */ `id` FROM `choong`.`choong_tasks` FORCE INDEX(`PRIMARY`) ORDER BY `id` LIMIT 1 /*first lower boundary*/
1339 # NibbleIterator:5563 31560 Last upper boundary statement: SELECT /*!40001 SQL_NO_CACHE */ `id` FROM `choong`.`choong_tasks` FORCE INDEX(`PRIMARY`) ORDER BY `id` DESC LIMIT 1 /*last upper boundary*/
1340 # NibbleIterator:5574 31560 Upper boundary statement: SELECT /*!40001 SQL_NO_CACHE */ `id` FROM `choong`.`choong_tasks` FORCE INDEX(`PRIMARY`) WHERE ((`id` >= ?)) ORDER BY `id` LIMIT ?, 2 /*next chunk boundary*/
1341 # NibbleIterator:5587 31560 Nibble statement: INSERT LOW_PRIORITY IGNORE INTO `choong`.`______choong_tasks_new` (`id`, `objectkey` FROM `choong`.`choong_tasks` FORCE INDEX(`PRIMARY`) WHERE ((`id` >= ?)) AND ((`id` <= ?)) LOCK IN SHARE MODE /*pt-online-schema-change 31560 copy nibble*/
1342 # NibbleIterator:5600 31560 Explain nibble statement: EXPLAIN SELECT `id`, `objectkey` FROM `choong`.`choong_tasks` FORCE INDEX(`PRIMARY`) WHERE ((`id` >= ?)) AND ((`id` <= ?)) LOCK IN SHARE MODE /*explain pt-online-schema-change 31560 copy nibble*/
1343 # NibbleIterator:5603 31560 Initial chunk size (LIMIT): 999

 

자세히보면 One nibble : no와 yes에 따라서 insert가 분할 되는 것을 알 수 있습니다.

 

debugging에서 나오는 내용을 바탕으로 pt-online-schema-change의 소스를 봤습니다. 

PTDEBUG에서 보이는 5841번째 줄을 보면 one_nibble의 변수를 받습니다. 또한 one_nibble은 row_est의 값과  chunk_size * chunk_size_limit의 비교값을 가지고 yes or no를 결정하게 됩니다.

 

5827    my ($row_est, $mysql_index) = get_row_estimate(
5828       Cxn   => $cxn,
5829       tbl   => $tbl,
5830       where => $where,
5831    );
5832
5833    if ( !$where ) {
5834       $mysql_index = undef;
5835    }
5836
5837    my $chunk_size_limit = $o->get('chunk-size-limit') || 1;
5838    my $one_nibble = !defined $args{one_nibble} || $args{one_nibble}
5839                   ? $row_est <= $chunk_size * $chunk_size_limit
5840                   : 0;
5841    PTDEBUG && _d('One nibble:', $one_nibble ? 'yes' : 'no');

 

그리고 row_est는 get_row_estimate 함수를 호출해서 리턴값을 받게 되어 있습니다. 여기에서 sql문을 실행하게 되어 있습니다. sql문은 explan을 사용했을 때 나오는 row 수입니다. 디버깅에서도 나온 "EXPLAIN SELECT * FROM `choong`.`choong_tasks` WHERE 1=1" 이런 쿼리를 사용하게 되고 그 결과값인 rows 21708691를 실제 비교값으로 사용하게 됩니다.

 

5961 sub get_row_estimate {
5962    my (%args) = @_;
5963    my @required_args = qw(Cxn tbl);
5964    foreach my $arg ( @required_args ) {
5965       die "I need a $arg argument" unless $args{$arg};
5966    }
5967    my ($cxn, $tbl) = @args{@required_args};
5968
5969    my $sql = "EXPLAIN SELECT * FROM $tbl->{name} "
5970            . "WHERE " . ($args{where} || '1=1');
5971    PTDEBUG && _d($sql);
5972    my $expl = $cxn->dbh()->selectrow_hashref($sql);
5973    PTDEBUG && _d(Dumper($expl));
5974    my $mysql_index = $expl->{key} || '';
5975    if ( $mysql_index ne 'PRIMARY' ) {
5976       $mysql_index = lc($mysql_index);
5977    }
5978    return ($expl->{rows} || 0), $mysql_index;
5979 }

결론적으로 21,708,691 >= 10,000 * 20,000(200,000,000)은 false이기 때문에 기존 옵션인 시나리오 1에서는 progress가 노출되지 않았으며 where 절이 사용되지도 않았습니다.


해당 이슈의 원인 및 결론

- chunk-size와 chunk-size-limit의 정의가 올바르게 되어 있지 않았으며 잘못된 사용으로 인해 chunk 단위로 insert 되지 않았습니다.
- 각 table의 row에 맞게 chunk-size 및 chunk-size-limit을 정의할 필요가 있으며 chunk-time을 쓰는 것도 좋은 선택지입니다.

'MySQL > MySQL Admin' 카테고리의 다른 글

mysql_config_editor 설정하기  (0) 2016.06.11
mysql parameter innodb_locks_unsafe_for_binlog  (0) 2015.12.09
Mysql isolation Level 설정  (0) 2015.12.09
Mysql prompt 변경하기!  (0) 2015.12.04
Transaction이란?  (0) 2015.12.04