Tuesday, October 1, 2019

Quick fix for Replication Error "but the master has purged binary logs containing GTIDs"

I ran into this error recently, and wanted to share a quick fix.

This is what the problem looks like -

mysql> show slave status \G
..
              Master_Log_File: mysql_binary_log.000382
          Read_Master_Log_Pos: 341467211
               Relay_Log_File: mysql_relay_log.000001
                Relay_Log_Pos: 4
        Relay_Master_Log_File: mysql_binary_log.000382
             Slave_IO_Running: No
            Slave_SQL_Running: Yes

        Seconds_Behind_Master: NULL
..
                Last_IO_Errno: 1236
                Last_IO_Error: Got fatal error 1236 from master when reading data from binary log: 'The slave is connecting using CHANGE MASTER TO MASTER_AUTO_POSITION = 1, but the master has purged binary logs containing GTIDs that the slave requires.'
               Last_SQL_Errno: 0

..

Checking the binary on the master

mysql> show binary logs;
+-------------------------+------------+
| Log_name                | File_size  |
+-------------------------+------------+
| mysql_binary_log.000388 | 1073741987 |
| mysql_binary_log.000389 | 1073742774 |
| mysql_binary_log.000390 | 1074106024 |
| mysql_binary_log.000391 | 1073748407 |
| mysql_binary_log.000392 |  120633755 |
+-------------------------+------------+
5 rows in set (0.00 sec)

As we can see that the binary log mysql_binary_log.000382 has indeed been purged.
However I have a script that backups binary logs before it purges them. Since the database is in order of TBs, I will attempt to recover from this using those, rather than rebuild the database using a backup. 

Copy the binary logs from the backup location to MySQL's binary log directory.

cp mysql_binary_log.00038* /mysql_binlog_dir

Be sure to change the ownership to mysql user

chown -R mysql:mysql /mysql_binlog_dir

Edit mysql_binary.index and add these files to it.

mysql_binary_log.000382
mysql_binary_log.000383
..
mysql_binary_log.000392

The logs still won't show up on MySQL. Lets try FLUSH LOGS;

mysql> flush logs;
Query OK, 0 rows affected (0.02 sec)

mysql> show binary logs;
+-------------------------+------------+
| Log_name                | File_size  |
+-------------------------+------------+
| mysql_binary_log.000382 | 1073897047 |
| mysql_binary_log.000383 | 1073860223 |
| mysql_binary_log.000384 | 1073742372 |
| mysql_binary_log.000385 | 1075249248 |
| mysql_binary_log.000386 | 1074009237 |
| mysql_binary_log.000387 |  115207752 |
| mysql_binary_log.000388 | 1073741987 |
| mysql_binary_log.000389 | 1073742774 |
| mysql_binary_log.000390 | 1074106024 |
| mysql_binary_log.000391 | 1073748407 |
| mysql_binary_log.000392 |  120924130 |
| mysql_binary_log.000393 |       6322 |
+-------------------------+------------+

12 rows in set (0.00 sec)

The logs now show up in MySQL.

Let us log in to the slave and try to resume replication now.

mysql> change master to master_auto_position=0;
Query OK, 0 rows affected (0.02 sec)

mysql> change master to master_log_file='mysql_binary_log.000382';
Query OK, 0 rows affected, 1 warning (0.01 sec)

mysql> set global slave_parallel_workers=0;

There are warnings, if you use start slave until with parallel replications, you could run into issues. 


start slave until master_log_file='mysql_binary_log.000393', master_log_pos=38204485;

MySQL replication now resumes

mysql> show slave status \G
..
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
  ..
        Seconds_Behind_Master: 45844
..

Monitor the replication lag, until the slave catches up to previous position.

Then switch back to master_auto_position and parallel replication

mysql> change master to master_auto_position=1;
mysql> set global slave_parallel_workers=20;
mysql> start slave;

mysql> show slave status \G
..
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
  ..
        Seconds_Behind_Master: 600
..

Eventually replication will catch up and things will be back to normal

mysql> show slave status \G
..
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
  ..
        Seconds_Behind_Master: 0
..

It is a good practice to validate that the data remains consistent after this fix. One way to do this is to check whether the Executed_Gtid_Set that shows up in show slave status \G, contains all transactions from master.  However this only validates that the transaction ID is present and not whether the actual data is consistent. If you want to validate if the actual data is consistent, this can be done using a tool like pt-table-checksum to check data consistency.

2 comments:

  1. How did you come up with this number master_log_pos=38204485 in 'start slave until' command??

    ReplyDelete
  2. I stopped the slave before I did the change master to master_auto_position=0; That was the position in show master status at that point.

    ReplyDelete

MySQL storing time

Storing time in databases seems like a daunting task to many. The general principle with storing time in databases is - ALWAYS STORE TIME ...