用过Oracle数据库的同学都知道,Oracle有一个Flash Recovery Area,可以把变更的块写入这块区域,当数据操作错误,需要恢复的时候,可以利用闪回空间中存储的数据块覆盖回去,也可以重构回滚段,恢复到需要的一致点。
As we know, There has a Flash Recovery Area in Oracle DB, Which allows the modified blocks been written into. So that, if there’s any incorrect deletion of data, and need to recover, DBA can use the data blocks which were stored in the Flash Recovery Area ,or reconstructed rollback segments, to restore the data to the consistent point.
而MySQL/InnoDB暂时没有提供这些功能,但是InnoDB很多设计都参考了Oracle,因此我觉得InnoDB也可以实现Flashback功能。
MySQL / InnoDB haven’t performed this great and useful function before I worked on it , though many designs of InnoDB are referred to Oracle. In this case, I think InnoDB should implement Flashback as well.
最开始我是想仿照Oracle,利用undo log来闪回,通过把COMMITTED的TRX标记为UNCOMMITTED,让InnoDB认为已经提交的事务没有提交,从而进行回滚。
具体方案是这样:
At first, I want to implement this feature, Oracle of reference. I can set COMMITTED transactions to UNCOMMITTED status during InnoDB starting with processing undo log. Then InnoDB will regard these committed transactions as uncommitted one, and rollback it.
Here are the details:
1. 在my.cnf中配置一个InnoDB_Flashback_Trx_ID的参数,标识回滚到这个trx_id的一致状态。
1. Add an option on my.cnf named InnoDB_Flashback_Trx_ID. It mean InnoDB need rollback to this trx snapshot.
2. 在InnoDB启动读取回滚段构造回滚事务时,凡是比InnoDB_Flashback_Trx_ID大的事务,都标记为UNCOMMITTED。
2. When InnoDB starting, and reading undo segments, I will set all transactions that trx_id > InnoDB_Flashback_Trx_ID to UNCOMMITTED.
3. InnoDB会把这些提交的事务认为没有提交,进而构造未提交事务,利用InnoDB自己的机制,将会在打开数据库前回滚这些事务。
3. InnoDB will consider these committed transactions are uncommitted, so construction the trx, and after construction all uncommitted transactions, InnoDB will rollback these transactions.
但这个方案有明显的弊端,首先只能适用于InnoDB,然后闪回操作需要重启,并且在实际编码实现这个方案的测试中发现,如果发生了DDL,再做一次闪回到DDL之前的TRX_ID,那么InnoDB会崩溃,并且无法再启动,应该是数据文件已经损坏,因为InnoDB的undo是逻辑记录,而非物理记录。
But this way have an Obvious disadvantages, it can only used by InnoDB. And flashback need restart MySQL. In the actual coding I found that if InnoDB did DDL, and I will rollback to the TRX_ID before DDL, InnoDB will crash, and can’t start again. I think the datafiles is corrupted, because InnoDB undo is logical records, not physical records.
因此想到了第二个方案,就是利用binlog,因为如果是ROW格式的binlog,其中记录了每个ROW的完整信息,INSERT会包含每个字段的值,DELETE也会包含每个字段的值,UPDATE会在SET和WHERE部分包含所有的字段值。因此binlog就是个完整的逻辑redo,把它的操作逆过来,就是需要的“undo”。
具体方案是这样:
So I think another way that use binlog. Because the ROW format binlog will record whole information about modified rows. INSERT/DELETE will contain all columns’ values. UPDATE will contain all columns’ on SET/WHERE part. So binlog like a whole logical redo log, reversed them can get the “undo” I need. Detail:
1. 修改Row_log_event的print的结果,将Event_type逆转:WRITE_ROWS_EVENT转为DELETE_ROWS_EVENT / DELETE_ROWS_EVENT转为WRITE_ROWS_EVENT,这只要改一个标记位即可,就是第4个字节ptr[4]。
1. Modifying the result of Row_log_event::print that reversed Event_type: Modifying WRITE_ROWS_EVENT to DELETE_ROWS_EVENT / DELETE_ROWS_EVENT to WRITE_ROWS_EVENT, this change need only modify a byte, that’s ptr[4].
2. 对于UPDATE_ROWS_EVENT,需要对调SET和WHERE部分,这是唯一相对有点麻烦的地方,我增加了个exchange_update_rows函数来完成。主要是利用print_verbose_one_row函数来解析出SET和WHERE部分的长度,以此来推断SET和WHERE的分割点,然后用memcpy交换。
2. With UPDATE_ROWS_EVENT, it need swap SET/WHERE parts. This is the only place has little trouble, I added an exchange_update_rows() function to do it. It will use print_verbose_one_row() to parse the length of SET/WHERE parts, so I can get the cut-point of SET/WHERE parts, and then swap it with memcpy().
3. 得到了逆转后的Event,就需要逆转输出。因此我在内存中拦截输出,我修改了Write_on_release_cache类,并且在Log_event中增加了一个buff,可以把Event的print结果打印在buff中,因此mysqlbinlog可以得到每个event的输出,并且存在内存中。
3. After get the reversed Event, it need reverse the sequence of Events. So I intercepted event output in memory by modifying Write_on_release_cache class, and I added a buff member on Log_event to save the print output. So mysqlbinlog can get all events’ output, and store in memory.
4. mysqlbinlog中我用动态数组存下所有的event输出,然后就从末尾向前逆向输出所有的事件,这样就可以获得闪回的逆操作文件,把这个文件导入目标库既可以完成闪回。
4. I used DYNAMIC_ARRAY to cache all events’ output in mysqlbinlog. and then I print the events’ output from end to begin, so I get the flashback file. You can import this file to MYSQL, data can flashback.
这个方案的好处很明显,通用于所有的存储引擎,因为binlog是Server层的。另外可以利用mysqlbinlog已有的各种filter来筛选部分日志输出为回滚日志,这样可以灵活选择闪回某一段操作,闪回某一个库的操作,某一个时间段的操作等等。
The advantage of this way is that all store engines can use it, because binlog is the log of Server. And then, mysqlbinlog have many filters, such as start-position/start-datatime and so on.
补丁可以看这里(Patch here):http://mysql.taobao.org/index.php/Patch_source_code#Add_flashback_feature_for_mysqlbinlog