Thursday, January 29, 2009

XA and Persistent InnoDB Locks

Recently a customer mentioned that they were seeing corruption rarely when they copied InnoDB files using LVM to setup a new slave. Investigating further, it turns out that replication would start, but would then hit a lock wait timeout. This locking issue occurred across restarts causing replication to always fail. They would solve this by taking a new LVM snapshot and resetting it up.

This is a classic case of an XA transaction in a prepared state persisting across restarts and the LVM sometimes taking a snapshot at the exact instant when this can occur.

Most people don't use actually XA transactions, so they may not consider this possibility. However, it can still occur even if you do not use them due to XA being used internally for the binary log and InnoDB. We will be able to identify that case later.

You can tell this is occurring due to a mention of prepared transactions in the error log:
090128 11:09:09 [Note] Starting crash recovery...
090128 11:09:09 InnoDB: Starting recovery for XA transactions...
090128 11:09:09 InnoDB: Transaction 0 18707 in prepared state after recovery
090128 11:09:09 InnoDB: Transaction contains changes to 1 rows
090128 11:09:09 InnoDB: 1 transactions in prepared state after recovery
090128 11:09:09 [Note] Found 1 prepared transaction(s) in InnoDB
090128 11:09:09 [Warning] Found 1 prepared XA transactions

In addition, you will see a transaction listed in PREPARE without any MySQL or OS thread id in the output of SHOW INNODB STATUS:
---TRANSACTION 0 18707, ACTIVE (PREPARED) 11 sec, OS thread id 0
, undo log entries 1

So how do you solve this when it occurs? The key is using the XA commands which allow you to manipulate XA transactions in the prepared state.

The first command is XA RECOVER which will show you a list of the possible transactions to handle. The data column has the XID of the transaction to rollback. If the XID starts with MySQLXid then it is an internal generated XID. The XID can contain unprintable characters for your terminal, so keep that in mind it might be a bit different than it initially looks.
mysql> xa recover;
| formatID | gtrid_length | bqual_length | data |
| 1 | 3 | 0 | foo |
1 row in set (0.00 sec)

After that, you can then use XA ROLLBACK to remove the transaction or XA COMMIT to save the change. Both of these require that you specify the XID that you got from the previous one.

I wrote a one-line shell script which might work when you have a single XID to commit. I haven't tested it very thoroughly, so don't get mad at me if it doesn't work in all cases.
echo xa commit \'`mysql --skip-column-names -e 'xa recover' | awk '{ print $4 }'`\' | mysql


  1. I don't think this can happen with the use of XA internally to keep the binlog and InnoDB in sync. InnoDB will rollback or commit a prepared transaction during crash recovery in that case.

    I am not fond of the XA support -- The worst part is that when a transaction is prepared before a crash and committed after a crash it will not be written to the binlog.

    I avoid problems with XA by changing the parser to raise an error for XA commands.

    Also, what is the deal with rolling back prepared transactions on a clean shutdown? Prepare should imply durability.

  2. I have only ever seen the internal XA issue occur with snapshots being taken. I've heard of it affecting two customers so far, both of them using LVM. It is easy to tell it is internal XA since the XID has that in the name.

    So I suspect there is some very short time period where a snapshot race condition exists. I haven't gotten around to checking into the internals to see where, but I know it can and does occur.

    I agree that the XA implementation isn't good for a lot of uses. I've seen it used successfully very rarely (mostly with Java and Connector/J).

  3. It is still a mystery to me. ha_recover() runs at startup and is provided a list of XIDs from the last open binlog. It then gets a list of PREPAREd transactions from InnoDB. Each PREPAREd transaction is committed if it is in the list of XIDs found in the binlog. Otherwise, rollback is done. So there is something else that allows an internal XA transaction to linger.

  4. I wonder if they were not copying the binary logs or if something was happening with the binary logs not being sync'd (though I think LVM should still be fine on that one).

    I'll try to do some tests and see what I can find out.

  5. That saved me on a ZFS snap restore. I couldn't see your full command line so I just did it manually.

    mysql --skip-column-names -e 'xa recover'
    then I cut and paste only the data column.

    xa commit 'MySQLXid\365\351;\0\362\302V\0\0\0\0\0';

    Why do the special characters show up when I run it from the command line but not from within mysql?

  6. i saw it once when setting up a slave using innodb hot backup. does serverid have anything to do with it?

  7. OK, I have seen the problem you describe. Only the internal XID isn't friendly to my terminal. Using mysql -e 'xa recover' | od -c is a little bitter.

  8. We used this to get the xid:
    mysql -e 'xa recover' | awk '{print $4}'

  9. I took a look at this code today finally.

    The internal XID is generated by combining the prefix MySQLXid + server_id + query_id.

    During startup, ha_recovery() is called, which loops through the prepared XIDs and uses xid_t::get_my_xid() to verify that they were created by internal MySQL processing. get_my_xid() uses the prefix and the server_id to see if they do indeed belong to the server or not.

    So if a snapshot is taken and the server_id is changed on restart, then it will not think it is the owner of the XID and will leave it in the prepared state.

    I think it should just be enough to use the special prefix and to not use the server id, but will need to verify with the developers. I don't think the internal XIDs will ever make it to other servers.

  10. I have opened a bug for this at:

  11. The bug is fixed in 5.1.41, 5.5.0, 6.0.14.