Thursday, January 29, 2009

XA and Persistent InnoDB Locks

Recently a customer mentioned that they were seeing corruption rarely when they copied InnoDB files using LVM to setup a new slave. Investigating further, it turns out that replication would start, but would then hit a lock wait timeout. This locking issue occurred across restarts causing replication to always fail. They would solve this by taking a new LVM snapshot and resetting it up.

This is a classic case of an XA transaction in a prepared state persisting across restarts and the LVM sometimes taking a snapshot at the exact instant when this can occur.

Most people don't use actually XA transactions, so they may not consider this possibility. However, it can still occur even if you do not use them due to XA being used internally for the binary log and InnoDB. We will be able to identify that case later.

You can tell this is occurring due to a mention of prepared transactions in the error log:
090128 11:09:09 [Note] Starting crash recovery...
090128 11:09:09 InnoDB: Starting recovery for XA transactions...
090128 11:09:09 InnoDB: Transaction 0 18707 in prepared state after recovery
090128 11:09:09 InnoDB: Transaction contains changes to 1 rows
090128 11:09:09 InnoDB: 1 transactions in prepared state after recovery
090128 11:09:09 [Note] Found 1 prepared transaction(s) in InnoDB
090128 11:09:09 [Warning] Found 1 prepared XA transactions

In addition, you will see a transaction listed in PREPARE without any MySQL or OS thread id in the output of SHOW INNODB STATUS:
---TRANSACTION 0 18707, ACTIVE (PREPARED) 11 sec, OS thread id 0
, undo log entries 1

So how do you solve this when it occurs? The key is using the XA commands which allow you to manipulate XA transactions in the prepared state.

The first command is XA RECOVER which will show you a list of the possible transactions to handle. The data column has the XID of the transaction to rollback. If the XID starts with MySQLXid then it is an internal generated XID. The XID can contain unprintable characters for your terminal, so keep that in mind it might be a bit different than it initially looks.
mysql> xa recover;
| formatID | gtrid_length | bqual_length | data |
| 1 | 3 | 0 | foo |
1 row in set (0.00 sec)

After that, you can then use XA ROLLBACK to remove the transaction or XA COMMIT to save the change. Both of these require that you specify the XID that you got from the previous one.

I wrote a one-line shell script which might work when you have a single XID to commit. I haven't tested it very thoroughly, so don't get mad at me if it doesn't work in all cases.
echo xa commit \'`mysql --skip-column-names -e 'xa recover' | awk '{ print $4 }'`\' | mysql

Friday, January 16, 2009

Enabling InnoDB Large Pages on Linux

In MySQL 5.0, InnoDB gained the ability to use Linux Large Page support for allocating memory for the buffer pool and additional memory pool.

A few customers have asked about using it and there is virtually no documentation on what is required on Linux to enable it. I actually ended up having to read some of the Linux kernel source code to figure out some of this.

This uses the API as documented at:

To set this up and use it, you first need a kernel that supports it. All of the recent RHEL kernels do by default from what I can tell. On my Ubuntu systems, I'm not seeing it enabled normally.

Then on the OS level you will need to do the following procedures:

# Set the number of pages to be used
# Each page is normally 2MB, so this would be 40 MB
# This actually allocates memory, so it requires that much memory to be available
echo 20 > /proc/sys/vm/nr_hugepages

# Set the group number that is allowed to access this memory
# The mysql user must be a member of this group
echo 102 > /proc/sys/vm/hugetlb_shm_group

# Increase the amount of shmem allowed per segment
# 256MB in this case
echo 268435456 > /proc/sys/kernel/shmmax

# Increase total amount of shared memory
# This is 4KB pages, ie. 16GB below
echo 4194304 > /proc/sys/kernel/shmall

For MySQL usage, you would normally want the shmmax to be close to the shmall.

You would normally want to put these into an rc file or similar to do it at every boot sequence (early in the boot sequence normally, prior to MySQL starting).

To verify it works with:
cat /proc/meminfo | grep -i huge
The final step is in order to make use of the hugetlb_shm_group, you need to give the mysql user 'unlimited' value for the memlock limit. This can either by done by editing /etc/security/limits.conf or by adding the following to your mysqld_safe:
ulimit -l unlimited
This will cause the root user to set it to unlimited before switching to the mysql user.

Finally, you will want to add the large-pages option to your my.cnf:
With this option, InnoDB will use it automatically for the two memory pools. If it can not, it will fail back and use traditional memory and output a warning to the error log.

You can verify it is being used by looking at:
cat /proc/meminfo | grep -i huge

Solaris also has the ability to use large pages (of different sizes as well), but MySQL doesn't support that yet.