MySQL Thoughts: Ext4 with MySQL binary logs oddity

I was working with a customer recently that kept seeing 10-12 second hangs in MySQL. Everything would be working well and suddenly all data changing statements would stop working.

Once the 10-12 seconds passed, the system would recover and everything would be fine for a while. This would repeat every 30-45 minutes normally, and would occur more frequently when the system was doing some heavy ETL type activity (data loads, big updates, etc...), with the ETL ones being longer and worse generally.

One of the steps taken while investigating was to check out vmstat/iostat. From this we noticed that during the spike there was very intense disk activity. I suspected something wrong with InnoDB such as the famous purge hiccup or something with the log files checkpointing, etc... However we then found out that it was the disk system where the binary logs were stored and not the InnoDB files.

While continuing to investigate, it was noticed that the times it occurred happened to exactly match up with the times that the binary log rotations were occurring. Using strace, we were able to notice that the fdatasync() call that accompanied the rotation was taking a very long time (the majority of the delay time).

Finally we were able to pin it down to ext4 and how it delays data writes for a very long time (30 minutes). Compare this to ext3 which will flush things every 5 seconds or so. I am told that ext4 recently changed this behavior (this was in 2.6.30), so hopefully this won't hit more people.

What was happening was as follows:

Binary log data gets written to over time

ext4 in an attempt to increase performance does not write the data to disk

ext4 continues to not write data to disk, even as hundreds of MB of binary log sits in memory and the disk is mostly idle

Binary log gets full (1024MB) and rotates while holding the log mutex

Rotation calls an fdatasync prior to closing the file

ext4 now has to write the data out and takes 10-12 seconds to do so

log mutex prevents any commits while it writes

Write finishes, log rotates, things repeat

To alleviate this, we ended up setting sync_binlog=100. This will force the binary log to fsync periodically and hence not allow ext4 to wait forever to write the data out to the file. There is also the commit mount option for ext4 which should give similar benefits by forcing it to write more often but I didn't test it.

6 comments:

Gerry NarvajaDecember 2, 2009 at 3:52 PM
It sounds to me that this could also spell disaster if the system crashes right before ext4 writes the data out.

My $.02
G
Stewart SmithDecember 3, 2009 at 12:58 AM
the answer is XFS :)
AnonymousDecember 3, 2009 at 2:41 AM
I saw the same behavior with XFS on SAS write-cache hardware RAID. Sync_binlog was only partial solution as heavy i/o operation (such as backups) caused stall too.
Guru Prasad G.V.December 10, 2009 at 9:32 AM
is shifting to ext3 a good option ?
-Guru
HarrisonDecember 10, 2009 at 9:37 AM
This is ext4 specific. ext3 shouldn't suffer from it, or even ext4 assuming it is configured properly.
Mark CallaghanJanuary 17, 2010 at 11:13 AM
Don't use ext-{2,3,4} for databases. ext-3 has its own problems. XFS is much better for databases.

MySQL Thoughts

Wednesday, December 2, 2009

Ext4 with MySQL binary logs oddity

6 comments:

Blog Archive

About Me