Monday, September 24, 2012

When too much RAM hurts- Centos/Redhat 5 and writes

Had an odd troubleshoot on linux.  DB server, over 200G of RAM, having problems every 30 seconds with DB queries.

The symptom was looking like blocking on I/O but not much actual I/O happening.  DB was on NFS without an async mount, so all writes being acknowledged.  Problem not really apparent on local disk that had write cache on the raid.  1G NFS mount made more problems than 10G.  So it kinda looked like write acknowledgement was a problem.  DB on a beefy NetApp so performance should be awesome.  Checked all the best practice stuff (NFS window size, netapp options, mysql .conf options. etc. etc.).  Had not gone to async on the mount yet.

Finally looking at what mysql was waiting for and how much BW was going to NFS mount on the netapp (nmon, iptraf, vmstat, iostat, top, etc), we found it waiting on a write but not really any iowait showing.

Then found this doc:

http://westnet.com/~gsmith/content/linux-pdflush.htm

Centos 5/Redhat 5 sets aside cache as a % of memory- good.  It flushes it after it gets a certain amount of memory or after 30 seconds.  Also good.  Unless you have like 28G of cache because it is taking 10% of physical memory- then  flushing 28G can be a problem.

So with a write instensive load turning that down to something the interface can keep up with makes a huge difference.  We went down to 1% of ram and flush basically every 3 seconds, although I'm considering tighter tuning on the flush interval.  Problem gone.  Writes stream to disk no problem.



No comments: