Monday, September 24, 2012

When too much RAM hurts- Centos/Redhat 5 and writes

Had an odd troubleshoot on linux.  DB server, over 200G of RAM, having problems every 30 seconds with DB queries.

The symptom was looking like blocking on I/O but not much actual I/O happening.  DB was on NFS without an async mount, so all writes being acknowledged.  Problem not really apparent on local disk that had write cache on the raid.  1G NFS mount made more problems than 10G.  So it kinda looked like write acknowledgement was a problem.  DB on a beefy NetApp so performance should be awesome.  Checked all the best practice stuff (NFS window size, netapp options, mysql .conf options. etc. etc.).  Had not gone to async on the mount yet.

Finally looking at what mysql was waiting for and how much BW was going to NFS mount on the netapp (nmon, iptraf, vmstat, iostat, top, etc), we found it waiting on a write but not really any iowait showing.

Then found this doc:

http://westnet.com/~gsmith/content/linux-pdflush.htm

Centos 5/Redhat 5 sets aside cache as a % of memory- good.  It flushes it after it gets a certain amount of memory or after 30 seconds.  Also good.  Unless you have like 28G of cache because it is taking 10% of physical memory- then  flushing 28G can be a problem.

So with a write instensive load turning that down to something the interface can keep up with makes a huge difference.  We went down to 1% of ram and flush basically every 3 seconds, although I'm considering tighter tuning on the flush interval.  Problem gone.  Writes stream to disk no problem.



pSCP and getting CIsco IOS on Cisco routers.

I had a great time with pSCP failing to copy an image to an ASR1000 Cisco router recently.

The router was setup to allow SCP- basically setup aaa so the local or remote has exec and enable etc. and turn on the SCP server with the command "ip scp server enable).  Full document here:  http://www.cisco.com/en/US/docs/ios/12_2t/12_2t2/feature/guide/ftscp.html.

I was trapped on a windows jump box with pSCP instead of scp from the openssh stuff like Linux or OS X.  Which is fine, but maybe the logging would have been easier.  But my pscp kept failing.  Turned on logging on the router and the client and only got this obscure message:

SSH-4-SSH2_UNEXPECTED_MSG: Unexpected message type has arrived. Terminating the connection

Hmmm... so the secret is pSCP did not default to actual SCP.  Here is the command line that got me over:

pscp -scp -2 c2951-universalk9-mz.SPA.152-4.M1.bin username@192.168.1.1:flash0:/c2951-universalk9-mz.SPA.152-4.M1.bin


Seriously had to force scp and ssh v2 with the two switches at the beginning. The rest is as you'd expect source, target username@host:location.  Crazy.  Apparently it was trying to use sftp.  Good to know that the utility can do that as well, but it is called pscp- wouldn't the default be scp?  Maybe something in the negotiation.  I don't know.  That was an extra hour of my day that I hope I'm saving you next time.