Wednesday, February 17, 2010

Chef + lvm2 = File descriptor xx left open

Here's one that cost me a day... 
I developed an LVM cookbook for Chef on Ubuntu 8.04, but once I ported to CentOS 5.4 and Ubuntu 9.10 I started to see a bunch of "File descriptor left open" errors in the chef logs.  Very concerning.

Turns out it is due to the new lvm2 package and (in my case) is benign.

To Reproduce

Once you have an lvm setup you can reproduce what I was seeing using the chef gem and irb:
irb(main):002:0> require 'rubygems'
=> true
irb(main):003:0> require 'chef'
=> true
irb(main):005:0> Chef::Mixin::Command.run_command(:command => "vgs")
[Wed, 17 Feb 2010 23:57:05 +0000] DEBUG: Executing vgs
[Wed, 17 Feb 2010 23:57:05 +0000] DEBUG: ---- Begin output of vgs ----
[Wed, 17 Feb 2010 23:57:05 +0000] DEBUG: STDOUT:   VG      #PV #LV #SN Attr   VSize   VFree
  vg-data   1   1   0 wz--n- 149.11G 59.64G
[Wed, 17 Feb 2010 23:57:05 +0000] DEBUG: STDERR: File descriptor 9 left open
[Wed, 17 Feb 2010 23:57:05 +0000] DEBUG: ---- End output of vgs ----
[Wed, 17 Feb 2010 23:57:05 +0000] DEBUG: Ran vgs returned 0
=> #


Turns out the lvm2 project added these warnings "during a bug investigation to prove that LVM was not involved" and left them in there.  Also, there is a super-secret environment variable LVM_SUPPRESS_FD_WARNINGS used to suppress the warnings -- but I suggest you figure out why your fds are open (read the bug report at and understand why you are seeing these warnings.

As for my cookbook, I haven't dug into fully, but the culprit is definitely the modified version of popen4 it's using.  The same thing happens with right_popen (used to return Chef logs back to the RightScale platform).  What happens in these versions of popen, is a thread is spawned to run a given command.  To return the stdout and stderr streams back to the parent thread ruby IO objects are opened.  

Whether LVM is complaining about the these streams being open or about a ruby VM delay in garbage collecting the streams -- I'm not sure.  Regardless, it doesn't seem like a critical problem in my case.

Any feedback, ideas, or results from further investigation is much appreciated!

1 comment:

cpenniman said...

This issue sparked interest in a RightScale developer who performed some stress tests on both Chef::Mixin:command.run_command() and the right_popen version. While the Chef version passed, there were some bug fixes applied to right_popen. Be sure to upgrade to right_popen v1.0.4 or later.