v2.6.8 problem (Ubuntu): not enough file descriptors?
Hi, I ran into a nasty problem. This is what I did:
john@crusoe:~$ xscreensaver-demo #opened the preview for electricsheep
curl: (6) Couldn't resolve host 'sheep.arces.net.nyud.net'
subprocess error: anim download, 1536=6<<8+0
download failed of sheep 24572
/home/john/.sheep/generation202/00202=80062=80053=78385.mpg: Too many open files
/home/john/.sheep/3680/00202=24770=24765=24751.mpg: Too many open files
please be patient while the first sheep is downloaded...
subprocess error: writing none to id file, 32512=127<<8+0
/home/john/.sheep/: Too many open files
My HD became rather busy and I finally killed the program. To my dismay, my sheep cache had shrunk in size from ~15GB to ~6GB, although I have been running the screensaver with the unlimited cache setting, and there was still enough HD space.
Some of my torrent directories have been emptied, others show another curiosity: ls shows the files at full size, but according to du they are only a few KB?!
john@crusoe:~/.sheep$ du -h .
0 ./2170
.
.
.
0 ./2438
0 ./2439
188K ./3626
188K ./3627
188K ./3628
196K ./3629
193K ./3630
197K ./3631
196K ./3632
188K ./3633
192K ./3634
188K ./3635
188K ./3636
188K ./3637
.
.
.
197K ./3679
6,8G ./generation202
6,8G .
john@crusoe:~/.sheep$ ls -l 3637
insgesamt 187
-rw-r--r-- 1 john doe 4793316 2008-09-14 11:40 00202=23202=23202=23202.mpg
.
.
.
-rw-r--r-- 1 john doe 4694923 2008-09-14 11:40 00202=23361=23209=23161.mpg
-rw-r--r-- 1 john doe 0 2008-09-12 03:02 finished
john@crusoe:~/.sheep$ du -h 3637
188K 3637
Anyone know what happened? I would hate this to happen again...
Thanks in advance
OS issue but fixable.
the basic problem is that you have exactly what it mentioned, too many files (descriptors actually) open.
from the log, it looks like it went something like this:
-) file descriptor allocation error
-) cant open new files/sockets
-) race condition as process fights with itself for resources and consumes CPU cycles.
-) you kill it, and all the open files are left in an unknown state.
If you were bumping the file descriptor limit then there are likely close to that many file descriptors open that were not closed properly and the files themselves possibly became corrupt. Bit torrent requires that space be set aside up front, so there are really 2 possibilities (?)
1) the files existed and then were corrupted because they werent closed properly.
2) the files were allocated but the blocks never consumed with data so you really didnt lose anything.
Either way the differing file sizes between 'du' and 'ls' speak to this i believe. Superblock says the file is one size (ls reads file size without traversing inode list, right?) but when the actual inodes are summed by 'du' the total sizes dont match up. Pretty sure 'du' is the one you should believe here :(
SOLUTION:
Well you probably cant get the data back (if it existed) without considerable work. Copy the good sheep to a backup so you dont lose them again. Do a little research on the following:
1) 'ulimit' linux command
2) /etc/security/limits.conf file
there are limits on the number of multiple classes of resources a single user can consume for reasons but they can also be raised to reasonably high limits without consequence. the safe way is to just bump them a couple times for your user until the problem goes away. i think doubling them each time is reasonable.
The easy way is probably just to peg 'nofile' at something like 65535 and be done with it. It WILL amplify the consequences of a runaway process running under that user (since it will be able to consume MORE resources) but given processes that behave you should have no problems.
Before you change anything, run 'ulimit -n' and note the number for a starting point.
Wow, that taught me some
Wow, that taught me some more about my OS :)
I am unlikely to encounter the problem again anytime soon because I already deleted the affected sheep, but I quadrupled the 'nofile' entry anyway. This should suffice for holding file handles to all of the sheep I am likely to hold at any given time.
Thank you very much for taking the time to write this very helpful reply!

Recent comments
1 day 8 hours ago
1 day 8 hours ago
1 day 17 hours ago
1 day 17 hours ago
1 day 20 hours ago
1 day 20 hours ago
1 day 20 hours ago
1 day 20 hours ago
1 day 21 hours ago
1 day 21 hours ago