04 Oct, 2013, Skol wrote in the 1st comment:
Votes: 0
Hey guys, I've seen this before via google (from an old thread) but wanted to bring it up here so I could talk in a more modern setting. IE. ask questions/discuss etc, and not a 10 year old thread on a backup of TMC.

Preface: As many of you might recall I wasn't around for a couple years on the forums etc, same for the mud I had run. Someone else helped out, did coding etc.

Now: I have sockets that seem to keep climbing in #, if they hit 1024 I seem to have an I/O issue where the game can't write anything.
Logs read the usual: "New_descriptor: accept: Too many open files"

So I went through all of my code looking for file's not having a fclose etc. No luck. Question is, what's the linux grep command to see open files? I remember it had to do something with *.c*, but again I'm a hack at linux still (hack being the original definition of course, not the hack'er of today :P).

Any insight/discussion would be great.
Thanks guys,
- Dave.
04 Oct, 2013, quixadhal wrote in the 2nd comment:
Votes: 0
What OS are you running?

Very OLD versions of unix used to have a limit of 1024 file descriptors per process, and remember that every socket also counts as a file descriptor. I haven't seen that issue since Slackware 3 back in 1995 though…. nowadays, any limits are more likely to be 65535.

If you're using cygwhine under windows…. that might be your issue.
04 Oct, 2013, Omega wrote in the 3rd comment:
Votes: 0
I had a similar issue many moons ago, so I created a system to figure out what was going on.

Downloadable here

This helped me track down the issue, I would also recommend using a File Handler.

Both of these eliminated that issue for me. They may require some tweaking as they were written quite some time ago, but they did the trick for me.
04 Oct, 2013, Skol wrote in the 4th comment:
Votes: 0
Hey Quix, not sure, looking for how to do that in bash prompt heh. But I believe the last coder set up a LAMP on a virtual machine. I think Fedora but not certain (and he's pretty hard to get anything out of lately). The odd thing is, the 'socket' #'s keep rising, to me that says there's a file or being left open right?

Here's a feedback from my active processes:
[root@ansy zivilyn]# lsof | grep "*.c*" -i
dhclient 1496 root 5u IPv4 726 0t0 UDP *:bootpc
dhclient 1496 root 20u IPv4 704 0t0 UDP *:33898
dhclient 1496 root 21u IPv6 705 0t0 UDP *:15731
sshd 1615 root 3u IPv4 2796 0t0 TCP *:ssh (LISTEN)
sshd 1615 root 4u IPv6 2798 0t0 TCP *:ssh (LISTEN)
ntpd 1626 ntp 16u IPv4 839 0t0 UDP *:ntp
ntpd 1626 ntp 17u IPv6 840 0t0 UDP *:ntp
mysqld 1793 mysql 10u IPv4 930 0t0 TCP *:mysql (LISTEN)
dovecot 1825 root 17u IPv4 959 0t0 TCP *:pop3 (LISTEN)
dovecot 1825 root 18u IPv6 960 0t0 TCP *:pop3 (LISTEN)
dovecot 1825 root 19u IPv4 961 0t0 TCP *:pop3s (LISTEN)
dovecot 1825 root 20u IPv6 962 0t0 TCP *:pop3s (LISTEN)
dovecot 1825 root 25u IPv4 967 0t0 TCP *:imap (LISTEN)
dovecot 1825 root 26u IPv6 968 0t0 TCP *:imap (LISTEN)
dovecot 1825 root 27u IPv4 969 0t0 TCP *:imaps (LISTEN)
dovecot 1825 root 28u IPv6 970 0t0 TCP *:imaps (LISTEN)
master 1925 root 12u IPv4 5207 0t0 TCP *:smtp (LISTEN)
master 1925 root 13u IPv6 5209 0t0 TCP *:smtp (LISTEN)
lighttpd 1943 lighttpd 4u IPv4 5307 0t0 TCP *:http (LISTEN)
murmurd 2050 mumble-server 20u IPv6 5530 0t0 TCP *:64738 (LISTEN)
murmurd 2050 mumble-server 21u IPv6 5533 0t0 UDP *:64738
rom 7571 root 5u IPv4 18632871 0t0 TCP *:8662 (LISTEN)
rom 8513 root 5u IPv4 10266860 0t0 TCP *:8660 (LISTEN)
rom 10142 root 5u IPv4 19832383 0t0 TCP *:8679 (LISTEN)
rom 15170 root 5u IPv4 19360178 0t0 TCP *:8650 (LISTEN)
[root@ansy zivilyn]# lsof | wc -l
3563
[root@ansy zivilyn]# lsof | wc
3563 32874 360704


So I see the ports, a mumble server, web server, email server, ssh, mysql process etc.

I've noticed some I/O errors in crashes before and I'm starting to think this is one issue. Copyover by the way, doesn't appear to fix it (at least the socket #'s are still high), but a complete crash/reboot of the Mud port does restart them.
I'll take a look at Darien's stuff and see if that helps. Thanks a ton for listening guys.
- Dave.
- Dave.
04 Oct, 2013, Omega wrote in the 5th comment:
Votes: 0
cd /proc/<pid>/fd

then do a ls (gives you file descriptor numbers)

then stat <some descriptor number> as listed above in the ls (tells you what it is, file, socket, etc)

Thats a basic run-down of what the first snippet I posted does. But it will give an idea of whats going on.

if the cd /proc/<pid>/fd fails, then you will require finding out what the location is for your specific os. I used fedora core 6 at the time of me writing the snippets and this is what I had to do to view the data.
05 Oct, 2013, Idealiad wrote in the 6th comment:
Votes: 0
I think this might have to do with actual sockets and not FDs in general.

http://stackoverflow.com/questions/53574...
05 Oct, 2013, quixadhal wrote in the 7th comment:
Votes: 0
It's possible, but remember that in UNIX, *everything* is a file. Every socket uses a file descriptor as well. Unless you have 1000 players on your MUD, sockets shouldn't be an issue unless you are not closing them somehow, when people disconnect. I suppose it's possible to have them close yet still leave the descriptors in the select set so they can't be reused.
07 Oct, 2013, Skol wrote in the 8th comment:
Votes: 0
Darien said:
cd /proc/<pid>/fd

then do a ls (gives you file descriptor numbers)

then stat <some descriptor number> as listed above in the ls (tells you what it is, file, socket, etc)

Thats a basic run-down of what the first snippet I posted does. But it will give an idea of whats going on.

if the cd /proc/<pid>/fd fails, then you will require finding out what the location is for your specific os. I used fedora core 6 at the time of me writing the snippets and this is what I had to do to view the data.


Thanks Darien, works perfectly!
Ok, so I have about 1000 in there, gaaah, Yeah there's the usual 8-10 that should be open but… looks like…:
File: `820' -> `/mud/playport/player/Sylphshade (deleted)'
Size: 64 Blocks: 0 IO Block: 1024 symbolic link
Device: 3h/3d Inode: 20501757 Links: 1
Access: (0500/lr-x——) Uid: ( 0/ root) Gid: ( 0/ root)
Access: 2013-10-07 11:37:15.790868756 -0400
Modify: 2013-10-07 11:37:15.782869084 -0400
Change: 2013-10-07 11:37:15.782869084 -0400


File: `666' -> `/mud/playport/player/Lazuli (deleted)'
Size: 64 Blocks: 0 IO Block: 1024 symbolic link
Device: 3h/3d Inode: 20501603 Links: 1
Access: (0500/lr-x——) Uid: ( 0/ root) Gid: ( 0/ root)
Access: 2013-10-07 11:37:15.788868850 -0400
Modify: 2013-10-07 11:37:15.782869084 -0400
Change: 2013-10-07 11:37:15.782869084 -0400


These pfiles are still there, so it doesn't look like it's letting go of that file descriptor when they leave, is that the consensus?
Unfortunately this was something introduced while I was away, so I'm kind of needle/haystack here.

- Dave.
08 Oct, 2013, Omega wrote in the 9th comment:
Votes: 0
That could mean that when your opening the file (or saving the file) it isn`t closing the file descriptor properly. This is where I would suggest you use a File Handler like the snippets I posted above. They can help you find all the spots where files haven`t been closed properly, and help future-proof your source for later, so as long as you use the file handler, it will tell you what file, function, and line where a file was opened, if files are closed properly, you won`t see anything in the fileio command, but when something is left open, it will show up there.

Just a thought, again, its up to you, but I had great success with it, found my errors and was able to correct them quickly.
08 Oct, 2013, Skol wrote in the 10th comment:
Votes: 0
Yeah I totally hear you Darien, I'm looking at doing that for sure, hunting through opens/closes in code isn't fun heh.
There seems to be one with LD pfiles/reconnect, at least from what I can see of the processes in the PID. I'm hunting there first heh.
Thanks for all of the help, that file handler code will be hugely appreciated, thanks again!

- Dave.
09 Oct, 2013, Nathan wrote in the 11th comment:
Votes: 0
Perhaps the code explicitly checks itself to make sure it doesn't go over 1000 even though the OS may not have that constraint anymore? If so, push it up a bit and see what happens, if it still runs over, well…

A copyover is a hotboot isn't it? So, the process would still be holding the file descriptors in that case, whereas if you kill it and restart it then the OS will automatically free them for you I think.
10 Oct, 2013, Skol wrote in the 12th comment:
Votes: 0
Nathan said:
Perhaps the code explicitly checks itself to make sure it doesn't go over 1000 even though the OS may not have that constraint anymore? If so, push it up a bit and see what happens, if it still runs over, well…

A copyover is a hotboot isn't it? So, the process would still be holding the file descriptors in that case, whereas if you kill it and restart it then the OS will automatically free them for you I think.


Yeah not sure on the code checking for that, it's strange that it would fail to write though, rather than some kind of log error of 'over limit' etc. I think it's more the open 1024 descriptors… Thing is, what if a mud were to have 800 players on, 1000 etc? I mean I don't have to worry about that myself, but how do other games handle that kind of numbers?

You're right on copyover/hotboot, the PID doesn't change so the file descriptors are still there as well.
The strange part I find is that it seems to be almost purely from open sockets after someone loses link, if they reconnect from a dynamic IP or another IP it's a new descriptor. Yet if it is from a static IP it re-uses the same one.

I haven't had a chance to put in Darien's file handler code, I've read through it etc, but not tried it out. Reading it though, it seems like it's a solution to what's happening, but not to the actual issue of the file descriptors not closing properly. IE. it's the mom coming behind going 'Seriously, shut your damned drawers' and then closing them for the kids. I want to make the kids do it heh, but it wouldn't be bad to have mom reporting in also.
10 Oct, 2013, quixadhal wrote in the 13th comment:
Votes: 0
Sounds like your socket code is not handling things properly. Make sure you are setting SO_LINGER as well as SO_REUSEADDR and SO_REUSERPORT. The former is to ensure buffers get flushed if the other end of the connection closes, and the latter are to make sure you don't end up with ports blocked for the duration of the process after being closed.
10 Oct, 2013, Skol wrote in the 14th comment:
Votes: 0
Thanks quix, I'll look there too. Just odd to see a 'new' problem pop out of old code :(.
10 Oct, 2013, Omega wrote in the 15th comment:
Votes: 0
its most likely an issue that has existed for a while but the circumstances that cause it haven't been triggered, and even if they were in the past, who is to say that they were ever triggered to this extent. Again, the file handling code will point out where a hanging file descriptor is. (for the file handler) the other piece of code identifies if it is a socket or a file that is left hanging. I personally wouldn't install the cleaning portion of it, just the command to identify what is open. But thats just me.

Of course with that said, the file handler may be all that you need, at least it is a step in the right direction to identifying the cause, even if it isn't the cause, it just eliminates one place that you were looking. And honestly, thats what debugging is, a series of eliminations until you reach the cause. (if you cannot directly find the cause that is)
10 Oct, 2013, Skol wrote in the 16th comment:
Votes: 0
My thoughts exactly, thanks for all the help Darien!

- Dave.
11 Oct, 2013, Omega wrote in the 17th comment:
Votes: 0
No problem, I just hope I was able to help.
04 Oct, 2016, Hades_Kane wrote in the 18th comment:
Votes: 0
I started having a "too many open files" problem myself.

I decided to hit up google while I'm at work just to see what popped up, and the file handler thing mentioned in this thread is likely where I'm going to go with it.

I'm replying mostly to bump this so I can find it easier when I get home to mess with it, but this is probably a 'duh' question, but I do need to make sure every fopen has a corresponding fclose, correct? (There are a couple of files where the fclose outnumbers the fopen)

I haven't personally modified or written any code that's used the fopen, but other people in the code have here and there over the years.
05 Oct, 2016, Pymeus wrote in the 19th comment:
Votes: 0
As a rule of thumb, every fopen() needs a fclose() when you're done with it, and every open() needs a close().

You might make exceptions for log files or other files that you use frequently throughout the run of the server, but the fewer the better.

creat(), dup/dup2(), and accept() also need a close(). socketpair() and pipe/pipe2() need 2 close()s, one for each fd. No doubt there are more. Bottom line is, when you open a file (or socket), there should be a corresponding close.
05 Oct, 2016, Hades_Kane wrote in the 20th comment:
Votes: 0
Pymeus said:
As a rule of thumb, every fopen() needs a fclose() when you're done with it, and every open() needs a close().

You might make exceptions for log files or other files that you use frequently throughout the run of the server, but the fewer the better.

creat(), dup/dup2(), and accept() also need a close(). socketpair() and pipe/pipe2() need 2 close()s, one for each fd. No doubt there are more. Bottom line is, when you open a file (or socket), there should be a corresponding close.


I appreciate the info, I'll double check all those things, as well :)
0.0/43