06 Oct, 2016, Tijer wrote in the 21st comment:
Votes: 0
i had an issue like this once.. used the snippet on here that uses File_Open and File_Close to replace fopen/close, to show which files werent being closed properly… and where they were located… once i had fixed the issues i reverted the code back to the opriginal
06 Oct, 2016, Pymeus wrote in the 22nd comment:
Votes: 0
As far as tracing leaked fds with a tool, it looks like Valgrind can be helpful:
http://stackoverflow.com/a/1235287
22 Oct, 2016, Hades_Kane wrote in the 23rd comment:
Votes: 0
I tried both snippets. The first one linked in the thread just had WAAAY too many things that weren't compatible with my code, so I moved on to the next one.

Once I got to changing all of the FILE to FileData and all that, I keep getting syntax errors on compile (that make no sense) and so after hammering at this for a few hours, I'm throwing in the towel.

I don't understand why/how a perfectly legitimate declaration is kicking up errors, and nothing I've done has fixed it.

gcc -c -Wall -O -ggdb  accounts.c -o obj/accounts.o
In file included from merc.h:2638,
from accounts.c:18:
synthesize.h:97: error: syntax error before "FileData"
In file included from merc.h:2639,
from accounts.c:18:
cooking.h:52: error: syntax error before "FileData"
In file included from merc.h:2948,
from accounts.c:18:
chocobo.h:291: error: syntax error before "FileData"
chocobo.h:292: error: syntax error before "FileData"
In file included from accounts.c:18:
merc.h:4438: error: syntax error before '*' token
merc.h:4735: error: syntax error before '*' token
merc.h:4736: error: syntax error before '*' token
merc.h:4737: error: syntax error before '*' token
merc.h:4738: error: syntax error before '*' token


Where the places giving errors are:

long 	fread_flag	args( ( FileData *fp ) );
char * fread_string args( ( FileData *fp ) );
char * fread_string_eol args(( FileData *fp ) );
void fread_to_eol args( ( FileData *fp ) );

void fwrite_stable (CHAR_DATA * ch, FileData * fp);
void fread_stable (CHAR_DATA * ch, FileData * fp);


Using the shell command pid thing mentioned earlier in the thread, I had hundreds of descriptor numbers pop up, most of which had the 'file' name socket and a bunch of numbers, so I suspect that my sockets are being let go, closed, freed, or something correctly, so I think I'm going to poke around in that code and see if anything pops out at me but honestly, this is all above my head so I'm not really sure what to do.
22 Oct, 2016, Hades_Kane wrote in the 24th comment:
Votes: 0
A few things that have appeared in my log file that I suspect may be connected:

Read_from_descriptor: Connection reset by peer
Read_from_descriptor: Connection timed out
New_descriptor: getpeername: Transport endpoint is not connected


In particular, the 'Connected reset by peer' appears quite frequently, with the 'timed out' once in a while. I've only seen the New_descriptor error a couple of times, but it was recent so I thought I would mention it.

Does this ring any bells to anyone?
22 Oct, 2016, Hades_Kane wrote in the 25th comment:
Votes: 0
Also, I can't seem to use valgrind either, I keep getting this error:

valgrind: padding mmap((nil), 134512640) failed during startup.
valgrind: is there a hard virtual memory limit set?


edited to add:

Ok, I got valgrind working… had to manually download, upload, and install the newest version. Trying to figure out the fds tool now.
23 Oct, 2016, Hades_Kane wrote in the 26th comment:
Votes: 0
Other things I've noticed… when another player connected to the game, their "num" in the "sockets" command lined up very closely with the number of descriptors when I did the 'ls' in the pid/fd directory, which at this point is up to 838 (the player that connected was listed as 836).

On the valgrind fds tool, everything looked normal except for an instance of this:

==28820==
==28820== Open AF_INET socket 4: 0.0.0.0:4001 <-> unbound
==28820== at 0x4154F42: socket (in /lib/libc-2.7.so)
==28820== by 0x80E9AA7: init_socket (comm.c:491)
==28820== by 0x80E9A0D: main (comm.c:460)


Which I think was when I took one of my test characters linkdead.

I have yet to run the tool on the main port (I've been using my dev port which has no players normally connecting).

If this doesn't seem to be enough info at this juncture, I'll run it on my live port for a few hours and see what info it gives me.

The lines references in comm.c are as follows (I have put comments where the specific lines are)

int main( int argc, char **argv )
{
struct timeval now_time;
bool fCopyOver = FALSE;
/*
* Memory debugging if needed.
*/
#if defined(MALLOC_DEBUG)
malloc_debug( 2 );
#endif

/*
* Init time.
*/
gettimeofday( &now_time, NULL );
current_time = (time_t) now_time.tv_sec;
strcpy( str_boot_time, ctime( &current_time ) );

/*
* Macintosh console initialization.
*/
#if defined(macintosh)
console_options.nrows = 31;
cshow( stdout );
csetmode( C_RAW, stdin );
cecho2file( "log file", 1, stderr );
#endif

/*
* Reserve one channel for our use.
*/
if ( ( fpReserve = fopen( NULL_FILE, "r" ) ) == NULL )
{
perror( NULL_FILE );
exit( 1 );
}

/*
* Get the port number.
*/
port = 4000;
if ( argc > 1 )
{
if ( !is_number( argv[1] ) )
{
fprintf( stderr, "Usage: %s [port #]\n", argv[0] );
exit( 1 );
}
else if ( ( port = atoi( argv[1] ) ) <= 1024 )
{
fprintf( stderr, "Port number must be above 1024.\n" );
exit( 1 );
}

/* Are we recovering from a copyover? */
if (argv[2] && argv[2][0])
{
fCopyOver = TRUE;
control = atoi(argv[3]);
}
else
fCopyOver = FALSE;

}

/*
* Run the game.
*/
#if defined(macintosh) || defined(MSDOS)
boot_db();
log_string( "Merc is ready to rock." );
game_loop_mac_msdos( );
#endif
pulsenum = 1;
#if defined(unix)

if (!fCopyOver)
control = init_socket( port ); //<—– comm.c:460

boot_db();
sprintf( log_buf, "End of Time is now running on port %d.", port );
log_string( log_buf );

if (fCopyOver)
copyover_recover();

game_loop_unix( control );
close (control);
#endif

/*
* That's all, folks.
*/
log_string( "Normal termination of game." );
exit( 0 );
return 0;
}


#if defined(unix)
int init_socket( int port )
{
static struct sockaddr_in sa_zero;
struct sockaddr_in sa;
int x = 1;
int fd;

if ( ( fd = socket( AF_INET, SOCK_STREAM, 0 ) ) < 0 ) //<—– comm.c:491
{
perror( "Init_socket: socket" );
exit( 1 );
}

if ( setsockopt( fd, SOL_SOCKET, SO_REUSEADDR,
(char *) &x, sizeof(x) ) < 0 )
{
perror( "Init_socket: SO_REUSEADDR" );
close(fd);
exit( 1 );
}

#if defined(SO_DONTLINGER) && !defined(SYSV)
{
struct linger ld;

ld.l_onoff = 1;
ld.l_linger = 1000;

if ( setsockopt( fd, SOL_SOCKET, SO_DONTLINGER,
(char *) &ld, sizeof(ld) ) < 0 )
{
perror( "Init_socket: SO_DONTLINGER" );
close(fd);
exit( 1 );
}
}
#endif

sa = sa_zero;
sa.sin_family = AF_INET;
sa.sin_port = htons( port );

if ( bind( fd, (struct sockaddr *) &sa, sizeof(sa) ) < 0 )
{
perror("Init socket: bind" );
close(fd);
exit(1);
}


if ( listen( fd, 3 ) < 0 )
{
perror("Init socket: listen");
close(fd);
exit(1);
}

return fd;
}
#endif
23 Oct, 2016, Hades_Kane wrote in the 27th comment:
Votes: 0
Aaaaand before too much info is more helpful than not enough, here's the close_socket function which handles people going linkdead…


void close_socket( DESCRIPTOR_DATA *dclose )
{
CHAR_DATA *ch;
char buf[MAX_STRING_LENGTH];

if ( dclose->outtop > 0 )
process_output( dclose, FALSE );

if ( dclose->snoop_by != NULL )
{
write_to_buffer( dclose->snoop_by,
"Your victim has left the game.\r\n", 0 );
}

{
DESCRIPTOR_DATA *d;

for ( d = descriptor_list; d != NULL; d = d->next )
{
if ( d->snoop_by == dclose )
d->snoop_by = NULL;
}
}

if ( ( ch = dclose->character ) != NULL )
{
sprintf( log_buf, "Closing link to %s.", ch->name );
log_string( log_buf );

if ( ch->pet && ch->pet->in_room == NULL )
{
char_to_room( ch->pet, get_room_index(ROOM_VNUM_LIMBO) );
extract_char( ch->pet, TRUE );
}

/* cut down on wiznet spam when rebooting */
if ( dclose->connected == CON_PLAYING && !merc_down)
{
char log_buf[MSL];

sprintf( log_buf, "Net death has claimed %s. (%ld)", ch->name,ch->in_room->vnum );
log_string( log_buf );
wiznet(log_buf,NULL,NULL,WIZ_LINKS,0,get_trust(ch));

act( "$n has lost $s link.", ch, NULL, NULL, TO_ROOM );

if (ch->actions != NULL)
{
DELAY * list, * next;

for (list = ch->actions; list != NULL; list = next)
{
next = list->next;
free_delay (list);
}
}
ch->actions = NULL;
ch->desc = NULL;
SET_BIT( ch->comm, COMM_LINKDEAD );

if (!IS_SET(ch->act,PLR_NOANNOUNCE) && !ch->incog_level && !ch->invis_level && !IS_SET( ch->comm, COMM_NOWHO))
{
sprintf(buf, "%s {wis now linkdead.", ch->name);
do_info(NULL,buf);
}


}
else
{
free_char(dclose->original ? dclose->original :
dclose->character );
}
}

if ( d_next == dclose )
d_next = d_next->next;

if ( dclose == descriptor_list )
{
descriptor_list = descriptor_list->next;
}
else
{
DESCRIPTOR_DATA *d;

for ( d = descriptor_list; d && d->next != dclose; d = d->next )
;
if ( d != NULL )
d->next = dclose->next;
else
bug( "Close_socket: dclose not found.", 0 );
}
ProtocolDestroy( dclose->pProtocol );

close( dclose->descriptor );
free_descriptor(dclose);
#if defined(MSDOS) || defined(macintosh)
exit(1);
#endif
return;
}
23 Oct, 2016, Hades_Kane wrote in the 28th comment:
Votes: 0
I caught it just before it crashed but while it was giving the too many file open errors.

The PID had 1023 descriptors listed, here's a sample of what most of them look like:

Quote
[XXXXX@EndofTime /proc/31738/fd]$ stat 1023

File: `1023' -> `socket:[1546546581]'
Size: 64 Blocks: 0 IO Block: 1024 symbolic link
Device: ach/172d Inode: 1548322861 Links: 1
Access: (0700/lrwx——) Uid: ( 1000/ XXXXX) Gid: ( 1000/ XXXXX)
Access: 2016-10-23 09:57:01.879954408 -0500
Modify: 2016-10-23 09:57:01.879954408 -0500
Change: 2016-10-23 09:57:01.879954408 -0500

[XXXXX@EndofTime /proc/31738/fd]$ stat 1022

File: `1022' -> `socket:[1546544111]'
Size: 64 Blocks: 0 IO Block: 1024 symbolic link
Device: ach/172d Inode: 1548322860 Links: 1
Access: (0700/lrwx——) Uid: ( 1000/ XXXXX) Gid: ( 1000/ XXXXX)
Access: 2016-10-23 09:57:01.879954408 -0500
Modify: 2016-10-23 09:57:01.879954408 -0500
Change: 2016-10-23 09:57:01.879954408 -0500

[XXXXX@EndofTime /proc/31738/fd]$ stat 1021

File: `1021' -> `socket:[1546542676]'
Size: 64 Blocks: 0 IO Block: 1024 symbolic link
Device: ach/172d Inode: 1548322859 Links: 1
Access: (0700/lrwx——) Uid: ( 1000/ XXXXX) Gid: ( 1000/ XXXXX)
Access: 2016-10-23 09:57:01.879954408 -0500
Modify: 2016-10-23 09:57:01.879954408 -0500
Change: 2016-10-23 09:57:01.879954408 -0500

[XXXXX@EndofTime /proc/31738/fd]$ stat 1020

File: `1020' -> `socket:[1546541956]'
Size: 64 Blocks: 0 IO Block: 1024 symbolic link
Device: ach/172d Inode: 1548322858 Links: 1
Access: (0700/lrwx——) Uid: ( 1000/ XXXXX) Gid: ( 1000/ XXXXX)
Access: 2016-10-23 09:57:01.879954408 -0500
Modify: 2016-10-23 09:57:01.879954408 -0500
Change: 2016-10-23 09:57:01.879954408 -0500

[XXXXX@EndofTime /proc/31738/fd]$


As far as the log file, this is a sample of what was there before the thing happened:

Quote
Read_from_descriptor: Connection reset by peer
Read_from_descriptor: Connection reset by peer
Read_from_descriptor: Connection reset by peer
Read_from_descriptor: Connection reset by peer
Read_from_descriptor: Connection reset by peer
Read_from_descriptor: Connection reset by peer
Read_from_descriptor: Connection reset by peer
Read_from_descriptor: Connection reset by peer
Read_from_descriptor: Connection reset by peer
Read_from_descriptor: Connection reset by peer
Read_from_descriptor: Connection reset by peer
Read_from_descriptor: Connection reset by peer
Read_from_descriptor: Connection reset by peer
Read_from_descriptor: Connection reset by peer
Sat Oct 22 21:30:56 2016 :: Cheren gained level 26
Read_from_descriptor: Connection reset by peer
Read_from_descriptor: Connection reset by peer
Read_from_descriptor: Connection reset by peer
Read_from_descriptor: Connection reset by peer
Sat Oct 22 21:52:59 2016 :: Closing link to Cheren.
Sat Oct 22 21:52:59 2016 :: Net death has claimed Cheren. (9974)
Sat Oct 22 21:53:22 2016 :: Loading Cheren.
Sat Oct 22 21:53:26 2016 :: Cheren@XXXXXXXXX.XXX reconnected.
Read_from_descriptor: Connection reset by peer
Read_from_descriptor: Connection reset by peer
Sat Oct 22 22:47:59 2016 :: Cheren gained level 27
Read_from_descriptor: Connection reset by peer
Read_from_descriptor: Connection reset by peer
Read_from_descriptor: Connection reset by peer
New_descriptor: accept: Too many open files
New_descriptor: accept: Too many open files
New_descriptor: accept: Too many open files
New_descriptor: accept: Too many open files
New_descriptor: accept: Too many open files


There was a whole lot of other Connection reset by peer in the logs, too.
23 Oct, 2016, Hades_Kane wrote in the 29th comment:
Votes: 0
Several more descriptors that showed as a "socket" appeared with no discernible connection or reading from the logs.

However, a little bit later, more "Connection reset by peer" did show up along with even MORE socket descriptors in the PID.
23 Oct, 2016, Pymeus wrote in the 30th comment:
Votes: 0
I'll ramble on for a bit. You may find the explanations useful, but see the question near the end.

Hades_Kane said:
On the valgrind fds tool, everything looked normal except for an instance of this:
==28820==
==28820== Open AF_INET socket 4: 0.0.0.0:4001 <-> unbound
==28820== at 0x4154F42: socket (in /lib/libc-2.7.so)
==28820== by 0x80E9AA7: init_socket (comm.c:491)
==28820== by 0x80E9A0D: main (comm.c:460)

I think that's harmless. It looks like it's the fd that the mud uses to listen for new connections.

Hades_Kane said:
Other things I've noticed… when another player connected to the game, their "num" in the "sockets" command lined up very closely with the number of descriptors when I did the 'ls' in the pid/fd directory, which at this point is up to 838 (the player that connected was listed as 836).

Obviously, you already know you have a problem, but I'll just comment that on a typical C mud the total number of open fds in that directory should closely match the number of active (but not necessarily logged in) connections. In addition, there are 3 fds numbered 0-2 for stdin/stdout/stderr, and usually another extra pointing to a log file. There's 1 fd for each port on which the mud listens for new connections, and sometimes 1 "reserved" fd which 99% of the time points to /dev/null. There may be a few more for any subprocesses, like a resolver, that your mud runs.

I'll also mention that instead of interrogating each number in /proc/(pid)/fd/ separately with "stat", you can get much of the same info nicely summarized via "ls -l /proc/(pid)/fd/" (or the common shell alias "ll" instead of ls). This is good for getting a broad overview quickly.

Hades_Kane said:
Read_from_descriptor: Connection reset by peer
Read_from_descriptor: Connection timed out
New_descriptor: getpeername: Transport endpoint is not connected

Those network events are fairly normal on a public server, but it's certainly possible that the code isn't handling them quite right.

Your log has lots of "Read_from_descriptor: Connection reset by peer", meaning that it's trying to use an fd where the other end has dropped the connection. Can we see your read_from_descriptor function?

Since there are no new logins appearing in your log file between most of these messages, my early suspect would be that the game is having trouble with early disconnects – someone connects, but doesn't log in or create a new character, and then manually disconnects before the game can time them out. Possibly they disconnect during char creation. More likely it's random port scans that only connect for a brief moment, or bots trying to probe a different protocol.
23 Oct, 2016, Hades_Kane wrote in the 31st comment:
Votes: 0
Ok, I ran valgrind for about an hour on the live port. On exit, this is a sample of what was popping up:

==3283== Open AF_INET socket 33: 66.221.0.71:4000 <-> 151.73.111.216:48936
==3283== at 0x4154A0C: accept (in /lib/libc-2.7.so)
==3283== by 0x80E9D58: init_descriptor (comm.c:1071)
==3283== by 0x80E9251: game_loop_unix (comm.c:798)
==3283== by 0x80E8F68: main (comm.c:469)
==3283==
==3283== Open AF_INET socket 32: 66.221.0.71:4000 <-> 151.73.111.216:48691
==3283== at 0x4154A0C: accept (in /lib/libc-2.7.so)
==3283== by 0x80E9D58: init_descriptor (comm.c:1071)
==3283== by 0x80E9251: game_loop_unix (comm.c:798)
==3283== by 0x80E8F68: main (comm.c:469)
==3283==
==3283== Open AF_INET socket 14: 66.221.0.71:4000 <-> 151.73.111.216:48442
==3283== at 0x4154A0C: accept (in /lib/libc-2.7.so)
==3283== by 0x80E9D58: init_descriptor (comm.c:1071)
==3283== by 0x80E9251: game_loop_unix (comm.c:798)
==3283== by 0x80E8F68: main (comm.c:469)
==3283==
==3283== Open AF_INET socket 29: 66.221.0.71:4000 <-> 80.179.17.18:37342
==3283== at 0x4154A0C: accept (in /lib/libc-2.7.so)
==3283== by 0x80E9D58: init_descriptor (comm.c:1071)
==3283== by 0x80E9251: game_loop_unix (comm.c:798)
==3283== by 0x80E8F68: main (comm.c:469)
==3283==
==3283== Open AF_INET socket 34: 66.221.0.71:4000 <-> 80.179.17.18:37216
==3283== at 0x4154A0C: accept (in /lib/libc-2.7.so)
==3283== by 0x80E9D58: init_descriptor (comm.c:1071)
==3283== by 0x80E9251: game_loop_unix (comm.c:798)
==3283== by 0x80E8F68: main (comm.c:469)
==3283==
==3283== Open AF_INET socket 30: 66.221.0.71:4000 <-> 80.179.17.18:37057
==3283== at 0x4154A0C: accept (in /lib/libc-2.7.so)
==3283== by 0x80E9D58: init_descriptor (comm.c:1071)
==3283== by 0x80E9251: game_loop_unix (comm.c:798)
==3283== by 0x80E8F68: main (comm.c:469)
==3283==
==3283== Open AF_INET socket 31: 66.221.0.71:4000 <-> 80.179.17.18:36943
==3283== at 0x4154A0C: accept (in /lib/libc-2.7.so)
==3283== by 0x80E9D58: init_descriptor (comm.c:1071)
==3283== by 0x80E9251: game_loop_unix (comm.c:798)
==3283== by 0x80E8F68: main (comm.c:469)
==3283==
==3283== Open AF_INET socket 21: 66.221.0.71:4000 <-> 80.179.17.18:36807
==3283== at 0x4154A0C: accept (in /lib/libc-2.7.so)
==3283== by 0x80E9D58: init_descriptor (comm.c:1071)
==3283== by 0x80E9251: game_loop_unix (comm.c:798)
==3283== by 0x80E8F68: main (comm.c:469)
==3283==
==3283== Open AF_INET socket 27: 66.221.0.71:4000 <-> 80.179.17.18:36668
==3283== at 0x4154A0C: accept (in /lib/libc-2.7.so)
==3283== by 0x80E9D58: init_descriptor (comm.c:1071)
==3283== by 0x80E9251: game_loop_unix (comm.c:798)
==3283== by 0x80E8F68: main (comm.c:469)
==3283==
==3283== Open AF_INET socket 26: 66.221.0.71:4000 <-> 80.179.17.18:36487
==3283== at 0x4154A0C: accept (in /lib/libc-2.7.so)
==3283== by 0x80E9D58: init_descriptor (comm.c:1071)
==3283== by 0x80E9251: game_loop_unix (comm.c:798)
==3283== by 0x80E8F68: main (comm.c:469)
==3283==
==3283== Open AF_INET socket 24: 66.221.0.71:4000 <-> 80.179.17.18:36367
==3283== at 0x4154A0C: accept (in /lib/libc-2.7.so)
==3283== by 0x80E9D58: init_descriptor (comm.c:1071)
==3283== by 0x80E9251: game_loop_unix (comm.c:798)
==3283== by 0x80E8F68: main (comm.c:469)
==3283==
==3283== Open AF_INET socket 18: 66.221.0.71:4000 <-> 80.179.17.18:36236
==3283== at 0x4154A0C: accept (in /lib/libc-2.7.so)
==3283== by 0x80E9D58: init_descriptor (comm.c:1071)
==3283== by 0x80E9251: game_loop_unix (comm.c:798)
==3283== by 0x80E8F68: main (comm.c:469)
==3283==
==3283== Open AF_INET socket 23: 66.221.0.71:4000 <-> 80.179.17.18:36061
==3283== at 0x4154A0C: accept (in /lib/libc-2.7.so)
==3283== by 0x80E9D58: init_descriptor (comm.c:1071)
==3283== by 0x80E9251: game_loop_unix (comm.c:798)
==3283== by 0x80E8F68: main (comm.c:469)
==3283==
==3283== Open AF_INET socket 20: 66.221.0.71:4000 <-> 86.35.243.176:36983
==3283== at 0x4154A0C: accept (in /lib/libc-2.7.so)
==3283== by 0x80E9D58: init_descriptor (comm.c:1071)
==3283== by 0x80E9251: game_loop_unix (comm.c:798)
==3283== by 0x80E8F68: main (comm.c:469)
==3283==
==3283== Open AF_INET socket 17: 66.221.0.71:4000 <-> 14.148.192.99:55641
==3283== at 0x4154A0C: accept (in /lib/libc-2.7.so)
==3283== by 0x80E9D58: init_descriptor (comm.c:1071)
==3283== by 0x80E9251: game_loop_unix (comm.c:798)
==3283== by 0x80E8F68: main (comm.c:469)
==3283==
==3283== Open AF_INET socket 19: 66.221.0.71:4000 <-> 179.220.240.163:39208
==3283== at 0x4154A0C: accept (in /lib/libc-2.7.so)
==3283== by 0x80E9D58: init_descriptor (comm.c:1071)
==3283== by 0x80E9251: game_loop_unix (comm.c:798)
==3283== by 0x80E8F68: main (comm.c:469)
==3283==
==3283== Open AF_INET socket 22: 66.221.0.71:4000 <-> 78.189.100.242:34470
==3283== at 0x4154A0C: accept (in /lib/libc-2.7.so)
==3283== by 0x80E9D58: init_descriptor (comm.c:1071)
==3283== by 0x80E9251: game_loop_unix (comm.c:798)
==3283== by 0x80E8F68: main (comm.c:469)
==3283==
==3283== Open AF_INET socket 25: 66.221.0.71:4000 <-> 78.189.100.242:34275
==3283== at 0x4154A0C: accept (in /lib/libc-2.7.so)
==3283== by 0x80E9D58: init_descriptor (comm.c:1071)
==3283== by 0x80E9251: game_loop_unix (comm.c:798)
==3283== by 0x80E8F68: main (comm.c:469)
==3283==
==3283== Open AF_INET socket 13: 66.221.0.71:4000 <-> 27.75.145.5:36917
==3283== at 0x4154A0C: accept (in /lib/libc-2.7.so)
==3283== by 0x80E9D58: init_descriptor (comm.c:1071)
==3283== by 0x80E9251: game_loop_unix (comm.c:798)
==3283== by 0x80E8F68: main (comm.c:469)
==3283==
==3283== Open AF_INET socket 12: 66.221.0.71:4000 <-> 179.220.240.163:35779
==3283== at 0x4154A0C: accept (in /lib/libc-2.7.so)
==3283== by 0x80E9D58: init_descriptor (comm.c:1071)
==3283== by 0x80E9251: game_loop_unix (comm.c:798)
==3283== by 0x80E8F68: main (comm.c:469)
==3283==
==3283== Open AF_INET socket 16: 66.221.0.71:4000 <-> 27.75.145.5:36601
==3283== at 0x4154A0C: accept (in /lib/libc-2.7.so)
==3283== by 0x80E9D58: init_descriptor (comm.c:1071)
==3283== by 0x80E9251: game_loop_unix (comm.c:798)
==3283== by 0x80E8F68: main (comm.c:469)
==3283==
==3283== Open AF_INET socket 15: 66.221.0.71:4000 <-> 27.75.145.5:36401
==3283== at 0x4154A0C: accept (in /lib/libc-2.7.so)
==3283== by 0x80E9D58: init_descriptor (comm.c:1071)
==3283== by 0x80E9251: game_loop_unix (comm.c:798)
==3283== by 0x80E8F68: main (comm.c:469)
==3283==
==3283== Open AF_INET socket 11: 66.221.0.71:4000 <-> 27.75.145.5:36291
==3283== at 0x4154A0C: accept (in /lib/libc-2.7.so)
==3283== by 0x80E9D58: init_descriptor (comm.c:1071)
==3283== by 0x80E9251: game_loop_unix (comm.c:798)
==3283== by 0x80E8F68: main (comm.c:469)
==3283==
==3283== Open AF_INET socket 10: 66.221.0.71:4000 <-> 27.75.145.5:36018
==3283== at 0x4154A0C: accept (in /lib/libc-2.7.so)
==3283== by 0x80E9D58: init_descriptor (comm.c:1071)
==3283== by 0x80E9251: game_loop_unix (comm.c:798)
==3283== by 0x80E8F68: main (comm.c:469)
==3283==
==3283== Open AF_INET socket 9: 66.221.0.71:4000 <-> 173.217.35.18:59644
==3283== at 0x4154A0C: accept (in /lib/libc-2.7.so)
==3283== by 0x80E9D58: init_descriptor (comm.c:1071)
==3283== by 0x80E9251: game_loop_unix (comm.c:798)
==3283== by 0x80E8F68: main (comm.c:469)


The referenced code is as follows, again with comments to the side to indicate to the proper line:

This is at the end of the int main() function:
/*
* Run the game.
*/
#if defined(macintosh) || defined(MSDOS)
boot_db();
log_string( "Merc is ready to rock." );
game_loop_mac_msdos( );
#endif
pulsenum = 1;
#if defined(unix)

if (!fCopyOver)
control = init_socket( port );

boot_db();
sprintf( log_buf, "End of Time is now running on port %d.", port );
log_string( log_buf );

if (fCopyOver)
copyover_recover();

game_loop_unix( control ); // <—– comm.c:469
close (control);
#endif

/*
* That's all, folks.
*/
log_string( "Normal termination of game." );
exit( 0 );
return 0;
}



In function game_loop_unix()

#if defined(unix)
void game_loop_unix( int control )
{
static struct timeval null_time;
struct timeval last_time;

signal( SIGPIPE, SIG_IGN );
gettimeofday( &last_time, NULL );
current_time = (time_t) last_time.tv_sec;

// init_signals();

/* Main loop */
while ( !merc_down )
{
fd_set in_set;
fd_set out_set;
fd_set exc_set;
DESCRIPTOR_DATA *d;
int maxdesc;

#if defined(MALLOC_DEBUG)
if ( malloc_verify( ) != 1 )
abort( );
#endif

/*
* Poll all active descriptors.
*/
FD_ZERO( &in_set );
FD_ZERO( &out_set );
FD_ZERO( &exc_set );
FD_SET( control, &in_set );
maxdesc = control;
for ( d = descriptor_list; d; d = d->next )
{
maxdesc = UMAX( maxdesc, d->descriptor );
FD_SET( d->descriptor, &in_set );
FD_SET( d->descriptor, &out_set );
FD_SET( d->descriptor, &exc_set );
}

if ( select( maxdesc+1, &in_set, &out_set, &exc_set, &null_time ) < 0 )
{
perror( "Game_loop: select: poll" );
exit( 1 );
}

/*
* New connection?
*/
if ( FD_ISSET( control, &in_set ) )
init_descriptor( control ); // <—– comm.c:798

/*
* Kick out the freaky folks.
*/
for ( d = descriptor_list; d != NULL; d = d_next )
{
d_next = d->next;
if ( FD_ISSET( d->descriptor, &exc_set ) )
{
FD_CLR( d->descriptor, &in_set );
FD_CLR( d->descriptor, &out_set );
if ( d->character )
save_char_obj( d->character );
d->outtop = 0;
close_socket( d );
}
}

/*
* Process input.
*/
for ( d = descriptor_list; d != NULL; d = d_next )
{
d_next = d->next;
d->fcommand = FALSE;

if ( FD_ISSET( d->descriptor, &in_set ) )
{
if ( d->character != NULL )
d->character->timer = 0;
if ( !read_from_descriptor( d ) )
{
FD_CLR( d->descriptor, &out_set );
if ( d->character != NULL)
save_char_obj( d->character );
d->outtop = 0;
close_socket( d );
continue;
}
}



And in init_descriptor() function:

#if defined(unix)

void init_descriptor( int control )
{
char buf[MAX_STRING_LENGTH];
DESCRIPTOR_DATA *dnew;
struct sockaddr_in sock;
struct hostent *from;
int desc;
// size was unsigned (cygwin fix)
int size;

size = sizeof(sock);
getsockname( control, (struct sockaddr *) &sock, &size );
if ( ( desc = accept( control, (struct sockaddr *) &sock, &size) ) < 0 ) // <—– comm.c:1071
{
perror( "New_descriptor: accept" );
return;
}

#if !defined(FNDELAY)
#define FNDELAY O_NDELAY
#endif

if ( fcntl( desc, F_SETFL, FNDELAY ) == -1 )
{
perror( "New_descriptor: fcntl: FNDELAY" );
return;
}

/*
* Cons a new descriptor.
*/
dnew = new_descriptor();

dnew->descriptor = desc;
dnew->connected = CON_ANSI;
dnew->showstr_head = NULL;
dnew->showstr_point = NULL;
dnew->outsize = 2000;
dnew->pEdit = NULL; /* OLC */
dnew->pString = NULL; /* OLC */
dnew->editor = 0; /* OLC */
dnew->outbuf = alloc_mem( dnew->outsize );
dnew->pProtocol = ProtocolCreate();

size = sizeof(sock);
if ( getpeername( desc, (struct sockaddr *) &sock, &size ) < 0 )
{
perror( "New_descriptor: getpeername" );
dnew->host = str_dup( "(unknown)" );
}
else
{
/*
* Would be nice to use inet_ntoa here but it takes a struct arg,
* which ain't very compatible between gcc and system libraries.
*/
int addr;

addr = ntohl( sock.sin_addr.s_addr );
sprintf( buf, "%d.%d.%d.%d",
( addr >> 24 ) & 0xFF, ( addr >> 16 ) & 0xFF,
( addr >> 8 ) & 0xFF, ( addr ) & 0xFF
);
sprintf( log_buf, "Sock.sinaddr: %s", buf );
//log_string( log_buf );
from = gethostbyaddr( (char *) &sock.sin_addr,
sizeof(sock.sin_addr), AF_INET );
dnew->host = str_dup( from ? from->h_name : buf );
}
23 Oct, 2016, Hades_Kane wrote in the 32nd comment:
Votes: 0
Here is the entire read_from_descriptor function:

bool read_from_descriptor( DESCRIPTOR_DATA *d )
{
int iStart;

static char read_buf[MAX_PROTOCOL_BUFFER];
read_buf[0] = '\0';

/* Hold horses if pending command already. */
if ( d->incomm[0] != '\0' )
return TRUE;

/* Check for overflow. */
//iStart = strlen(d->inbuf);
//if ( iStart >= sizeof(d->inbuf) - 10 )
iStart = 0;
if ( strlen(d->inbuf) >= sizeof(d->inbuf) - 10 )
{
sprintf( log_buf, "%s input overflow!", d->host );
log_string( log_buf );
write_to_descriptor( d->descriptor,
"\r\n*** PUT A LID ON IT!!! ***\r\n", 0 );
return FALSE;
}

/* Snarf input. */
#if defined(macintosh)
for ( ; ; )
{
int c;
c = getc( stdin );
if ( c == '\0' || c == EOF )
break;
putc( c, stdout );
if ( c == '\r' )
putc( '\n', stdout );
//d->inbuf[iStart++] = c;
read_buf[iStart++] = c;
if ( iStart > sizeof(d->inbuf) - 10 )
break;
}
#endif

#if defined(MSDOS) || defined(unix)
for ( ; ; )
{
int nRead;

/*nRead = read( d->descriptor, d->inbuf + iStart,
sizeof(d->inbuf) - 10 - iStart );
if ( nRead > 0 )
{
iStart += nRead;
if ( d->inbuf[iStart-1] == '\n' || d->inbuf[iStart-1] == '\r' )
break;
}*/
nRead = read( d->descriptor, read_buf + iStart,
sizeof(read_buf) - 10 - iStart );
if ( nRead > 0 )
{
iStart += nRead;
if ( read_buf[iStart-1] == '\n' || read_buf[iStart-1] == '\r' )
break;
}
else if ( nRead == 0 )
{
//log_string( "EOF encountered on read." );
return FALSE;
}
else if ( errno == EWOULDBLOCK )
break;
else
{
perror( "Read_from_descriptor" );
return FALSE;
}
}
#endif

//d->inbuf[iStart] = '\0';
read_buf[iStart] = '\0';
ProtocolInput( d, read_buf, iStart, d->inbuf );
return TRUE;
}


As far as some of the other things you mentioned, there are around 8 or so normal descriptors, many of which exactly the things you mentioned, and then a few misc ones specific to the game.

I suspect the connections that are piling up are the port scans or bots like you mentioned. With as quick as these things pile up, there's NO WAY its actual human beings connecting with that frequency.
23 Oct, 2016, Pymeus wrote in the 33rd comment:
Votes: 0
I'm not going to say that I've read through and understood every line of the pasted code, but I did look it all over, and diff'd it against the relevant sections of stock ROM (since I had it handy and this looks like a close relative) and am doubtful that the problem is in what's here.

At this point, I would try to reproduce it in a controlled environment. (ie your dev machine or dev port) If you can run it in a way that the outside world (bots, scanners, over-curious players) can't even connect to it in the first place, but you can still connect, then you have that many fewer variables to contend with.

Connect a few times and manually disconnect (ie without sending "quit" to the mud) at various phases of the initial login screen (at the username prompt, at the password prompt, any custom prompts, menus, motds, etc). Check the number of fds from a terminal as you go. Also try a couple where you fully log in, to confirm that the problem is only happening before login – so far all I have is a theory after all. If you succeed at reproducing the problem, the immediate goal should be to feel out the scope of the problem – when does it happen, when does it not happen. That will, hopefully, tell us where to look next.
24 Oct, 2016, quixadhal wrote in the 34th comment:
Votes: 0
In your game loop:

You have the check for processing input in a loop, inline with checking the descriptors. In my own code (DikuMUD alfa), this is split out… I don't remember if we did that, or if it came that way. :)

What makes it stand out to me is that sometimes things can get confusing when you're looping over a list which gets modified inside the loop. If close_socket() frees the descriptor on the way out, it might be confusing the loop logic.

Here's my corresponding code:
/*
* New connection?
*/
if (FD_ISSET(s, &input_set))
if (new_descriptor(s) < 0)
log_info("New connection");

/*
* kick out the freaky folks
*/
for (point = descriptor_list; point; point = next_point) {
next_point = point->next;
if (FD_ISSET(point->descriptor, &exc_set)) {
FD_CLR(point->descriptor, &input_set);
FD_CLR(point->descriptor, &output_set);
close_socket(point);
}
}

for (point = descriptor_list; point; point = next_point) {
next_point = point->next;
if (FD_ISSET(point->descriptor, &input_set))
if (process_input(point) < 0)
close_socket(point);
}

/*
* process_commands;
*/
for (point = descriptor_list; point; point = next_to_process) {
next_to_process = point->next;
24 Oct, 2016, Hades_Kane wrote in the 35th comment:
Votes: 0
Thanks for the responses and help!

Pymeus -

I've tried to reproduce it on the code port, and the only "luck" I've had is reconnecting a linkdead player doesn't "reuse" or free the previous socket/descriptor it occupied. But, we are hitting that 1024 limit about every 4-5 days, and while I'm sure that's sloppy socket code in there somewhere with it not freeing or reusing the socket, we don't have that amount of traffic for that to be the cause of it. Maybe a very, very small contribution.

I have noticed that when I just straight up connect to the MUD and let that connection sit, it occupies a socket/descriptor… I wonder if maybe I'm getting hit with bots/crawlers or whatever that are reaching the connection without actually disconnecting from it? I don't really know how that works, so I could be way off, just throwing out ideas here.

But best I can tell, I can't seem to reproduce the issue to any meaningful capacity, so I feel like it has to be coming from somewhere else. None of the other weird things, like the read_from_descriptor messages in the logs, appear on the code port, either.

FWIW, our wiznet has a "connections" option that displays to any connected imms any incoming connections that make it past our color/screen reader prompt that initially displays. If it may seem helpful, I can try moving that to where it displays (or logs) the connection info when a new connection first hits that prompt instead of past it and see if there is a correlation between incoming connections from addresses I don't recognize (and that don't fully connect) and how high this stuff is climbing.



Quix -

When I get a chance today, I'm going to try to get my code there more closely aligned with what you have and see if that makes any difference.

Thanks!
24 Oct, 2016, Hades_Kane wrote in the 36th comment:
Votes: 0
Ok, I enabled the wiznet thing for immediately incoming connections.

I'm getting these pretty much constant… well, in "pulses".

Incoming connection from 176.196.90.21.

Incoming connection from static.vnpt.vn.

Incoming connection from pool-108-21-59-2.nycmny.fios.verizon.net

Incoming connection from 191.111.176.139.

Incoming connection from 86.47.52.255.

Incoming connection from 171.250.27.134.


Sometimes one will hit 2-3 times in a second, and then I won't see it again for like 10 seconds. Some will almost seem like they are piggy backing. The longer I let it run, the more that are popping up. At first it was just 2 of them, then another will join in the rotation, and then a little later another will join in and keep going.

In the 10 or so minutes since I enabled the connection thing and did a copyover, my FDs have gone from 20 to about 60, and is still climbing. Sometimes after a pulse of the connections, it won't budge, other times it'll pop up one, then back down, but its a constant trend upward in how many are popping up.

I feel confident this is where the issue is originating, now to figure out what is broken on my code's end that is causing it to actually be an issue.

Any help/advice/thoughts on why I'm getting hit with these connections would be great, too.

Still going to mess around w/ what Quix posted and see if that makes any difference.

"localhost" has joined the party since I started typing this post.
24 Oct, 2016, Hades_Kane wrote in the 37th comment:
Votes: 0
But looking at the code, I have no reason to think that these connections are going any further than init_descriptor because it takes input to get any further (by answering the initial prompt), and by default, my wiznet connections (which I moved into init_desc before the prompt) is normally immediately after answering the prompt, and with it there, I wasn't seeing this connection information.
25 Oct, 2016, Hades_Kane wrote in the 38th comment:
Votes: 0
Quix's code change suggestion -seems- to have fixed it.

I was up to around 29 descriptors after my last copyover, and while it "spiked" to 35, its hovering around 32 right now and so far it seems to be holding steady.

————————–

Ok, since I first typed that, I played around more with a test character losing link and coming back in. It notched it up one, but now it seems to be increasing again… but MUCH MUCH slower.

By this point before, I was up to 60, and right now I'm at 40.

So I'm not entirely sure what, if any, affect the code change had.

—————————

Lol, ok, they have fallen slightly since that (now around 37). I think I'm going to leave it as-is for a couple of hours, and I'm going to come back and check and see what they look like… I'll update then.
25 Oct, 2016, quixadhal wrote in the 39th comment:
Votes: 0
It sounds to me like when you drop a socket, you aren't fully clearing things out.

Things I can think of…. SO_LINGER keeps a socket open until pending data has been flushed, allowing you to read/write what's there despite the other end going away. This is useful, but means if you do tiny reads you may not close the socket until you've exhausted the buffer.

Make sure you are not only closing the socket, but using FD_CLR() on the descriptor, so that select isn't still trying to poll it.

If you check the file I linked, further up I set some signal handlers, so I can ignore things like SIG_PIPE and friends. Not sure if that would help, but it can sidestep some of the issues with trying to write to a socket that just closed.

I assume the file descriptors themselves are actually staying around, and it's not just ghost data from improperly freed (or recycled?) data structures? The ROM folks made the horrible decision to try and reuse things instead of freeing them, because on their specific hardware and OS, malloc() and free() were very slow.

If you really are just getting a flood of attempted (and aborted) connections…. any chance you're running on port 23, or perhaps a port that bit torrent uses by default? Maybe you're getting hit by root kits, or buggy torrent software?
25 Oct, 2016, Hades_Kane wrote in the 40th comment:
Votes: 0
Its over 150 now, so its still stacking up.

I'll take a closer look at the other stuff you mentioned (probably won't be until tomorrow night, unfortunately).

Best I can tell, the FDs are sticking around? I'm not really sure how to check that, exactly.

I'm running it on port 4000. Once upon a time, I had it setup where it would hit port 23 and forward, but somewhere along the lines that stopped working and I feel pretty certain I removed all of that, but it would probably be a good idea to double check.
20.0/43