15 Dec, 2008, Zeno wrote in the 1st comment:
Votes: 0
So I get stuff like this somewhat often:
Log: [*****] LAG: large snake: mpmload 1010 (R:1026 S:2.372001)

On my VPS. Now if I send these logs to the support center, they will probably blame my code.

What can I do in the shell to show this "lag"?
15 Dec, 2008, Guest wrote in the 2nd comment:
Votes: 0
Do you have any idea what the CPU and memory load are when those lag spikes happen? That sort of thing typically only happens when the system is bogged down.
15 Dec, 2008, Zeno wrote in the 3rd comment:
Votes: 0
Is there a way to print that data in that log (using C)? I don't know what the mem/CPU was it happened.

Right now Memory is 203MB out of 512MB.
CPU is: load average: 0.06, 0.01, 0.00
15 Dec, 2008, Guest wrote in the 4th comment:
Votes: 0
I'm sure there must be a proper way to do it, but in my case what I'd end up doing is more hackish. Using the Command Shell snippet for Smaug, you could have it simply send the "w" command to the system and write the results to the log. If a solution like this isn't desirable, you could just wait for the next lag notice, log in, run 'top' and see what's eating cycles.
15 Dec, 2008, Zeno wrote in the 5th comment:
Votes: 0
Log: [*****] LAG: Player: kill snake (R:1026 S:1.509467)

Within 10sec I typed w and then top:

top - 21:26:06 up 136 days,  2:48,  1 user,  load average: 0.07, 0.02, 0.00
Tasks: 42 total, 1 running, 41 sleeping, 0 stopped, 0 zombie
Cpu(s): 0.1%us, 0.0%sy, 0.1%ni, 99.3%id, 0.5%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 524288k total, 222392k used, 301896k free, 0k buffers
Swap: 0k total, 0k used, 0k free, 0k cached

21:26:05 up 136 days,  2:47,  1 user,  load average: 0.07, 0.02, 0.00
root pts/0 cpe-xx-xx-xx-xxx 21:18 0.00s 0.01s 0.00s w

Looks fine..?
15 Dec, 2008, quixadhal wrote in the 6th comment:
Votes: 0
One way you can tell is to keep time info at the top of each trip through the main loop. You know how often a "TICK" should happen, and the game is supposed to sleep for the leftover time to make those happen on a regular basis. If you ever miss a tick, or fall below some threshold that makes you think you're close to not having any sleep time left, that'd be the spot to issue warnings.

That won't tell you what did it, but it'll tell you it's happening. To find the culprit, you might have to compile with profiling enabled and then use gprof to examine a log file where it happens.
15 Dec, 2008, Zeno wrote in the 7th comment:
Votes: 0
Well it's the server I know for sure. I can take this code and put it on my own server and it doesn't have this issue.

Trying to prove to the support team it's on their end, not solve something in my code.
15 Dec, 2008, The_Fury wrote in the 8th comment:
Votes: 0
I also get those random lag spikes on my vps every now and again on seemingly random commands that really don't load up anything processor wise. I suspect that it is caused my too much load on the node at the times that this happens. I know each vps is meant to be insulated from resource hogs, but i suspect that its quite possible for all vps on a single node to hammer the cpu and disk at the same time which might cause these spikes. Personally i would not worry too much about them unless this is happening more than once in a while.
15 Dec, 2008, Zeno wrote in the 9th comment:
Votes: 0
Happens multiple times a day, I don't want to see it happen ever.
15 Dec, 2008, David Haley wrote in the 10th comment:
Votes: 0
First off this is not "lag", this is CPU or other resource usage. If you try to describe it to them as "lag" you'll probably just confuse them and they might point you to network issues or something.

The best thing I can suggest is to just try talking to them. Tell them that the function call in question usually takes 'x' seconds – get some empirical numbers. Tell them what the warning means (i.e. how long has to pass for the alarm function to get triggered). Ask why there are such sudden CPU spikes, and explain that this is causing trouble for you because your function that should be fast is taking much longer.

I'd only go to more data-collecting effort if they actually blow you off.
15 Dec, 2008, Zeno wrote in the 11th comment:
Votes: 0
Hm, data collection. What about writing a simple C program to do this and produce these same issues? Not that I know what to do offhand, but I assume it would help them understand that it's not an issue in the mass of code running.
15 Dec, 2008, Scandum wrote in the 12th comment:
Votes: 0
That's typical behavior for a VPS and as far as I know there's nothing that can be done about it.
15 Dec, 2008, David Haley wrote in the 13th comment:
Votes: 0
How long do these spikes last? If they last long enough, you can write a perl script that takes the output from, say, 'top' every x seconds and then look at that. There are also plenty of programs that grab CPU usage and write it to a file – you could then graph it and show that there's a problem with the whole system, not just one program. To be really accurate, you'd have to show the spike when your program isn't running…

Remember that if you have a virtualized server, it is entirely possible that there is nothing you can do to see where the usage is, because it could be in the physical machine underneath and therefore completely invisible to you as far as resource consumption goes.

I'd still recommend just asking first.
15 Dec, 2008, Zeno wrote in the 14th comment:
Votes: 0
When this happens, the commands freeze for a few seconds. Long enough to cause deaths on a MUD.

I tried asking once, showing this log ouput. They were a dick about it and tried claiming MUDs violated the ToS (when the ToS really just said CS servers etc).
15 Dec, 2008, David Haley wrote in the 15th comment:
Votes: 0
So the MUD keeps running and processing ticks even though commands are not being processed? What exactly is going on?

Showing log output isn't enough: I think you need to very accurately describe what the problem is and what symptoms you see.
15 Dec, 2008, Zeno wrote in the 16th comment:
Votes: 0
For the player who experiences this (shown by the log), the MUD basically seems like it freezes.

look self
(7 seconds pass)
Log: [*****] LAG: Zeno: look self (R:1026 S:2.372001)
You see nothing special.
15 Dec, 2008, David Haley wrote in the 17th comment:
Votes: 0
What is the rest of the MUD doing during this period?

The 'lag' message usually means that the whole process is halted. But you said that people are dying during this period…
15 Dec, 2008, Zeno wrote in the 18th comment:
Votes: 0
I don't know for sure. When I see the lag message, that means the lag period is already over. So I can't type who or anything to see if it goes through.

When I get home, I'll spam commands until I hit that lag.
15 Dec, 2008, David Haley wrote in the 19th comment:
Votes: 0
You could always edit the 'lag' function (I think it's called alarm something or other) to dump the who list and recent commands to disk to see what was going on, and maybe even dump resource usage output or something like that..
15 Dec, 2008, Zeno wrote in the 20th comment:
Votes: 0
Yeah, that's what I wondered about on page 1.

Should I be using system() for this?