01 Dec, 2008, Silenus wrote in the 1st comment:

Votes: 0

Hi,

After a brief hiatus I have gone back to working on my mud server. One issue that arises is how to prevent various run time problems related to over utilization of resources such as the cpu, memory, network ports etc. by various entities in the system. I am not quite sure how best to this so I am asking for advice. The cases I have enumerated (not exhaustive) are as follows:

1) halting type situations (infinite loops) need to be trapped by the virtual machine somehow
2) stack overflows (due to deep recursion).
3) heap exhaustion (due to over allocation of either data or spawning alot of auxillary objects)
4) DOS type attacks on ports.

Now there are various ways of tackling some of these problems. For number 1) the LPmud server MudOS keeps a running tally of a metric of instruction cost and branches out of an infinite loop when the tally hits a certain user determined threshold. This to me seems a bit expensive. Alternatives I can think of are adding additional traps around existing branch statements (control flow) which update the counters but less often and do the same thing- do this only for backward pointing branches which in a given cycle of the call graph (1 per cycle).

For 2) this one seems like you would just need to have some sort of counter in place for call chains. Maybe there are better ways….

3) Somehow have some limit on resource usage per xxx/yyy after which a given thread of execution or process cannot allocate further objects/data without clearing some existing data.

4) No idea looking for advice here.

Are there better methods for tackling these sorts of problems? I will confess I am a bit of a newbie when it comes to system related issues and I am so far thinking only in purely computational terms.

Thanks in advance,

Silenus.

01 Dec, 2008, David Haley wrote in the 2nd comment:

Votes: 0

Infinite loops: an instruction count really isn't that bad; certainly not when compared to adding traps around existing branch statements. You don't have to have the count trap be tested on every single instruction; it should suffice to have a granularity in the thousands, for example. Nor does it have to be very precise; it would be enough to trigger it "every once in a while" as long as it is guaranteed to eventually be triggered.

Stack overflow: sure, just have a limit on the depth of recursion, like 100, and throw an exception if that limit is passed. You might have to tweak it depending on how deeply you recurse. You might want to implement tail recursion if you like writing recursive algorithms. (Depth of recursion would then be the number of stack frames, not the number of function calls per se.)

Heap exhaustion: trigger the garbage collection when you run out of resources. Not a whole lot you can do if you need more resources to do something and don't have any available. Trying to allocate memory and failing is a "Very Bad" situation, so if you prevent a thread from allocating memory you're basically killing it entirely.

DOS attacks: I wouldn't do anything at the language level. First off make sure that your connection handling is as light-weight as possible until you determine that the person on the other hand is a legitimate person; then devote more resources. You could have some kind of automated blocking where an address is denied if more than X (5?) failed attempts are made within some (rather short) period of time (30 seconds?). This is something that many ssh servers do, for example.

01 Dec, 2008, Silenus wrote in the 3rd comment:

Votes: 0

Thanks for the input. It might have been a bit unclear from my initial post but my concern is wrt. malicious code which would have reduced privileges according to some security scheme. In a sense the first two(infinite loops and stack overflows) in a single-threaded situation could perhaps be considered global because only 1 thing executes at a time. With the third one I am concerned about one entity spawning alot of processes or generating alot data making it difficult for the entities to do their tasks.

I am not quite sure I understand you when it comes to infinite loops but perhaps there is something I dont quite grasp. My initial thought was that introducing additional branches which halt the execution at or around existing branch points using some sort of counter would be cheaper and safer than doing some higher granularity test since it would be difficulty to predict where the branches are going (unless there is some method for certain cases to project when certain loops are guaranteed to terminate etc).

01 Dec, 2008, David Haley wrote in the 4th comment:

Votes: 0

Ah… if it's w.r.t. malicious code that changes thing a bit (well, not tremendously, but it does contextualize the issue).

Infinite loops: what I've seen people do to guard against runaway code is to only allow the script to run for X instructions before interrupting it. Before entering untrusted code, the engine sets up a hook that is caught after X instructions are executed. In this context you don't even think about branches: you just count instructions, on the assumption that if more than X (e.g. 1 million?) instructions are executed, chances are that something went wrong. Loops can happen in all sorts of strange places, and it is very hard (in fact, theoretically impossible) to determine if a loop is guaranteed to halt. So the "dumb" approach of instruction counting is usually a better idea.

For the entity hogging up resources, it sounds like you can just put caps on the memory that can be allocated by these entities, and kill it off if it tries to do too much. Since these are presumably not core entities, it should be ok to just kill it off, and you can then tell the author to fix their code.

In all of this I am assuming that you can differentiate between the core engine (or at least the trusted part of your engine) and the rest of it, and that you are going to allow the core to set limits on and observe what the untrusted code is doing. Really, what you're trying to do is very similar to a miniature operating system, so it might make sense to model things similarly.

01 Dec, 2008, Silenus wrote in the 5th comment:

Votes: 0

Thanks again for the input.

Infinite loops: the difficulty here for me is that I am thinking of using LLVM 2.4's JIT as my backend. Now given I am using an IR(intermediate representation) which is a bit closer to how the machine operates and that whole functions or chunks of code are being turned into native instructions- I would have to add my own branches and cannot rely on the VM interpreters inner loop to do the work for me. If I emulated the behavior of per instruction counts it would get pretty slow too since every second instruction would be effectively some sort of increment instruction.

My first conservative take on this was to consider inserting an extra branch to some error basic block whenever the system encounters a branch based on a branch taken counter… this would work (as long as you passed the branch count down into function calls) but seems to be not too much of a gain. So I was wondering if by examining the graph of basic blocks whether you could just stick these extra branches in places where the graph contains a cycle. Perhaps improving on this even more only insert of these branches per cycle thus created. I have been curious if this scheme is correct and also whether it is still too conservative i.e. can it be improved upon.

Yes Lpmud servers are very much like miniOS systems. So alot of the concepts as you note could perhaps be recycled from the solutions devised for solving problems with multiple users trying to share a fixed set of resources.

01 Dec, 2008, elanthis wrote in the 6th comment:

Votes: 0

DavidHaley said:

Infinite loops: an instruction count really isn't that bad; certainly not when compared to adding traps around existing branch statements.

Quote

Loops can happen in all sorts of strange places, and it is very hard (in fact, theoretically impossible) to determine if a loop is guaranteed to halt. So the "dumb" approach of instruction counting is usually a better idea.

Actually, it's during the backwards jump in loops that almost every single high-performance VM does this kind of loop detection. If the counting only happens in these instructions, and you protect against stack overflow, and you don't let a malicious user upload 4 GB of source code, and you have no blocking instructions/functions, then you can get 100% guaranteed infinite loop detection with significantly less performance penalty than checking on every instruction.

Quote

Stack overflow: sure, just have a limit on the depth of recursion, like 100, and throw an exception if that limit is passed. You might have to tweak it depending on how deeply you recurse. You might want to implement tail recursion if you like writing recursive algorithms. (Depth of recursion would then be the number of stack frames, not the number of function calls per se.)

The neat thing is, if tail recursion is implemented properly, it automatically benefits from the backward jump detection mentioned above. Less code, more power. :)

Quote

Heap exhaustion: trigger the garbage collection when you run out of resources. Not a whole lot you can do if you need more resources to do something and don't have any available. Trying to allocate memory and failing is a "Very Bad" situation, so if you prevent a thread from allocating memory you're basically killing it entirely.

Not necessarily. It is very rare for an application to gracefully handle OOM situations, but it certainly is possible. In the case of a scripting language or an application running in a VM, it's a lot more flexible, too. You can set the memory limit for a specific "VM thread" to be something acceptably low, so that the script can only end up killing itself, not the whole application.

Quote

DOS attacks: I wouldn't do anything at the language level. First off make sure that your connection handling is as light-weight as possible until you determine that the person on the other hand is a legitimate person; then devote more resources. You could have some kind of automated blocking where an address is denied if more than X (5?) failed attempts are made within some (rather short) period of time (30 seconds?). This is something that many ssh servers do, for example.

Actually, a lot of systems use external tools for this. My ssh setup uses sshguard, for example (which can be easily modified to work with your MUD.) These tools actually add the blocks directly into the kernel firewall, which is the most efficient way to handle it. You do of course require root privileges (for installing sshguard, NOT for running the MUD!) in order for this to work.

Implementing it at the application level is several orders of magnitude slower than having a tool that updates the firewall. If someone wants to DOS your server, it is almost guaranteed to be possible if you implement filtering at the application level, if only because the DOS can keep your accept() queue maxed out with relatively little bandwidth resources. You're pretty much always better off doing filtering in a kernel firewall or at the router/switch level. If you're running on a MUD host, you might consider politely asking the host to implement some simple iptables+conntrack rules to filter out excessive connections to non-HTTP ports.

01 Dec, 2008, David Haley wrote in the 7th comment:

Votes: 0

elanthis said:

then you can get 100% guaranteed infinite loop detection with significantly less performance penalty than checking on every instruction.

It's theoretically impossible to guarantee that a program will halt. How then can you get 100% infinite loop detection? I think what you meant is that you can still check if you are hitting your instruction count limit – which does not mean you necessarily hit an infinite loop – while maintaining good performance.

elanthis said:

I think we're saying the same thing. :smile: I wasn't talking about the whole application, just the execution context that was hitting its allocation limit.

elanthis said:

Implementing it at the application level is several orders of magnitude slower than having a tool that updates the firewall. If someone wants to DOS your server, it is almost guaranteed to be possible if you implement filtering at the application level, if only because the DOS can keep your accept() queue maxed out with relatively little bandwidth resources. You're pretty much always better off doing filtering in a kernel firewall or at the router/switch level. If you're running on a MUD host, you might consider politely asking the host to implement some simple iptables+conntrack rules to filter out excessive connections to non-HTTP ports.

Completely agreed that doing it at the OS level is more efficient. I was under the assumption that we were doing something at the application level, and so talking to iptables wasn't an option.

That said, it is still a good idea to keep your connection handling as light-weight as possible until you're sure you're talking to a real person.

01 Dec, 2008, elanthis wrote in the 8th comment:

Votes: 0

Quote

Yeah, that's what I was trying to say. :)

02 Dec, 2008, Silenus wrote in the 9th comment:

Votes: 0

I think on a purely theoretical level there are loop classes which you can determine to terminate in a finite amount of time (though this could be quite large). Obviously trivially in certain cases you can also determine the exact running time (for example with a foreach loop on a constant array which consists only of basic opcodes or functions which have similar properties). Arguably this might be quite useless in practice since even in the simplest and probably most common case something like-

for(i = 0;i < n;++i) { some basic opcodes or functions consisting of such }

you can establish that it terminates but n might be so large that it causes problems. Obviously for general programs the problem is undecidable (halting problem etc). It is good to know though that the solution I thought up (for counting branches in cycles) is equivalent to what elanthis is saying so I guess that confirms that I am on the right track.

02 Dec, 2008, David Haley wrote in the 10th comment:

Votes: 0

It turns out that as long as you allow simple arithmetic and basic control flow (if, goto), it is formally undecidable whether or not a general program will terminate. Yes, of course you can tell for sure in some cases whether a program will halt. But it's awfully dangerous to go down that route, because, well, it is an impossible question to answer in general. :wink:

BTW, you don't want to be counting branches in cycles: you want to look at a very specific branch, namely the one that creates a cycle in the basic block graph. You don't want to be messing with branches in general, just very specific ones.

02 Dec, 2008, Tyche wrote in the 11th comment:

Votes: 0

Even in userspace, there's code that can fail, code that can't fail, code that can continue, code that requires rollback and code that doesn't.

02 Dec, 2008, Silenus wrote in the 12th comment:

Votes: 0

DavidHaley said:

Well I am well aware of that- I was just curious if any work had been done exploring this area. I suspect my cursory exploration suggests you would gain nothing useful.

DavidHaley said:

BTW, you don't want to be counting branches in cycles: you want to look at a very specific branch, namely the one that creates a cycle in the basic block graph. You don't want to be messing with branches in general, just very specific ones.

Sure I think I mentioned this in the previous post.

02 Dec, 2008, David Haley wrote in the 13th comment:

Votes: 0

Well, plenty of work has been done in this area (and more broadly, static analysis in general). It turns out that in many cases you actually can determine when a loop will halt: you just have to constrain your problem area very carefully and occasionally give nudges in the right direction (e.g., asserting that a given function is guaranteed to terminate). It is difficult for example to guarantee analytically that a large linked list will not contain any cycles, but you might have high enough confidence that your linked list code is correct that you feel comfortable asserting to the analysis engine that the list is acyclical. Of course the more complex your program, the more work you have to do yourself, so eventually you start questioning just how automatic the tools are. (Not to say that they're not useful, but they're more debuggers than verifiers IMO.)

Silenus said:

Sure I think I mentioned this in the previous post.

Sorry, I must have missed that, I confess to having read rather quickly last night. :redface:

02 Dec, 2008, quixadhal wrote in the 14th comment:

Votes: 0

Silenus said:

for(i = 0;i < n;++i) { some basic opcodes or functions consisting of such }

you can establish that it terminates but n might be so large that it causes problems.

That depends a great deal on what's inside the braces. If I were to put an n++ in there, said termination wouldn't happen, since even when you rolled around the INT_MAX limit, they'd both roll around in sync and still be equal.

Of course, that kind of bug isn't going to show up often… but you did say you were worried about malicious code, and someone who wants to peg your CPU might try something of that nature.

03 Dec, 2008, Silenus wrote in the 15th comment:

Votes: 0

Well actually a dependency with respect to i should be quite easy to detect. I perhaps was not clear enough again here I meant that n upon entry could be turn into a constant upon entry into the loop. I speculate that this sort of loop also is probably the most common form of loop and in many common cases (or at least how I use them) termination can be guaranteed though n might be so large that it would seem forever though wouldn't be infinite.

David, in terms of static analysis of loops do you have any good references on the topic? I speculate that incorporating perhaps some such fixes into the branch count mechanism if the loop cases considered are common enough might save quite a few "backwards" branch counting and checking points which is why I brought it up.

03 Dec, 2008, David Haley wrote in the 16th comment:

Votes: 0

Silenus said:

I perhaps was not clear enough again here I meant that n upon entry could be turn into a constant upon entry into the loop.

Yes, well, unless…
1- you actually do want to move 'n' for legitimate reasons (e.g. you added more things to process)
2- 'n' is not actually a variable/number but the result of a function call, such as getSize(), the return value of which depends on something else – such as the size of a container that you are processing in the loop body
3- 'n' is a variable shared by several threads, that could be legitimately modified by other threads while in the loop
etc.

Silenus said:

David, in terms of static analysis of loops do you have any good references on the topic? I speculate that incorporating perhaps some such fixes into the branch count mechanism if the loop cases considered are common enough might save quite a few "backwards" branch counting and checking points which is why I brought it up.

Unfortunately not really. The only references I know are very academic and/or mathematical (as in, they involve logic proofs, and not a line of code), and so would not be terribly useful.

Some friends of mine have been doing practical work in static analysis, but not related to loop termination. Still, if you'd like to read their work, here are their two homepages.

03 Dec, 2008, Silenus wrote in the 17th comment:

Votes: 0

Thanks anyhow. I guess theoretical results may not directly applicable. Sometimes they are sometimes they are not- though I somewhat enjoy the more mathematical stuff.

03 Dec, 2008, David Haley wrote in the 18th comment:

Votes: 0

Yeah, that's the unfortunate bit about research that is still at a fairly theoretical stage. But, frankly, that also shows you just how incredibly difficult this problem is: people have been working on this for many decades and have made relatively little progress that is practically useful for general programs.

Random Picks

Ateraan

DB: Sphere of Fantasy

Neo Smaug