14 Jan, 2009, ghasatta wrote in the 21st comment:
Votes: 0
Quote
the more I start wondering if a proper parser might be in order as well.
boost::tokenizer works wonders.

http://www.boost.org/doc/libs/1_37_0/lib...
15 Jan, 2009, quixadhal wrote in the 22nd comment:
Votes: 0
Actually, this is more like what I was thinking of….

http://spirit.sourceforge.net/distrib/sp...
15 Jan, 2009, Tyche wrote in the 23rd comment:
Votes: 0
elanthis said:
Non-arbitrary fixed-sized buffer lenghts, maybe? Using dynamically-allocated buffers where appropriate? Removing the use of buffers entirely where they shouldn't be used or where simpler solutions could be created?

I saw an example in another thread where a user wanted to colorize some output, so he had to create a buffer, call a function on the input string to spit the terminal control codes into the buffer. and then send that to a send_to_char function. Why not just make a send_to_char_color function that does that in one go, without a ton of extra copies and buffers?

I also see a lot of things like sprintf()ing a bunch of times into a buffer which is then passed to send_to_char. Why not a printf_to_char function? Why append to a buffer over and over and then pass that to send_to_char instead of just calling send_to_char directly several times? That would have less overhead with any sane implementation of send_to_char.

quixadhal said:
As far as picking an arbitrary size… I suppose one has to do that if you're going to use stack buffers… but why not pick a nice even power of two? Working under the assumption that computers like powers of two, and that the paging subsystem is likely going to allocate pages in powers of two, it seems to me that choosing something that will be evenly divisible by (or into) the page size is a good idea.

Since 4K seems to be roughly the number they were aiming for, 4096 seems like a good number to try.

However, I agree… the use of varargs functions was either outside the ancestral Diku coder's knowledge, or perhaps gcc didn't support such things back then (although I'm pretty sure it did).

One of the first things I do to a diku-base is make printf-style functions for all the various send_to_foo's and act. The more I work with one_argument(), the more I start wondering if a proper parser might be in order as well.


* The process stack is already allocated by the kernel (on some systems it may grow at runtime), and given that alignment is unpredictable using a power of two is just as arbitrary as using any other number that comes to mind. In fact using 4K+ almost guarantees that every stack frame will cross page boundaries. Even so, crossing stack page boundaries isn't significant since pages occupied by the stack are the least likely pages to be swapped out.

* It seems to me that dynamically allocated buffers are used when appropriate. One would not dynamically allocate temporaries or scratch space, unless the size required by the operations approached stack limits. malloc/calloc/free would seem to be a far more expensive proposition than incrementing the stack pointer.

* The call depth on Dikumuds is quite shallow. Even with MobProgs, the stack size isn't notable because of the coded recursion limit.

* Many of the multiple sprintf's I see have control logic interspersed between them. Some were apparently done for code readability by avoiding string concantenation. And yes, some are accumulated cruft from adding features.

* Variadic functions don't really address the real problem in any meaningful way. Certainly a variadic send_to_char(), bug() log() are useful and will save you a few lines of code, but remember the parameters to those calls are going to have to be assigned to variables and allocated on the stack. act() already has 6 or so parameters.

The real problem has nothing to do with the arbitrary size.
DikuMuds aren't unusual in their use of arbitrary limits in character arrays compared to most C applications.

The problem is this arbitrary limit isn't checked nor enforced.

You'd get a far far more robust system by removing strcat, strcpy, and sprintf, and instead use something consistently like snprintf and actually check the return value.
15 Jan, 2009, Lobotomy wrote in the 24th comment:
Votes: 0
quixadhal said:
It also attempts to handle quotes, although you can confuse it with things like [ pick up "Bob's shovel"].

I was under the impression that one_argument (or at least the implementations I've seen) works on a variable delimiter system. I.e, space is the default delimiter, but that it can be overriden to become either the apostrophe or quotation mark depending on which one it hits first. Then it merely looks for the next occurance of the delimiter to know where to stop. In the case of "Bob's shovel", it should be ignoring the apostrophe since the quotation mark appears first. Is there a particular codebase's one_argument where this is not the case, or am I merely misunderstanding what you meant?

In regards to one_argument and argument parsing in general, I think it could be good to outlay here, or in another thread entirely, all the various sorts of things an argument parser should be able to handle or is otherwise expected to do (or ideas for things it could do that it doesn't do already). Creating a new argument parsing system (again) is one thing that I have slated among other changes to make to my codebase, so it's something that I'm especially interested in.
15 Jan, 2009, David Haley wrote in the 25th comment:
Votes: 0
One could argue that one_argument is implementing a parser of some sort. The problem is how funky its API is. In C++, it's fairly easy to wrap it into a function that returns a vector of arguments.

But to be fair, there's a reason why one_argument has the API it does. You sometimes want to tokenize part of the string (the first bit, to find the commands) but then use the rest of the string as-is. If you tokenize the string, you lose quotation marks, spaces, etc.

This is where a proper grammar (not grammer :twitch: :tongue:) can be really helpful: you can specify how a command works, and say that if you see these tokens first, treat the rest of the string as one token, but if those tokens come first, tokenize the rest of the string. Then, you can keep the same nifty API of turning a string into a vector of tokens, but not lose the ability to do things like the above.
17 Jan, 2009, quixadhal wrote in the 26th comment:
Votes: 0
My next entry for Madness, still in the ban code. :)

In check_ban(), the very first lines….

Quote
strcpy( host, capitalize( site ) );
host[0] = LOWER( host[0] );


Ummmm… why capitalize site if the very next thing you're doing is to lowercase it?

*tap tap* Hello? This thing on? *tap tap*
17 Jan, 2009, Pedlar wrote in the 27th comment:
Votes: 0
hahaha
18 Jan, 2009, ghasatta wrote in the 28th comment:
Votes: 0
quixadhal said:
My next entry for Madness, still in the ban code. :)

In check_ban(), the very first lines….
In order to avoid any tears over painful unnecessary integration, just want to point out that this code is all gone in the latest revision on the dev branch.

And yes, it is rather curious. Especially since there isn't any input scrubbing/validation when creating bans.
18 Jan, 2009, Sharmair wrote in the 29th comment:
Votes: 0
The capitalize function does not just upper case the first char of the string, it also lowers
the rest. This was probably their way of lowering the whole string in just a couple lines.
19 Jan, 2009, Lobotomy wrote in the 30th comment:
Votes: 0
Sharmair said:
This was probably their way of lowering the whole string in just a couple lines.

Doesn't/shouldn't that particular codebase have some implementation of a to_lower() function or somesuch, though? :thinking:
19 Jan, 2009, David Haley wrote in the 31st comment:
Votes: 0
When it comes to these codebases, trying to reason about things in the context of how they "should" be is about as effective as trying to solve algebra problems by chewing bubble gum. (With apologies to ol' Baz.)
20.0/31