01 Nov, 2008, quixadhal wrote in the 141st comment:
Votes: 0
Ummmmm….. W. T. F. ????

char                   *fread_string_eol( FILE * fp )
{
static bool char_special[256 - EOF];
char *plast = NULL;
char c = '\0';

if ( char_special[EOF - EOF] != true )
{
char_special[EOF - EOF] = true;
char_special['\n' - EOF] = true;
char_special['\r' - EOF] = true;
}


Unless I'm very much mistaken, EOF is typically defined in the system header files as -1…. so this code is really declaring an array that's 257 bytes long, and then looking at an UNINITIALIZED array, making a decision based on element 0 of it?

Is that really what's happening here?

I don't think I've had enough to drink to be sure….
01 Nov, 2008, David Haley wrote in the 142nd comment:
Votes: 0
That function is just wrong on so many levels. :cry:

I have no idea what the intention was there.
01 Nov, 2008, Runter wrote in the 143rd comment:
Votes: 0
I can answer part of that. That array is initialized.

Static and global variables can be safely assumed as memory set to 0 on the binary level. All elements in such a class, or elements of such an array would indeed be 0.

However, the code looks bad anyways. I'm not sure what the intent is without he rest of the function. Using variables inside of array declarations is also a gcc extension that allows it to work.
01 Nov, 2008, Runter wrote in the 144th comment:
Votes: 0
I don't know what all of the EOF math uses are for but the array in general seems legitimate after reviewing the full context.
01 Nov, 2008, quixadhal wrote in the 145th comment:
Votes: 0
It always seemed odd to me that the Diku people go to such herculean efforts to avoid using sscanf() or fscanf() for things. I mean, there are far simpler ways to parse and load a file format, even if you don't do it the lazy way of reading the whole file and then passing over it in RAM until you get everything you want out of it.

If I'm to believe the function name, all this does is read a string using EOL as the terminator instead of the old ~ character. Why it can't just use fgets() and then do whatever mojo the shared string junk requires is beyond me.

Oh well, I'll investigate that sometime before the meeting Sunday. I'm almost done with my one last cleanup for the code in general, and then we can see what bugs on Ye Olde Bugee Listee (now with more e's!) people want to tackle. Time to head out to lunch before the bank closes though (assuming it's even open today).
01 Nov, 2008, David Haley wrote in the 146th comment:
Votes: 0
I don't think it's a gcc extension to allow array definitions to use constant-folded math. E.g., int i[5 + 6]. If EOF is a macro then it would be constant-folded. It is an extension to have things like int i[x] though. It's been a while since I looked at this in detail so maybe I'm misremembering.
01 Nov, 2008, Runter wrote in the 147th comment:
Votes: 0
Heh. I thought this was kinda cute. fread_string() and fread_to_eol() are similar but implemented differently in rom.

This is out of fread_string() at the point where it uses that silly array in the other function to check for breakable characters.

Quote
/*
* Back off the char type lookup,
* it was too dirty for portability.
* – Furey
*/


Guess it was good enough for portability to leave it in the next function in the file. ;)
01 Nov, 2008, Runter wrote in the 148th comment:
Votes: 0
quixadhal said:
It always seemed odd to me that the Diku people go to such herculean efforts to avoid using sscanf() or fscanf() for things. I mean, there are far simpler ways to parse and load a file format, even if you don't do it the lazy way of reading the whole file and then passing over it in RAM until you get everything you want out of it.

If I'm to believe the function name, all this does is read a string using EOL as the terminator instead of the old ~ character. Why it can't just use fgets() and then do whatever mojo the shared string junk requires is beyond me.

Oh well, I'll investigate that sometime before the meeting Sunday. I'm almost done with my one last cleanup for the code in general, and then we can see what bugs on Ye Olde Bugee Listee (now with more e's!) people want to tackle. Time to head out to lunch before the bank closes though (assuming it's even open today).


They probably wanted to do it without needing to edit the string they got from fgets() considering the function already takes most of the boot up time. They likely didn't want to go through every string and search and remove ~ from every single string after it was already read. Considering that fgets() probably does just about the same thing as their function without checking for the special characters, it probably was more efficient their way in a noticeable way.
01 Nov, 2008, David Haley wrote in the 149th comment:
Votes: 0
Removing ~ from the end of strings would be extremely cheap if you did it right (start searching from the end…). Add that to the fact that library routines are typically very optimized, sometimes even on the assembly level, I would be a little surprised if they were actually increasing performance. Perhaps I am simply less charitable than you but sometimes I think that somebody just made a mistake and everybody else decided to live with it rather than spend the time to fix it, because "it would be so much effort" and all that jazz.
02 Nov, 2008, Runter wrote in the 150th comment:
Votes: 0
That's not true. You can't assume ~ or any other special character is at the end. It would take parsing the entire string twice. Once to read and once to check the elements. When you could have just done it at the same time.
02 Nov, 2008, Runter wrote in the 151st comment:
Votes: 0
The function in question is reading a string to EOF marker.
Which could look like this:

name Whatever the name is~
blah Whatever blah is~
blah2 whatever blah 2 is~

times 1000 to the end of file. Producing one string. If you're talking about reading to the EOF marker with fgets() then you are talking about not having a tilde only at the end. You're talking about producing a string, then reparsing it to remove elements. If you're going to tell me that's cheaper than just doing it right in the first place, but maybe you don't understand the problem.

And farther more, the code that is used currently checks for other elements that shouldn't exist or possibly shouldn't exist in the future in a finished string.

Everyone knows removing ~ from the end of a string is easy. That's not at all what the code is just doing.
02 Nov, 2008, David Haley wrote in the 152nd comment:
Votes: 0
What is the point of having ~ if it doesn't serve a purpose as a marker? Either you are reading a string terminated by ~, in which case you know exactly where to remove it, or you aren't reading a string terminated by ~, in which case there is no need to parse it twice. Easy. Either you want all lines, or you don't – easy. I'm not entirely sure why the operating assumption here is that the people who wrote this code didn't do anything that simply doesn't make sense.

Runter said:
(…) times 1000 to the end of file. Producing one string. If you're talking about reading to the EOF marker with fgets() then you are talking about not having a tilde only at the end.

fgets gets you until EOF or newline, so I'm not entirely sure why you claim that you'd be getting every single line. :thinking:

Runter said:
If you're going to tell me that's cheaper than just doing it right in the first place, but maybe you don't understand the problem.

Geez, no need to assume I'm completely stupid, while you're at it. :rolleyes:
02 Nov, 2008, quixadhal wrote in the 153rd comment:
Votes: 0
I'm talking about fread_string_eol(), which does not do ANYTHING with tildes, in any way. Feel free to look at db.c if you disbelieve. :)

Specifically, fread_string_eol() does a while loop using getc(), skipping over leading whitespace, doing some bizzaro funky logic to try and match various combinations of endline symbols, and finally having its OWN code to hand-insert the result into the shared string buffer system (instead of abstracting that away to a function call).

The EOF symbol is only checked to make sure the file didn't terminate before an endline marker showed up, which does happen. It doesn't read the whole file, unless you were on the last line when you called it.

Since fgets() would do the end-of-line checking for you, and it would read the whole thing in ONE disk I/O operation instead of (potentially) hundreds of single-character gets (remember, if the OS back then didn't buffer disk I/O, every call to getc would be a full sector read to the hardware – unless you were a non-multitasking OS).

The same logic could have been done by doing fgets() and then advancing the start pointer past the whitespace, and overwriting the trailing \r or \n's with NULL characters. THEN doing the same funky shared-string whatsits. :)
02 Nov, 2008, David Haley wrote in the 154th comment:
Votes: 0
quixadhal said:
doing some bizzaro funky logic to try and match various combinations of endline symbols

In fairness this can actually be somewhat necessary if you're reading from files that are hand-crafted – you don't know if they were saved by Windows or Unix machines, or even maybe Macs… Of course, there are more or less nice ways of solving this – a nicer way being a state machine model – but it is something you have to account for if you want to read hand-edited files.

fgets might not do cross-platform line-ending checking; I don't remember how it works. You might end up with extra \r characters if you're on Unix and reading a Windows file, or you might end up treating the whole file as one line if you're reading a Mac file from Unix…

quixadhal said:
remember, if the OS back then didn't buffer disk I/O, every call to getc would be a full sector read to the hardware

I think that OSs have done look-ahead buffering for quite some time now. (Where reading one sector entails reading a chunk from that sector onward.) In fact, the hardware itself has been doing this as well for a while. (Of course, that means you still need to ask the disk for the sector, but, eh. That's cheaper than having the disk actually spin…)
02 Nov, 2008, Runter wrote in the 155th comment:
Votes: 0
I stand corrected. I have been known to be wrong from time to time.
02 Nov, 2008, Runter wrote in the 156th comment:
Votes: 0
What ever happened to shooting for yesterday to have this done? :)
02 Nov, 2008, quixadhal wrote in the 157th comment:
Votes: 0
I blame people on this site for pointing me at Fallout 3….. :devil:

However, that said, I consider the major changes done enough. Yeah, I could and probably should have refactored things apart a bit more. Yeah, the official bug lists haven't been exhausted. BUT… everything is run through indent, the mass textual changes of the _printf() functions and all that jazz are done. Thus, people can begin making changes with far less chance of stepping on each other, since they'll most likely be far more localized.

When we get to the C++ stuff, there'll be another set of mass changes for std::string and friends, but that's another horse to beat later. :)

I'm going to try and make a list of all those bugs and see if perhaps we can divvy them up and start verifying if (a) they still exist, and (b) we can fix them. Then we can also figure out which new thingies to start adding in.

I'd really like to redo that gsn system… I still don't see why it was done differently than the spell system (IE: why all the global gsn_foo variables?), and it bugs me.

Now would also be a good time for some of you to take a good look at the stuff I've done and see if I screwed anything up, or if you really really hate it. Once we start adding features, doing anything else along these lines becomes… challenging. :)

Oh yeah, we probably also need to figure out what to do with SVN. Either we need to rearrange the directories so we can make branches… or we need to make a second repository. I don't really care which. I still vote for CRaM as the C version!
02 Nov, 2008, Runter wrote in the 158th comment:
Votes: 0
I don't really care how we handle the SVN business. I tend to say just go with a second repository.
02 Nov, 2008, quixadhal wrote in the 159th comment:
Votes: 0
It looks like it'll be pretty easy to rearrange the directory structure, so I'll do that later this afternoon. That way, we can create both the C and C++ branches, and personal branches as well (for experiments). Davion will still need to setup permissions for everyone, because even if everyone gets write access to the full tree, I still think keeping track of commits by username would make sense.

The traditional layout is trunk, branches, tags… but nothing enforces that. A branches directory is good, everyone can make their own private branch if they want. I think we may end up with two trunks, as the C camp and the C++ camp are likely to both want a "main" release branch.

It's not hard to shuffle them around, so the meeting might be a good place to nail that down too.
02 Nov, 2008, David Haley wrote in the 160th comment:
Votes: 0
What you're talking about with directories, branches, personal experiments etc. is pretty much exactly what distributed version control systems are meant to accomplish. It also means that you don't need to worry as much about commit access because everybody is only working on local copies, and you can have a gatekeeper for the main repository. But, well, eh. :wink:

FWIW the C++ branch will probably want to pull changes from C from time to time, at least early on before the codebases diverge too much. Maybe even the other way around, but that would be challenging for obvious reasons. The point being that making things a separate repository in a non-distributed system makes it harder to share changes.
140.0/267