16 Jul, 2007, Omega wrote in the 1st comment:

Votes: 0

Recently I've been tinkering around with the thought of optimization in mind. And I figured, hey, I should look into looping, and see if I'm doing things WAY wrong or not. And as it turns out, I'm doing things just fine, but, there are ways to optimize loops, such as using a while loop in-place of a for-loop (pending the conditions needed to be met) Different loops do different things.

So my ultimate question to you, and to all is, What loop is your preferance. Would you sooner write a for(;;) loop then to write a do { }while(condition); loop. And why.

And if you've got a method for optimizing loops, nomatter the kind, how do you go about doing so, and why you've chosen that way. Hell, this is just crazy enough to be helpful.

16 Jul, 2007, Conner wrote in the 2nd comment:

Votes: 0

Personally, I would use a for loop without hesitation even if a do-while loop was called for because I don't know enough about do-while loops to know when it's called for and I've seen/used for loops throughout my code but have very few do-while loops to call upon for examples. :shrug:

16 Jul, 2007, Guest wrote in the 3rd comment:

Votes: 0

Count me as an abstention since the poll implies only a choice between one or the other. There are situations where both are appropriate but I forget exactly why you'd want to choose one over the other at the moment.

16 Jul, 2007, Dorian wrote in the 4th comment:

Votes: 0

Darien said:

And if you've got a method for optimizing loops, nomatter the kind, how do you go about doing so, and why you've chosen that way.

Simple, write a better algorithm. Or get a better compiler if you truly believe that it makes a practical difference which type of loop you use.

16 Jul, 2007, Justice wrote in the 5th comment:

Votes: 0

Well, there are really 3 types of loops.

for (<init>;<condition>;<increment>)
{
<body>
}

while(<condition>)
{
<body>
}

do
{
<body>
}
while(<condition>)

The for loop is what I prefer whenever there is an increment. This is because it handles the increment after the body has executed.

A while loop tests the condition prior to executing the body, same as a for loop. A do/while loop tests the condition after executing the body. Depending on what you're doing, either may be preferred. One aspect of the do/while is that the body will always execute at least once.

16 Jul, 2007, Scandum wrote in the 6th comment:

Votes: 0

The data structure is more important. Use an array where possible, and switch to a linked list if the array would only be filled for less than 20%. Not to mention it helps to hash a linked list where possible, and in some cases you can create a linked list that holds the ranges of indexes of an array that are being used.

I don't think changing every for loop with a while loop is going to make a difference.

17 Jul, 2007, Justice wrote in the 7th comment:

Votes: 0

It really depends on what you're doing if an array is faster. For iteration, it isn't any faster. An array is only faster when you need random access or don't need to access each element. An array that isn't full is wasting alot of memory. Using a fixed length array is very limited, and using a dynamic one has a high cost for insertion/deletion.

In any case, I agree you're unlikely to see any benefit from using a while loop over a for loop.

17 Jul, 2007, Omega wrote in the 8th comment:

Votes: 0

according to some dev-sites, while loops process faster then for-loops, however, i have yet to test that theory.

another thing with for-loops, in my recent research to find methods of optimization, it would appear that decrementing through the loop is faster then incrementing, also untested. Other things I've recently seen are steps through, to speed up. so instead of doing for(Whatever = 0; whatever < 50; whatever++)

doing like, for(whatever = 0; whatever < 50; whatever =+4)

things like that to step larger chunks, however, these require that you know that the data exists. and such, but still, as my research furthers, i'd like to find the best methods for optimization.

as for iterating through a linked list, thats not always an option when your not working with structures.

as for me getting a better compiler, nah, i'd just prefer to learn up on my coding practices and find the best instances to use the different styles of loops, that way my code is cleaner, and optimized to a better state. But alas, this is just one of many things I'm researching to better optimize my mud.

And samson, yeah, theres only 2 options i've given in the poll, reason being, to scale which is more commonly used.

17 Jul, 2007, Justice wrote in the 9th comment:

Votes: 0

Darien said:

according to some dev-sites, while loops process faster then for-loops, however, i have yet to test that theory.

While loops tend to be slightly faster, but unless you're dealing with 10's of thousands of values, you're not going to see a benefit. This is more of an issue with older compilers that don't optimize very well. In many cases, using a while loop makes the code more difficult to read and/or maintain.

Darien said:

Other things I've recently seen are steps through, to speed up. so instead of doing for(Whatever = 0; whatever < 50; whatever++)

doing like, for(whatever = 0; whatever < 50; whatever =+4)

things like that to step larger chunks, however, these require that you know that the data exists. and such, but still, as my research furthers, i'd like to find the best methods for optimization.

This is a common search optimization, but it only works with ordered lists. If your list is not ordered, then you have no way to determine if you missed what you were in fact searching for.

Darien said:

as for iterating through a linked list, thats not always an option when your not working with structures.

It doesn't matter if you're using a structure or something simpler like a string. I've written a linked list to store a series of strings to be fed into the command interpreter before. But as always, the ability to do something doesn't make it good. It depends on what you're trying to accomplish.

Quite often in muds, the linkage is maintained by members of a structure, but you can define the linkage external from what is being stored. A major issue with using members for linkage is that you can only handle 1 list with those members. This is why you get things like… next,prev next_in_room, prev_in_room, etc

17 Jul, 2007, Tyche wrote in the 10th comment:

Votes: 0

You are better off learning assembly and examining the code your C compiler generates.

$ cat loops.c 
#include <stdio.h>

fa() {
  int i;
  for(i=0;i<10;i++)
    printf("%d",i);
}

fb() {
  int i;
  i=0;
  while (i<10) {
   printf("%d",i);
   i++;
  }
}

int main() {
  fa();
  fb();
  return 0;
}

(gdb) disass fa
Dump of assembler code for function fa:
0x00401050 <fa+0>:      push   %ebp
0x00401051 <fa+1>:      mov    %esp,%ebp
0x00401053 <fa+3>:      sub    $0x18,%esp
0x00401056 <fa+6>:      movl   $0x0,0xfffffffc(%ebp)
0x0040105d <fa+13>:     cmpl   $0x9,0xfffffffc(%ebp)
0x00401061 <fa+17>:     jg     0x40107d <fa+45>
0x00401063 <fa+19>:     mov    0xfffffffc(%ebp),%eax
0x00401066 <fa+22>:     mov    %eax,0x4(%esp)
0x0040106a <fa+26>:     movl   $0x402000,(%esp)
0x00401071 <fa+33>:     call   0x401190 <printf>
0x00401076 <fa+38>:     lea    0xfffffffc(%ebp),%eax
0x00401079 <fa+41>:     incl   (%eax)
0x0040107b <fa+43>:     jmp    0x40105d <fa+13>
0x0040107d <fa+45>:     leave  
0x0040107e <fa+46>:     ret    
End of assembler dump.
(gdb) disass fb
Dump of assembler code for function fb:
0x0040107f <fb+0>:      push   %ebp
0x00401080 <fb+1>:      mov    %esp,%ebp
0x00401082 <fb+3>:      sub    $0x18,%esp
0x00401085 <fb+6>:      movl   $0x0,0xfffffffc(%ebp)
0x0040108c <fb+13>:     cmpl   $0x9,0xfffffffc(%ebp)
0x00401090 <fb+17>:     jg     0x4010ac <fb+45>
0x00401092 <fb+19>:     mov    0xfffffffc(%ebp),%eax
0x00401095 <fb+22>:     mov    %eax,0x4(%esp)
0x00401099 <fb+26>:     movl   $0x402000,(%esp)
0x004010a0 <fb+33>:     call   0x401190 <printf>
0x004010a5 <fb+38>:     lea    0xfffffffc(%ebp),%eax
0x004010a8 <fb+41>:     incl   (%eax)
0x004010aa <fb+43>:     jmp    0x40108c <fb+13>
0x004010ac <fb+45>:     leave  
0x004010ad <fb+46>:     ret    
End of assembler dump.

Suprise suprise. The compiler is smart enough to recognize and optimize both loops to the same code.

P.S. That was gcc 3.4.4

17 Jul, 2007, Justice wrote in the 11th comment:

Votes: 0

Heh, aren't modern compilers nice?

17 Jul, 2007, Scandum wrote in the 12th comment:

Votes: 0

Justice said:

For iteration, it isn't any faster.

Wrong, an array is much faster when it comes to iteration.

Justice said:

An array that isn't full is wasting alot of memory.

Wrong again, a half full array of pointers with 100K elements uses 400K of memory. A double linked list of 50K elements uses 600K of memory. So you're actually saving a lot of memory.

Justice said:

Using a fixed length array is very limited

In several cases it doesn't matter whether you use a linked list or an array.

17 Jul, 2007, Omega wrote in the 13th comment:

Votes: 0

Tyche, thats the exact stuff i'm looking for, as we try to find the benefit of each loop, thanks :)

now I know of the disass command for gdb, that'll help plenty :)

17 Jul, 2007, Omega wrote in the 14th comment:

Votes: 0

I came across this, thismorning while looking into this in more depth.

Several compilers need this hint from the programmer while others are clever enough to figure it out automatically. When you have nested loops, declare the counter of the most deeply-nested loop as a register variable. For example:

for (int i = 0; i <1000; ++i)
{
for (register int j = 0; j <1000; ++j)
{
// .. this code will be executed a 1,000,000 times
}
}

The variable i will be accessed 1000 times. However, j will be accessed a million times. Therefore, storing j in a CPU register can give a significant performance boost compared to storing i in a register. Of course, storing both i and j in registers is even better. Alas, the number of free registers is usually very limited so when you have to make a choice, declare the counter of the innermost loop as a register variable.

I am un-sure if gcc will do this automaticaly or not, (registering) but when you think about it, its a smart thing to know if it doesn't automaticaly do it. Assuming it works as it says it does. I've seen register used before, but I never touched it because frankly, I didn't understand it, and never really looked into it until now. Hope this is interesting to someone.

17 Jul, 2007, Justice wrote in the 15th comment:

Votes: 0

Scandum said:

Justice said:

For iteration, it isn't any faster.

Wrong, an array is much faster when it comes to iteration.

Obviously you haven't tested them. At best you get equivalent results, however the array degrades rapidly for multiple access if you don't set a pointer to the element. This is because the array calculates a memory position using the index and the sizeof each element.

Scandum said:

Justice said:

An array that isn't full is wasting alot of memory.

Wrong again, a half full array of pointers with 100K elements uses 400K of memory. A double linked list of 50K elements uses 600K of memory. So you're actually saving a lot of memory.

Way to compare apples to oranges. Comparing the total memory of the struct in one side and just the "pointers" on the other.
If we're talking just the pointers, the doubly linked list would use 400k (50x8bytes) of list overhead, not 600k. This is the same as what the half full array is using. If you consider a singly linked list, it's the half the memory overhead.

In any case it's besides the point. I wasn't discussing memory overhead, I was talking about wasted memory. In a half full array, at least half of it's memory overhead is not serving a purpose. In a linked list (double or single), the overhead serves a purpose.

17 Jul, 2007, Scandum wrote in the 16th comment:

Votes: 0

struct justice
{
        struct justice *next;
        struct justice *prev;
        int data;
};

int scandum[1000000];

DO_COMMAND(do_test)
{
        int i, j;
        struct justice *head = NULL;
        struct justice *tail = NULL;
        struct justice *temp;
        long long scandum_start, scandum_end, justice_start, justice_end;

        for (i = 0 ; i < 1000000 ; i++)
        {
                temp = calloc(1, sizeof(temp));

                temp->data = i;
                scandum[i] = i;

                LINK(temp, head, tail);
        }

        scandum_start = utime();

        for (i = 0 ; i < 1000000 ; i++)
        {
                j = scandum[i];
        }

        scandum_end = utime();

        justice_start = utime();

        for (temp = head ; temp ; temp = temp->next)
        {
                j = temp->data;
        }

        justice_end = utime();

        scandum_end -= scandum_start;
        justice_end -= justice_start;

        tintin_printf(ses, "scandum's time: %lld, justice's time: %lld\n", scandum_end, justice_end);

        return ses;
}

#10 #test
scandum's time: 17492, justice's time: 58659
scandum's time: 17063, justice's time: 58564
scandum's time: 17326, justice's time: 58812
scandum's time: 16943, justice's time: 59343
scandum's time: 17209, justice's time: 58782
scandum's time: 16662, justice's time: 58803
scandum's time: 17016, justice's time: 61981
scandum's time: 16785, justice's time: 70633
scandum's time: 16854, justice's time: 59469
scandum's time: 17221, justice's time: 58586

Fatality…………… Scandum wins!!

18 Jul, 2007, Tyche wrote in the 17th comment:

Votes: 0

Darien said:

I am un-sure if gcc will do this automaticaly or not, (registering) but when you think about it, its a smart thing to know if it doesn't automaticaly do it. Assuming it works as it says it does. I've seen register used before, but I never touched it because frankly, I didn't understand it, and never really looked into it until now. Hope this is interesting to someone.

Well… same code with some minor changes.

#include <stdio.h>

fa() {
  int i;
  for(i=0;i<10;i++) {
//    printf("%d",i);
  }
}

fb() {
  register int i;
  i=0;
  while (i<10) {
   printf("%d",i);
   i++;
  }
}

int main() {
  fa();
  fb();
  return 0;
}

$ gcc -O2 -S -c loops.c  
$ cat loops.s
        .file   "loops.c"
        .text
        .p2align 4,,15
.globl _fa
        .def    _fa;    .scl    2;      .type   32;     .endef
_fa:
        pushl   %ebp
        movl    $9, %eax
        movl    %esp, %ebp
        .p2align 4,,15
L5:
        decl    %eax
        jns     L5
        popl    %ebp
        ret
        .section .rdata,"dr"
LC0:
        .ascii "%d\0"
        .text
        .p2align 4,,15
.globl _fb
        .def    _fb;    .scl    2;      .type   32;     .endef
_fb:
        pushl   %ebp
        movl    %esp, %ebp
        pushl   %ebx
        subl    $20, %esp
        xorl    %ebx, %ebx
        .p2align 4,,15
L12:
        movl    %ebx, 4(%esp)
        incl    %ebx
        movl    $LC0, (%esp)
        call    _printf
        cmpl    $9, %ebx
        jle     L12
        addl    $20, %esp
        popl    %ebx
        popl    %ebp
        ret
        .def    ___main;        .scl    2;      .type   32;     .endef
        .p2align 4,,15
.globl _main
        .def    _main;  .scl    2;      .type   32;     .endef
_main:
        pushl   %ebp
        movl    $16, %eax
        movl    %esp, %ebp
        subl    $8, %esp
        andl    $-16, %esp
        call    __alloca
        call    ___main
        call    _fa
        call    _fb
        leave
        xorl    %eax, %eax
        ret
        .def    _printf;        .scl    3;      .type   32;     .endef

Note the special optimization in fa, a constant is loaded into a register and decremented. This optimization will only be taken if i is never referenced in the loop. While in fb, i is not assigned to a register variable despite the use of the keyword. It can't be, because of the call inside the loop would overwrite the registers. Additionally gcc will ignore register if you ever take the address of a variable.

'register' really ought to be considered deprecated. These sorts of optimizations are time wasters unless you really do have performance problems and then you use the 80/20 rule and use gprof to find where.

The compiler can even do optimizations that an assembly programmer would do by hand…

$ gcc -O2 -funroll-all-loops -S -c loops.c  
$ cat loops.s
        .file   "loops.c"
        .text
        .p2align 4,,15
.globl _fa
        .def    _fa;    .scl    2;      .type   32;     .endef
_fa:
        pushl   %ebp
        movl    %esp, %ebp
        popl    %ebp
        ret
        .section .rdata,"dr"
LC0:
        .ascii "%d\0"
        .text
        .p2align 4,,15
.globl _fb
        .def    _fb;    .scl    2;      .type   32;     .endef
_fb:
        pushl   %ebp
        movl    %esp, %ebp
        subl    $8, %esp
        xorl    %edx, %edx
        movl    %edx, 4(%esp)
        movl    $LC0, (%esp)
        call    _printf
        movl    $LC0, (%esp)
        movl    $1, %eax
        movl    %eax, 4(%esp)
        call    _printf
        movl    $LC0, (%esp)
        movl    $2, %eax
        movl    %eax, 4(%esp)
        call    _printf
        movl    $LC0, (%esp)
        movl    $3, %eax
        movl    %eax, 4(%esp)
        call    _printf
        movl    $LC0, (%esp)
        movl    $4, %eax
        movl    %eax, 4(%esp)
        call    _printf
        movl    $LC0, (%esp)
        movl    $5, %eax
        movl    %eax, 4(%esp)
        call    _printf
        movl    $LC0, (%esp)
        movl    $6, %eax
        movl    %eax, 4(%esp)
        call    _printf
        movl    $LC0, (%esp)
        movl    $7, %ecx
        movl    %ecx, 4(%esp)
        call    _printf
        movl    $LC0, (%esp)
        movl    $8, %edx
        movl    %edx, 4(%esp)
        call    _printf
        movl    $LC0, (%esp)
        movl    $9, %eax
        movl    %eax, 4(%esp)
        call    _printf
        leave
        ret
        .def    ___main;        .scl    2;      .type   32;     .endef
        .p2align 4,,15
.globl _main
        .def    _main;  .scl    2;      .type   32;     .endef
_main:
        pushl   %ebp
        movl    $16, %eax
        movl    %esp, %ebp
        subl    $8, %esp
        andl    $-16, %esp
        call    __alloca
        call    ___main
        call    _fa
        call    _fb
        leave
        xorl    %eax, %eax
        ret
        .def    _printf;        .scl    3;      .type   32;     .endef

18 Jul, 2007, Omega wrote in the 18th comment:

Votes: 0

again tyche, thanks :) This is very helpful! So using register is sorta a waste of time todo then since the compiler will/can register what it wants anyways. Gotcha.

Note: I've never used gprof, so mind giving me a brief example of how it works?

19 Jul, 2007, Davion wrote in the 19th comment:

Votes: 0

Least we forget for_each! Though it's C++ it's another loop. We could also use goto :) *duck*.

I'm surprised this never came up with the for loops, and that's ++iter over iter++.

19 Jul, 2007, Conner wrote in the 20th comment:

Votes: 0

Well, if you're going to go there, how about gosub? :tongue:

Random Picks

Wing Commander: Gemini Sector

d20MUD: Star Wars

Bedlam