22 Aug, 2009, Silenus wrote in the 1st comment:
Votes: 0
I ran some timing loops of these and I was somewhat surprised by the results. printf is fastest followed by putchar and then distantly cout. The cout printf thing didnt surprise me too much (under g++ 4.4.x) but why is putchar slower than printf even for printing single characters?
22 Aug, 2009, quixadhal wrote in the 2nd comment:
Votes: 0
From the man page:

fputc() writes the character c, cast to an unsigned char, to stream.
putc() is equivalent to fputc() except that it may be implemented as a macro which evaluates stream more than once.
putchar©; is equivalent to putc(c,stdout).

So, my guess would be that the way C++ expands macros ends up referencing the file stream multiple times. Heck, being C++, it may even internally translate (copy) c into a straing as "c\0" and then send THAT to fputs().
22 Aug, 2009, Silenus wrote in the 3rd comment:
Votes: 0
For some reason fputc is no faster than putchar and still lags behind printf so perhaps the implementation of fputc is not that well optimized. printf one would expect it would need to do extra work i.e. it takes varargs and also it would at the very least need to check a single extra condition deference the pointer and check if it's null on termination.
22 Aug, 2009, David Haley wrote in the 4th comment:
Votes: 0
Could you show the timing code used and the output please?

It's also possible that routines used the most often have been optimized extremely heavily, perhaps with platform-tuned assembly implementations for example.
22 Aug, 2009, Silenus wrote in the 5th comment:
Votes: 0
This is basically the code. I didn't write this code from scratch but found it on a web forum and modified it a bit. Unfortunately I cannot remember which forum I grabbed it from. I assume the author wont mind since it pretty much is just a series of timed for loops.

#include <cstdio>
#include <ctime>
#include <fstream>
#include <iostream>
using namespace std;

int foo1()
{
clock_t t1,t2;
t1 = clock();
for(int i = 0; i < 1000000; i++)
printf("a");
t2 = clock();
return t2 - t1;
}
int foo2()
{
clock_t t1,t2;
t1 = clock();
for(int i = 0; i < 1000000; i++)
cout << "a";
t2 = clock();
return t2 - t1;
}

int foo3()
{
clock_t t1,t2;
t1 = clock();
for(int i = 0; i < 1000000; i++)
fputc('a',stdout);
t2 = clock();
return t2 - t1;
}


int main()
{
int d1,d2,d3;
d1 = foo1();
d2 = foo2();
d3 = foo3();
cout << endl << endl << "cout " << d2 << endl;
cout << "printf " << d1 << endl;
cout << "fputc " << d3 << endl;
return 0;
}


The output generated looks something lilke this-

cout 140000
printf 30000
fputc 50000
23 Aug, 2009, David Haley wrote in the 6th comment:
Votes: 0
Looking at /usr/include/stdio.h, there is some indication that fputc uses locking. I'm not sure if printf does. There is a version fputc_unlocked that g++ seems to provide as an extension that you get when __USE_MISC is defined. It appears to be the fastest of them all.

Interestingly, I don't get the same relative results as you anyhow.

$ cat time-test.cpp
#define __USE_MISC

#include <cstdio>
#include <ctime>
#include <fstream>
#include <iostream>

using namespace std;

const int TIMES = 1000000000;

int test_printf()
{
clock_t t1,t2;
t1 = clock();
for(int i = 0; i < TIMES; i++) printf("a");
t2 = clock();
return t2 - t1;
}
int test_cout()
{
clock_t t1,t2;
t1 = clock();
for(int i = 0; i < TIMES; i++) cout << "a";
t2 = clock();
return t2 - t1;
}
int test_fputc()
{
clock_t t1,t2;
t1 = clock();
for(int i = 0; i < TIMES; i++) fputc('a',stdout);
t2 = clock();
return t2 - t1;
}
int test_putc()
{
clock_t t1,t2;
t1 = clock();
for(int i = 0; i < TIMES; i++) putc('a',stdout);
t2 = clock();
return t2 - t1;
}
int test_fputc_unlocked()
{
clock_t t1,t2;
t1 = clock();
for(int i = 0; i < TIMES; i++) fputc_unlocked('a',stdout);
t2 = clock();
return t2 - t1;
}
int main()
{
int d1,d2,d3,d4,d5;
d1 = test_printf();
d2 = test_cout();
d3 = test_fputc();
d4 = test_putc();
d5 = test_fputc_unlocked();

cerr << "printf " << d1 << endl;
cerr << "cout " << d2 << endl;
cerr << "fputc " << d3 << endl;
cerr << "putc " << d4 << endl;
cerr << "fputc_unlocked " << d5 << endl;

return 0;
}

$ g++ time-test.cpp
$ ./a.out > /dev/null
printf 13570000
cout 69060000
fputc 12650000
putc 11910000
fputc_unlocked 5180000


In this instance, the unlocked version is a clear winner, and putc and fputc are only marginally superior to printf; cout is still a clear loser. I would imagine that that is because cout does even more processing, and does things like checking if alignment has been asked for, etc.
24 Aug, 2009, quixadhal wrote in the 7th comment:
Votes: 0
Interesting!

It might be worthwhile to try the same tests using stderr, which is unbuffered on most systems.

On my system, as David posted it (I lowered TIMES a notch, as I'm not that patient):

quixadhal@andropov:~$ ./bench >/dev/null
printf 5400000
cout 24360000
fputc 5540000
putc 5550000
fputc_unlocked 2280000


And then flipping stderr and stdout around:

quixadhal@andropov:~$ ./bench 2>/dev/null
printf 94400000
cout 138420000
fputc 95260000
putc 94450000
fputc_unlocked 91950000


WOW! That's a whole lot slower across the board! Notice that the unlocked fputc still wins, but not by nearly as great a margin.
24 Aug, 2009, quixadhal wrote in the 8th comment:
Votes: 0
And just for amusement, I decided to add a couple other comparisons:

int test_printf_c()
{
clock_t t1,t2;
t1 = clock();
for(int i = 0; i < TIMES; i++) fprintf(stdout, "%c", 'a');
t2 = clock();
return t2 - t1;
}
int test_printf_s()
{
clock_t t1,t2;
t1 = clock();
for(int i = 0; i < TIMES; i++) fprintf(stdout, "%s", "a");
t2 = clock();
return t2 - t1;
}
int test_putchar()
{
clock_t t1,t2;
t1 = clock();
for(int i = 0; i < TIMES; i++) putchar('a');
t2 = clock();
return t2 - t1;
}


quixadhal@andropov:~$ ./bench >/dev/null
cout 24040000
fputc 5540000
putc 5540000
putchar 5410000
fputc_unlocked 2290000
printf 5490000
printf %s 5570000
printf %c 5540000
24 Aug, 2009, David Haley wrote in the 9th comment:
Votes: 0
Wow! Interesting how much the stdout buffering helps, even when output is rerouted into the great void of /dev/null. It's also interesting that our numbers are basically consistent whereas Silenus saw significant differences between fputc and printf. (That said, I'm not sure that the original number of loops, a million, is really quite enough to get rid of 'noise' in the results. That's why I used a billion. :wink:)
0.0/9