14 Mar, 2010, Deimos wrote in the 1st comment:

Votes: 0

# Basically, the parsing methods will be passed a string in typical ROM format:
#
# !RThis test should be red and !Bthis text should be blue.!X
#
# In ROM, I think the escape character is "{", but I'm using "!" here just
# because it looks better :)

require "benchmark"
require "jcode"
require "singleton"

class ColorBenchmark

  include Singleton

  COLOR_ESC = "!"

  COLORS =
  {
    "x"        => "\e
# Basically, the parsing methods will be passed a string in typical ROM format:
#
# !RThis test should be red and !Bthis text should be blue.!X
#
# In ROM, I think the escape character is "{", but I'm using "!" here just
# because it looks better :)

require "benchmark"
require "jcode"
require "singleton"

class ColorBenchmark

  include Singleton

  COLOR_ESC = "!"

  COLORS =
  {
    "x"        => "\e[0m",     # Reset
    "X"        => "\e[0m",     # Reset
    "u"        => "\e[4m",     # Underline
    "U"        => "\e[4m",     # Underline
    "i"        => "\e[7m",     # Inverse
    "I"        => "\e[7m",     # Inverse
    "k"        => "\e[0;30m",  # Normal intensity black
    "r"        => "\e[0;31m",  # Normal intensity red
    "g"        => "\e[0;32m",  # Normal intensity green
    "y"        => "\e[0;33m",  # Normal intensity yellow
    "b"        => "\e[0;34m",  # Normal intensity blue
    "m"        => "\e[0;35m",  # Normal intensity magenta
    "c"        => "\e[0;36m",  # Normal intensity cyan
    "w"        => "\e[0;37m",  # Normal intensity white
    "K"        => "\e[1;30m",  # High intensity black
    "R"        => "\e[1;31m",  # High intensity red
    "G"        => "\e[1;32m",  # High intensity green
    "Y"        => "\e[1;33m",  # High intensity yellow
    "B"        => "\e[1;34m",  # High intensity blue
    "M"        => "\e[1;35m",  # High intensity magenta
    "C"        => "\e[1;36m",  # High intensity cyan
    "W"        => "\e[1;37m",  # High intensity white
    COLOR_ESC  => COLOR_ESC    # Escape character
  }

  TEST_ITER   = 10_000
  TEST_STR    = "#{COLOR_ESC}R@#{COLOR_ESC}r@#{COLOR_ESC}Y@#{COLOR_ESC}y@" +
                "#{COLOR_ESC}G@#{COLOR_ESC}g@#{COLOR_ESC}B@#{COLOR_ESC}b@" +
                "#{COLOR_ESC}C@#{COLOR_ESC}c@#{COLOR_ESC}M@#{COLOR_ESC}m@" +
                "#{COLOR_ESC}W@#{COLOR_ESC}w@#{COLOR_ESC}K@#{COLOR_ESC}k@" +
                "#{COLOR_ESC}X#{COLOR_ESC}Ii#{COLOR_ESC}X#{COLOR_ESC}Uu" +
                "#{COLOR_ESC}x#{COLOR_ESC}#{COLOR_ESC}"
  TEST_REF    = "\e[1;31m@\e[0;31m@\e[1;33m@\e[0;33m@\e[1;32m@\e[0;32m@" +
                "\e[1;34m@\e[0;34m@\e[1;36m@\e[0;36m@\e[1;35m@\e[0;35m@" +
                "\e[1;37m@\e[0;37m@\e[1;30m@\e[0;30m@\e[0m\e[7mi\e[0m\e[4mu\e[0m!"

  def run()
    puts "\nBenchmarking #{TEST_ITER} iterations:\n\n"
    puts "Reference  #{TEST_REF}"
    parsers = []

    # Find all the parse methods
    methods.sort.each do |p|
      if /^parse\d+$/ =~ p
        m      = method( p )
        out    = m.call( TEST_STR )
        valid  = ( out == TEST_REF ? "" : " – Invalid!" )
        puts sprintf( "%-10s %s%s", p, out, valid )
        parsers  << p if valid == ""
      end
    end

    puts "\n"

    # Start the benchmarking
    Benchmark.bm( 10 ) do |bm|
      parsers.each do |p|
        m = method( p )
        bm.report( p ) { TEST_ITER.times { m.call( TEST_STR ) } }
      end
    end
  end

  # Iterate each character of the String keeping state because I don't know how
  # to advance the pointer :(
  def parse1( str )
    found_esc  = false
    out        = ""
    str.each_char do |c|
      if found_esc
        out << COLORS[c]
        found_esc = false
        next
      end
      if c == COLOR_ESC
        found_esc = true
        next
      else
        out << c
      end
    end
    return out
  end

  # One-pass gsub.  It's sort of cheating, though, since any added codes in the
  # future will have to be added to the Regex, as well.  Shrug.
  def parse2( str )
    out = str.dup
    out.gsub!( /#{COLOR_ESC}[xXuUiIkKrRgGyYbBmMcCwW#{COLOR_ESC}]/m ) do |code|
      COLORS[code[1,1]]
    end
    return out
  end

  # Iterates the hash and gsubs each pair individually.
  def parse3( str )
    out = str.dup
    COLORS.each do |key,val|
      out.gsub!( /#{COLOR_ESC}#{key}/, val )
    end
    return out
  end

  # Same as parse2, only giving gsub a String param instead of a Regex.
  def parse4( str )
    out = str.dup
    COLORS.each do |key,val|
      out.gsub!( "#{COLOR_ESC}#{key}", val )
    end
    return out
  end

  # Your superior algorithm here!?
  def parse5( str )
  end

end

ColorBenchmark.instance.run()
puts "\n"
[/code]
Yeah, so… I got bored.  And we had been talking about performance-related issues on an earlier thread, to which someone brought up color code replacements as one potential bottleneck.  Anyway, so I made this to test different replacement algorithms in Ruby, mostly just to help me learn some of the language constructs and things.  If you want to show me up, make your own algorithm and plop it in a "parseX" method in the above class, where X is some integer.  It should automatically identify it and run it against the other algorithms so you can compare.  Note that I made this for 1.8, so if you have 1.9 and it breaks, sorry!

Here's the results from my own tests in case anyone cares:

[code]
                user     system      total        real
parse1      2.280000   0.890000   3.170000 (  3.208685)
parse2      0.800000   0.140000   0.940000 (  0.952998)
parse3      1.950000   0.200000   2.150000 (  2.145340)
parse4      2.010000   0.180000   2.190000 (  2.182371)
[/code]

14 Mar, 2010, Tyche wrote in the 2nd comment:

Votes: 0

I commented out jcode.rb and got these marks.

user     system      total        real
parse1      1.110000   0.000000   1.110000 (  1.126000)
parse2      1.156000   0.000000   1.156000 (  1.180000)
parse3      2.906000   0.016000   2.922000 (  2.968000)
parse4      2.906000   0.016000   2.922000 (  2.969000)

14 Mar, 2010, Runter wrote in the 3rd comment:

Votes: 0

user     system      total        real
parse1      0.270000   0.000000   0.270000 (  0.278178)
parse2      0.410000   0.000000   0.410000 (  0.405291)
parse3      1.860000   0.010000   1.870000 (  1.877433)
parse4      1.680000   0.000000   1.680000 (  1.679922)

Had to comment out jcode to get it to run under 1.9.1.

14 Mar, 2010, Runter wrote in the 4th comment:

Votes: 0

And in test 2 you could dynamically compile the regular expression. So it's not really cheating. It would take a little more overhead but it would only need to be compiled once.

expression = Regexp.union *COLORS.keys

def parse2( str )
    out = str.dup
    out.gsub!(expression) do |code|
      COLORS[code[1,1]]
    end
    return out
  end

Well, not exactly like that because your table doesn't include the "!" but you get the idea. :p

14 Mar, 2010, Deimos wrote in the 5th comment:

Votes: 0

Wow, I wasn't expecting such different results from my own. I assume that this is because I needed jcode for String#each_char, whereas I guess it's native in 1.9 or something?

Runter said:

And in test 2 you could dynamically compile the regular expression. So it's not really cheating. It would take a little more overhead but it would only need to be compiled once.

expression = Regexp.union *COLORS.keys

Thanks for the tip, I didn't know you could do that! That will definitely come in handy. Incidentally, it was actually slightly slower than the static version (I made sure to compile it outside of the benchmark), and I have a feeling it's because the character classes evaluate slightly faster than alternation.

user     system      total        real
parse1      2.310000   0.850000   3.160000 (  3.161557)
parse2      0.790000   0.160000   0.950000 (  0.947645)  <—– old version
parse3      2.010000   0.190000   2.200000 (  2.202817)
parse4      2.060000   0.200000   2.260000 (  2.255647)
parse5      0.830000   0.160000   0.990000 (  0.992019)  <—– new version

14 Mar, 2010, Runter wrote in the 6th comment:

Votes: 0

Quote

I assume that this is because I needed jcode for String#each_char, whereas I guess it's native in 1.9 or something?

Not sure. It seems to be japanese character support or something. I'm not sure why you would need it for each_char. Certainly isn't required in 1.9. (In fact, the file doesn't seem to exist.)

14 Mar, 2010, Deimos wrote in the 7th comment:

Votes: 0

Well, I get an undefined method error if I don't include it (see here). I'm guessing it has to do with Unicode, since Japanese encoded strings couldn't be iterated using String#each_byte because they include multi-byte characters, and so they probably had to come up with that method (which probably later got merged into the core).

I tried a couple other similar methods, including "str.scan( /./ ).each" and "str.each_byte do |byte| c = byte.chr", which work better than the jcode version of each_char, but still slower than the others.

14 Mar, 2010, Runter wrote in the 8th comment:

Votes: 0

Yeah, strings are handled differently in 1.9.

14 Mar, 2010, Tyche wrote in the 9th comment:

Votes: 0

Deimos said:

Well, I get an undefined method error if I don't include it.

String#each_char is in the ruby 1.8.7 branch.

Random Picks

Ancient Anguish

Avalon: The Legend Lives

DragonStone