Little Ruby Tidbits
Earlier today I stumbled upon a behavior of String#split
that was interesting and surprising: split
takes either a string or a regexp as its first argument. If not provided, it defaults to /\s+/
. (Technically it defaults to the value of $;
but you should immediately forget that fact.)
What was new to me, however, was that if you split on a regexp that includes matching groups, those groups are included in the resulting array:
"foobarbaz".split(/bar/) #=> ["foo", "baz"] "foobarbaz".split(/(bar)/) #=> ["foo", "bar", "baz"]
Handy!
to_proc
Another quick note: Here's an implementation of Symbol#to_proc
that overcomes both the slowness of creation and slowness of application that to_proc
usually entails:
class Symbol @@memoized_procs = {} def to_proc @@memoized_procs[self] ||= eval("lambda {|x| x.#{self} }") end end
The one minor caveat of this approach is the fact that these procs are made effectively singletons. I don't see how it would ever be an issue, but it is different. For comparison, here's some stupid benchmarks demonstrating the performance characteristics of the different to_proc
implementations, along with using a block literal:
loop 10,000,000 times: user system total real
block: 1.460000 0.000000 1.460000 ( 1.460893)
to_proc: 3.440000 0.060000 3.500000 ( 3.505272)
to_proc (send): 2.340000 0.000000 2.340000 ( 2.334712)
to_proc (eval): 1.430000 0.000000 1.430000 ( 1.432513)
to_proc (memo): 1.440000 0.000000 1.440000 ( 1.438504)
gen 10,000,000 times:
block: 21.720000 5.290000 27.010000 ( 27.023781)
to_proc: 23.860000 5.530000 29.390000 ( 29.382054)
to_proc (send): 25.070000 5.540000 30.610000 ( 30.614251)
to_proc (eval): 79.030000 7.600000 86.630000 ( 86.648413)
to_proc (memo): 3.250000 0.010000 3.260000 ( 3.253314)
One thing to note is that using eval
to generate the proc is just as fast as using a block literal on 1.8.7. For the most part, using eval
for metaprogramming on 1.8.x wherever possible will lead to faster code. Though as shown in the generation benchmark, it's pretty slow to run eval over and over again. Don't do that.
Also, rather embarassingly, 1.8.7's built-in to_proc
is slower than the implementation using send
, which is surprising as it's effectively the same thing but in C.
For comparison, here are the same benchmarks using Rubinius 1.1:
loop 10,000,000 times: user system total real
block: 0.404524 0.000161 0.404685 ( 0.400118)
to_proc: 2.789626 0.002370 2.791996 ( 2.780405)
to_proc (send): 2.737143 0.002231 2.739374 ( 2.739562)
to_proc (eval): 2.740973 0.002626 2.743599 ( 2.743766)
to_proc (memo): 2.736583 0.001882 2.738465 ( 2.738590)
gen 10,000,000 times:
block: 2.608039 0.006805 2.614844 ( 2.589822)
to_proc: 1.929814 0.001922 1.931736 ( 1.925800)
to_proc (send): 1.976845 0.001521 1.978366 ( 1.975148)
to_proc (eval): SLOW!
to_proc (memo): 2.255655 0.001574 2.257229 ( 2.067706)
gen 100,000 times:
to_proc (eval): 18.293764 0.248443 18.542207 ( 15.189934)
Rubinius is pretty much faster all around, except that eval
is way slower in Rubinius than in MRI 1.8.7 (which was already fairly slow). The first time I tried, I let that benchmark go for 5 minutes before killing. Turning down the number of iterations reveals that eval
in this simple case is about 2 orders of magnitude slower. I'm not particulary surprised by this, as Rubinius does more work, compiling everything to byte code. MRI just parses the string into an AST.
Interesting, to say the least. If you want to play along at home, here's the benchmark.