Channels
Displaying page 9
Posted 28 days ago at Unlimited Novelty

Once upon a time Rails was single threaded and could only process one request at a time. This meant for each concurrent request you wanted to process with Rails, you needed to run an entirely separate Ruby VM instance. This was not a good state of affairs, especially in cases where your Rails application was blocking on I/O when talking to a database or other external service. An entire instance of your application sat there useless as it was waiting for I/O to complete.

The lack of multithreading in Rails would lead Ezra Zygmuntowicz to write Merb, a thread-safe web framework for Ruby which certainly borrowed conceptually from Rails and would go on to serve as the core for the upcoming Rails 3.0 release. In the meantime, the earlier Rails 2.x branch would get support for a thread safe mode as well. This meant that web applications written in Ruby could process multiple requests using a single VM instance: while one thread was blocking on a response from a database or other service, the web application could continue processing other requests in other threads.

Even better, while Ruby originally started out with a "green threads" implementation which executed threads in userspace and could not provide multicore concurrency, newer, more modern Ruby implementations emerged which provided true native multithreading. JRuby and IronRuby, implementations of Ruby on the JVM and .NET CLR respectively, provided truly concurrent native multithreading while still maintaining Ruby's original threading API. Rubinius, a clean-room implementation of a Ruby VM based on the Smalltalk 80 architecture, has started to take steps to remove its global lock and allow concurrent multithreading as well.

With a multithreaded web framework like Merb, recent versions of Rails 2.x, or Rails 3.0, in conjunction with a Ruby VM that supports concurrent multithreading, you now need to only run one VM instance with a copy of your web application and it can utilize all available CPU cores in a server, providing true concurrent computation of Ruby code. No longer do you need a "pack of Mongrels" to serve your Rails application. Instead, you can just run a single VM and it will utilize all available system resources. This has enormous benefits in terms of ease-of-deployment, monitoring, and memory usage.

Ruby on Rails has finally grown up and works just like web applications in other more popular languages. You can run just one copy of any Ruby VM that supports native multithreading and utilize all available server resources. Rails deployment is no longer a hack. It Just Works.

But Wait, Threads Are Bad, And Async Is The New Hotness!





Threads have typically had a rather mired reputation in the programming world.  Threads utilize shared state by default and don't exactly provide the greatest mechanisms for synchronizing bits of shared state.  They're a leaky abstraction, and without eternal vigilance on the part of an entire development team and an excellent understanding of what's happening when you use thread synchronization mechanisms, sharing state between threads is error-prone and often difficult to debug.

The "threads are bad" cargo cult has often lead people to pursue "interesting" solutions to various concurrency problems in order to avoid using threads.  Event-based concurrent I/O became an incredibly popular solution for writing network servers, an approach seen in libraries like libevent, libev, Python's Twisted, and in the Ruby world EventMachine and my own event library, Rev.  This scheme uses a callback-driven approach, often with a central reactor core, dispatching incoming I/O asynchronously to various handlers.  For strictly I/O-bound applications, things like static file web servers, proxies, and protocol transformers, this approach is pretty much the best game in town.

Node.js, a pretty awesome I/O layer for Google's V8 JavaScript interpreter, is something of the new hotness.  It's certainly opened up the evented I/O approach to a new audience, and for I/O-bound tasks it provides a way to script in a dynamic language while remaining quite fast.  But as others have noted, Node is a bit overhyped. If you write your server in Node, will it scale? It really depends on the exact nature of the problem.  I'll get into that in a bit.

Ilya Grigorik recently presented at RailsConf and OSCON about em-synchrony, a set of "drivers" for EventMachine which facilitate various types of network I/O which present a synchronous interface but use Fibers to perform I/O asynchronously in the background. He had some rather impressive things to share there, including Rails running on top of EventMachine, dispatching requests concurrently using fibers instead of threads.  This approach won't provide you the computational concurrency that truly multithreaded Rails as in JRuby and IronRuby (and Rubinius soon!), but it will provide you wicked fast I/O performance... at a price.

The New Contract



Programmers generally live in a synchronous world. We call functions which return values. That's the status quo. Some languages go so far as to make this the only possible option. Evented frameworks do not work like this. Evented frameworks turn the world upside down.  For example, in Ruby, where you might ordinarily write something like:

response = connection.request params

In async land, you first have to initiate the request:

begin_request params

Then define a callback in order to receive the response:

def on_response(response)
  ...
end

Rather than calling functions, you initiate side effects which will eventually call one of a number of callbacks.  Exceptions no longer work. The context is lost between callbacks; you always start from just your arguments and have to figure out exactly what you were up to before, which generally necessitates breaking anything complex down into a finite state machine, instead of say, an imperative list of I/O commands to perform. It's a very different approach from the status quo.

The em-synchrony approach promises to save you from this by wrapping up all that ugly callback driven stuff with Fibers. I've been down that road and I no longer recommend it.  In January 2008 I wrote Revactor, a Erlang-like implementation of the Actor Model for Ruby 1.9, using Fibers as the underlying concurrency primitive. It's the first case known to me of someone using this approach, and significantly more powerful than any of the other available frameworks. Where em-synchrony makes you write Fiber-specific code for each network driver, Revactor exposed an incomplete duck type of Ruby's own TCPSocket, which means that patching drivers becomes significantly easier as you don't need asynchronous drivers to begin with.

However, for the most part I stopped maintaining Revactor, largely because I began to think the entire approach is flawed. The problem is frameworks like Revactor and em-synchrony impose a new contract on you: evented I/O only! You aren't allowed to use anything that does any kind of blocking I/O in your system anywhere, or you will hang the entire event loop. This approach works great for something like Node.js, where the entire system was written from the ground-up to be asynchronous, in a language which has a heritage of being asynchronous to begin with.

Not so in Ruby. There are tons and tons of libraries that do synchronous I/O. If you choose to use async Rails, you can't use any library which hasn't specifically been patched with em-synchrony-like async-to-Fiber thunks. Since most libraries haven't been patched with this code, you're cutting yourself off from the overwhelming majority of I/O libraries available. This problem is compounded by the fact that the only type of applications which will benefit from the async approach more than the multithreaded approach are ones that do a lot of I/O.

This is a problem you have to be eternally vigilant about what libraries you use and make absolutely sure nothing ever blocks ever. Hmm, is this beginning to sound like it may actually be as problematic as threads? And one more thing: exceptions. Dealing with exceptions in an asynchronous environment is very difficult, since control is inverted and exceptions don't work in callback mode. Instead, for exceptions to work properly, all of the "Fibered" em-synchrony-like drivers must catch, pass along, and rethrow exceptions. This is left as an exercise to the driver writer.

Threads are Good


Threads are bad when they have to share data.  But when you have a web server handling multiple requests concurrently with threads, they really don't need to share any data at all.  When threads don't share any data, multithreading is completely transparent to the end user. There are a few gotchas in multithreaded Rails, such as some foibles with the initial code loading, but after you get multithreaded Rails going, you won't even notice the difference from using a single thread.  So what cases would Async Rails be better than multithreaded Rails for?  I/O bound cases. For many people the idea of an I/O bound application draws up the canonical Rails use case: a database-bound app.

"Most Rails apps are database bound!" says the Rails cargo cult, but in my experience, useful webapps do things.  That said, Async Rails will have its main benefits over multithreaded apps in scenarios where the application is primarily I/O bound, and a webapp which is little more than a proxy between a user and the database (your typical CRUD app) seems like an ideal use case.

What does the typical breakdown of time spent in various parts of your Rails app look like?  The conventional wisdom would say this:

But even this is deceiving, because models generally do things in addition to querying your database. So really, we need a breakdown of database access time.  Evented applications benefit from being bound on I/O with little computation, so for an Async Rails app this is the ideal use case:

Here our application does negligible computation in the models, views, and controllers, and instead spends all its time making database queries. This time can involve writing out requests, waiting while the database does its business, and consuming the response.

This picture is still a bit vague.  What exactly is going on during all that time spent doing database stuff?  Let's examine my own personal picture of a typical "read" case:



For non-trivial read cases, your app is probably spending a little bit of time doing I/O to make the REQuest, most of its time waiting for the database QueRY to do its magic, and then spending some time reading out the response.

But a key point here: your app is spending quite a bit of time doing nothing but waiting between the request and the response.  Async Rails doesn't benefit you here. It removes some of the overhead for using threads to manage an idle connection, but most kernels are pretty good about managing a lot of sleepy threads which are waiting to be awoken nowadays.

So even in this case, things aren't going to be much better over multithreaded apps, because your Rails app isn't actually spending a lot of time doing I/O, it's spending most of it's time waiting for the database to respond. However, let's examine a more typical use case of Rails:

Here our app is actually doing stuff! It's actually spending a significant amount of time computing, with some time spent doing I/O and a decent chunk spent just blocking until an external service responds. For this case, the multithreaded model benefits you best: all your existing Ruby tools will Just Work (provided they don't share state unsafely), and best of all, when running multithreaded on JRuby or IronRuby (or Rubinius soon!) you can run a single VM image, reduce RAM usage by sharing code between threads, and leverage the entire hardware stack in the way the CPU manufactures intended.

Why You Should Use JRuby

JRuby provides native multithreading along with one of the most compatible alternative Ruby implementations out there, lets you leverage the power of the JVM, which includes a great ecosystem of tools like VisualVM, a mature underlying implementation, some of the best performance available in the Ruby world, a diverse selection of garbage collectors, a significantly more mature ecosystem of available libraries (provided you want to wrap them via the pretty nifty Java Interface), and the potential to deploy your application without any native dependencies whatsoever. JRuby can also precompile all of your Ruby code into an obfuscated Java-like form, allowing you to ship enterprise versions to customers you're worried might steal your source code.  Best of all, when using JRuby you also get to use the incredibly badass database drivers available for JDBC, and get things like master/slave splits and failover handled completely transparently by JDBC. Truly concurrent request handling and awesome database drivers: on JRuby, it Just Works.

Why not use IronRuby? IronRuby also gives you native multithreading, but while JRuby has 3 full time developers working on it, IronRuby only has one. I don't want to say that IronRuby is dying, but in my opinion JRuby is a much better bet. Also, the JVM probably does a better job supporting the platforms of interest for running Rails applications, namely Linux.

Is Async Rails Useful? Kinda.

All that said, are there use cases Async Rails is good for? Sure! If your app is truly I/O bound, doing things like request proxying or a relatively minor amount of computation as compared to I/O (regex scraping comes to mind), Async Rails is awesome. So long as you don't "starve" the event loop doing too much computation, it could work out for you.

I'd really be curious about what kinds of Rails apps people are writing that are extremely I/O heavy though.  To me, I/O bound use cases are the sorts of things people look at using Node for. In those cases, I would definitely recommend you check out Rainbows instead of Async Rails or Node.  More on that later...

Why I Don't Like EventMachine, And Why You Should Use Rev (and Revactor) Instead

em-synchrony is built on EventMachine. EventMachine is a project I've been using and have contributed to since 2006. I really can't say I'm a fan. Rather than using Ruby's native I/O primitives, EventMachine reinvents everything. The reason for this is because its original author, Francis "Garbagecat" Cianfrocca, had his own libev(ent)-like library, called "EventMachine", which was written in C++. It did all of its own I/O internally, and rather than trying to map that onto Ruby I/O primitives, Francis just slapped a very poorly written Ruby API onto it, foregoing any compatibility with how Ruby does I/O. There's been a lot of work and refactoring since, but even so, it's not exactly the greatest codebase to work with.

While this may have been remedied since last I used EventMachine, a key part of the evented I/O contract is missing: a "write completion" callback indicating that EventMachine has emptied the write buffer for a particular connection. This has lead to many bugs in cases like when proxying from a fast writer to a slow reader, the entire message to be proxied is taken into memory. There are all sorts of special workarounds for common use cases, but that doesn't excuse this feature being missing from EventMachine's I/O model.

It's for these reasons that I wrote Rev, a Node-like evented I/O binding built on libev. Rev uses all of Ruby's own native I/O primitives, including Ruby's OpenSSL library. Rev sought to minimize the amount of native code in the implementation, with as much written in Ruby as possible. For this reason Rev is slower than EventMachine, however the only limiting factor is developer motivation to benchmark and rewrite the most important parts of Rev in C instead of Ruby. Rev was written from the ground up to perform well on Ruby 1.9, then subsequently backported to Ruby 1.8.

Rev implements a complete I/O contract including a write completion event which is used by Revactor's Revactor::TCP::Socket class to expose an incomplete duck type of Ruby's TCPSocket.  This should make monkeypatching existing libraries to use Revactor-style concurrency much easier.  Rather than doing all the em-synchrony-style Fiber thunking and exception shuffling yourself, it's solved once by Revactor::TCP::Socket, and you just pretend you're doing normal synchronous I/O.

Revactor comes with all sorts of goodies that people seem to ask for often. Its original application was for a web spider, which in early 2008 was sucking down and scanning regexes on over 30Mbps of data using four processes running on a quad core Xeon 2GHz. I'm sure it was, at the time, the fastest concurrent HTTP fetcher ever written in Ruby. Perhaps a bit poorly documented, this HTTP fetcher is part of the Revactor standard library, and exposes an easy-to-use synchronous API which scatters HTTP requests to a pool of actors and gathers them back to the caller, exposing simple callback-driven response handling. I hear people talking about how awesome that sort of thing is in Node, and I say to them: why not do it in Ruby?

Why Rainbows Is Cooler Than Node

Both Rev and Revactor-style concurrency are provided by Eric Wong's excellent Rainbows HTTP server. Rainbows lets you build apps which handle the same types of use cases as Node, except rather than having to write everything in upside async down world in JavaScript, using Revactor you can write normal synchronous Ruby code and have everything be evented underneath. Existing synchronous libraries for Ruby can be patched instead of rewritten or monkeypatched with gobs of Fiber/exception thunking methods.

Why write in asynchronous upside down world when you can write things synchronously? Why write in JavaScript when you can write in Ruby? Props to everyone who has worked on solutions to this problem,  and to Ilya for taking it to the mainstream, but in general, I think Rev and Revactor provide a better model for this sort of problem.

Why I Stopped Development on Rev and Revactor: Reia

A little over two years ago I practically stopped development on Rev and Revactor. Ever since discovering Erlang I thought of it as a language with great semantics but a very ugly face. I started making little text files prototyping a language with Ruby-like syntax that could be translated into Erlang. At the time I had outgrown my roots as an I/O obsessed programmer and got very interested in programming languages, how they work, and had a deep desire to make my own.

The result was Reia, a Ruby-like language which runs on top of the Erlang VM. I've been working on it for over two years and it's close to being ready! It's got blocks! It's got Ruby-like syntax! Everything is an (immutable) object! All of the core types are self-hosted in Reia. It's got a teensy standard library. Exceptions are kind of working. I'd say it's about 75% of the way to its initial release. Soon you'll be able to write CouchDB views with it.

Erlang's model provides the best compromise for writing apps which do a lot of I/O but also do a lot of computation as well. Erlang has an "evented" I/O server which talks to a worker pool, using a novel interpretation of the Actor model. Where the original Actor model was based on continuations and continuation passing, making it vulnerable to the same "stop the world" scenarios if anything blocks anywhere, Erlang chose to make its actors preemptive, more like threads but much faster because they run in userspace and don't need to make a lot of system calls.

Reia pursues Erlang's policy of immutable state systemwide. You cannot mutate state, period. This makes sharing state a lot easier, since you can share a piece of state knowing no other process can corrupt it. Erlang  uses a model very similar to Unix: shared-nothing processes which communicate by sending "messages" (or in the case of Unix, primitive text streams).  For more information on how Erlang is the evolution of the Unix model, check out my other blog post How To Properly Utilize Modern Computers, which spells out a lot of the same concepts I've discussed in this post more abstractly.  Proper utilization of modern computers is exactly what Reia seeks to do well.

Reia has been my labor of love for over two years. I'm sorry if Rev and Revactor have gone neglected, but it seems I may have just simply been ahead of my time with them, and only now is Ruby community interest in asynchronous programming piqued by things like Node and em-synchrony. I invite you to check out Rev, Revactor, and Reia, as well as fork them on Github and start contributing if you have any interest in doing advanced asynchronous programming on Ruby 1.9.

back to top
Posted 29 days ago at The GitHub Blog

Sorry for the last minute notice, but GitHub will be buying beer for geeks at Flat Top Johnny's in Cambridge tonight (Tuesday, August 10th) around 9:30. We're piggybacking on the bostonrb after party, so come mix and mingle. I may even get a couple LinuxCon geeks to come out with us, which is why I'm here in the first place.

Also, even though PJ won't be there, it is his birthday so come and celebrate it as it was meant to be celebrated!


View Larger Map

Where and When:

1 Kendall Sq
Cambridge, MA 02139
9:30p, Tues Aug 10th

back to top
Favorite
Posted 29 days ago at Ruby5

Microsoft, IronRuby, and Jimmy Schementi start this episode of Ruby5. Then, we touch on Ruby Zucker, AJAX Exceptions, details of the Ruby splat operator, and learning BDD by playing dice.

Listen to this episode on Ruby5

This episode is sponsored by Acts As Conference
Acts As Conference is a Ruby on Rails and Software Craftsmanship Training conference taking place October 28-30 in Orlando, Florida. Over the course of three days, you'll participate in workshops, sessions, discussions, lightning talks, and more.

Microsoft Tires of IronRuby; Jimmy Schementi Jumps Ship
Probably the biggest Ruby headline this weekend was the news that Jimmy Schementi is leaving Microsoft where he was previously working on IronRuby. This follows the recent IronRuby 1.0 release back in April and appears to be prompted by an internal reorganization of the project.

Ruby Zucker 2 - Syntactic Sugar for Ruby
Last week Jan Lelis released the Ruby Zucker gem which provides a bit of syntactic sugar to the standard Ruby library. It provides shortcuts to Enumerable’s #zip, Inifinity and NaN constants, Array #sum, a left-chomp on strings, unions with Regular Expressions, and a lot more.

Rails, Ajax and Exceptions? Bring it on.
Dane Harrigan wrote up an interesting solution to the problem of displaying error messages in your browser when they occur during AJAX requests. His method involves using an around_filter and custom jQuery error handler call to display the full exception message in a lightbox. Pretty nifty.

Learn BDD Playing Dice
On August 3rd, Valerio Farias published a free ebook he created titled, "Learning BDD Playing Dice." It's definitely introductory, but certainly easy to follow and has a decent story line if you're new to BDD and want to flex your RSpec muscles.

Placeholder Images with Placeholder
Have you ever needed placeholder images when developing your site layout? If so, you should checkout placeholder.it, which is a web service that generates those images on the fly, just for you. And, recently, Matt Darby released the Placeholder gem which gives a simple interface to generating the request URLs for these resources.

What is splat/unary/asterisk operator useful for?
A few days ago Rafael Magana wrote up great article walking through all the uses of the splat operator, and provided some great code samples. So, if you've got any questions about what that darn asterisk-thing is used for, this is definitely worth a quick look.

back to top
Posted 29 days ago at The Geek Talk

Who is Justin Palmer?

I’m a proud Father, a wonderful husband (my wife might be reluctant to confirm this), Trail Blazers fan, amateur Southern cook, and I do both programming and design.  I helped cofound ActiveReload, the company that went on to produce Lighthouse, Warehouse and Mephisto.  Today you can find me at ENTP where I help design and improve the frontend of our products.  I’ve also been known to do a little bit of iOS development over at labratrevenge.

Where and when did you start programming?

I think it was around 98 when I started design work and 2000 when I began programming.  Those days are mostly a blur given how much my life has changed over the last 10 years.  At the time I was still living with my parents in the tiny town I grew up in: Walnut, MS which had a population of about 750 people.  I worked on a production line building furniture in a near by town during the day and I would play guitar with my band during the night.  We were always needing promotional material for shows  so I did what I had to at the time and acquired a copy of Photoshop.

Not long after we began pumping out new promotional material it was decided we needed a website.  I was curious to see how it was done so I began to learn how to write HTML and eventually how to write PHP.

Eventually we ran into some trouble with the band and my mom got scared, and said,  “You’re movin’ with your auntie and uncle in Bel-Air.”  I whistled for a cab and when it came near The license plate said fresh and it had dice in the mirror. If anything I could say that this cab was rare, but I thought, “Nah, forget it. Yo, holmes to Bel-Air!” I pulled up to the house about 7 or 8 and I yelled to the cabbie, “Yo homes smell ya later!” Looked at my kingdom I was finally there, to sit on my throne as the prince of Bel-Air.

What does your typical day look like?

I always try to start my day with a cup of Stumptown.  Then I usually catch up on the world of politics before I hop into the ENTP Campfire room and begin working on Tender and Lighthouse.  A few days a week I try to make it into the office to get some face to face time with the rest of the crew and, not to mention, have drinks with them.

What do you do in your free time?

I like to spend time with my Wife and our Son, go hiking, cook, or just walk around the city.  I’m also a pretty big sports junky.  I love to go watch the Blazers and Timbers play in Portland when I can and catch any college football or basketball game I can on TV.

Current favorite apps?

I don’t think I’d consider these my favorite applications, but they are the applications which I find hard to replace.  TextMate, Photoshop & Safari.

What OS do you prefer?

Mac OS X.

Small picture for your Workplace?

We share an office with the  Kongregate (they’re downstairs).  It’s a nice little spot in downtown Portland.

Favorite: Language, JS Framework?

JavaScript and Node.js

Name something that has inspired you recently?

The wonderful work designers are sharing on Dribbble is really inspiring.  It’s a great place chock full of talent.  It really encourages me to improve my own work or helps me get unstuck when I feel like I’m in a rut.

What do you prefer (and why)? Freelance work or full time employment?

Both have their ups and downs.  I love the stability and health insurance that comes with working full time. I love the freedom and experimentation that comes with freelance.

What are your personal projects and goals for 2010?

I don’t think I’ll complete it by the end of 2010, but I’d really love to release an iPhone game.

back to top
Posted 29 days ago at Engine Yard Ruby on Rails Blog

In computer science, concurrency is a property of systems in which several computations are executing simultaneously, and potentially interacting with each other. The computations may be executing on multiple cores in the same chip, preemptively time-shared threads on the same processor, or executed on physically separated processors.
-- Wikipedia Concurrency article Simply put, concurrency is when you have more than one logical thread of execution occurring simultaneously, or at least appearing to occur simultaneously. When you write software that makes use of concurrency, you want your software to do two or more things at once. The motivations for using concurrency are varied. Sometimes you may have architectural reasons for using concurrency -- your code makes more sense to you or is easier to write if you conceive it in more than one discretely executing unit. In other cases you may want to employ concurrency in order to make better use of the multiple cores that many modern computers have, enabling you to get better total throughput out of your code than you would have from a non-concurrent implementation. Whatever the motivation for employing concurrency, the reality is that concurrency is a complex subject. There are many different ways to achieve concurrency in software, and they each have their own set of tradeoffs. Furthermore, if your platform is Ruby, your decisions about what kind of concurrency to employ will be influenced by the specific Ruby implementation you are targeting. Each provides a different set of concurrency options for you to consider. This is the first installment in a new series of articles focusing on introducing and exploring the variety of concurrency options available in the Ruby ecosystem. Advantages and disadvantages will be discussed for each, and I'll leave you with a few examples of how you can leverage these different options in your code. It should be a fun subject to explore! Concurrency is all about multitasking -- doing more than one thing at once. The building blocks of multitasking are processes, threads, and fibers. Each of these components is complex in itself, both because of the nuances in how they interact and can be combined, and because different platforms have variations in which capabilities they implement and in how they are implemented. Luckily, their overall description can be summarized in a useful way. Processes are independent units of execution that generally share nothing with other processes, except for resources which are intended to be shared (such as shared memory segments, shared IO resources, or memory mapped files). Processes carry a lot of state information with them and have their own address spaces. Communication between them has to be through an interprocess communication mechanism provided by the platform that the processes are running in. Processes running on the same machine will be scheduled by the kernel, which will typically use some sort of time slicing algorithm to spread CPU usage of all running processes across the available cores. Threads come in several different flavors, including kernel, user space, and green threads. On some platforms there are entities called light-weight processes that bring kernel threads into user space so they look somewhat like processes, but are less expensive. For our purposes, threads are contained within a process, and share the memory space and process state of the process with each other. Green threads differ in that they are not controlled or scheduled by the operating system. Rather, they are provided by the process itself. This has a portability advantage because it means that the threads will be available on every platform that the process can run on, and will work the same on each. The main disadvantage is that green threads, being managed by the process itself, are generally confined to sharing a single core, and are limited to the peculiarities of the process's threading implementation (which may vary substantially from the platform's own threading implementation). Regardless of the type of threading, context switching with threads is generally faster than it is with processes. Fibers are like user space threads, except the operating system doesn't handle scheduling for them. Instead, fibers must be explicitly yielded to allow other fibers to run. This can have performance advantages like the reduction of system scheduling overhead. Since multitasking with fibers is cooperative, the need to use locks on shared resources is reduced or eliminated. Programmers can also leverage fibers to their advantage with IO operations by allowing other things to run while waiting for a slow or blocking IO operation. Ruby concurrency isn't quite as simple as selecting one of the above and using it, however. In the beginning, there was just Ruby, a single implementation that everyone used. This Ruby implementation, now commonly called the Matz Ruby Implmenetation (MRI), saw a widespread usage explosion with the 1.8.x version. It's pretty old now. This is from the ftp://ruby-lang.org FTP server:
carbon:/home/ftp/pub/ruby/1.8$ ls -la | grep ruby-1.8.0
-rw-rw-r--  1 root     ftp   1979070 Aug  4  2003 ruby-1.8.0.tar.gz
So, it has been around for a while, and offers a good starting point for discussing concurrency in Ruby. MRI Ruby 1.8.x supports concurrency in a few ways. One of the first things newcomers to Ruby leap for are its threads. Depending on the language these newcomers were familiar with before arriving at Ruby, they may be in for a surprise. MRI Ruby 1.8.x provides a green thread implementation. As mentioned above, green threads do not make use of any threading system native to the platform. Instead, 1.8.x's threads are implemented within the interpreter itself. This leads to threads behaving consistently across any platform the interpreter runs on. Because they are green threads, however, they offer no advantages for CPU bound tasks. cpu_bound_threads.rb
require 'benchmark'
threads = []
thread_count = ARGV[0].to_i
iterations = ARGV[1].to_i
increment = iterations / thread_count.to_f
sum = 0

Benchmark.bm do |bm|
  bm.report do
    thread_count.times do |counter|
      threads << Thread.new do
        my_sum = 0
        queue = (1 + (increment * counter).to_i)..(0 + (increment * (counter + 1)).to_i)
        queue.each do |x|
          my_sum += x
        end
        Thread.current[:sum] = my_sum
      end
    end

    threads.each {|thread| thread.join; sum += thread[:sum]}

    puts "The sum of #{iterations} is #{sum}"

  end
end
This is a simple program that takes a large range of numbers, divides them into smaller ranges, and hands each smaller range to a thread that calculates the sum of the range it was given. The results from each individual thread are then added together to arrive at a final answer. All examples ran on an 8 core Linux machine. The numbers below are an average of the results of 100 runs for each set of inputs. Threads Iterations 50000 500000 5000000 1 0.01730298 0.17149276 1.70610744 2 0.01724724 0.17179465 1.70557474 4 0.01729293 0.17181384 1.70570264 8 0.01741591 0.17210276 1.71201153 As demonstrated by the numbers, MRI 1.8 threads are absolutely no help at all for a CPU bound application. In fact, there is a small but measurable cost to the overhead of managing them that is apparent in the numbers. As thread count increased, timing consistently and measurably slowed. If you are an MRI 1.8 user, do not despair; threads are but one concurrency option available to you. An option that will better serve you for CPU bound tasks is process based concurrency. The idea is simple. In order to leverage multiple cores/CPUs, just create more than one process to handle the work load. Ruby provides a fork() method call which, on platforms that support it using the underlying fork() call from the C standard library. This will create a new process, with a new process ID, that can be considered an exact copy of the parent process, except that its resource allocations will be reset to 0. Since processes do not share memory spaces, you must utilize another system provided communication mechanism in order to pass work to or from processes; this avoids the potential pitfalls that arise when trying to correctly manage locks on shared resources, but it does force one to think more specifically about exactly how to achieve communication. cpu_bound_processes.rb
require 'benchmark'
processes = []
process_count = ARGV[0].to_i
iterations = ARGV[1].to_i
increment = iterations / process_count.to_f
sum = 0

def in_subprocess
  from_subprocess, to_parent = IO.pipe

  pid = fork do
    from_subprocess.close
    r = yield
    to_parent.puts [Marshal.dump(r)].pack("m")
    exit!
  end

  to_parent.close
  [pid,from_subprocess]
end

def get_result_from_subprocess(pid, from_subprocess)
  r = from_subprocess.read
  from_subprocess.close
  Process.waitpid(pid)
  Marshal.load(r.unpack("m")[0])
end

Benchmark.bm do |bm|
  bm.report do
    process_count.times do |counter|
      processes << in_subprocess do
        my_sum = 0
        queue = (1 + (increment * counter).to_i)..(0 + (increment * (counter + 1)).to_i)
        queue.each do |x|
          my_sum += x
        end
        my_sum
      end
    end

   processes.each {|process| sum += get_result_from_subprocess(*process)}

   puts "The sum of #{iterations} is #{sum}"

  end
end
In this example I used IO pipes to send data from the master process to the children, and to receive data from the children, back into the master. As earlier, testing was done on an 8 core linux machine, with 100 runs of each test. The program is equivalent to the threaded version, and was changed only as necessary to enable it to be used in a multiprocess model instead of a multithread model. Worker Processes Iterations 50000 500000 5000000 1 0.01805432 0.17199047 1.70812685 2 0.0098329 0.08675517 0.85509328 4 0.00609409 0.0446612 0.43100698 8 0.00847991 0.05346145 0.25621009 Take a good look at these numbers. Everything moves in the correct direction, until you get to the 8 process column. Then timing slows for both the 50000 and 500000 iteration rows that are under the 4 process column. Do you have any theories as to why? Processes are, in many ways, a great way to handle concurrency. One of their drawbacks, though, is that they are heavy structures. They can take up significant time and resources to create . Linux uses copy-on-write semantics when creating forked processes. This means it doesn't actually duplicate the address space of the forked process until pages in that space start changing. Then, it duplicates what changes. This means that forked processes on Linux can be created fairly quickly. However, MRI 1.8 is not very friendly to copy-on-write semantics. If you are unfamiliar with the way memory is managed and garbage is collected in MRI 1.8, you should check out my article on MRI Memory Allocation. One key aspect is that objects carry all of their status bits with them. This means that when the garbage collector scans the object space to find objects it can collect, it touches every object in the address space. For a process forked with copy-on-write semantics, this forces the kernel to make copies of all of those pages. This takes time, and largely negates the fast-creation benefit of copy-on-write forked processes. The times for the lower iterations on the 8 thread test reveal a cost to this form of concurrency. The overhead associated with creating the forked processes overwhelms the performance gains from the division of labor when the work to be done is brief enough. This is a reality for any form of concurrency -- there is always a performance tax from some amount of overhead. That tax is just higher when spawning something heavy like a process. Keep this in mind when you explore concurrency options for your task. These first two examples both represent CPU bound problems. Many real world problems are not CPU bound, though. Rather, they are IO bound issues. Because an IO bound problem has latencies imposed on it by something outside of the program itself, IO bound problems can provide an excellent case for using MRI 1.8's green threads to improve performance. io_bound_threads.rb
require 'net/http'
require 'thread'
require 'benchmark'

def get_data(url)
  tries = 0
  response = nil
  if /^http/.match(url)
    m = /^http:\/\/([^\/]*)(.*)/.match(url)
    site = m[1]
    path = m[2]
    begin
      http = Net::HTTP.new(site)
      http.open_timeout = 30
      http.start {|h| response = h.get(path)}
    rescue Exception
      tries += 1
      retry if tries < 5
    end
  end
  response.kind_of?(Array) ? response[1] : response.respond_to?(:body) ? response.body : ''
end

mutex = Mutex.new
signal = ConditionVariable.new
thread_count = ARGV[0].to_i
fetches = ARGV[1].to_i
url = ARGV[2]
threads = []
count = 0
active_threads = 0

Benchmark.bm do |bm|
  bm.report do
    while count < fetches
      while count < fetches && active_threads < thread_count
        mutex.synchronize do
          active_threads += 1
          count += 1
        end
        Thread.new do
          get_data(url)
          mutex.synchronize do
            active_threads -= 1
            threads << Thread.current
            signal.signal
          end
        end
      end

      mutex.synchronize do
        signal.wait(mutex)
      end
      while th = threads.shift
        th.join
      end
    end
  end
end
This script makes many HTTP requests. For simplicity's sake, lets say it just makes the same request over and over again, but could easily be expanded to take a list of URLs, and to do something useful with the returned data. The script uses threads much like the CPU bound example, except that it is a bit more sophisticated in how it counts the work it has assigned to generated threads, and how it waits for all the threads to be completed. This table shows timing from it in action. The target URL used was not local to the testing machine. Each run used the indicated number of threads to gather the URL, either a "fast" URL, with an over-the-net response speed of about 35 requests per second, or a "slow" URL with an over-the-net response speed of about 3 requests per second, 400 times. There were 100 runs completed. The numbers below are an average from those runs. Worker Threads Request speed 35/second 3/second 1 6.53462668 61.1016239 2 3.34861606 30.4514539 5 1.38942396 12.1620945 10 0.72804622 6.0968646 20 0.47964698 3.0411382 Just a glance at these numbers clearly shows that Ruby threads are a big help with an IO bound activity like this. The relationship between number of threads and reduction in time to complete the task is not linear; but even with up to 20 threads there is a significant benefit to additional numbers of threads. The benefit is more linear, and evident for slower requests because the requests spend more time waiting on IO, and less on CPU bound activities. There are some caveats to be aware of with regard to Ruby threads. First, even though they are green threads, as soon as one starts sharing resources between threads, threading becomes something that can be hard to get right. Share as little as possible, thoroughly think through your code, and use tests to support your reasoning, because threading problems can be hard to diagnose and solve. Second, MRI 1.8 has a limit on the number of threads that it will manage. As a consequence of how the internals are implemented, this means that on most systems (notably excluding win32 systems), total thread count is limited to 1024. Also, because of the way it is implemented, the overhead increases to manage a larger number of threads versus smaller. Each thread consumes a significant amount of memory, so do not go crazy with threads or it will backfire on you. Third, because of the way that Ruby threading is implemented, it is possible for a C extension to Ruby to take control of the process and prevent Ruby from allowing context switches to other threads. It is possible to write extensions so that they do not do this, but many are not written in this way. Where this bites most people, is with code that interacts with a database. One can reasonably look at a database query as an IO bound activity -- all the Ruby process is really doing is sending a request to the DB and waiting for a response. However, most DB interaction libraries are implemented as C extensions, and some of them do not play well with Ruby threads. One of the most common offenders is Mysql-Ruby. It will block all of Ruby while waiting for the result from a long running query. This means that a long running query will block the whole process until it returns. On the other hand, Ruby-PG, the driver for Postgres, will context switch within pgconn_block(), the function that makes blocking calls to the database, thus permitting other MRI 1.8 threads to run even during a long running query. Fourth, because MRI 1.8 threads are green threads, they all run inside the context of a single process and a single system thread. Thus, while they give the appearance of concurrency, there is actually only one thread running at once. This is okay, because it is the appearance of concurrency that matters. If you run top on your laptop or VM shell, you will see a large number of processes running on your system. This number will exceed the number of cores that you have by a large margin, but you rarely have to worry about which processes are actually running on one of the cores at any given time. Your kernel takes care of slicing up access to the CPU into fine enough grains that it appears that all the running processes are executing on a core at the same time (even though most of them probably are not actually running at any given time). Concurrency in computing doesn't strictly mean that two or more things are actually running at the same time. Rather, it means that there is an appearance that they are, and that one works with them on the assumption that they are, and lets the underlying scheduler deal with making reality fit that appearance. An entire book could be written about concurrency in Ruby. I've just scratched the surface with this overview of process and thread based concurrency in Ruby. Hopefully this helped answer a few questions or suggested some techniques to consider. Future installments in this series will cover Ruby 1.9.x (which uses system threads as opposed to green threads), JRuby, Rubinius, and using event systems like EventMachine to handle concurrency. So stay tuned! There is a lot more coming soon!

back to top
Posted 29 days ago at The Geek Talk

Who is Tyler Weir?

Husband, Father, Coder, Gym Owner. Looking at all the others interviewed on the site, Tyler is also pleasantly surprised to be included.

Why Scala/Lift?

Scala: Type Safety, Functional Aspects and Actors.
Lift: Agility, efficiency, and the community around the framework.
In fact, the community that surrounds both Scala and Lift are incredible, which keeps me motivated.

Where and when did you start programming?

My first year at McMaster University.

What does your typical day look like?

Up at 5:30am, teach two classes at CrossFit Quantum, home to eat and squeeze wife and daughter, off to work (startup), home at 6:30, bed at 10, repeat during weekdays.  Weekends are reserved for family stuff and occasional code sprinting.

What do you do in your free time?

Life is simple but busy, just 3 parts, Family, CrossFit Quantum and my Startup.  Any free time is split up amongst video games (Starcraft2), reading and personal coding projects.

Current favorite apps?

OS X: Terminal, Vim, FireFox, EchoFon, QuickSilver and SimpleNote.
iOS : Camera+, SimpleNote, iMockups, Carcassone, iBooks, EPL Live! (to keep up with Everton, my English Premier League Football club)

What OS do you prefer?

OS X for desktop, CentOS for server.

Small picture for your Workplace?

Favorite: Language, JS Framework?

Scala, C and ObjC.  JQuery and ExtJS from Sencha. I wish I had more time to investigate Google’s Go, as systems
development is quite interesting.

Name something that has inspired you recently?

I’m inspired by the Lift Community and notably the work of David Pollak, Martin Odersky, Daniel Spewiak and Nathan Hamblen.

What do you prefer (and why)? Freelance work or full time employment?

I’ve done both and I think they fit me at different stages of my life.  Right out of university, when I pretty much knew nothing, it was excellent to be among very smart people at IBM.  Later on, with a bit of confidence and a desire to prove yourself, I think freelance/start-up is the right balance of excitement and fear.

What are your personal projects and goals for 2010?

Making my current startup a success.  I have much more influence than the last time, so I’m a bigger part of the
success or failure. Making the gym a success in terms of health of its community and economics.  I also have a number of fitness-related goals which don’t really fit here. :) In addition to being a good husband and father. :)

back to top
Posted 29 days ago at The GitHub Blog

We added a special header to all outgoing repository and direct message notifications that should make it much easier to filter emails:

Screenshot of an email with a List-ID header in GMail.

Direct messages get a List-ID of user.github.com, while repositories get repo/user.github.com. Do you want to easily filter all notifications from a user or organization? GMail supports wildcards too:

Screenshot of GMail's filter interface.

This is a quick workaround for those getting flooded with notifications. We feel your pain, and are discussing ways to solve this without drowning you in a sea of checkboxes.

back to top
Favorite
Posted 30 days ago at Aman King's Bliki

back to top

Episode 0.3.1 - Websockets:

Wynn and Micheil sat down with Peter Griess from Yahoo Mail, Martyn Loughran from Pusher App, and Guillermo Rauch from Socket.IO to talk about Websockets.

Download MP3

Items mentioned in the show:

  • WebSocket is a technology providing for bi-directional, full-duplex communications channels, over a single Transmission Control Protocol (TCP) socket, designed to be implemented in web browsers and web servers.
  • Socket.IO provides a really simple API to leverage Sockets on the client side.
  • Socket.IO Node.JS server - sockets for the rest of us (in Node.js)
  • Pusher App Hosted HTML5 web sockets service
  • Websocket-js A Flash fallback for browsers that do not support Websockets
  • Long polling Traditional approach to emulating push for web apps
  • HAProxy reliable, high performance TCP/HTTP load balancer
  • True Story A collaborative planning tool for agile teams
  • HTML5 Event Source A one-way websocket with limited browser support
  • Node Websocket Server Micheil’s websocket server written in low-level node.js, should be 90-100% spec compatible.
  • Hummingbird demo - a real time traffic visualizer
  • MongoDB Awesome NoSQL database featured on Episode 0.0.7
  • Redis an advanced key-value store. It is similar to memcached but the dataset is not volatile, and values can be strings, exactly like in memcached, but also lists, sets, and ordered sets.
  • EM-Websocket EventMachine based WebSocket server from Ilya Grigorik
  • Node.js YUI3 bindings YUI3 on the server?!
  • Telehash - a new wire protocol from Jeremie Miller, the guy behind XMPP, for exchanging JSON in a real-time and fully decentralized manner, enabling applications to connect directly and participate as servers on the edge of the network.

back to top
Posted 30 days ago at Railscasts

Here we continue upgrading an application to Rails 3 by getting our specs up and running and going through the rails_upgrade plugin.

back to top
Posted 30 days ago at RubyLearning Blog

Do YOU want us to continue with the Ruby Challenge for Newbies?

RubyLearning has been conducting the monthly Ruby Programming Challenge for Newbies for over a year now and so far 11 challenges have been completed. The 12th Challenge is in progress. All this was possible due to the extensive support we got from Rubyists across the world.

However today, probably due to lack of time or other commitments, not many experienced Rubyists are willing to set a Ruby challenge for the newbies? So, what do we do with the newer challenges? Do we dis-continue with the challenges? Do we change it from monthly to as an when? What are your suggestions?

Newbies, do you find the challenge interesting and useful – what are your thoughts?

In the meantime, are you interested in setting a Ruby challenge for the newbies? If so, do email me at satishtalim [at] gmail.com.

Do post your thoughts and suggestions. I am hopeful that we would be able to continue with the challenges.

Technorati Tags: RPCFN, Ruby Challenge, Ruby, Programming

back to top

Well as some of you have noticed, things have been a bit quiet around here this week.

Fear not, we’re not about to stop covering the fast-moving world of open source. We’ve just been working to give The Changelog a fresh coat of paint.

We love Tumblr, but creating custom themes isn’t the easiest workflow in the world. Fortunately, Wynn’s Fumblr project let’s us use Compass and Sass to speed things up a bit. We are also using Mark Wunsch’s Tumblr gem to post articles and episodes right from the command line.

New features

Some highlights of the new design include

  • Easy access the latest episode right from the header or from the /latest vanity URL
  • Browse all episodes from the new Episodes page
  • Goodbye, Flash, hello HTML5 audio for devices that support it (including iPad). For other browsers that don’t support MP3s in HTML5 (we’re looking at you FF on Mac) we’re using the 1 Bit audio player
  • Meet your contributors. Many of you didn’t know that Adam and Wynn weren’t the only Changeloggers. Meet Micheil, Kenneth, and Tim

back to top
Posted about 1 month ago at The GitHub Blog

In a little over twelve hours from now we'll be flipping the switch on GitHub Jobs. That means you only have until midnight PST to post a job at our discounted $150 rate.

We're really excited with the quality of jobs being posted and can't wait to flip the switch. We've seen some awesome listings so far — look for some highlights on the blog tomorrow.

Another reminder to upate your location and hiring status

If you're looking for work, we've designed a way to promote jobs you'll be interested in on your GitHub dashboard. But they will only be visible if you explicitly opt-in on the job profile section of your account settings — so if you're looking for work, make sure you've checked the available for hire checkbox.

We will also be using the location field of your public profile to better figure out what jobs you might be interested in — so make sure it's up to date.

back to top
Posted about 1 month ago at Ruby Inside

In April, we wrote about IronRuby hitting 1.0 and Microsoft's "3 years with Ruby [paying] off." It's sad, then, to read today that program manager Jimmy Schementi is leaving Microsoft citing a rapidly decreasing interest in dynamic languages (other than JavaScript) at the software giant.

[..] a year ago the team shrunk by half and our agility was severely limited. [..] In short, the team is now very limited to do anything new, which is why the Visual Studio support for IronPython took so long. IronRuby’s IDE support in Visual Studio hasn’t been released yet for the same reasons. [..] many other roadblocks have cropped up that made my job not enjoyable anymore.

Overall, I see a serious lack of commitment to IronRuby, and dynamic language[s] on .NET in general. [..] The bad-news is I will no longer be working on IronRuby full-time, but in the near future I’m definitely staying active. Also, Tomas will definitely continue working on IronRuby when he can; we weren’t the last two people left for no reason. :-)

Given that Tomas and I will only be working part-time on IronRuby now, I invite the Ruby and .NET communities to come help us figure out how to continue the IronRuby project, assuming that Microsoft will eventually stop funding it. I’ll start a thread on the IronRuby Mailing List shortly, so keep an eye on that if you’d like to help.

Jimmy Schementi

Schementi left Microsoft at the end of July and is on his way to work at a NYC-based financial technology consulting firm. I'm sure most Rubyists would be quick to join me in congratulating Schementi and the rest of the IronRuby team (including John Lam, who left in 2009) for making significant strides in a company and environment where the obstacles were piled high. We've wondered for years whether Windows is a first class platform for Ruby and now we know that Ruby certainly isn't even a second class language for Microsoft.

Schementi seems keen for people from outside of Microsoft to get involved with IronRuby to keep it alive. These sorts of efforts aren't often successful, because contributors usually bubble up over time to become more important, and he notes that he is now the first non-MS contributor merely by virtue of no longer working for MS. If, though, you're a .Net and Ruby hotshot and have the time and passion to become a hero in the worlds of DLR and "Ruby on Windows", there's a significant opportunity here for the taking.

back to top
Favorite
Posted about 1 month ago at defunkt's Gists

don't think so

back to top