Using Ruby Fibers for Async I/O: NeverBlock and Revactor
Fibers are slowly entering the Ruby programmers minds as a new concurrency primitive to put to good use. Two approaches are making use of a combination of Fibers and non-blocking or asynchronous I/O to solve the problem of userspace threads or Ruby 1.9's Giant Interpreter Lock (GIL), which allows only one Ruby language thread to be active at a time.
The biggest problem to work around are blocking syscalls, mostly for I/O. A blocking syscall, like a
read will only return once data is available. In a userpace thread system, this means that all the threads in a process are blocked as well. The solution is to decouple the I/O system from the way the blocking I/O. One way is to catch an I/O call before it causes a blocking syscall, issue this I/O request in a non-blocking way and suspend the Fiber, giving other Fiber a chance to run. Once the system gets a response for the I/O request, the Fiber can be scheduled again.
NeverBlockInfoQ: What exactly does NeverBlock do? As I see, it allows to pass code wrapped in Fibers to a pool of connections, which then resumes the Fibers as connections become available. What is the main contribution of NeverBlock?
Mohammad A. Ali Implementing the pooling features was needed for the other parts of NeverBlock to function properly. When all the pieces fall into place we should proivde instant IO concurrency for applications like Rails, Merb or Ramaze without requiring full thread safety.
To be able to achieve that goal we need a web server that wraps requests in fibers from the NeverBlock fiber pool and IO libraries (not just DB but all IO operations) that are aware of those fibers and use them for pausing and resuming requests. All must be orchestrated using an eventloop like EventMachine or Rev.
InfoQ: - Neverblock::PG is built on NeverBlock and the PostgreSQL driver which already supports non-blocking I/O out of the box. What exactly does NeverblockPG add in this case? If you wanted to add another evented driver for, say, Mysql which doesn't seem to do non-blocking I/O yet - can NeverBlock help with that?
Mohammad A. Ali The NeverBlock::PG driver makes the non-blocking operations transparent as long as you do them within the a fiber spawned by the NeverBlock fiber pool. This allows you to write something like this:
pool.spawn doA normal non-blocking implementation will be much more complex than that. Thanks to Fibers we are able to write seemingly blocking code that is run in a non-blocking manner.
res1 = db.exec(query1)
res2 = db.exec(query2_that_depends_on_query1)
We are taking the NeverBlock::PG driver a step further. We have just released a new activerecord-neverblock-postgresql adapter. Which brings non-blocking IO to ActiveRecord. I think it is easy to guess what is the next target for NeverBlock.
The interesting part is when applying NeverBlock to a full blown stack it can be done in an almost transparent way.
NeverBlock cannot help making a blocking driver non-blocking. It can help make a non-blocking driver operate in a seemingly blocking way without sacrificing the non-blocking features. That said, we are looking closely at efforts like Asymy (evented mysql driver) for future integration.
InfoQ: You library requires Fibers, which are only available on Ruby 1.9. Do you think this might be a problem for users (considering Ruby 1.9 is still changing a lot and hasn't really seen much adoption yet)? Could the benefits of your library or Fibers in general give an incentive to users to adopt Ruby 1.9?
Mohammad A. Ali I believe that Fibers are one of the best things that happened to Ruby1.9. The same functionality can be duplicated for Ruby1.8 via continuations but at the expense of much lower performance. Anyway the stable 1.9 release is upon us now and I really think that we should all be moving forward. I hope advantages offered by NeverBlock, Revactor and the likes would help in convincing people to switch.*
InfoQ: Looking at the source code for NeverBlock: you're opening the class Fiber and add a few methods. What's the reason for this?
Mohammad A. Ali Fibers lack a facility to store fiber local variables as you can do with threads. I needed those to replace the current ActiveRecord implementation of transactions and make it Fiber aware. I also needed them to make the non-blocking operations optional, even within the context of a Fiber.*
InfoQ: Have you heard of Revactor?
Mohammad A. Ali Yes, I have played a bit with Revactor. And I actually use Rev as a second backend for NeverBlock besides EventMachine (still experimental). But the goals of NeverBlock are different from Revactor. Revactor introduces a new concurrency model for Ruby while NeverBlock aims to bring concurrency to current Ruby programs with minimal changes.
Editor's note: a first attempt at an asynchronous MySQL driver, MySQLPlus was made available after this interview was conducted.
RevactorInfoQ: What's the current status of Revactor?
Tony Arcieri: A bit neglected against a myriad of other projects, however it is being used successfully in a commercial setting.
InfoQ: What are plans for future Revactor versions?
Tony Arcieri: Recently Aman Gupta released a "poor man's Fibers" implementation for Ruby 1.8. With something like that it may be possible to port Revactor to 1.8 as well, however the performance would suffer.
Right now Revactor relies on all Actors existing within the same Ruby Thread. Most of the machinery is in place to facilitate sending messages between Actors running in different Threads, however that doesn't work at present. I've been talking with people interested in supporting this and hopefully it will make its way into Revactor soon.
InfoQ: Do users need to program directly with Revactor's actors or is there a way to use RevActor to implement backends for other libraries, so it can be used transparently (to the developer)?
Tony Arcieri: Revactor is mostly compatible with the Actor implementation in Rubinius, however at this time there isn't an easy way to do network programming with Actors in Rubinius like you can in Erlang with gen_tcp. That said, programmers looking to write network applications using Actors can start on Revactor for now and port their applications over to Rubinius later.
InfoQ: How do you schedule blocking I/O requests? Do you use kernel threads to run I/O requests?
Tony Arcieri: When all outstanding Actor messages have been processed and no more Actors are runnable, Revactor uses an event library I wrote called Rev (similar to EventMachine) to monitor for I/O events. Rev uses the rb_thread_blocking_region() function in Ruby 1.9 to make blocking system calls monitoring for I/O readiness, so it's not necessary to spin off a separate kernel thread. Revactor exports a duck type of Ruby's TCPSocket class (Revactor::TCP) which let you make calls which appear blocking on the surface but actually just defer back to the Actor scheduler. It's easy to monkeypatch existing libraries which use Ruby sockets to use Revactor::TCP instead. For example, Revactor ships with a small monkeypatch for Mongrel which uses Actors for concurrency instead of Threads.
InfoQ: How do you handle code that issues a sequence of I/O requests? Do you batch requests?
Tony Arcieri: On the surface the calls appear to be "blocking". When an I/O request is issued, the current Actor is suspended and resceduled after an I/O completion occurs. This allows for libraries which rely on blocking imperative interfaces to function effectively on top of Revactor. Some examples (which don't run on Revactor now) would include things like ActiveRecord and DataMapper.
InfoQ: What's the status of Actors on Rubinius?
Tony Arcieri: They certainly do everything Revactor can do, with the caveat of "active mode" TCP sockets. This means that rather than imperatively reading for input from a TCP socket, incoming data is asynchronously delivered to a specified Actor using the standard inter-Actor messaging. This lets Actors handle I/O and inter-Actor messages side-by-side without issue. The Rubinius VM is presently being rewritten in C++, after which it will hopefully have all the features necessary to implement "active mode" message delivery. As soon as that's all ready to go I'll probably take a crack at getting it going.
InfoQ: Do you plan to support Revactor on Rubinius?
Tony Arcieri: No, Revactor is heavily tied to YARV features. Rubinius has an excellent concurrency model and I/O support in the form of Tasks and Channels, and Rubinius's existing Actor implementation leverages these both quite effectively. Revactor and Rubinius Actors are largely duck typed to each other, so writing programs which are cross-compatible shouldn't be too much of a headache.
InfoQ: Are there advantages of Actors or Revactor running on Rubinius (over Ruby 1.9)?
Tony Arcieri: For the time being Ruby 1.9 generally performs better and has greater compatibility with existing libraries. Rubinius is a work in progress and is presently undergoing a rewrite. In the future there's a number of advantages that Rubinius will have over Ruby 1.9, namely that the Task/Channel abstraction for concurrency and I/O is so much cleaner that what exists in Ruby 1.9. The solution for performant I/O on both 1.8 and 1.9 has long been to run an event framework such as EventMachine or Rev side-by-side with Ruby's built-in I/O, whereas Rubinius actually gets I/O right from the start.
InfoQ: Do you know of projects using Revactor?
Tony Arcieri: I've talked with various people who are using it for internal projects, mostly having to do with concurrent HTTP clients. I don't know of any released projects which are using it.
InfoQ: Is the dependency on Ruby 1.9 a problem with the adoption of Revactor or generally libraries using Fibers?
Tony Arcieri: I would certainly imagine so. I've certainly encountered my fair share of bugs dealing with Ruby 1.9 and imagine most people are wary of even trying to use it at all. That said I've seen a number of projects springing up lately which try to couple an event framework with Fibers, such as Ry Dahl's Flow web server.