InfoQ

InfoQ

News

My Bookmarks

Login or Register to enable bookmarks for unlimited time.

The content has been bookmarked!

There was an error bookmarking this content! Please retry.

Databases Roundup: Data Sharding for ActiveRecord and Faster Postgres IO

Posted by Mirko Stocker on Jul 21, 2008

Sections
Architecture & Design,
Development,
Operations & Infrastructure
Topics
Performance & Scalability ,
Data Access ,
Ruby
Tags
Data Partitioning ,
Database ,
Ruby1.9 ,
Concurrency ,
ActiveRecord

In this databases roundup we take a look at a new data sharding plug-in for ActiveRecord and how Postgres data access can be improved with the asynchronous client API.

Data Sharding for ActiveRecord

Data sharding is a technique to break a database into small partitions and to distribute them over several servers to improve the performance and scalability. How the data is partitioned is highly application dependent, eBay for example could partition by article category.

ActiveRecord does not support data sharding out of the box—this is where FiveRuns' DataFabric comes into play. DataFabric is an ActiveRecord plug-in that adds data sharding and also replication abilities to your models.

Introducing sharding into your models is quite easy, as this example from the README shows:

class MyHugeVolumeOfDataModel < ActiveRecord::Base
  data_fabric :replicated => true, :shard_by => :city
end

 See the FiveRuns blog or the DataFabric GitHub repository for more information.

Faster IO for Postgres

In other database-related news, Muhammed Ali was able to boost Ruby's Postgres access by about 40%. He uses Postgres' asynchronous client API and Ruby 1.9 Fibers to implement a nonblocking connection pool and a fiber pool. The interaction from the user program's perspective looks as follows:

[..] once a fiber calls cpool.exec the query is sent to the pool for processing and the fiber is halted, giving way for another one to start processing. The other one will halt as well once it hits a cpool.exec. Later during the event loop you will get notifications of completion of queries (in any order) and resume the fiber associated with the finished query.

Muhammed also thinks about working on a better integration with EventMachine, which might improve the performance even more. Take a look Muhammad's Blog where he describes his analysis in full detail and the code he used.

No comments

Watch Thread Reply

Educational Content

Jesper Boeg on Priming Kanban

In this interview, Jesper Boeg, author of the new InfoQ book – Priming Kanban, discusses the keys to using Kanban effectively, and how to get started if you are currently using other approaches.

New-age Transactional Systems - Not Your Grandpa's OLTP

John Hugg discusses high volume transaction processing applications with high and low frequency profiles, and how VoltDB can be used for that purpose.

Cool Code

Kevlin Henney examines code samples to see what can be learned from them starting from the premise that one won’t write great code unless he knows how to read it.

Collaboration: At the Extremities of Extreme

Jason Ayers share the observations he made watching a team of developers collaborating in real time on the same code base, pushing XP, pair programming and continuous integration to their extremes.

Yesod Web Framework

Michael Snoyman presents Yesod, a web framework written in Haskell and containing a web server, templating, ORM, libraries (templating, gravatar, etc.).

Transactions without Transactions

Richard Kreuter and Kyle Banker on how to avoid classical RDBMS transactional systems by using compensation mechanisms, transactional messaging or transactional procedures.

Attila Szegedi on JVM and GC Performance Tuning at Twitter

Attila Szegedi talks about performance tuning Java and Scala programs at Twitter: how to approach GC problems, the importance of asynchronous I/O, when to use MySQL/Cassandra/Redis, and much more.

10 tips on how to prevent business value risk

One category of risk that project teams need to ensure they address is business value failure – delivering a product that fails to provide value for the business investor.