Bindings, Platforms, and Innovation
This presentation focuses on the Internet and separating myth from fact, history from the future, and the mundane from the imaginative. Bob Frankston presents a vision of what could and should be.
Tracking change and innovation in the enterprise software development community
Posted by Steven Robbins on Jun 12, 2008 03:54 PM
The architecture underlying the very popular social application Twitter has been at the center of several discussions lately. Twitter had several instances of downtime and had turned off several popular features as the team tried to deal with the issues. What can be learned from looking at how Twitter tries to move forward?It doesn't do that [make a copy of the message for every user's follower], but that actually might be more efficient. Right now it goes into a database and when people want to get their timeline we construct the timeline out of the database and then, not every time, we then cache it in memory. But because things are written so often we are going to the database a lot just to update the cache. So there are lots of copies [of a message] in the cache but there's only one on disk. Our future architecture may be more like we're writing it many times because reading it will be a lot faster that way.The possibility of moving away from an SIS message architecture opens the door to using data techniques like Data Sharding that are already popular with many high-volume sites and applications. Randy Shoup talked about ways that eBay architected their systems for high scalability, in part, by using sharding:
The more challenging problem arises at the database tier, since data is stateful by definition. Here we split (or "shard") the data horizontally along its primary access path. User data, for example, is currently divided over 20 hosts, with each host containing 1/20 of the users. As our numbers of users grow, and as the data we store for each user grows, we add more hosts, and subdivide the users further. Again, we use the same approach for items, for purchases, for accounts, etc. Different use cases use different schemes for partitioning the dataBogdan Nicolau wrote an overview on the basics of database sharding. In the series, Bogdan discussed how to decide where and how to divide the data for an application. The main point in deciding was
What I’m trying to say is that no matter the logic you chose to split a table, always keep in mind that you want 0 join, order by or limit clauses which would require more than one table shards.Bogdan moved on to the application side of using shards. Along with providing several code samples to go along with an example problem, Bogdan gave reasons for why they should work:
As you can see, the weight now sits in the writing part, as the mapping table must be populated. When reading, the splitting of the data algorithm involved is no longer a concern.With several people involved in the discussions around how to scale Web 2.0, perhaps Twitter will continue to move towards a more stable, scalable architecture.
Comprehensive Threat Protection for REST, SOA, and Web 2.0 Applications
5 Ways to Ensure Application Performance
The Role of Open Source in Data Integration
Some things to help with shards:
Hibernate Shards
Metamatrix Incredible at federating and mapping data.
Any solution that includes writing to a database as part of the critical path of adding a status is asking for problems. I've actually been working on a micro-blogging app and it's amazing how poorly architected for this purpose Twitter has been. Messaging systems aren't new or novel, there's no reason to re-invent the wheel, only making it square this time.
This presentation focuses on the Internet and separating myth from fact, history from the future, and the mundane from the imaginative. Bob Frankston presents a vision of what could and should be.
This article explores the use of JBoss and jBPM to implement design solutions that effectively address the issue of orchestrating long running activities.
This presentation covers the use of graph databases as an optimal solution for data that is difficult to fit in static tables, rapidly evolving data or data that has a lot of optional attributes.
This session introduces Real Options and shows how it can help in running your project. Real Options is a decision-making process that can be used to manage risk.
This article discusses the use of bindings on services and references (including the instance of non-configured bindings) as the means to implement SCA communications in a Web and SOA environment.
After a short introduction to DSLs, Scott Davis plays with the keyboard showing how to approach the creation of a DSL by typing working snippets of Groovy code that get executed.
IBM Rational and InfoQ present, Scaling Agile with C/ALM, an eBook showing organizations how to become “finely tuned software delivery machines” by enabling team integration and scaling.
Amanda Laucher presents a real life enterprise application written in F#. She shows actual code snippets, explaining design decisions and suggesting how to use some of the F# constructs.
2 comments
Watch Thread Reply