New-age Transactional Systems - Not Your Grandpa's OLTP
John Hugg discusses high volume transaction processing applications with high and low frequency profiles, and how VoltDB can be used for that purpose.
The content has been bookmarked!
There was an error bookmarking this content! Please retry.
Posted by Abel Avram on Dec 30, 2009
80legs uses Plura’s grid of over 50,000 computers to crawl over 2 billion pages a day. Shion Deysarkar, 80legs CEO, says that their crawling services are generally requested by smaller search engines which do not afford their own large capacity grid, companies performing market research, organizations monitoring copyright infringement activities, and ad companies spying on what their competitors are doing.
The service can be accessed on demand by setting up a job and executing it. As any crawling process, the job needs a seed list which can be contained by a text file up to 1 GB in size. Other job parameters are:
When a job runs, the crawler starts reading web pages starting with the seed ones and considering the outgoing links options, and analyzes the content of the pages. Simple analysis is available by specifying keywords to match or by selecting information based on regular expressions, but complex analysis can be performed on the data by using a custom application or a pre-built 80legs application. The analysis application needs to be written in Java. 80legs plans to open an application store where developers can sell their applications at their desired price and will collect all the revenue. 80legs has launched a contest to attract developers.
Paid subscriptions offer access to a Python API to interact with the crawling engine. Plans are for a Perl API. Free subscribers can create and control their jobs through the 80legs Portal.
There is a free plan with some limitations: 1 job at a time, 100k pages of max 100KB each, a 10MB analysis application (Java JAR), no API, 1 hit per second for the domain searched. There are two paid subscriptions, the top one offering 5 concurrent repeatable jobs with 10M pages/job, 10 MB/page, a 10 MB JAR, and 10 hits/sec/domain for $2/million pages crawled and 3 cents for CPU-hour utilized.
John Hugg discusses high volume transaction processing applications with high and low frequency profiles, and how VoltDB can be used for that purpose.
Kevlin Henney examines code samples to see what can be learned from them starting from the premise that one won’t write great code unless he knows how to read it.
Jason Ayers share the observations he made watching a team of developers collaborating in real time on the same code base, pushing XP, pair programming and continuous integration to their extremes.
Michael Snoyman presents Yesod, a web framework written in Haskell and containing a web server, templating, ORM, libraries (templating, gravatar, etc.).
Richard Kreuter and Kyle Banker on how to avoid classical RDBMS transactional systems by using compensation mechanisms, transactional messaging or transactional procedures.
Attila Szegedi talks about performance tuning Java and Scala programs at Twitter: how to approach GC problems, the importance of asynchronous I/O, when to use MySQL/Cassandra/Redis, and much more.
One category of risk that project teams need to ensure they address is business value failure – delivering a product that fails to provide value for the business investor.
InfoQ spoke to the authors of Software Systems Architecture on a couple of new topics, the System Context viewpoint and Agile, which have been added to the second edition.
2 comments
Watch Thread Reply