Ruby.rewrite(Ruby)
In this RubyFringe talk, Reginald Braithwaite writes Ruby code to read, write, and rewrite Ruby. Demos include extending Ruby with conditional expressions, call-by-name and more.
- Ruby,
Tracking change and innovation in the enterprise software development community
Posted by Jonathan Allen on Feb 28, 2008 12:53 AM
Microsoft and Unisys are claiming that they hold the record for loading information into a relational database. The unofficial benchmark was 1 TB of TPC-H data moved in under 30 minutes using an Extract, Transform, and Load (ETL) tool. The previous record for that volume was 45 minutes and was held by Informatica.
The specific claim is,
More than one terabyte of data was parsed from flat files, transferred over the network and loaded into the destination database in less than 30 minutes, a world record beating all previously published results using an ETL tool. That is a rate in excess of 2 TB per hour (650+ MB/second). To be precise, 1.18TB of flat file data was loaded in 1794 seconds. This is equivalent to 1.00TB in 25 minutes 20 seconds or 2.36TB per hour.
The ETL benchmark uses TPC-H data but is not an official benchmark of the Transaction Processing Performance Council. But this has not stopped companies such as Informatica from bragging about their performance. Microsoft says that ETL benchmarks are important because they represent common real world scenarios.
It is rare in businesses today that data is always available on the destination system, and does not need to be standardized or corrected for errors before loading. These rare cases are the times that bulk loading data makes sense. Data integration can involve complex transformation rules, error checking and data standardization techniques. ETL tools like SSIS can perform these functions such as moving data between systems, reformatting data, integrity checking, key lookups, tracking lineage, and more. SSIS has proven itself to be a versatile ETL tool, and now it is shown to be the fastest one as well.
The hardware to perform this feat was certainly non-standard and out of the reach of most companies.
The database server ran on a Unisys ES7000/one Enterprise Server , with 32 socket dual core Intel® XeonTM 3.4 Ghz (7140M) processors , 256 GB RAM and 8 dual port 4Gbit HBA’s . The SQL Server data was stored on an EMC Clariion CX3-80 SAN with 165 (146 GB/15 krpm) spindles. The database server ran a pre-release build of SQL Server 2008 Enterprise Edition (V10.0.1300.4, built just before the “February 2008 CTP”) on the Windows Server 2008 x64 Datacenter Edition operating system.
Four servers acted as data sources, modeling the fact that data comes from a variety of systems in a modern enterprise. Each source server ran SSIS packages that sent data across the network to the database server. The source servers ran SSIS from SQL Server build V10.0.1300.4, on the Windows Server 2008 operating system. Source data came from flat files, as it was generated by DBGEN.
For the source servers, 4 Unisys ES3220L servers with Windows2008 x64 Enterprise Edition were used. Each server is equipped with 2 x 2.0GHz quad core Intel® processors, 4GB RAM, a dual port 4Gbit Emulex HBA and Intel PRO1000/PT network card. The source data was read from 2 x EMC Clariion CX600 SAN’s with 45 spindles each.
The white paper on the benchmark has not been released yet.
Gnip Case Study: Reliable and Scalable Access to Massive Data Streams from Multiple Sources
Terracotta 10x Faster Than Oracle Coherence
Hibernate without Database Bottlenecks
Why Should I Care About Terracotta?
Terracotta 2.7 Download now for scalability without tradeoffs
Terracotta is Scalability and Availability for Java Applications. It clusters the JVM itself, which dramatically simplifies development and reduces database dependency.
In this RubyFringe talk, Reginald Braithwaite writes Ruby code to read, write, and rewrite Ruby. Demos include extending Ruby with conditional expressions, call-by-name and more.
Aptana RadRails: An IDE for Rails Development by Javier Ramírez discusses the latest Aptana RadRails IDE, a development environment for creating Ruby on Rails applications.
Cliff Click discusses how to optimize generated bytecode for running on the JVM. Click analyzes and reports on several JVM languages and shows several places where they could increase performance.
Scott Ambler, Practice Lead for Agile Development at IBM, speaks on the current status of the Agile community and practices having a look at the perspective of the Agile’s future.
Dave Nicolette and Karl Scotland try to introduce non-technical managers to one of the most popular Agile development techniques: Test-Driven Development (TDD).
Smooks is best known for its transformation capabilities, but in this article Tom Fennelly describes how you can also use it for structured event streaming.
Successful architectures evolve over time to meet changing business requirements. Luke Hohmann presents how to collaborate with key members of your business to manage architectural changes.
In this article, Dr. Tobias Komischke explains how colors used in a GUI can influence our interaction with a computer and offers advice on using the appropriate colors for the interface.
No comments
Reply