BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage Data Warehousing Content on InfoQ

  • Better Developer Experience in Version 1.5 of the Data Access Framework MetaModel

    Eobject.org's open-source Java framework MetaModel implements a unified API for the access, exploration, and query of different datastores. Eobjects.org, both a website and an open source software organization dedicated to "the development of Open Source software related to Business Intelligence and Data Warehousing", has recently published version 1.5 of MetaModel.

  • A Case for Graph Databases

    We talk with Daniel Kirstenpfad, founder and CTO of sones GmbH, about Graph Databases and how they can better model some types of data such as relations in a social networking application. A graph database can offer performance benefits over other types of databases because they explicitly represent a graph and are organized to have index free adjacency.

  • LinkedIn's Data Infrastructure

    Jay Kreps of LinkedIn presented some informative details of how they process data at the recent Hadoop Summit. Kreps described how LinkedIn crunches 120 billion relationships per day and blends large scale data computation with high volume, low latency site serving.

  • Facebook on Hadoop, Hive, HBase, and A/B Testing

    The Hadoop Summit of 2010 included presentations from a number of large scale users of Hadoop and related technologies. Notably, Facebook presented a keynote and details information about their use of Hive for analytics. Mike Schroepfer, Facebook's VP of Engineering delivered a keynote describing the scale of their data processing with Hadoop.

  • Emergent Data Architectures Highlights From GigaOm Structure Conference

    The GigaOM Stucture conference a couple of weeks ago addressed many areas of cloud computing. One of the key themes of the event was the emergence of new data architectures. Throughout the panels, interviews, and presentations many speakers identified significant changes in how data gets handled that will be coming.

  • Mahout 0.3: Open Source Machine Learning

    The need for machine-learning techniques like clustering, collaborative filtering, and categorization has steadily increased the last decade along with the number of solutions needing quick and efficient algorithms to transform vast amounts of raw data into relevant information. Apache Mount 0.3 has been announced on March, adding more functionality, stability and performance.

  • Information Can Be Sold and Bought in “Dallas”

    Microsoft’s service codename “Dallas” is an information marketplace bringing together data, imagery and service providers and their consumers facilitating information exchange through a single point of access.

  • MapPoint Add-In For SQL Server Released

    Microsoft released a free MapPoint 2009 Add-In for SQL Server 2008 spatial data. The add-in can be used with MapPoint to build map graphics against queries on SQL Server 2008 spatial geography columns.

  • Is the NoSQL Meeting Announcing the End of the RDBMS Era?

    The NoSQL meeting tried to raise the awareness towards the opportunity of using non-relational databases which promise to be cheaper, simpler to administer and maintain, and offering superior scalability. Michael Stonebraker, co-creator of Ingres and Postgres, thinks that the end of RDBMS era is close, while others think that we are not there yet.

  • SQL Server 2008 Major Updates

    Microsoft has released significant updates for SQL Server 2008, including 2008 Service Pack 1, Express Edition Service Pack 1, large Feature Pack, Upgrade Advisor, SRS Report Builder 2.0, SharePoint integration, and data mining Office add-ins.

  • Event Stream Processing: Scalable Alternative to Data Warehouses?

    Dan Pritchett suggests that analyzing streams of events using Event Stream Processor could be an interesting alternative solution to data warehousing applications, which have, in his opinion, important downsides in terms of cost, scalability and reactivity.

  • Introducing the Microsoft Sync Framework (Again)

    Back in August, we reported on the release of the Microsoft Sync Framework. Strangely enough, they recently have released it again. In honor of this bizarre event, we are following up with what information we have on this muddled framework.

  • Is Enterprise Data Management the Third Face of the SOA/BPM Coin?

    Fred Cummins, an EDS fellow, and SOA veteran, wrote an essay last week on "Data Management for SOA". He is looking at how some of the key tenets of service design ("loose coupling" and "autonomy") relate to enterprise data in the context of achieving reuse and enabling change.

  • Agile Business Intelligence

    Large centrally designed BI systems often don't meet the expectations of their end users. In this article at Cutter IT journal Scott Ambler has written about using Agile methods to help meet the user's expectations and deliver business value quickly.

  • Microsoft Claims to Hold the ETL Record at 1 TB in 30 Minutes

    Microsoft and Unisys are claiming that they hold the record for loading information into a relational database. The unofficial benchmark was 1 TB of TPC-H data moved in under 30 minutes using an Extract, Transform, and Load (ETL) tool. The previous record for that volume was 45 minutes and was held by Informatica.

BT