New-age Transactional Systems - Not Your Grandpa's OLTP
John Hugg discusses high volume transaction processing applications with high and low frequency profiles, and how VoltDB can be used for that purpose.
The content has been bookmarked!
There was an error bookmarking this content! Please retry.
Posted by Jonathan Allen on Jan 07, 2008
For the most part LINQ works very much like SQL. Sources, joins, selects, and where clauses are all pretty standard fare. The Group/By/Into clause is where this breaks down. Unlike SQL, which always returns a rectangular dataset, LINQ is capable of returning hierarchical data.
An example of this would be grouping customers by country and city. In SQL this would require either manually grouping on the client-side or performing 1 + N + (N*M) queries: one to get the countries, one for the cities in each country, and one for each customer list.
This can all be done with a single LINQ query by using a series of sub queries. However, the query gets increasing complex. Mitsu demonstrates:
var q = from c in db.Customers group c by c.Country into g select new { g.Key, Count = g.Count(), SubGroups = from c in g group c by c.City into g2 select g2};
Demonstrating the flexibility the LINQ framework, Mitsu reduces it down this single line:
var result = customers.GroupByMany(c => c.Country, c => c.City);
Mitsu did this in a general fashion suitable for any LINQ query. You can see the source code and an explanation of how it works on his blog.
Using Drools? See what you're missing! Get the Power of Drools with the Assurance of Red Hat
Fair Trade Software Licensing - A Guide to Neo4j Licensing Options
Complimentary Gartner (Hype Cycle for Cloud Security) Report
It can be done in 1 query?
select count(customerid) as numcustomer, country, city
from customers
group by country, city
order by country asc, numcustomer desc
Then all you need is a reader which reads it in, one row at a time. You have to traverse the list just once.
Having it projected for you in a hierarchy is great of course, but I find it misleading you'd need 1+N+(N*M) queries, which is simply nonsense.
The downside of this particular linq construct is that the line between what's executed in the DB and what's executed in memory is blurred. Can you tell which group by is ran in the DB and which one in memory? IMHO it's better to keep things separated, so a developer doesn't fall into the trap where s/he thinks the query runs on the DB but actually it's ran in memory...
Heh, actually, I ran the Linq query above using Linq to Sql and I get... I myriad of queries! Not just 1, but a lot.
This is obvious because Linq to Sql can't merge hierarchical resultsets in memory, it does that by postponing queries for hierarchical data till the parent is evaluated (the customer) and then fetches for that row the child rows.
So, although the idea is OK, don't be fooled by the MS marketing machine: unless the end result is 1 SQL query as I described above OR 2 (one for parents and 1 for children which are merged at runtime), you WILL get a lot of queries, and this will bring down performance tremendously.
Indeed, even the old dataset methods had better ways of doing it than LINQ to SQL, if what you say is true (I didn't play much with LINQ to SQL, but I thought it could actually handle this particular scenario correctly... I'm probably mistaken).
Seriously though, LINQ is getting to be pretty disapointing... LINQ to SQL is a joke (though it is nice to have around for simpler applications, being built in all). Using LINQs on collections is much more viable, but even that is inefficient as hell, and anywhere where you need performance, you're better off doing things with the extension methods on your own...
I guess LINQ to SQL is like everything else in .NET, an entry level tool, in the same way the basic toolbox controls for winforms and asp.net are just entry level stuff, and when you get serious you roll up your owns or use third party controls like Component Art, Telerik, DevExpress... Same deal now. Use LINQ to SQL for prototypes or entry level projects, and use something like LLBLGEN when you get serious.
At least making a LINQ provider doesn't look TOO hard.
It can be done in 1 query?
select count(customerid) as numcustomer, country, city
from customers
group by country, city
order by country asc, numcustomer desc
That won't return all of the customer records, a goal of the above query.
Also, you still have to group all the cities in each country.The downside of this particular linq construct is that the line between what's executed in the DB and what's executed in memory is blurred. Can you tell which group by is ran in the DB and which one in memory? IMHO it's better to keep things separated, so a developer doesn't fall into the trap where s/he thinks the query runs on the DB but actually it's ran in memory...
I do believe that it is going to be a major performance problem down the road. As you pointed out, it can be hard to separate database queries from in-memory queries.
Fortunately it is quite easy to figure out what LINQ to SQL is doing. You just need to pipe the DataContext.Log property to a stream like Console.Out.
If you really want to make sure the query is run in-memory, you can call .ToList or .ToArray. The first query can still be in the database, but at least you know everything against the List or Array will be local.
I think the correct way to handle that would be to run a single query that gets back all the information you need, and turn it into a List. Then run all your grouping queries against that in-memory list.
That said, doing it the lazy, one-query way is going to be really, really tempting.
There are ways to improve in-memory queries. For example, there is "i4o" for indexing in-memory objects. www.infoq.com/news/2007/04/i40-intro
That won't return all of the customer records, a goal of the above query.
No, the initial linq query also doesn't fetch the customer rows. This is logical, as 'c' isn't in scope after the 'into', also because if you would want to fetch all the customer data, you'd have a groupby query which had to group on all customer fields.Also, you still have to group all the cities in each country.
No :) You can use a loop which traverses the results in 1 go and builds the same hierarchy using an O(n) algorithm. The count per city is already there, so all you have to do is add them per country, which can be done when you loop over the rows. You just have to compare previous with current row if it is a new country, but that's it.
Doing everything in memory is one way to do it, but it's still worse than O(n) because of the nature how the linq query has to be constructed: the nested grouping for the city is in the projection, so you can't execute that one AS WELL when traversing the list for grouping the city. If you want to do that internally in linq to objects, you've to analyze the whole query, which looks easy but is actually quite complex.
John Hugg discusses high volume transaction processing applications with high and low frequency profiles, and how VoltDB can be used for that purpose.
Kevlin Henney examines code samples to see what can be learned from them starting from the premise that one won’t write great code unless he knows how to read it.
Jason Ayers share the observations he made watching a team of developers collaborating in real time on the same code base, pushing XP, pair programming and continuous integration to their extremes.
Michael Snoyman presents Yesod, a web framework written in Haskell and containing a web server, templating, ORM, libraries (templating, gravatar, etc.).
Richard Kreuter and Kyle Banker on how to avoid classical RDBMS transactional systems by using compensation mechanisms, transactional messaging or transactional procedures.
Attila Szegedi talks about performance tuning Java and Scala programs at Twitter: how to approach GC problems, the importance of asynchronous I/O, when to use MySQL/Cassandra/Redis, and much more.
One category of risk that project teams need to ensure they address is business value failure – delivering a product that fails to provide value for the business investor.
InfoQ spoke to the authors of Software Systems Architecture on a couple of new topics, the System Context viewpoint and Agile, which have been added to the second edition.
8 comments
Watch Thread Reply