New-age Transactional Systems - Not Your Grandpa's OLTP
John Hugg discusses high volume transaction processing applications with high and low frequency profiles, and how VoltDB can be used for that purpose.
The content has been bookmarked!
There was an error bookmarking this content! Please retry.
Posted by Jonathan Allen on Nov 02, 2008
First, some background those of you not familiar with full text search. In general computer science terms, a full text search simply means you are search through all of the text in a document. The alternative to this is just looking at meta-data such as titles and keywords.
In terms of SQL Server, Full Text Search provides advanced search capabilities against text stored in a relational database or on the file system. Searches are not limited to literal strings, the application understands such as stemming. This allows a search for "swim" to also return "swims", "swimming", and "swam". It also has support for weighted searches, where certain words are more important than other words, and searches where two phrases must be near each other. Depending on the search criteria, the results can be a ranked.
In prior versions, Full Text Search was an external service that ran alongside the rest of SQL Server. With this design, data from tables and columns to be indexed have to be shipped from SQL Server to the Full Text Search service. Full Text Search catalogs are not backed up with the rest of a database and the two services cannot easily share memory and CPU resources.
To address these and other issues, SQL Server 2008 moved Full Text Search into the database. Now server resources can be dynamically managed by SQL Server itself, shifting memory and CPU allotments automatically as demand changes. Unfortunately, developers are running into some unintended consequences of this new design.
The specific problem they keep running into is transactions. Being a transactional database, SQL Server wants to abide by the rules of ACID at all times. This means rows, pages, or even whole tables can be locked while a search is being performed. For uncommon terms this isn't so bad, but as Brent Ozar explains, the wrong search can make things get messy.
If you do a full text search on Revisions and you include a common keyword like, say, SQL, you’re going to match tens of thousands of records. When I look at the query plans for these, I’m seeing 50-100k reads. Doing that inside a table that’s also getting heavy inserts - boom, transactional disaster.
Jeff Attwood continues,
We rely heavily on full-text search on stackoverflow.com, which worked amazingly well for us under SQL Server 2005. Looks like that’s no longer the case for SQL Server 2008, unfortunately.
Brent is following up with the SQL Server team on this, and they have a copy of our database to test against. […] Based on the stunningly poor SQL Server 2008 full text results so far, and the apparent architecture changes, I’m pessimistic that the SQL team will be able to do anything for us.
StackOverflow, the site they are referring to, isn't planning on using Full Text Search over the long run anyways. They have been planning on eventually migrating to competing search engine called Lucene.Net. But those of you who are planning on continuing to use Full Text Search should after upgrading from SQL Server 2005 to 2008 should test this area thoroughly.
Using Drools? See what you're missing! Get the Power of Drools with the Assurance of Red Hat
Fair Trade Software Licensing - A Guide to Neo4j Licensing Options
We (SQL Server team) have been working with StackOverflow and identified the issue they've been hitting.
Turned out the problem had 2 components - one is a query plan issue (can be worked around with a simple re-write), exposing a genuine bug that we're working on fixing in a QFE within a couple of weeks.
Even w/o the QFE we expect that the workaround should get StackOverflow performance back into milliseconds for the search queries.
-Denis, Microsoft SQL Server Full-Text Search team.
Hello, Mr. Churin! I Am sorry for my bad English and the message not on a theme. I search for the relative. His name is Denis Viktorovich Churin, it has finished МФТИ and has moved from Moscow to Seattle where probably and lives. It the programmer. Here my e-mail: 43zdv@rambler.ru. ICQ: 476467654.
John Hugg discusses high volume transaction processing applications with high and low frequency profiles, and how VoltDB can be used for that purpose.
Kevlin Henney examines code samples to see what can be learned from them starting from the premise that one won’t write great code unless he knows how to read it.
Jason Ayers share the observations he made watching a team of developers collaborating in real time on the same code base, pushing XP, pair programming and continuous integration to their extremes.
Michael Snoyman presents Yesod, a web framework written in Haskell and containing a web server, templating, ORM, libraries (templating, gravatar, etc.).
Richard Kreuter and Kyle Banker on how to avoid classical RDBMS transactional systems by using compensation mechanisms, transactional messaging or transactional procedures.
Attila Szegedi talks about performance tuning Java and Scala programs at Twitter: how to approach GC problems, the importance of asynchronous I/O, when to use MySQL/Cassandra/Redis, and much more.
One category of risk that project teams need to ensure they address is business value failure – delivering a product that fails to provide value for the business investor.
InfoQ spoke to the authors of Software Systems Architecture on a couple of new topics, the System Context viewpoint and Agile, which have been added to the second edition.
2 comments
Watch Thread Reply