InfoQ

InfoQ

News

My Bookmarks

Login or Register to enable bookmarks for unlimited time.

The content has been bookmarked!

There was an error bookmarking this content! Please retry.

Performance Problems Mar SQL Server 2008 Full Text Search

Posted by Jonathan Allen on Nov 02, 2008

Sections
Architecture & Design,
Development,
Operations & Infrastructure
Topics
.NET ,
Data Access ,
SQL Server
Tags
SQL Server 2008

First, some background those of you not familiar with full text search. In general computer science terms, a full text search simply means you are search through all of the text in a document. The alternative to this is just looking at meta-data such as titles and keywords.

In terms of SQL Server, Full Text Search provides advanced search capabilities against text stored in a relational database or on the file system. Searches are not limited to literal strings, the application understands such as stemming. This allows a search for "swim" to also return "swims", "swimming", and "swam". It also has support for weighted searches, where certain words are more important than other words, and searches where two phrases must be near each other. Depending on the search criteria, the results can be a ranked.

In prior versions, Full Text Search was an external service that ran alongside the rest of SQL Server. With this design, data from tables and columns to be indexed have to be shipped from SQL Server to the Full Text Search service. Full Text Search catalogs are not backed up with the rest of a database and the two services cannot easily share memory and CPU resources.

To address these and other issues, SQL Server 2008 moved Full Text Search into the database. Now server resources can be dynamically managed by SQL Server itself, shifting memory and CPU allotments automatically as demand changes. Unfortunately, developers are running into some unintended consequences of this new design.

The specific problem they keep running into is transactions. Being a transactional database, SQL Server wants to abide by the rules of ACID at all times. This means rows, pages, or even whole tables can be locked while a search is being performed. For uncommon terms this isn't so bad, but as Brent Ozar explains, the wrong search can make things get messy.

If you do a full text search on Revisions and you include a common keyword like, say, SQL, you’re going to match tens of thousands of records. When I look at the query plans for these, I’m seeing 50-100k reads. Doing that inside a table that’s also getting heavy inserts - boom, transactional disaster.

Jeff Attwood continues,

We rely heavily on full-text search on stackoverflow.com, which worked amazingly well for us under SQL Server 2005. Looks like that’s no longer the case for SQL Server 2008, unfortunately.

Brent is following up with the SQL Server team on this, and they have a copy of our database to test against. […] Based on the stunningly poor SQL Server 2008 full text results so far, and the apparent architecture changes, I’m pessimistic that the SQL team will be able to do anything for us.

StackOverflow, the site they are referring to, isn't planning on using Full Text Search over the long run anyways. They have been planning on eventually migrating to competing search engine called Lucene.Net. But those of you who are planning on continuing to use Full Text Search should after upgrading from SQL Server 2005 to 2008 should test this area thoroughly.

StackOverflow update by Denis Churin Posted
Re: StackOverflow update by Dmitriy Zolotarjov Posted
  1. Back to top

    StackOverflow update

    by Denis Churin

    We (SQL Server team) have been working with StackOverflow and identified the issue they've been hitting.
    Turned out the problem had 2 components - one is a query plan issue (can be worked around with a simple re-write), exposing a genuine bug that we're working on fixing in a QFE within a couple of weeks.
    Even w/o the QFE we expect that the workaround should get StackOverflow performance back into milliseconds for the search queries.
    -Denis, Microsoft SQL Server Full-Text Search team.

  2. Back to top

    Re: StackOverflow update

    by Dmitriy Zolotarjov

    Hello, Mr. Churin! I Am sorry for my bad English and the message not on a theme. I search for the relative. His name is Denis Viktorovich Churin, it has finished МФТИ and has moved from Moscow to Seattle where probably and lives. It the programmer. Here my e-mail: 43zdv@rambler.ru. ICQ: 476467654.

Educational Content

New-age Transactional Systems - Not Your Grandpa's OLTP

John Hugg discusses high volume transaction processing applications with high and low frequency profiles, and how VoltDB can be used for that purpose.

Cool Code

Kevlin Henney examines code samples to see what can be learned from them starting from the premise that one won’t write great code unless he knows how to read it.

Collaboration: At the Extremities of Extreme

Jason Ayers share the observations he made watching a team of developers collaborating in real time on the same code base, pushing XP, pair programming and continuous integration to their extremes.

Yesod Web Framework

Michael Snoyman presents Yesod, a web framework written in Haskell and containing a web server, templating, ORM, libraries (templating, gravatar, etc.).

Transactions without Transactions

Richard Kreuter and Kyle Banker on how to avoid classical RDBMS transactional systems by using compensation mechanisms, transactional messaging or transactional procedures.

Attila Szegedi on JVM and GC Performance Tuning at Twitter

Attila Szegedi talks about performance tuning Java and Scala programs at Twitter: how to approach GC problems, the importance of asynchronous I/O, when to use MySQL/Cassandra/Redis, and much more.

10 tips on how to prevent business value risk

One category of risk that project teams need to ensure they address is business value failure – delivering a product that fails to provide value for the business investor.

Interview: Software Systems Architecture: Working With Stakeholders Using Viewpoints and Perspectives

InfoQ spoke to the authors of Software Systems Architecture on a couple of new topics, the System Context viewpoint and Agile, which have been added to the second edition.