InfoQ

News

Performance Problems Mar SQL Server 2008 Full Text Search

Posted by Jonathan Allen on Nov 02, 2008

Community
.NET
Topics
SQL Server ,
Data Access
Tags
SQL Server 2008

First, some background those of you not familiar with full text search. In general computer science terms, a full text search simply means you are search through all of the text in a document. The alternative to this is just looking at meta-data such as titles and keywords.

In terms of SQL Server, Full Text Search provides advanced search capabilities against text stored in a relational database or on the file system. Searches are not limited to literal strings, the application understands such as stemming. This allows a search for "swim" to also return "swims", "swimming", and "swam". It also has support for weighted searches, where certain words are more important than other words, and searches where two phrases must be near each other. Depending on the search criteria, the results can be a ranked.

In prior versions, Full Text Search was an external service that ran alongside the rest of SQL Server. With this design, data from tables and columns to be indexed have to be shipped from SQL Server to the Full Text Search service. Full Text Search catalogs are not backed up with the rest of a database and the two services cannot easily share memory and CPU resources.

To address these and other issues, SQL Server 2008 moved Full Text Search into the database. Now server resources can be dynamically managed by SQL Server itself, shifting memory and CPU allotments automatically as demand changes. Unfortunately, developers are running into some unintended consequences of this new design.

The specific problem they keep running into is transactions. Being a transactional database, SQL Server wants to abide by the rules of ACID at all times. This means rows, pages, or even whole tables can be locked while a search is being performed. For uncommon terms this isn't so bad, but as Brent Ozar explains, the wrong search can make things get messy.

If you do a full text search on Revisions and you include a common keyword like, say, SQL, you’re going to match tens of thousands of records. When I look at the query plans for these, I’m seeing 50-100k reads. Doing that inside a table that’s also getting heavy inserts - boom, transactional disaster.

Jeff Attwood continues,

We rely heavily on full-text search on stackoverflow.com, which worked amazingly well for us under SQL Server 2005. Looks like that’s no longer the case for SQL Server 2008, unfortunately.

Brent is following up with the SQL Server team on this, and they have a copy of our database to test against. […] Based on the stunningly poor SQL Server 2008 full text results so far, and the apparent architecture changes, I’m pessimistic that the SQL team will be able to do anything for us.

StackOverflow, the site they are referring to, isn't planning on using Full Text Search over the long run anyways. They have been planning on eventually migrating to competing search engine called Lucene.Net. But those of you who are planning on continuing to use Full Text Search should after upgrading from SQL Server 2005 to 2008 should test this area thoroughly.

StackOverflow update by Denis Churin Posted Nov 7, 2008 1:27 PM
Re: StackOverflow update by Dmitriy Zolotarjov Posted Jun 30, 2009 6:56 AM
  1. Back to top

    StackOverflow update

    Nov 7, 2008 1:27 PM by Denis Churin

    We (SQL Server team) have been working with StackOverflow and identified the issue they've been hitting.
    Turned out the problem had 2 components - one is a query plan issue (can be worked around with a simple re-write), exposing a genuine bug that we're working on fixing in a QFE within a couple of weeks.
    Even w/o the QFE we expect that the workaround should get StackOverflow performance back into milliseconds for the search queries.
    -Denis, Microsoft SQL Server Full-Text Search team.

  2. Back to top

    Re: StackOverflow update

    Jun 30, 2009 6:56 AM by Dmitriy Zolotarjov

    Hello, Mr. Churin! I Am sorry for my bad English and the message not on a theme. I search for the relative. His name is Denis Viktorovich Churin, it has finished МФТИ and has moved from Moscow to Seattle where probably and lives. It the programmer. Here my e-mail: 43zdv@rambler.ru. ICQ: 476467654.

Educational Content

Brian Marick on 4 Challenges and 5 Guiding Values of Agile Software Development

Brian Marick takes us through a quick tour of the most important values and challenges to adopting Agile successfully (they aren't the typical challenges and values we hear in the community).

Are You a Software Architect?

The line between development and architecture is tricky. Does it exist at all? Is an ivory tower actually needed? There's a balance in the middle, but how do you move from developer to architect?

Agile – A Way of Life and Pragmatic Use of Authority

The word 'authority' sometimes produces an allergic response in hard-line agilists. Freedom and authority – both are bad if misused and both are good if used in right spirit for a noble cause.

Getting Started with Grails, Second Edition

"Getting Started with Grails" brings you up to speed on this modern web framework. Companies as varied as LinkedIn, Wired, and Taco Bell are all using Grails. Are you ready to get started as well?

Using ITIL V3 as a Foundation for SOA Governance

Those familiar with only ITIL V2 often scoff at the thought that ITIL could serve as a governance framework for SOA. With ITIL V3, the focus of the framework shifted towards service-orientation.

Adrian Colyer on AspectJ, tc Server and dm Server

SpringSource CTO Adrian Colyer discusses AspectJ, SpringSource's dm Server and tc Server products, OSGi and Scrum.

Adam Wiggins on Heroku

Heroku's Adam Wiggins talks about Rails, Background Jobs, Add-Ons, Ruby, and how Heroku manages to work around Ruby's inefficiencies using Erlang and other languages.

SOA as an Architectural Pattern: Best Practices in Software Architecture

For Grady Booch the foundation of a good architecture is patterns, SOA being just one of many patterns. In this Second Life presentation, Booch attempts to bring more clarity on what architecture is.