InfoQ

News

Performance Problems Mar SQL Server 2008 Full Text Search

Posted by Jonathan Allen on Nov 02, 2008 06:06 AM

Community
.NET
Topics
SQL Server ,
Data Access
Tags
SQL Server 2008

First, some background those of you not familiar with full text search. In general computer science terms, a full text search simply means you are search through all of the text in a document. The alternative to this is just looking at meta-data such as titles and keywords.

In terms of SQL Server, Full Text Search provides advanced search capabilities against text stored in a relational database or on the file system. Searches are not limited to literal strings, the application understands such as stemming. This allows a search for "swim" to also return "swims", "swimming", and "swam". It also has support for weighted searches, where certain words are more important than other words, and searches where two phrases must be near each other. Depending on the search criteria, the results can be a ranked.

In prior versions, Full Text Search was an external service that ran alongside the rest of SQL Server. With this design, data from tables and columns to be indexed have to be shipped from SQL Server to the Full Text Search service. Full Text Search catalogs are not backed up with the rest of a database and the two services cannot easily share memory and CPU resources.

To address these and other issues, SQL Server 2008 moved Full Text Search into the database. Now server resources can be dynamically managed by SQL Server itself, shifting memory and CPU allotments automatically as demand changes. Unfortunately, developers are running into some unintended consequences of this new design.

The specific problem they keep running into is transactions. Being a transactional database, SQL Server wants to abide by the rules of ACID at all times. This means rows, pages, or even whole tables can be locked while a search is being performed. For uncommon terms this isn't so bad, but as Brent Ozar explains, the wrong search can make things get messy.

If you do a full text search on Revisions and you include a common keyword like, say, SQL, you’re going to match tens of thousands of records. When I look at the query plans for these, I’m seeing 50-100k reads. Doing that inside a table that’s also getting heavy inserts - boom, transactional disaster.

Jeff Attwood continues,

We rely heavily on full-text search on stackoverflow.com, which worked amazingly well for us under SQL Server 2005. Looks like that’s no longer the case for SQL Server 2008, unfortunately.

Brent is following up with the SQL Server team on this, and they have a copy of our database to test against. […] Based on the stunningly poor SQL Server 2008 full text results so far, and the apparent architecture changes, I’m pessimistic that the SQL team will be able to do anything for us.

StackOverflow, the site they are referring to, isn't planning on using Full Text Search over the long run anyways. They have been planning on eventually migrating to competing search engine called Lucene.Net. But those of you who are planning on continuing to use Full Text Search should after upgrading from SQL Server 2005 to 2008 should test this area thoroughly.

StackOverflow update by Denis Churin Posted Nov 7, 2008 1:27 PM
Re: StackOverflow update by Dmitriy Zolotarjov Posted Jun 30, 2009 6:56 AM
  1. Back to top

    StackOverflow update

    Nov 7, 2008 1:27 PM by Denis Churin

    We (SQL Server team) have been working with StackOverflow and identified the issue they've been hitting. Turned out the problem had 2 components - one is a query plan issue (can be worked around with a simple re-write), exposing a genuine bug that we're working on fixing in a QFE within a couple of weeks. Even w/o the QFE we expect that the workaround should get StackOverflow performance back into milliseconds for the search queries. -Denis, Microsoft SQL Server Full-Text Search team.

  2. Back to top

    Re: StackOverflow update

    Jun 30, 2009 6:56 AM by Dmitriy Zolotarjov

    Hello, Mr. Churin! I Am sorry for my bad English and the message not on a theme. I search for the relative. His name is Denis Viktorovich Churin, it has finished МФТИ and has moved from Moscow to Seattle where probably and lives. It the programmer. Here my e-mail: 43zdv@rambler.ru. ICQ: 476467654.

Educational Content

Bindings, Platforms, and Innovation

This presentation focuses on the Internet and separating myth from fact, history from the future, and the mundane from the imaginative. Bob Frankston presents a vision of what could and should be.

Orchestrating Long Running Activities with JBoss / JBPM

This article explores the use of JBoss and jBPM to implement design solutions that effectively address the issue of orchestrating long running activities.

Neo4j - The Benefits of Graph Databases

This presentation covers the use of graph databases as an optimal solution for data that is difficult to fit in static tables, rapidly evolving data or data that has a lot of optional attributes.

Realistic about Risk: Software development with Real Options

This session introduces Real Options and shows how it can help in running your project. Real Options is a decision-making process that can be used to manage risk.

Communication Flexibility Using Bindings

This article discusses the use of bindings on services and references (including the instance of non-configured bindings) as the means to implement SCA communications in a Web and SOA environment.

Writing DSLs in Groovy

After a short introduction to DSLs, Scott Davis plays with the keyboard showing how to approach the creation of a DSL by typing working snippets of Groovy code that get executed.

Scaling Agile with C/ALM (Collaborative Application Lifecycle Management)

IBM Rational and InfoQ present, Scaling Agile with C/ALM, an eBook showing organizations how to become “finely tuned software delivery machines” by enabling team integration and scaling.

Concurrent Programming with Microsoft F#

Amanda Laucher presents a real life enterprise application written in F#. She shows actual code snippets, explaining design decisions and suggesting how to use some of the F# constructs.