Performance Problems Mar SQL Server 2008 Full Text Search
First, some background those of you not familiar with full text search. In general computer science terms, a full text search simply means you are search through all of the text in a document. The alternative to this is just looking at meta-data such as titles and keywords.
In terms of SQL Server, Full Text Search provides advanced search capabilities against text stored in a relational database or on the file system. Searches are not limited to literal strings, the application understands such as stemming. This allows a search for "swim" to also return "swims", "swimming", and "swam". It also has support for weighted searches, where certain words are more important than other words, and searches where two phrases must be near each other. Depending on the search criteria, the results can be a ranked.
In prior versions, Full Text Search was an external service that ran alongside the rest of SQL Server. With this design, data from tables and columns to be indexed have to be shipped from SQL Server to the Full Text Search service. Full Text Search catalogs are not backed up with the rest of a database and the two services cannot easily share memory and CPU resources.
To address these and other issues, SQL Server 2008 moved Full Text Search into the database. Now server resources can be dynamically managed by SQL Server itself, shifting memory and CPU allotments automatically as demand changes. Unfortunately, developers are running into some unintended consequences of this new design.
The specific problem they keep running into is transactions. Being a transactional database, SQL Server wants to abide by the rules of ACID at all times. This means rows, pages, or even whole tables can be locked while a search is being performed. For uncommon terms this isn't so bad, but as Brent Ozar explains, the wrong search can make things get messy.
If you do a full text search on Revisions and you include a common keyword like, say, SQL, you’re going to match tens of thousands of records. When I look at the query plans for these, I’m seeing 50-100k reads. Doing that inside a table that’s also getting heavy inserts - boom, transactional disaster.
Jeff Attwood continues,
We rely heavily on full-text search on stackoverflow.com, which worked amazingly well for us under SQL Server 2005. Looks like that’s no longer the case for SQL Server 2008, unfortunately.
Brent is following up with the SQL Server team on this, and they have a copy of our database to test against. […] Based on the stunningly poor SQL Server 2008 full text results so far, and the apparent architecture changes, I’m pessimistic that the SQL team will be able to do anything for us.
StackOverflow, the site they are referring to, isn't planning on using Full Text Search over the long run anyways. They have been planning on eventually migrating to competing search engine called Lucene.Net. But those of you who are planning on continuing to use Full Text Search should after upgrading from SQL Server 2005 to 2008 should test this area thoroughly.
Turned out the problem had 2 components - one is a query plan issue (can be worked around with a simple re-write), exposing a genuine bug that we're working on fixing in a QFE within a couple of weeks.
Even w/o the QFE we expect that the workaround should get StackOverflow performance back into milliseconds for the search queries.
-Denis, Microsoft SQL Server Full-Text Search team.
Re: StackOverflow update