Thursday, January 14, 2010

State of MySQL Search

Recently I took part in the first meeting of the MySQL Search Team. You can read more about the team here.

My task was to represent the customer interests regarding fulltext search. My report had a few main points which I made:

  • MySQL Fulltext Search (FTS) has several big issues with it compared to other solutions. I have listed these in the order of importance that I see. Numbers 1 and 2 are very close in importance however, so I can see those going either way.

    1. MyISAM only: Limits the usefulness in many very busy environments due to table level locking and crash safety

    2. Performance: The design of FTS makes performance hurt as the index size exceeds RAM. Combining multiple second queries with MyISAM table level locking is a recipe for failure.

    3. Features: There are a lot of features lacking in FTS that exist in other products. Stemming, synonyms, dictionary usage, CJK support, etc... are all potential avenues of investigation.

  • Third-party solutions are commonly used for searching data in MySQL, but does require extra work to deploy or implement. Tools like Lucene, Sphinx, etc... are very good at what they do, and are normally faster and more fully featured compared to MySQL FTS. The big issue is that normally there is some effort involved in implementing these, so the ease of use in implementing and maintaining is lower than MySQL FTS.



So the forward looking question is how to combine the best attributes of these two solutions to get one easy to deploy and use robust solution. Personally, I can see a few possible routes forward:
  • Improve MySQL FTS! There is a lot of research and known best practices that exist for search. This would leverage those in order to make the MySQL solution more robust and featureful. From the current state of FTS, it would be a pretty decent amount of work, but could have good returns.
  • Improve the glue code between the third parties and MySQL. This could take a variety of forms. One idea I had was in the the form of index plugins where as rows are inserted/updated/deleted, the external datastores can be automatically updated as well. This is very similar to triggers, but is a bit different since there can be more integration and easier deployment.
It's possible the best solution may be a mix of the two options to allow for more flexibility or even something completely new that I haven't thought of!

Regardless, I am happy that people at MySQL are finally taking a look at the poor state of the MySQL fulltext search environment and trying to figure out new good solutions for this space.