Monday, August 10, 2009

Introduction to Sphinx Search

About Sphinx

Sphinx Search??? what is this????

might be the question in your mind!
Let me tell you that if you are a user of some of the finest search engines on the internet then you have used Sphinx for shure.
Sphinx is an acronym of SQL Phrase Index was created by Andrew Aksyonoff.
Sphinx search is a full text search engine.Basically Sphinx search is a standalone software designed to connect to a database. Currently Sphinx Search works with many Data Source Drivers (source from where Sphinx gets data from) like MySQL, Postgre-SQL, XML-PIPE,XML-PIPE2.

Search API is natively ported to PHP, Python, Perl, Ruby, Java, and also available as a pluggable MySQL storage engine. API is very lightweight so porting it to new language is known to take a few hours.
This blog is mainly dedicated to Sphinx interaction with MySQL database. and the SphinxSE which is the storage engine for MySQL.

Sphinx Features

  • high indexing speed (upto 10 MB/sec)
  • high search speed (avg query is under 0.1 sec on 2-4 GB text collections)
  • high scalability (upto 100 GB of text, upto 100 M documents on a single CPU)
  • provides good relevance ranking through combination of phrase proximity ranking and statistical (BM25) ranking (Phrase ranking is given on phrase match and BM25 is based on frequency of concurrence of the keyword)
  • provides distributed searching capabilities
  • provides searching from within MySQL through pluggable storage engine(SphinxSE)
  • supports boolean, phrase, and word proximity queries
  • supports multiple full-text fields per document(document in Sphinx term is for a single database table row) (upto 32 by default)
  • supports multiple additional attributes per document (ie. groups, timestamps, etc)
  • supports stopwords.
  • supports both single-byte encodings and UTF-8.
  • supports English stemming, Russian stemming, and Soundex for morphology
  • supports MySQL natively (MyISAM and InnoDB tables are both supported)
  • supports PostgreSQL natively.

*NOTE: Features listed in Bold are some of the Salient features of Sphinx Search.

From Where To Download Sphinx

Sphinx is available through its official Web site at http://www.sphinxsearch.com/.


Currently, Sphinx distribution tarball includes the following software:

  • Indexer: utility which creates fulltext indexes
  • Search: a simple command-line (CLI) test utility which searches through fulltext indexes
  • Searchd: a daemon which enables external software (eg. Web applications) to search through fulltext indexes
  • sphinxapi: a set of searchd client API libraries for popular Web scripting languages (PHP, Python, Perl, Ruby).
  • spelldump: a simple command-line tool to extract the items from an ispell or MySpell (as bundled with OpenOffice) format dictionary to help customize your index, for use with
  • indextool: an utility to dump miscellaneous debug information about the index, added in version 0.9.9-rc2.
In this blog we will be discussing only the first three parts of sphinx which are indexer,search,searchd.

No comments:

Post a Comment