Friday, January 8, 2010

Some Bugs That I have come Across While Using Sphinx

I have been using sphinx now for more than 10 months now and almost on a daily basis i develop new things based on sphinx search.During my recent development i have come across some of the unknown or unmentioned bugs in sphinx search.

Keyword Length has a limit

say for example in my case i had to search in my index file for keyword pizza hut in some fields like name,reviews etc and a bunch of zip codes in zip code field, these bunch of zip codes can be anything around ten to hundreds of then depending on the distance a user selects for a search. now what i did is i wrote a query like this:

@(name,review)pizza hut @(zip_code)60606|60605|........ and so on

but what i found out is that when the collection of zip codes exceeds above 90 sphinx gives me empty set.which was a big problem in my case.

so what i came up as a solution is that i divided the zip codes into groups of 50 and then queried sphinx multiple times and got the result from sphinx and then combined those results for further processing of data...

Saturday, August 22, 2009

Presentation on Sphinx






























Monday, August 10, 2009

Sphinx Installation

Installing Sphinx on Linux

  1. UnZip the Zip File and go to the sphinx subdirectory:

    $ tar xzvf sphinx-0.9.8.tar.gz
    $ cd sphinx

  2. Run the configuration program:

    $ ./configure

    There's a number of options to configure. The complete listing may be obtained by using --help switch. The most important ones are:

    • --prefix, which specifies where to install Sphinx; such as --prefix=/usr/local/sphinx (all of the examples use this prefix)
    • --with-mysql, which specifies where to look for MySQL include and library files, if auto-detection fails;
    • --with-pgsql, which specifies where to look for PostgreSQL include and library files.

  3. Build the binaries:

    $ make

  4. Install the binaries in the directory of your choice: (defaults to /usr/local/bin/ on *nix systems, but is overridden with configure --prefix)

    $ make install


Installing Sphinx On Windows


Installing Sphinx on a Windows server is often easier than installing on a Linux environment; unless you are preparing code patches, you can use the pre-compiled binary files from the Downloads area on the website.
  1. Extract .zip file you have downloaded - sphinx-0.9.8-win32.zip (or sphinx-0.9.8-win32-pgsql.zip if you need PostgresSQL support as well.)

    For the remainder of this guide, we will assume that the folders are unzipped into C:\Sphinx, such that searchd.exe can be found in C:\Sphinx\bin\searchd.exe. If you decide to use any different location for the folders or configuration file, please change it accordingly.

  2. Install the searchd system as a Windows service:

    C:\Sphinx> C:\Sphinx\searchd --install --config C:\Sphinx\sphinx.conf --servicename SphinxSearch

  3. The searchd service will now be listed in the Services panel within the Management Console, available from Administrative Tools. It will not have been started, as you will need to configure it and build your indexes with indexer before starting the service.

Introduction to Sphinx Search

About Sphinx

Sphinx Search??? what is this????

might be the question in your mind!
Let me tell you that if you are a user of some of the finest search engines on the internet then you have used Sphinx for shure.
Sphinx is an acronym of SQL Phrase Index was created by Andrew Aksyonoff.
Sphinx search is a full text search engine.Basically Sphinx search is a standalone software designed to connect to a database. Currently Sphinx Search works with many Data Source Drivers (source from where Sphinx gets data from) like MySQL, Postgre-SQL, XML-PIPE,XML-PIPE2.

Search API is natively ported to PHP, Python, Perl, Ruby, Java, and also available as a pluggable MySQL storage engine. API is very lightweight so porting it to new language is known to take a few hours.
This blog is mainly dedicated to Sphinx interaction with MySQL database. and the SphinxSE which is the storage engine for MySQL.

Sphinx Features

  • high indexing speed (upto 10 MB/sec)
  • high search speed (avg query is under 0.1 sec on 2-4 GB text collections)
  • high scalability (upto 100 GB of text, upto 100 M documents on a single CPU)
  • provides good relevance ranking through combination of phrase proximity ranking and statistical (BM25) ranking (Phrase ranking is given on phrase match and BM25 is based on frequency of concurrence of the keyword)
  • provides distributed searching capabilities
  • provides searching from within MySQL through pluggable storage engine(SphinxSE)
  • supports boolean, phrase, and word proximity queries
  • supports multiple full-text fields per document(document in Sphinx term is for a single database table row) (upto 32 by default)
  • supports multiple additional attributes per document (ie. groups, timestamps, etc)
  • supports stopwords.
  • supports both single-byte encodings and UTF-8.
  • supports English stemming, Russian stemming, and Soundex for morphology
  • supports MySQL natively (MyISAM and InnoDB tables are both supported)
  • supports PostgreSQL natively.

*NOTE: Features listed in Bold are some of the Salient features of Sphinx Search.

From Where To Download Sphinx

Sphinx is available through its official Web site at http://www.sphinxsearch.com/.


Currently, Sphinx distribution tarball includes the following software:

  • Indexer: utility which creates fulltext indexes
  • Search: a simple command-line (CLI) test utility which searches through fulltext indexes
  • Searchd: a daemon which enables external software (eg. Web applications) to search through fulltext indexes
  • sphinxapi: a set of searchd client API libraries for popular Web scripting languages (PHP, Python, Perl, Ruby).
  • spelldump: a simple command-line tool to extract the items from an ispell or MySpell (as bundled with OpenOffice) format dictionary to help customize your index, for use with
  • indextool: an utility to dump miscellaneous debug information about the index, added in version 0.9.9-rc2.
In this blog we will be discussing only the first three parts of sphinx which are indexer,search,searchd.