Friday, January 8, 2010
Some Bugs That I have come Across While Using Sphinx
Keyword Length has a limit
say for example in my case i had to search in my index file for keyword pizza hut in some fields like name,reviews etc and a bunch of zip codes in zip code field, these bunch of zip codes can be anything around ten to hundreds of then depending on the distance a user selects for a search. now what i did is i wrote a query like this:
@(name,review)pizza hut @(zip_code)60606|60605|........ and so on
but what i found out is that when the collection of zip codes exceeds above 90 sphinx gives me empty set.which was a big problem in my case.
so what i came up as a solution is that i divided the zip codes into groups of 50 and then queried sphinx multiple times and got the result from sphinx and then combined those results for further processing of data...
Saturday, August 22, 2009
Monday, August 10, 2009
Sphinx Installation
UnZip the Zip File and go to the
sphinx
subdirectory:$ tar xzvf sphinx-0.9.8.tar.gz
$ cd sphinxRun the configuration program:
$ ./configure
There's a number of options to configure. The complete listing may be obtained by using
--help
switch. The most important ones are:--prefix
, which specifies where to install Sphinx; such as--prefix=/usr/local/sphinx
(all of the examples use this prefix)--with-mysql
, which specifies where to look for MySQL include and library files, if auto-detection fails;--with-pgsql
, which specifies where to look for PostgreSQL include and library files.
Build the binaries:
$ make
Install the binaries in the directory of your choice: (defaults to
/usr/local/bin/
on *nix systems, but is overridden withconfigure --prefix
)$ make install
Installing Sphinx On Windows
Extract .zip file you have downloaded -
sphinx-0.9.8-win32.zip
(orsphinx-0.9.8-win32-pgsql.zip
if you need PostgresSQL support as well.)For the remainder of this guide, we will assume that the folders are unzipped into
C:\Sphinx
, such thatsearchd.exe
can be found inC:\Sphinx\bin\searchd.exe
. If you decide to use any different location for the folders or configuration file, please change it accordingly.Install the
searchd
system as a Windows service:C:\Sphinx> C:\Sphinx\searchd --install --config C:\Sphinx\sphinx.conf --servicename SphinxSearch
The
searchd
service will now be listed in the Services panel within the Management Console, available from Administrative Tools. It will not have been started, as you will need to configure it and build your indexes withindexer
before starting the service.
Introduction to Sphinx Search
Sphinx Search??? what is this????
might be the question in your mind!
Let me tell you that if you are a user of some of the finest search engines on the internet then you have used Sphinx for shure.
Sphinx is an acronym of SQL Phrase Index was created by Andrew Aksyonoff.
Sphinx search is a full text search engine.Basically Sphinx search is a standalone software designed to connect to a database. Currently Sphinx Search works with many Data Source Drivers (source from where Sphinx gets data from) like MySQL, Postgre-SQL, XML-PIPE,XML-PIPE2.
Search API is natively ported to PHP, Python, Perl, Ruby, Java, and also available as a pluggable MySQL storage engine. API is very lightweight so porting it to new language is known to take a few hours.
This blog is mainly dedicated to Sphinx interaction with MySQL database. and the SphinxSE which is the storage engine for MySQL.
Sphinx Features
- high indexing speed (upto 10 MB/sec)
- high search speed (avg query is under 0.1 sec on 2-4 GB text collections)
- high scalability (upto 100 GB of text, upto 100 M documents on a single CPU)
- provides good relevance ranking through combination of phrase proximity ranking and statistical (BM25) ranking (Phrase ranking is given on phrase match and BM25 is based on frequency of concurrence of the keyword)
- provides distributed searching capabilities
- provides searching from within MySQL through pluggable storage engine(SphinxSE)
- supports boolean, phrase, and word proximity queries
- supports multiple full-text fields per document(document in Sphinx term is for a single database table row) (upto 32 by default)
- supports multiple additional attributes per document (ie. groups, timestamps, etc)
- supports stopwords.
- supports both single-byte encodings and UTF-8.
- supports English stemming, Russian stemming, and Soundex for morphology
- supports MySQL natively (MyISAM and InnoDB tables are both supported)
- supports PostgreSQL natively.
*NOTE: Features listed in Bold are some of the Salient features of Sphinx Search.
From Where To Download Sphinx
Sphinx is available through its official Web site at http://www.sphinxsearch.com/.
Currently, Sphinx distribution tarball includes the following software:
Indexer: utility which creates fulltext indexes
Search
: a simple command-line (CLI) test utility which searches through fulltext indexesSearchd:
sphinxapi
: a set of searchd client API libraries for popular Web scripting languages (PHP, Python, Perl, Ruby).spelldump
: a simple command-line tool to extract the items from anispell
orMySpell
(as bundled with OpenOffice) format dictionary to help customize your index, for use withindextool
: an utility to dump miscellaneous debug information about the index, added in version 0.9.9-rc2.