Personal tools
You are here: Home Documentation How-Tos Searching Objects

Searching Objects

This How-to applies to: Any version.
This How-to is intended for: Any audience.

Indexing the contents of your objects allow you to perform fast complex search operations.
Author: Sebastian Ware

Introduction

Relational databases provide ad hoc search capability by means of SQL queries. In order to perform search operations on objects stored in your ZODB you need to explicitly create indexes. These indexes will update automatically when an object is modified provided it fires the IObjectModified event. The upside to this approach is that you will be be inclined to create simple and well designed indexes that in turn will scale well.

Grok supports the vanilla indexing services available in Zope 3 straight out of the box.

  • FieldIndex: search matching an entire field
  • SetIndex: search for keywords in a field
  • TextIndex: full-text searching
  • ValueSearch: search for values using ranges

You won’t be able to perform SQL-style joins to search related objects. Instead you could index an adapter with calculated properties.

Setup

The egg (package) containing the indexing functionality is called [zc.catalog-x.x.x-py2.x.egg]. The package is installed by including "zc.catalog" in the list "install_requires" in [setup.py]:

install_requires=['setuptools',
              'grok',
              'zc.catalog',
              'hurry.query',
              ],

The "hurry.query" package gives you some simple tools to perform advanced searching.

VERSION PROBLEMS: There are reports of problems with hurry.query 1.0 so it is recommended that you pin down an earlier version of hurry.query in your buildout.cfg file. The error you will experience is: ComponentLookupError: (<InterfaceClass zope.app.intid.interfaces.IIntIds>,'')

[buildout]
...
versions = versions
[versions]
hurry.query = 0.9.2

Don't forget to re-run buildout.

Note

In the above example, the indexes are only added when a new application is installed. You will have to manually create new indexes through the Zope 3 management screens if you wish to add them to an existing application. It would be great if someone wrote some documentation on how to do this in code!

Example

# interfaces.py
class IProtonObject(Interface):
    """
    This is an interface to the class who's objects I want to index.
    """
    body = schema.Text(title=u'Body', required=False)
# protonobject.py
class ProtonObject(grok.Model):
    """
    This is the actual class.
    """
    interface.implements(interfaces.IProtonObject)

    def __init__(self, body):
        self.body = body
# app.py
import grok
from grok import index
from hurry import query
from hurry.query.query import Query, Text
# hurry.query is a simplified search query language that
# allows you to create ANDs and ORs.

class ContentIndexes(grok.Indexes):
    """
    This is where I setup my indexes. I have two indexes;
    one full-text index called "text_body",
    one field index called "body".
    """
    grok.site(ProtonCMS)

    grok.context(interfaces.IProtonObject)
    # grok.context() tells Grok that objects implementing
    # the interface IProtonObject should be indexed.

    grok.name('proton_catalog')
    # grok.name() tells Grok what to call the catalog.
    # if you have named the catalog anything but "catalog"
    # you need to specify the name of the catalog in your
    # queries.

    text_body = index.Text(attribute='body')
    body = index.Field(attribute='body')
    # The attribute='body' parameter is actually unnecessary if the attribute to
    # be indexed has the same name as the index.

class Index(grok.View):
    grok.context(ProtonCMS)

    def search_content(self, search_query):
            # The following query does a search on the field index "body".
            # It will return a list of object where the entire content of the body attribute
            # matches the search term exactly. I.e. search_query == body
            result_a = Query().searchResults(
                               query.Eq(('proton_catalog', 'body'), search_query)
                               )

            # The following query does a search on the full-text index "text_body".
            # It will return objects that match the search_query. You can use wildcards and
            # boolean operators.
            #
            # Examples:
            # "grok AND zope" returns objects where "body" contains the words "grok" and "zope"
            # "grok or dev*" returns objects where "body" contains the word "grok" or any word
            # beginning with "dev"
            result_b = Query().searchResults(
                               Text( ('proton_catalog', 'text_body'), search_query)
                               )

            return result_a, result_b

Setting up a value index

You need to import zc.catalog to index values. First you need to create a Grok compatible index class.

from zc.catalog.catalogindex import ValueIndex
from grok.index import IndexDefinition
class Value(IndexDefinition):
    index_class = ValueIndex

Then you can use this to create your actual value index in your catalog.

class SiteCatalog(grok.Indexes):
    grok.site(Testvalueindex)
    grok.context(MyObject)
    grok.name('my_catalog')

    counter = Value()

This will index the property "counter" on objects of type "MyObject". This index supports searches such as greater than, less than, in between. It also supports sorting.

from zope.component import getUtility
from hurry.query.interfaces import IQuery
def displayQuery(q):
    query = getUtility(IQuery)
    r = query.searchResults(q)
    return [e.counter for e in r]

from hurry.query import value
class Index(grok.View):
    grok.context(MyApp)
    def render(self):
        mini = int(self.request.form.get('mini', 1))
        maxi = int(self.request.form.get('maxi', 99))
        return "%s" % displayQuery(value.Between(('content_index', 'counter'), mini, maxi))

This will display a list of values. If you are using hurry.query 1.1.0 or higher, you can pass sorting options to the query method. If not, you need to get the catalog and sort calling the index directly.

from zope.component import getUtility
from zope.catalog.interfaces import ICatalog

class Dates(grok.View):
    grok.context(MyApp)
    def render(self):
        mini = int(self.request.form.get('mini', 1))
        maxi = int(self.request.form.get('maxi', 12))
        limit = int(self.request.form.get('limit', 10))

        # Perform the query, returning a result set
        res = self.findMe(d_mini, d_maxi)

        # get the catalog
        content_catalog = getUtility(ICatalog, 'my_catalog')

        # sort the result and return limited result set
        tmp = content_catalog['published'].sort(res.uids, limit=limit)

        # create list of objects
        objs = [res.uidutil.getObject(o) for o in tmp]
        return "%s" % [e.counter for e in objs]

    def findMe(self, mini, maxi):
        q = value.Between(('content_index', 'published'), mini, maxi)
        query = getUtility(IQuery)
        r = query.searchResults(q)
        return r

This also shows how to find a catalog, which is useful if you want to check statistics on the index or need to update (reindex) the index.

If you want to index datetime properties, there is a datetime normalizer which I never got to work. Instead I did something like this.

from zope.interface import Interface
from zope import schema
class IPublished(Interface):
    published = schema.Int(title=u'Normalized datetime')

def _minuteNormalizer(dt):
    tmpin = dt.utctimetuple()[:5]
    multi = (535680, 44640, 1440, 60, 1) # Resolution in minutes
    value = sum(i*j for i,j in zip(tmpin, multi))
    return value

class Published(grok.Adapter):
    grok.implements(IPublished)
    grok.context(MyObj)
    def _published(self):
        return _minuteNormalizer(self.context.published)
    published = property(_published)

class SitePublishCatalog(grok.Indexes):
    grok.site(MyApp)
    grok.context(IPublished)
    grok.name('my_catalog')

    published = Value()

The SitePublishCatalog uses the IPublished() adapter to convert the datetime property "published" to an integer. In order to perform a query you will need to normalize your parameters too. Don't forget timezones or you might get unexpected results. I use the "pytz" egg to get preconfigured timezones.

from pytz import timezone

d_mini = datetime(2010, 1, 1, tzinfo = timezone('CET'))
d_maxi = datetime(2010, 12, 31, tzinfo = timezone('CET'))
q = value.Between(('content_index', 'published'), _minuteNormalizer(d_mini), _minuteNormalizer(d_maxi))
query = getUtility(IQuery)
r = query.searchResults(q)

Learning More

The "hurry.query" package contains the DocTest "query.txt" that shows how to perform more complex search queries.