TextIndexNG (3.0.6, 2.1.1 stable)
by
cms
—
last modified
2009-01-08
Released on 2005-09-13 by ZOPYX for Zope 2, Zope 3 under Zope Public License (ZPL) available for All platforms.
Software development stage: stable
- TextIndexNG download link: http://sourceforge.net/project/showfiles.php?group_id=50052&package_id=43509
- Homepage of TextIndexNG: http://zopyx.com/OpenSource/TextIndexNG/
TextIndexNG is the new fulltext index for Zope and is the most feature-complete solution for fulltext indexing under Zope. The next generation text index for the Zope Catalog. TextIndexNG V 3 is a complete new implementation based on Zope 3 technologies and can be used both in Zope 2.8 or 2.7 (with Five) or in
Zope 3.
Sophisticated search with indexing of PDF, MSWord docs.
New features:
- multi-field support: one index can index multiple fields/attributes of objects; queries can happen against all fields or a single field
- multi-language support: one index can index documents in different languages. Queries can be limited to a particular language.
- configurable converters: external converters for foreign formats like PDF, DOC, etc. can be configured through ZCML
- custom content-types can be indexed either by implementing the required interfaces or by providing an adapter providing the required interfaces for a given content-type
- Integrates with Zope 2 and Zope 3
Features:
- DocumentConverters
- StemmerSupport for 13 languages
- SimilaritySearch for english text (based on the Levenshtein distance)
- NearSearch,
- PluggableParsers
- extended StopWords support
- full integration in ZCatalog
- TestFunctionality through ZMI
- ExtensibleArchitecture
- being MoreEfficient than the current !TextIndex
- full globbing support (wildcard search)
- NormalizationSupport (e.g. reducing accented characters to their base form)
- full UnicodeAwareness
- Relevance ranking of search results added. Searches are now ranked using an extended cosine measure. The cosine measure is based on a vector model and calculates the document "score" based on the frequency of the query terms inside the document result set.
- Much faster phrase/near search: the old implementation of TextIndexNG had to perform a very expensive job at query time when phrase/near search was performed. Re-using the !WidCode module of !ZCTextIndex made this operation less expensive.
- Left-truncation added: TextIndexNG can be configured creation-time time to support left-truncation (means you can search for "*suffix") Left-truncation is an option because this feature requires a second reverted index inside the lexicion and much more memory!
- optional auto-expansion support: This optional feature allows you to get better search results when some of the query terms could not be found. The index expands a query term "foo" to "foo*" if there was no hit for "foo". This expansion is currently global for the index. This feature will be available on a per-query basis in a later version. (Auto-expansion will be extended in a later version to search for similiar terms)
- improved HTML converter: now using Chris Withers "Strip-o-Gram" module instead of the Strip-Tag-Parser
- added converter for text/sgml
- Similarity search (soundex, metaphone, doublemetaphone) dropped and replace with a more general approach and language indepedant approach using the Levenshtein distance.
- range searches like "Fi..Foo"
- substring searches "substring"