Sections

Personal tools

You are here: Home → Plone products → BigramSplitter

Navigation

Plone categories: audio (7)

blog (10)

books (6)

calendar (23)

chat (7)

communication (25)

content (142)

database (10)

development tools (79)

e-commerce (13)

e-learning (14)

email notifications (8)

forms (28)

forum (8)

gis and maps (14)

guestbook (2)

import and export (14)

internationalization (20)

links (14)

mailing list (9)

navigation and menus (22)

newsletter (9)

other (227)

photo gallery (26)

poll (7)

portlet (66)

rating and evaluation (6)

search (17)

skins (106)

syndication (6)

theme deployment (27)

third-party integration (39)

tracking management (7)

user management (54)

versioning (3)

video (19)

webmail (9)

widgets and archetypes (18)

workflow (16)

wysiwyg (9)

Quintagroup: Plone skins

Zope hosting

Plone hosting

BigramSplitter (0.2.1.1)

Editor's rating:

by Olha Pelishok — last modified 2009-04-27

Features (0 votes):

0.0

Reliability (0 votes):

0.0

Ease Of Use (0 votes):

0.0

Documentation (0 votes):

0.0

category: Plone other

Released on 2008-11-11 by CMScom, Takashi Nagai, Naoki Nakanishi, Manabu Terada, Naotaka Hotta, Mikio Hokari for Plone 3.0 under GPL - GNU General Public License available for All platforms.

Software development stage: stable

BigramSplitter download link: http://code.google.com/p/bigramsplitter/downloads/list
Homepage of BigramSplitter: http://code.google.com/p/bigramsplitter/
BigramSplitter repository: http://code.google.com/p/bigramsplitter/source/checkout
Description source: http://code.google.com/p/bigramsplitter/

BiagramSplitter is add-on search product for Plone 3.x. It supports non-English languages, especially south east Asian languages.

Text character normalization process uses Python unicodedata. Convert full-width numeric and alphabet character into half-width equivalent. Convert half-width Katakana into full-width equivalent. Therefore all of above character variations can be recognized as same ones.

Language Specifications:

Chinese

-- No space between words.

-- There is only Kanji(Chinese) character

-- Process with Bigram(2-gram) model

Japanese

-- No space between words

-- Combination 0f Kanji(Chinese), Katakana, and Hiragana character

-- Discriminate Kanji, Hiragana, Katakana and processed with Bigram(2-gram) model

-- Convert Katakana into Hiragana

Korean

-- There are spaces between words, but it contains a particle

-- Combination of Korean alphabet and Kanji(Chinese) character

-- Discriminate Korean alphabet and Kanji(Chinese) character and processed with Bigram(2-gram) model

Thai

-- No space between words

-- It's very difficult to handle this language in a computer

-- A vowel and a consonant are registered in Unicode separately so that it is difficult to recognize as one word.

-- However, there is a possibility of dealing with Thai characters to use Bigram(2-gram) model.

Other languages (Including English)

-- There is a space between words

-- It is indexed each word

Source Code

Since no documents are available on how to develop 'word splitter', we refer to other splitter source code. But I still have a number of questions. If you have any more information, please feel free let us know.

Hotfix to Plone 3.0 source code

Because Plone 3.x catalog setting, catalog.xml, doesn't have existing index overwrite mechanism, we developed hotfix and added XML attribute. We believe Plone 3 XML define mechanism is simple and clear, so that we take this approach. We appreciate any comment.

Document Actions

Print this

Plone Themes

Blog: CMS.Info Blog:

Add-ons for the upcoming Plone 4 release

Review of the Plone products updates

Add-ons for Plone: review of the recent Products Releases

Portlet Cumulus for Plone

Review of the Recent Plone Products Releases

More...

OpenID Log in: OpenID URL