ERP5 KM

DiscussionLocalization

Localization problems and solutions

Currently ERP5 uses Localizer to achieve both UI and content localization(translation). To aware Japanese language system, there are several problems due to some architectural limitation etc. In this discussion, we will list up those problems and try to find a good solution for each one.

Current Feature

Describe the current localization feature shortly.

  • Localizer
  • erp5_ui message catalog is used for UI localization.
    • Messages on UI like labels are static. User cannot change them. Centralized.
  • erp5_content message catalog is used for user-input data. Centralized.
  • getTranslatedXXXX accessor returns localized data of the current choosen language from either erp5_ui or erp5_content. User can choice a message catalog in type information definition.

Problems

Problem 0a: document local localization data

We often want to have document local localization data instead of centralized one. One English word may have 10 corresponding translation equivalent words of Japanese in various different contexts.

Problem 0b: UI independent localization

UI localization and Content localization is different. For example, even if UI is English, but user may want to see an address information written in Japanese.

Problem 1: transliterations and translations

There are sometimes multiple ways to represent the same language.

In Japanese:

  • Japanese written normally
  • Japanese written in romaji
  • Japanese written in kana

In Chinese

  • traditional Chinese
  • simplified Chinese

In Wolof:

  • Wolof written in latin alphabet (after French colonialism)
  • Wolof written in arabic alphabet (which is the way it used to be written when arabs were colonising africa)

There is also a common requirement for the original spelling of a product (ie. title) translate it and / or transliterate it

  • ex. to arabic for addresses to / from saudia arabia
  • ex. to kana for searching purpose in Japan

I propose to extend localiser to support language variations and more languages:

  • ja_kana
  • ja_romaji
  • wo
  • wo_arabic

Problem 2: localized properties

We need some properties to be available in different languages. Example:

  • description in French, English, Japanese
  • name of person (Title) in kana and / or romaji

Solution

  • add a property to propertysheets to control the localizable scope

(ex. only languages, languages + pronounciation, etc.)

  • generate accessors dinamycally based on the selection of languages

in localizer and on the families of languages

  • ex. getJaTranslatedTitle; getJaKanaTranslatedTitle,

setJaRomajiTranslatedTitle

  • ex. setTranslatedTitle (in user default language)
  • in the accessor code, implement a system which tries to lookup

subobjects of the current object to search for a translation. Subobjects can be called "Localized Property" or "Localized Content". There should be one such property per language. For fast lookup, it is better to use ids if possible.

  • calls to setJaRomajiTranslatedTitle automatically create

subobjects if needed

  • subobjects may have a workflow (optional) to manage translation tasks
  • add a table to catalog to store all translated properties

(generic) and if customers really need higher performance on large data set, add columns to catalog

  • use FormBox for properties which may require different forms based

on language and include appropriate formbox in localization skins

  • ex. title needs 2 fields in general (title, translated_title) but

4 in Japanese (title, translated_title, ja_romaji_translated_titile and ja_kana_translated_title)

Note: same approach of Formbox can be used for localizing address forms.

Problem 3: localized properties in listbox

If a listbox contains a property such as title, more columns should allows to select also

  • ja_translated_title
  • fr_translated_title
  • ja_kana_translated_title

etc.

Solution

  • if more columns field in listbox uses static property, expand

automatically the columns each time a column refers to a localizable propery (small change to Listbox.py)

Problem 4: searchability

If we search title using kana, we want to see results in Kanji (Japanese). If we search French title using pronounciation alphabet, we want to see results.

Solution:

translated properties table or in extra columns whenever available

Solutions

An experimental implementation by NexediKK

To solve above problems, we made a experimental implementation.

Solution for problem 1

Extended Localizer and made it possible to use user-defined language. Then it is possible to regard Japanese-Kana as another independent language(ja-kana) and we can have localized content for both real language(ja, fr, en) and such virtual language(ja-kana).

One problem is that currently extended localizer does not distinguish between default languages(real) and user-defined ones(virtual) and virutal languages will be shown in the language drop down menu. But usually they are useless and no need to be shown.

Solution for problem 0 and problem 2

New translation accessor has been introduced. By checking Localizer's enabled language list, this new accessor system generates getter/setter per language and localization data is stored in the document itself.

For example, if there are three languages ja-kana, ja, fr and if title property is marked as localized, then new accessor system generates following accessors:

  • get/setJaKanaTranslatedTitle
  • get/setJaTranslatedTitle
  • get/setFrTranslatedTitle

And about translation data container, I use dictionary instead of using content-type subobject.

The good point of using dictionary is that it is small and does not impact to machine resource, no need to index, we don't have to care about allowed-content-types definition on type information, because all portal types can be localized, so that we have to allow "subobject localize property" everywhere.

The bad point is that it is not fit in with workflow. If we manage each translations seriously, we need subobject. For now, this implementation provides very simple workflow for localization content. This simple workflow detects difference between original data and localized one.

I suppose that this is good enough for normal project which target is searching.

And about UI, I did not use formbox because it is a kind of optimization to a specific language, but prepared a generic UI to view localized data. User can see all localized content in a matrix box(Y-axis is property and X-axis is language).

Solution for problem 3

Nothing is done.

Solution for problem 4

About searchability, I have brought a new table named "content_translation".

With this table and a scriptable key, user can search documents by both original title and localized title. By default, the scriptable key script supports only title property, but it is easy to add another property.

content_translation table will be like following:

uid

language

property_name

translated_text

100

en

title

Nexedi

101

ja-kana

title

ネクセディ

102

en

first_name

Yusei

103

ja

first_name

悠西

104

ja-kana

first_name

ユウセイ

And by using scriptable key, if user search like following:

   1 context.portal_catalog(title='ネクセディ')

then, converted like this

select catalog.uid, catalog.path
from catalog, content_translation
where
(catalog.title='ネクセディ' or
 (content_translation.property_name='title' and content_translation.translated_text='ネクセディ')
)
and
catalog.uid=content_translation.uid

Due to the current limitation of SQLCatalog, we cannot generate a query like following so that all untranslated documents need to have one empty record in content_translation table for now...

select catalog.uid, catalog.path
from catalog, content_translation
where
catalog.title='ネクセディ'
or
(content_translation.property_name='title'
 and
 content_translation.translated_text='ネクセディ'
 and
 catalog.uid=content_translation.uid)

Side effect

Centralized content localization feature by erp5_content message catalog is replaced with document-local localization feature. But this may be too much.

Comments

Content languages vs. UI languages

The choice of language to translate a content property to is not always the same as the choice of language to translate user interface to. This is similar to the issue with web sites for which the languages of the site are not the same as the languages of the user interface, with no relation between the two.

Example 1:

  • UI interface: English (the working language for all users)
  • description language: French, English, Japanese (ex. production descriptions in a catalog)

Example 2:

  • UI interface: English, French, Japanese (we have users in different countries who want to use their own language to use ERP5)
  • description language: users are requested to enter all content in English for common data (ex. products) - This already has been considered. That's why we have introduced a new translation accessor which includes language name like getJaKanaTranslatedTitle. Content language is free from UI language now. -yusei

Translation Domains

It would be good to remind how translation domains are currently set:

  • for forms, we use erp5_ui
  • for module titles, we use erp5_ui
  • for categories, we use erp5_ui ? erp5_content ? why not erp5_categories or erp5_configuration ? and why not multiple ?
  • for document titles, we use erp5_content

The translation message choice which we have now could evolve into a translator choice (ie. which translation class ie. algorithm to we use to get translated content).

Translation centralized or modularized ?

Both method would have good/bad points.

Currently all UI translation is centralized, but is it good? As the next section points out that Zope3 or Plone can split translation into several PO files per context(business field). This is good idea, we don't have to worry about message id conflict. -yusei

Translation subcontent or independent content ?

Properties in content dict are one possible to way to store content in ad-hoc objects (but rather than using a dict, please consider a MixIn + Interface for this, it is always better). Another way is to store translated properties in a subdocument. Another way is to store translated properties in a document of another module.

There are actually many use cases for translation of messages, which go beyong

  • everything central
  • everything on the same document

In Plone, translation is split in multiple po files, one for each module. This is not stupid. It has some good points.

In KDE, documentation is translated by splitting text into paragraphs and using a po file to translate content. This is not stupid either, since it helps implementing a kind of translation memory.

In short: translated messages should be stored on documents which granularity matches the translation worklow.

  • - Regardless of the translation content location, new translation accessor system would not be changed. Instead, translation storage mechanism should be selectable by developer/user. To archive this purpose, interface definition of how to save/load translation data is required. Then user can choose or developer can implement their preferred one. -yusei

What about the translation workflow for local translation properties ?

If someone can speak only Japanese, will he only see "Japanese properties" to translate ? Or all documents with properties to translate ?

Can the translation workflow apply to other implementations ?

The concept of "this document still has some properties to translate" does not depend in reality on the kind of implementation of message translation. The worklow should thus be independent of the implementation.

Time to replace Localizer

It is time to create a new tool for ERP5 localisation (LocalisationTool / portal_localisations or same with z).

  • - I prefer TranslationTool, if the purpose is only translation. Localisation has a broader meaning. -yo

It should meet the following requirements:

  • keep the beautiful "online translation" features of Localizer
  • provide the kind of "translation workflows" which Localizer was defined for and failed to reach
  • provide the kind of "translation memory" which Localizer was defined for and failed to reach
  • translation domain and message catalog are 2 independent things
    • a given domain could use multiple message catalogs
    • same message catalog could be used for multiple domains
  • the concept of "ContentTranslator" class / plugin could be introduced. A given domain could use a given list of ContentTranslator to implement translation. Here are some examples

    • a PO file based translator which uses the data from multiple ordered PO files to provide translation
    • a "paragraph split" translator which splits content before trying to translate it using a given list of PO files
    • a "paragraph split" translator which splits content before trying to translate it using a PO file with same reference as the current document + ".Translation" extension
    • a "Babelfish" translator which invokes Google Babelfish each time something needs to be translated and keeps the result in cache
    • a PO file based translator which uses the data from a PO file in relation with the current document (ie. to implement the idea that this document is translated by this one)
  • let us make message catalog ERP5ish (and even DMSish)
    • a subclass of TextDocument

    • with nice features to parse po files
    • with nice features for online editing and search
    • possibly to support other translation file formats
  • include the possiblity to generate po files for certain domains / certain documents
  • fully implemented with interfaces and possibly MixIn

Please also do not forget to select an optimal data structure for the performance. Making everything ERP5ish is a way to suicide, because sub-objects are all independent persistent objects in ZODB, and different types of sub-objects are mixed up in a list or tree in the current implementation. -yo

Discussion/Localization (last edited 2009-11-04 01:29:27 by Yusei)

Page
  • Immutable Page
  • Info
  • Attachments
User
Learn about new ERP5 releases,technical articles, events and more.

Subscribe to the monthly ERP5 Newsletter!