Home | Wiki | OI 1.x Docs | OI 2.x Docs |
OpenInteract2::Manual::I18N - Internationalization in OpenInteract2
This part of the manual will describe i18n efforts in OpenInteract2, how to create message bundles to distribute with your application, and how you can customize the process.
I'm a newbie at i18n/l10n efforts. The main purpose is to find the path I think most web applications will trod and make that as simple as possible to navigate. The hooks in the framework to enable localization should be sufficiently unobtrusive so as not to preclude other efforts you may have in this area.
So if you have ideas about how things can be done better or more flexibly, please join the openinteract-dev mailing list and chime in. (See SEE ALSO for more info on the mailing list.)
Localizing every aspect of your application is extremely difficult. There are the easy things like translating words on the screen, date/time formats and money. Then there are the tough things: what does this shade of yellow mean in China versus Saudi Arabia? What happens if someone reads this sequence of graphics from right-to-left instead of left-to-right? And on and on for many more items you couldn't have even thought up yet.
OpenInteract won't presume to take care of all these for you. Instead we try to make the most common operations as simple as possible. Hopefully that will be sufficient for your needs.
Ordered from most to least important, here's how we identify the language to use for the current request. First match wins.
User logged in? Look in 'lang' user property
Language set in session?
Language in GET/POST params? ('oi_language')
Language passed by browser? (use as backup also...)
Customized identifiers (register in server.ini)
OI has more hooks than your favorite rock band, and this area is no exception. During the request initialization process we identify all the languages available for this request. Normally this means all the languages for a particular user, but you can override it with GET/POST parameters or a setting in the session.
We also provide the means for you to step in and implement your own -- you could parse it from the URL, use Geo::IP, whatever. Just declare your class in the server configuration key 'language.custom_language_id_class':
[language] ... custom_language_id_class = MyApp::I18N::LanguageId
And implement the class method 'identify_languages()', which takes a single argument of the languages identified so far. Here's a naive example:
package MyApp::I18N::LanguageId; use Geo::IP; use OpenInteract2::Context qw( CTX ); my $gi = Geo::IP->new( GEOIP_STANDARD ); sub identify_languages { my ( $class, @oi_langs ) = @_; my $country = $gi->country_code_by_addr( CTX->request->remote_host ); my @langs_from_country = $class->_some_nifty_method( $country ) push @oi_langs, @langs_from_country; return @oi_langs; }
Note that if you return a list with entries it replaces what OI has so far identified. We took care of this above by first copying all the languages previously identified then adding to them.
This is the fairly simplistic means of using keys to represent blocks of text. The key gets replaced by the text for whatever language the current user is associated with. Here's an example: you setup your music library search form like this:
Artist: _____________ Title: _____________ Year: _____________ <Search>
And you'd like to localize this. Like all other problems dealing with programming you just add a layer of abstraction, associating each piece of text with a key, then associating text to that key for each language:
{search.artist}: _____________ {search.title}: _____________ {search.year}: _____________ <{search.button}>
Now you just have sets of data for each language:
en: search.artist = Artist search.title = Title search.year = Year search.button = Search es: search.artist = Artista search.title = Titulo search.year = Ano search.button = Hallazgo ...
When the page is rendered these keys get replaced by the associated text. Fortunately Perl comes with libraries to make this happen fairly painlessly. And a nice side-effect is that the message files are in a sufficiently simple format that you can ship them off to someone else and just plug them in your application when they're ready.
There's more about the messages and the file format below.
A second type of localization is template negotiation. Hopefully you won't need to use it as often because it can require more maintenance. Instead of replacing text in the template you replace the entire template wholesale.
It works in much the same way, except instead of placing text in the various language files you place template names under a particular key. (The name is in the normal 'package::template' syntax.) And just like invoking a template from your action you can do this in two ways:
specify the template in your action
specify the template in your action configuration
Here's a quick example of the first, passing the message key in your
action generate_content()
call:
sub mytask { my ( $self ) = @_ my %params = ( ... ); ... return $self->generate_content( \%params, { message_key => 'mytask.template' } ); }
And an example of the second, passing the message key in the action
configuration (action.ini
):
[foo template_source] mytask = msg:mytask.template
In your message files you'd have:
messages_en.msg: mytask.template = mypackage::mytemplatename_english
messages_es.msg: mytask.template = mypackage::mytemplatename_spanish
The templates get the exact same data under the exact same variable names, but you can control the layout and text per language.
See OpenInteract2::Manual::Templates and OpenInteract2::Action for more information.
The names of the filenames we process are fairly flexible, but one aspect is not. The language must be the last distinct set of characters before the file extension. So the following are ok:
myapp-en.msg # lang is 'en' myotherapp-es-MX.dat # ...'es-MX' messages_en-HK.msg # ...'en-HK'
The following are not:
english-messages.msg messages-en-part2.msg messagesen.msg
If you create a message filename that does not conform to this specification, it not only won't be processed but will halt the entire localization reading process altogether.
The message file format is fairly simple:
Unless we're in the middle of a continued value, we'll skip all commented lines (those beginning with a '#') and blank lines.
A message key is unique per language and has a single value that is its associated message for that language. It is separated from the message by an '='.
A message value may span multiple lines using the standard '\' notation at the end of a line. (Examples below.)
A message value may have one or more runtime replacements which match up with parameters passed in. These replacement declarations can get relatively sophisticated -- we discuss them briefly below but for true enlightenment read the documentation for Locale::Maketext.
So here is a simple declaration for two message keys without continued values or runtime replacements:
company.title=Welcome to MyCompany! company.phone = Call 412-555-1212 for more information.
Two things to note:
The keys ('company.title' and 'company.phone') are abstract and semi-hierarchical. There's a FAQ below about why we chose opaque message IDs for the core OI packages, but you don't have to do so. The only tricky part is ensuring you don't stomp on someone else's namespace. One way to do avoid this is using your package/application name as the first part of the hierarchy.
The message reader will truncate any whitespace around the '='.
Here's a declaration of two keys, one of which has a continued value:
company.intro = You have decided to learn about MyCompany, a leader \ in the maintenance of the status quo around the world. Ensure your \ status is the one that's in quo! company.title = Welcome to MyCompany!
The main things here are:
The '\' must be at the end of the line or the remainder of your message will get lost. (You may have whitespace between the '\' and the end of line, but that may not be the case forever.)
You can have multiple continuations for a single value.
The value returned will not have any embedded newlines. (TODO: This may change, speak up if you have strong feelings about it.)
Since we just use Locale::Maketext behind the scenes you can do anything in your message values that it allows. Here is a quick summary of the most common options.
First, you often need to embed one or more values in a
message. Position is important: the translation of your message may
shift around the order of the values so you cannot treat it like a
sprintf
. For instance, you might have:
db.error.process = While processing the statement [_1] the database \ returned an error [_2]
In another language this might be something like the following nonsense:
db.error.process = La base de datos volvio un error [_2] mientras \ que procesaba la declaracion [_1]
When we ask for the message we need to pass in two values which will get plugged into the message at runtime:
my ( $sth ); eval { $sth = $dbh->prepare( $sql ); $sth->execute(); }; if ( $@ ) { my $error_msg = $lh->maketext( 'db.error.process', $sql, $@ ); # ... }
Since they're ordered there's no ambiguity.
Second, you often need to plugin values that depending on their value may change words around them. For instance:
cart.numitems = You have [_1] items in your shopping cart.
Easy enough, but what happens when the number is 1? Or 0?
You have 1 items in your shopping cart. You have 0 items in your shopping cart.
It's understandable, but not user-friendly. Fortunately Locale::Maketext does this for us:
cart.numitems = You have [quant,_1,item,items,no items]
With a '1' this will generate:
You have 1 item in your shopping cart.
And with a '0':
You have no items in your shopping cart.
Nifty!
Why did you use opaque IDs for the message keys?
In the Locale::Maketext docs Sean Burke recommends using keys based on the base language -- that is, not using opaque message keys. His suggestion makes for very readable translation documents but I think in practice it would be extremely brittle -- if you change the key in the base language even for punctuation you'll need to change all of them. Feh. (Then again, Mr. Burke is a bona-fide superhero, so we'll see how that shakes out...)
Additionally a lot of this was inspired by the message (or 'resource') bundle technology built in to the Java 2 platform. (See SEE ALSO for more on this.) Message bundles shipped with applications built on Struts or Spring typically use the hierarchical message syntax, with different levels separated by a dot. So you might have 'myapp.search.label.firstname' which gets more specific as you traverse the key from left to right. How specific you want to get is up to you.
That said, there's nothing stopping you from using your own standard for declaring keys in your application. Use ID numbers, letters, days of the week, whatever. Just make sure your package's keys don't trod on another's.
OpenInteract2::I18N::Initializer
openinteract-dev mailing list:
http://lists.sourceforge.net/lists/listinfo/openinteract-dev
Article published in TPJ 13 by Sean Burke about Locale::Maketext:
http://search.cpan.org/~sburke/Locale-Maketext-1.06/lib/Locale/Maketext/TPJ13.pod
Web Localization in Perl by Autrijus Tang
http://www.autrijus.org/webl10n/TABLE_OF_CONTENTS.html
Java Internationalization: Localization with ResourceBundles
http://developer.java.sun.com/developer/technicalArticles/Intl/ResourceBundles/
Copyright (c) 2003 Chris Winters. All rights reserved.
Chris Winters <chris@cwinters.com>
Generated from the OpenInteract 1.99_04 source.