Language Studio™ runtime rules can be simple search and replace rules with basic text or Regular Expressions, or can be more complex rules that leverage syntax, part of speech and other linguist information. Language Studio™ Linguists will guide you on the best approaches to solve a specific linguistic challenge.
To learn more about Regular Expressions, see these websites: http://en.wikipedia.org/wiki/Regular_expression, http://www.regular-expressions.info/ and http://regexlib.com/.
-
Pre-Translation Correction (PTC): Often the original source
material may contain outdated terminology. Language Studio™ allows for
the adjustment of terminology so that older terms are transformed into
their more modern form prior to translation. As the correct form is
passed into the translation engine, the translation will also choose the
correct form. Pre-Translation Corrections can also correct known errors
such as spelling mistakes, common OCR errors, glued words, etc.
-
Runtime Glossaries (GLO): Language Studio™ allows glossaries to be defined on a customer, project and job level. Like RBMT systems, the preferred term can guide the translation. Unlike RBMT systems, most terms are already known by the SMT platform via the translation memories provided, thereby reducing the amount of work needed for a specialist or a linguist to refine the engine.
-
Non-Translatable Terms (NTT): Some terms such as product names, venue names, etc. should not be translated. Language Studio™ allows a list of non-translatable terms to be specified.
- Post-Translation Adjustments (PTA): Statistics may determine a preferred term based upon the training data provided. A preferred term for one customer may not be the preferred term for another customer. One of our clients has 2 clients of their own in the commercial real-estate business. One prefers to call some buildings by their older name, while the other prefers the new name. A single engine can be used, with a Post-Translation Adjustment making the necessary change for each specific customer.
Rule File Formats
GLO, PTC and PTA all have the same format:
<source pattern><tab><target pattern><tab><rule>
<source pattern><tab><rule>
A rule is one of the following values:
cs: case sensitive
ci: case insensitive
rcs: regex case sensitive
rci: regex case insensitive
Note: To ensure that some patterns do not break other patterns make sure that you sort the rules by the length of the source search pattern with the longest at the top.
1. Correct
Apple Computer<tab>Apple Computer<tab>ci
Apple<tab>manzana<tab>ci
2. Incorrect
Apple Computer<tab>Apple Computer<tab>ci
If we ran both the above examples against the following phrase:
Apple Computer is a company
Apple Computer es una empresa
Apple Computer is a company
Manzana Informática es una empresa
As can be seen from the example, when the word Apple is processed first, the resulting impact on the translation can be quite negative.