Language Studio™ runtime rules can be simple search and replace rules with basic text or Regular Expressions, or can be more complex rules that leverage syntax, part of speech and other linguist information. Language Studio™ Linguists will guide you on the best approaches to solve a specific linguistic challenge.


To learn more about Regular Expressions, see these websites: http://en.wikipedia.org/wiki/Regular_expression, http://www.regular-expressions.info/ and http://regexlib.com/.


  • Pre-Translation Correction (PTC): Often the original source material may contain outdated terminology. Language Studio™ allows for the adjustment of terminology so that older terms are transformed into their more modern form prior to translation. As the correct form is passed into the translation engine, the translation will also choose the correct form. Pre-Translation Corrections can also correct known errors such as spelling mistakes, common OCR errors, glued words, etc.

  • Runtime Glossaries (GLO): Language Studio™ allows glossaries to be defined on a customer, project and job level. Like RBMT systems, the preferred term can guide the translation. Unlike RBMT systems, most terms are already known by the SMT platform via the translation memories provided, thereby reducing the amount of work needed for a specialist or a linguist to refine the engine.

  • Non-Translatable Terms (NTT): Some terms such as product names, venue names, etc. should not be translated. Language Studio™ allows a list of non-translatable terms to be specified.

  • Post-Translation Adjustments (PTA): Statistics may determine a preferred term based upon the training data provided. A preferred term for one customer may not be the preferred term for another customer. One of our clients has 2 clients of their own in the commercial real-estate business. One prefers to call some buildings by their older name, while the other prefers the new name. A single engine can be used, with a Post-Translation Adjustment making the necessary change for each specific customer.


Rule File Formats


GLO, PTC and PTA all have the same format:

<source pattern><tab><target pattern><tab><rule>


NTT as the following format:

<source pattern><tab><rule>


A rule is one of the following values:

cs: case sensitive

ci: case insensitive

rcs: regex case sensitive

rci: regex case insensitive


Note: To ensure that some patterns do not break other patterns make sure that you sort the rules by the length of the source search pattern with the longest at the top.


1. Correct

Apple Computer<tab>Apple Computer<tab>ci

Apple<tab>manzana<tab>ci


2. Incorrect

Apple<tab>manzana<tab>ci

Apple Computer<tab>Apple Computer<tab>ci


If we ran both the above examples against the following phrase:


Apple Computer is a company

Apple Computer es una empresa


Apple Computer is a company

Manzana Informática es una empresa


As can be seen from the example, when the word Apple is processed first, the resulting impact on the translation can be quite negative.