Character Encoding (http://en.wikipedia.org/wiki/Character_encoding) is the binary format that each character or letter stored and processed as.


Language Studio uses UTF-8 encoding. If you use any other encoding in your documents, then they must be converted to UTF-8 prior to submitting for translation. If you submit content in an encoding format other than UTF-8, the result will most likely be garbage or with some broken text.


To the naked eye, the file may look fine. For example, the following Chinese text “关于中文维基百科” could be encoded in UTF-8, UTF-16, GB18030 or BIG5 and look the same, but actually be represented differently in Hexadecimal for each encoding.


Some encoding is difficult to see the difference as most characters look correct, but there will be subtle differences that will have negative impacts on translation. In some cases the text will look the same in different encoding, but others will look totally different. Unless the software (or human) can understand or convert Morse Code to UTF-8 then the data is not understandable and cannot be processed.