Did you know that a translation memory (TM) file is just a text file with structure? Maybe you saw the file extension (*.tmx or *.xlf) and imagined it can only be opened with special software. In reality, all you need is a standard text editor (such as TextEdit on Mac or Notepad on Windows) and you can open it up. Additionally, you don’t need a PhD or computer degree to read it or understand the structure. For the most part, a translation memory file is straightforward and functional... it is not a black box.
A translation memory file holds translation and linguistic data in a structured format. It’s just a text file. More specifically, it is typically an XML (Extensible Markup Language) file, which is also a text file, but with a well-defined structure which provides ways to represent complicated data structures.
The main info:
Additional data it might store:
The two most popular file types in the industry are XLIFF and TMX - both of which are XML files. However, translation memory can also be stored as spreadsheet files such as Excel (XLS) or even just comma separated value text files (CSV). Although XLS and CSV files tend to be smaller in size, the downside to these types is that you store less data about each translation unit (i.e. typically only the segment and language are stored.)
Why do XLIFF and TMX both use the XML format? There are a few advantages XML provides over a raw text file:
TMX and XLIFF are both industry standard file types. Additionally, both are XML-based file types. The two formats have a lot in common, including some inline markup elements; however, each one has a slightly different structure and elements. The following are a few key differences between the two file types:
What's in a translation memory file? First is the header. The header contains metadata about the file and the localization process. Let's look at an example header for each file type. As you will see, the semantic naming of XML tags makes the files human readable - even without reading the actual specification you can probably understand most of the fields.
<?xml version="1.0"?>
<tmx version="1.4">
<header
creationtool="TRADOS Translator's Workbench for Windows"
creationtoolversion="Edition 8 Build 863"
segtype="sentence"
adminlang="EN-US"
srclang="EN-US"
creationdate="20131117T140541Z"
>
</header>
</tmx>
In an XLIFF file the metadata attributes are stored in the <file> element.
<?xml version="1.0"?>
<xliff version="1.1">
<file
original="ABC Company Brochure.docx"
source-language="EN-US"
datatype="plaintext"
target-language="JA-JP"
tool="Transdraft"
date="2013-09-04"
>
</file>
</xliff>
After the header comes the body. This is the section that contains the most important data - the translation units and segments. Let's look at an example body for both a TMX and XLIFF file.
<?xml version="1.0"?>
<tmx version="1.4">
<body>
<tu creationdate="20110510T103323Z">
<tuv xml:lang="EN-US">
<seg>This is a pen.</seg>
</tuv>
<tuv xml:lang="JA-JP">
<seg>これはペンです。</seg>
</tuv>
</tu>
</body>
</tmx>
<?xml version="1.0"?>
<xliff version="1.1">
<file>
<body>
<trans-unit id="1">
<source xml:lang="EN-US">This is a pen.</source>
<target xml:lang="JA-JP">これはペンです。</target>
</trans-unit>
</body>
</file>
</xliff>
Translation memory files are typically used by translators in their CAT or TEnT (translation environment tool) tool to help them translate more efficiently. Loading a translation memory file into a translation software tool allows a translator to leverage their prior work. If a segment in the current translation has already been translated before (or even partially translated before) the tool will help the translator by automatically alerting them of the match (or partial match).
Translation memory files will also help you maintain consistency as a translator. Over your career you will work on many different projects for many different clients. Some projects might require specific terminology or phrases. Utilizing "client-based" or "project-based" translation memory will allow you to ensure accuracy and stay consistent with every translation you work on.
Both TMX and XLIFF are powerful choices. They are both industry standard file formats and both are supported by the majority of translation software tools. Ultimately, whether you end up using a TMX or XLIFF file often depends on the project or tool you will be using. Additionally, sometimes a TM file might be provided to you for a particular job. Both TMX and XLIFF can get the job done well and using translation memory is 1000x better than not using translation memory (regardless of what file format you use). Many times you don't have to "choose" as you can download your translation memory from the tool you are using in either format.
However, given the choice on a new translation project, I would prefer TMX for 2 main reasons:
On the other hand, if using the TM file to reconstruct or rebuild the original file is important to you, the XLIFF format is much more powerful in this regard.
Kevin Dias
TM-Town Developer
Patricia Brenes United States |
Posted almost 10 years ago. Excellent blog post. May I repost this in my blog on terminology for my readers with due acknowledgement of the author, of course? Thanks |
Kevin Dias Japan |
Posted almost 10 years ago. @inmyownterms - Thanks for the kind words. Of course, I'd be honored :thumbsup: |
If you would like to leave a comment please sign in to your TM-Town account. If you are not a TM-Town member you can easily register for a free account.