Put Your Translations To Work...

TM-Town matches new work based on your past translations.

Try it now, Free!

Unlocking the Black Box of Translation Memory Files

Unlocking the Black Box of Translation Memory Files

Did you know that a translation memory (TM) file is just a text file with structure? Maybe you saw the file extension (*.tmx or *.xliff) and imagined it can only be opened with special software. In reality, all you need is a standard text editor (such as TextEdit on Mac or Notepad on Windows) and you can open it up. Additionally, you don’t need a PhD or computer degree to read it or understand the structure. For the most part, a translation memory file is straightforward and functional... it is not a black box.

So what exactly is a translation memory file?

A translation memory file holds translation and linguistic data in a structured format. It’s just a text file. More specifically, it is typically an XML (Extensible Markup Language) file, which is also a text file, but with a well-defined structure which provides ways to represent complicated data structures.

What type of information does it store?

The main info:

  • Segments (source and target)
  • Language
  • Creation dates and times

Additional data it might store:

  • Author
  • Usage count
  • Change dates and times
  • Creation tool
  • Domain (field)
  • Alternate translations
  • Notes

What are the typical formats of translation memory files?

The two most popular file types in the industry are XLIFF and TMX - both of which are XML files. However, translation memory can also be stored as spreadsheet files such as Excel (XLS) or even just comma separated value text files (CSV). Although XLS and CSV files tend to be smaller in size, the downside to these types is that you store less data about each translation unit (i.e. typically only the segment and language are stored.)

Why do XLIFF and TMX both use the XML format? There are a few advantages XML provides over a raw text file:

  • It is easy to parse because it has a well defined structure.
  • The structure and tags of an XML file often help indicate what the data means (i.e. semantic tags such as <segment>, <trans-unit>, etc.).
  • There are many software tools built around the XML format to validate, import, parse, search, etc.
  • Different applications and systems can interact and exchange data because an XML file typically has a well-defined structure.

Let's break it down

What's in a translation memory file? First is the header. The header contains metadata about the file and the localization process. Let's look at an example header for each file type. As you will see, the semantic naming of XML tags makes the files human readable - even without reading the actual specification you can probably understand most of the fields.

Hover your mouse over an attribute for more details.

TMX Header

<?xml version="1.0"?>
<tmx version="1.4">
<header
creationtool="TRADOS Translator's Workbench for Windows"
creationtoolversion="Edition 8 Build 863"
segtype="sentence"
adminlang="EN-US"
srclang="EN-US"
creationdate="20131117T140541Z"
>
</header>
</tmx>

XLIFF Header

In an XLIFF file the metadata attributes are stored in the <file> element.

<?xml version="1.0"?>
<xliff version="1.1">
<file
original="ABC Company Brochure.docx"
source-language="EN-US"
datatype="plaintext"
target-language="JA-JP"
tool="Transdraft"
date="2013-09-04"
>
</file>
</xliff>

After the header comes the body. This is the section that contains the most important data - the translation units and segments. Let's look at an example body for both a TMX and XLIFF file.

Hover your mouse over an attribute for more details.

TMX Body

<?xml version="1.0"?>
<tmx version="1.4">
<body>
<tu creationdate="20110510T103323Z">
<tuv xml:lang="EN-US">
<seg>This is a pen.</seg>
</tuv>
<tuv xml:lang="JA-JP">
<seg>これはペンです。</seg>
</tuv>
</tu>
</body>
</tmx>

XLIFF Body

<?xml version="1.0"?>
<xliff version="1.1">
<file>
<body>
<trans-unit id="1">
<source xml:lang="EN-US">This is a pen.</source>
<target xml:lang="JA-JP">これはペンです。</target>
</trans-unit>
</body>
</file>
</xliff>

Why are translation memory files so important?

Improves efficiency

Translation memory files are typically used by translators in their CAT or TEnT (translation environment tool) tool to help them translate more efficiently. Loading a translation memory file into a translation software tool allows a translator to leverage their prior work. If a segment in the current translation has already been translated before (or even partially translated before) the tool will help the translator by automatically alerting them of the match (or partial match).

Ensures consistency

Translation memory files will also help you maintain consistency as a translator. Over your career you will work on many different projects for many different clients. Some projects might require specific terminology or phrases. Utilizing "client-based" or "project-based" translation memory will allow you to ensure accuracy and stay consistent with every translation you work on.

Which format is better - TMX or XLIFF?

Both TMX and XLIFF are powerful choices. They are both industry standard file formats and both are supported by the majority of translation software tools. Ultimately, whether you end up using a TMX or XLIFF file often depends on the project or tool you will be using. Additionally, sometimes a TM file might be provided to you for a particular job. Both TMX and XLIFF can get the job done well and using translation memory is 1000x better than not using translation memory (regardless of what file format you use). Many times you don't have to "choose" as you can download your translation memory from the tool you are using in either format.

However, given the choice on a new translation project, I would prefer TMX for 2 main reasons:

  1. Translation units are (or at least can be) time stamped (Time stamped translation units allow you to later do a productivity analysis on your work).
  2. Multiple target languages can be stored in one file.

On the other hand, if using the TM file to reconstruct or rebuild the original file is important to you, the XLIFF format is much more powerful in this regard.

Quiz

Try out this short quiz about translation memory files to check what you learned from this blog post.

#1: You can open a translation memory file (such as a TMX or XLIFF file) in a standard text editor.

#2: The term for the grouping that establishes an equivalence between a segment in one language and its translation.

#3: Which of the following is a true statement about the difference between XLIFF and TMX?

#4: TMX files can store date and time information on the translation unit level.

#5: The original purpose for the creation of the TMX and XLIFF formats was the same.

kevin dias at tm-town

About the Author

Kevin Dias
TM-Town Developer
More about me

TM-Town is the next-generation platform for freelance translators.

Join today and let your work start working for you.

Join now, it's free!

Comments (2)

User Avatar Patricia Brenes
United States
Posted about 10 years ago.

Excellent blog post. May I repost this in my blog on terminology for my readers with due acknowledgement of the author, of course? Thanks

diasks2 Kevin Dias
Japan
Posted about 10 years ago.

@inmyownterms - Thanks for the kind words. Of course, I'd be honored :thumbsup:

If you would like to leave a comment please sign in to your TM-Town account. If you are not a TM-Town member you can easily register for a free account.