TM-Town's translation enablement platform is designed to help translators get the most out of their linguistic assets. TM-Town's system can help you unlock the value of your prior translations in multiple different ways - finding potential new clients, leveraging your Translation Memory (TM) as well helping you extract valuable analytical business data from your linguistic assets.
In future posts I will go into all of these areas in detail; however, today I will explore one innovative feature of TM-Town that will help you quickly and easily get your previous translations into a form where you can benefit from their value. This feature is TM-Town's free alignment tool.
After interviewing countless professional translators one of the most pressing issues discussed was your inability to access previous work. Sometimes your documents would be in PDF form, other times in Word. Many of you said there were tons of Word documents and PDFs scattered throughout your hard drives, but trying to remember which document had that phrase you need for this current translation...well, that can be time consuming to track down. Frustration sets in because you know that these are valuable assets that you could be using to help you in future or present translation jobs.
Translation Memory and Computer Assisted Translation (CAT) tools were created to take care of this problem. These tools perform many functions and one is to allow you to leverage your previous translations so if a similar or identical sentence comes up again, you don’t have to waste your time translating the same thing again. The more your translation memory (TM) grows, the more linguistic assets you have to leverage and the better chance of getting a match. This means that over time your previous work will help you to work more and more efficiently.
Since your files are scattered throughout your hard drive as Word files, PDFs, text files, etc., you are probably too busy and don’t have time to figure out how to get these documents into a TM file format so that you can utilize a CAT tool. Many translators’ previous translations are not in TM file formats (such as TMX or XLIFF).
Alignment software segments a source document and its translation (target document) and then matches the corresponding segments together into translation units. It then creates industry standard TM files such as TMX or XLIFF.
All you do is upload the source document and target document and the alignment tool will automatically create a new TM file for you.
Sometimes it helps to see things visually. Let's go through an example to better understand the alignment process.
Imagine that you are a Spanish to English to translator. Your client gave you a document that needs to be translated. This is called the source document. To keep this example simple, let's pretend this is the document you were given to translate:
Bienvenido a Miami. Miami es una ciudad estadounidense ubicada en la parte sureste de Florida alrededor del río Miami, entre los Everglades y el océano Atlántico.
You work hard to translate the document into English. The translated document is called thetarget document.
Welcome to Miami. Miami is a US city located in the southeastern part of Florida around the Miami River, between the Everglades and the Atlantic Ocean.
The first step in the alignment process is to segment each document (both the source and target). Segmentation is the process of breaking the text into segments (typically a segment is roughly equivalent to a sentence).
Segment #1: Bienvenido a Miami.
Segment #2: Miami es una ciudad estadounidense ubicada en la parte sureste de Florida alrededor del río Miami, entre los Everglades y el océano Atlántico.
Segment #1: Welcome to Miami.
Segment #2: Miami is a US city located in the southeastern part of Florida around the Miami River, between the Everglades and the Atlantic Ocean.
After each document has been segmented, the next step in the alignment process is to match each segment from the source document to its corresponding segment in the target document. In this example it is easy, segment #1 from the source document matches segment #1 from the target document and segment #2 from the source document matches segment #2 from the target document.
After all of the segments have been successfully matched, the final step is to create a Translation Memory file. A Translation Memory file stores the aligned document in a special format so that when the file is read it it is obvious which segments match together.
TM-Town offers an alignment tool that is not only free but is far superior to most other alignment software. There are some open source alignment tools on the web, but none of them are particularly user friendly to say the least. There is no need for an IT certificate to use TM-Town’s alignment tool. TM-Town’s alignment tool is so simple; you upload the source document, upload the target document and TM-Town’s system does the rest - creating a new aligned file for you that you can download in many different formats (.tmx, .xliff, .xls, .csv).
If you have never used an alignment tool, try out TM-Town and see how easy it is. With TM-Town your previous work is just a click away which will help save you time and effort. TM-Town has some other fantastic free features which I will get into in upcoming posts. In the meantime, please send us your feedback or comment below. I enjoy getting to know the community and it helps me to improve TM-Town.
Kevin Dias
TM-Town Developer
Michael J.W. Beijer United Kingdom |
Posted over 9 years ago. Hi Kevin, I was wondering what alignment engine you are using in the background? For example, are you using Hunalign, or some other open source engine? |
Kevin Dias Japan |
Posted over 9 years ago. Hi Michael, Thanks for your question. The current alignment engine is based on the [Gale-Church](https://en.wikipedia.org/wiki/Gale%E2%80%93Church_alignment_algorithm) algorithm, so somewhat similar to Hunalign. Actually I have developed my own alignment method which I talk about at the end of this [presentation](http://www.slideshare.net/diasks2/exploring-natural-language-processing-in-ruby). It is based on 3 main heuristics: 1. Machine translate A -> B and B -> A 2. Relative sentence length 3. The segment's order/position in the document In my tests it is much more accurate. The reason I don't use it for TM-Town is that it would require me sending data to a 3rd party (such as Microsoft Translate) to get the machine translation results which I can't/won't do (as any documents translators uploaded to TM-Town are strictly private). I plan to open source it at some point, just haven't gotten around to it, too many other things to focus on with TM-Town. Regardless of the alignment method though, I have found that the #1 reason for misalignment is poor segmentation. Therefore I have spent a lot of time on TM-Town's segmentation engine (which is [open source](https://github.com/diasks2/pragmatic_segmenter)). If you are interested in alignment (or segmentation) I would check out TM-Town's [Natural Language Processing](https://www.tm-town.com/natural-language-processing) page, it has links to research papers in those areas. |
Hans van den Broek Indonesia |
Posted about 9 years ago. Excellent! I tried two well-known other alignment tools and one CAT tool, and they failed where TM-T produced a decent result (had to delete only 4 segments, or 2%). |
Kevin Dias Japan |
Posted about 9 years ago. Thanks for the comment Hans. Glad to hear you had success with the alignment tool. |
Maria Pia Montoro Luxembourg |
Posted almost 6 years ago. Hey, where is the allignment tool? |
Learn Quran Center United States |
Posted over 5 years ago. Good for All |
If you would like to leave a comment please sign in to your TM-Town account. If you are not a TM-Town member you can easily register for a free account.