past-present-translation-memory-technology

The Past and Present of Translation Memory Technology

The idea of a translation memory (TM) had been considered as early as the 1970’s and developed further in the 1980’s. However, it wasn’t until the 1990’s that the breakthrough in translation memories came about for SDL, with Translator’s Workbench for Windows. This was the first truly widely used TM engine, first in 16-bit from 1995, then in 32-bit from 1998 (previous generations were for too small communities – DOS-based, although DOS had some early success in early 90s).

Why did we have this breakthrough?

Machine translation at this time was evolving but the quality was considered too poor. Windows PCs were also becoming mainstream in organizations and private homes, so both freelance translators, as well as translators in organizations, started adopting more technology to help them cope with the rise of digital content. Additionally, having solutions dedicated to audiences with specific needs was seen as a big positive – e.g. Freelance Edition.

You could say that translation memories are both the heart and the brain of the CAT tool. However, this technology was initially received with some scepticism. Fast forward to the present day and it’s hard to imagine life without translation memory and since the 1990’s SDL has maintained its translation memory development, always looking to improve its use for our customers.

The evolution of the translation memory

When SDL acquired Trados in 2005, the translation memory was redesigned from the ground up for SDL Trados Studio and GroupShare. One of our key goals was to plug the gaps customers had reported with the Workbench TM engine over the years. These included concordance search in target language, introducing the concept of context and structure match, having a fully XML-standards based engine etc.

Extending translation memory capabilities

SDL’s translation memories are extremely versatile and have evolved over the years to offer more productivity-focused features. Let’s use AutoSuggest Dictionaries as an example.

These are created from your translation memory content and provide you with phrases or fragments via AutoSuggest during the translation process itself. We then have Concordance Search which searches for words or chunks of text inside a translation memory that are not appearing as matches from a termbase or other sources.

AutoSuggest Dictionaries and Concordance Search should be fairly well known for everyday CAT tool users but as SDL TMs have evolved there are some more intricate features that are also very useful, which you might not be aware of.

As well as supporting segment based, SDL translation memories support paragraph-based segmentation which can be useful when translating from or into Asian languages where the sequence of the thought process can be different from Western languages, and so often it is better to translate paragraphs rather than segments. Interestingly, paragraph-based segmentation could have a bit of a comeback with Neural Machine Translation (NMT), as it could ensure that translators see the entire context of a paragraph rather than translating segment by segment.

We also have the ability to provide context in a TM through the use of Document Structure, which is unique to SDL. What this means is we don’t just differentiate Context Matches, we can also use structural context that is in the document (index marker, heading, list item etc.). Often, it can be necessary to translate segments differently depending on their structural context. For instance, an index entry will be written in lower case in English, whereas the same segment would need upper case in a heading.

Flexibility

The flexibility of SDL translation memories can be really seen in our industry unique App Store. SDL Trados Studio itself allows various ways for you to manage and maintain your TMs but you can benefit from more advanced ways of working with various apps. For example, you can get source text, target text, source and target text, and all represented in different file formats with apps such as the ones listed below:

  1. SDLXliff2Tmx
  2. SDL TmReverse Langs
  3. SDL TmExport
  4. SDL TmConvert

With the increasing focus on data and data protection, we can even offer the ability to anonymize data in your TM, with the SDL Data Protection Suite app available to download from the SDL AppStore.

Scalability

At SDL we have always been passionate about what we call “scale up and down." This means that for us it’s key to have a translation memory that not only scales up to hundreds of users at the same time – it’s equally important to have a solution that scales down to the individual user working locally on a PC that might not even be connected to the Internet – And any scenario in between.

In all cases, the experience and performance must be as good as possible. For this to happen, it needs a design approach where you need different storage mechanisms and ways of working in the software. We refer to this as ‘file-based’ way of working in a local desktop environment and ‘server-based’ where several users share the same resource at the same time.

SDL file-based TM’s are ideal and very efficient for individual users or very small teams up to a maximum of three, from there on and for optimal efficiency a server-based product is available.

SDL Server-based TM’S can serve hundreds of users (Studio and GroupShare) and ensure more consistent translations by providing controlled, time-limited access to centralized translation memories. By being able to share assets in real-time during translation, it increases the rates of content reuse which is not possible in a desktop only environment.

By offering both file-based translation memories with extended productivity functionality and server-based sharing, TM collaboration is grounded in the different customer interactions that support the freelance translator as well as LSPs and Corporations that deal with large volumes of translation projects in every increasing turnaround times.

The emergence of upLIFT translation memory technology

After many years of continuous evolution of the TM, the launch of SDL Trados Studio 2017 marked a real milestone for SDL with the introduction of upLIFT technology turning the ‘workhorse’ of a CAT tool into something even more intelligent.

Earlier in this blog, we talked about AutoSuggest Dictionaries and Concordance Search as great productivity extensions of the TM, however one downside was the manual interaction to set them up and work with them, this all changed with upLIFT technology or ‘Fragment Recall’.

The underlying technology of Fragment Recall is a process called fine-grained alignment. Since a TM contains pairs of aligned segments – that is, translation memory units (TUs) – operations at segment level are straightforward, such as fuzzy matching a segment and retrieving the stored translation proposal. Operations below segment level are more challenging, such as matching just part of a TU segment (e.g. a phrase or term within a sentence) and retrieving the corresponding part of the translation. This all changed in Studio 2017 as Fragment Recall made it possible to see these Whole TU fragments automatically without the user having to do anything.

Since that launch in 2016, Fragment Recall has been refined and improved. You can now see through icon tips where the fragment match originated from and you also have the ability to reject fuzzy matches that have automatically been repaired by Studio as part of the Fuzzy Match Repair functionality.

And the improvements have not stopped there. A new feature called LookAhead introduced with Service Release 1 of Studio 2017 provides faster access to translation memory (TM) search results by retrieving TM results in the background. When you move to a segment that you are translating, Studio performs a look-up on the following two segments while you’re working on the active segment. The benefit? Almost instantaneous results every time you change segments, as the search results (if any) will already have been “retrieved" for you.

Making it easier to add new content

Of course, managing and working with your translation memories is important but additionally getting the content into them is just as important.

Whether you are new to CAT tools or not, translation alignment is an efficient way to create translation assets straight away by making use of existing content to create translation memories. In Service Release 1 for SDL Trados Studio 2019, we have made the process of aligning content much more versatile and easier to use by adding new alignment selection and connection capabilities, as well as advanced split and search functionalities.

Improving translation memory functionality even further

We have improved the accuracy of both context and fuzzy matches to provide more matches than ever before. Not only have we improved the way context matches are calculated to achieve higher accuracy matches but we have enhanced stemming for western languages providing better fuzzy matching.

In addition, we made improvements in recognizing half/full-width characters for Japanese language, typical in DTP situations, which we believe is a real step forward for this market.

This latest Service Release shows that refining TM has not stopped and it is still possible to enhance it even further. It’s great to see how big innovations – such as AutoSuggest and upLIFT Fragment Recall and Fuzzy Match Repair – have been joined by smaller developments – such as improved stemming/fuzzy matching in SDL Trados Studio 2019, making it possible to dramatically increase leverage from TMs.

As you can see, the translation memory has come a long way over the years. Translation memories continue to be developed with new innovations and features, making it easier than ever before to use and manage your TM.

This year Trados celebrates its 35th birthday and it has got us thinking about what the future holds for the TM. In the second part of this blog, Daniel Brockmann, Product Director for SDL Trados, and Kevin Flanagan, creator of upLIFT translation memory technology, discuss the role TMs will play in the future which include topics around AI, NMT and the cloud.