Custom Machine Translation Engines


Frequently Asked Questions

What are SDL Language Cloud Custom Machine Translation Engines?

SDL Language Cloud Custom MT Engines are offered through SDL Language Cloud Translation Toolkit. At the core of SDL’s machine translation technology is SDL XMT.

The SDL XMT baseline engines are the starting point for the Custom MT Engines training capability in SDL Language Cloud. This enables you to customize and tailor your own language pairs to suit a particular project, client or industry vertical. All language pairs are supplied in a standard state which have been created using SDL XMT training technology. XMT’s modular approach allows the training process to be optimized for each language. Using the SDL Language Cloud training capability, you can utilize relevant content from your client or your own translation assets and train the MT engine to return results that require less post-editing.

What is SDL XMT?

Most MT engines use a one-size-fits-all approach for all supported language pairs. Over the years, SDL has learned that there are many different ways to handle language quirks and different algorithms work well for different language pairs. We have taken that experience and developed SDL XMT, a modular and flexible technology, which allows us to handle all those challenges that language poses. It moves beyond the monolithic phrase-based design for machine translation and is a statistical-based system with algorithms designed in a way so that they can be mixed and matched based on the source and target language to give the best possible quality output when compared to legacy systems. SDL XMT also enables machine translation to be applied in new domains, such as social media, with modules that can deal with that type of content.

How do I train a Custom MT Engine?

It’s easy to get started by simply uploading your TMX files to our powerful training environment in the ‘Custom Engines’ tab of the Machine Translation area of your SDL Language account. Years of SDL MT experience mean that your files will automatically be cleaned and prepared to optimize the engine training for the best results. You will need a minimum of 90,000 source words to train an engine. Multiple TMX files can be uploaded to train an engine. Each TMX file is limited to 250 Mbytes in size, however .zip files can be uploaded to combine multiple TMXs in one upload.

What is the training workflow?

The training of custom MT engines is composed of four phases:

  1. Offline data collection: Parallel data must be collected from any available sources (typically Translation Memories) and evaluated to make sure it is suitable for the training project (i.e. it is in the same domain, such as Travel, and of the same content type, such as holiday package brochures).

    There is no need to prepare or clean the content, as this will automatically be done using settings configured based on years of SDL’s machine translation experience, to yield optimal results for each engine. At this stage, a decision needs to be made regarding whether some parallel data should be withheld for use as test data (used to tune the engine during training). If none is withheld, the MT training capability will do it automatically.

  2. Engine training: Once the data has been collected for training, in the ‘Custom Engines’ tab select ‘Train a new engine’. Simply upload one or more TMX files and follow the easy to use wizard to get started. Upload evaluation data and example data (UTF-8 encoded) and start the training. Training requires a lot of computational power – depending on the current load on the computational grid the training may be queued and run later. The training run can take many hours – status updates are shown in the ‘Custom Engines’ user interface and the user is sent an email when the training is complete.

  3. Engine evaluation: The engine will automatically be evaluated using the uploaded evaluation data, or by selecting 1000 random lines from the training data if no evaluation data is uploaded. This is used to calculate a BLEU score, which is a measure of similarity between machine and human translated sentences. To test a trained engine, it must be deployed in your SDL Language Cloud Translation Toolkit account.

    Once it is deployed, the engine can be accessed via SDL Language Cloud Translation Toolkit and through the API (and therefore through tools like SDL Trados Studio and Microsoft Office). The results of engine training can be packaged into a .zip file that contains in CSV, XLIFF or TMX format the following contents: source text, translations carried out with the trained engine, translations carried out with the baseline. This .zip file can then be sent to translators for evaluation.

  4. Engine activation: To start using the trained engine for production work, it needs to be “activated”. This can be done with a couple of clicks in the user interface.

Which languages can be trained?

Training a Custom MT Engine can be done using any supported combination of languages. Note that if there is no corresponding SDL baseline engine then the training will be done from scratch rather than incrementally building on the existing baseline. Find a full list of the baseline engines here.

How long does it take to train an engine?

Once you have uploaded your TMX file it is sent via a server API to our MT team. The time taken to train an engine firstly depends on the queuing time, which depends on how many training requests have been submitted. Secondly it depends on the language pair and the size of the training data uploaded. It takes around 4 hours to train an engine using a 200MB TMX upload which contains approximately 2.5 million words. Please note that training an engine requires intensive CPU usage and can take up to 24 hours. You will be notified by email when you have been allocated a slot and the training has started, as well as when the training is complete. If any errors occur during the engine training process you will also be notified by email.

How and where can trained engines be used?

Custom MT Engines are used through SDL Language Cloud Translation Toolkit. The engines are no different to the ones created by SDL. They can be used in the same ways, such as via the SDL Language Cloud Translation Toolkit API, within SDL Language Cloud online, from SDL Trados Studio and Microsoft Office etc.

Note that these engines cannot be used in the SDL Language Weaver Enterprise Translation Server or SDL BeGlobal.

Can I share my self-trained engines?

You can share your trained engines with another user by sharing your API key. Alternatively, sharing permission can be granted by SDL. You can also download your evaluation data and the trained engine files (for example to free up space to train new engines). These can then be deployed by SDL at a later date when you wish to use that trained engine again.

Can SDL train an engine for me?

Yes, you can either train your own engine by uploading your TMX data, or the SDL iMT team can train it on your behalf and deploy it to you SDL Language Cloud account. These will be visible in the ‘SDL trained’ column on the ‘Custom Engines’ tab in your account.

Are the SDL Language Cloud Translation Toolkit APIs available?

How secure is SDL Language Cloud?

With machine translation in SDL Language Cloud you can rest assured that your content is safe. SDL guarantees that none of your data is saved or used. For more information, please refer to the SDL Language Cloud Terms & Conditions.

When you train an MT engine, the data you upload to train the engine is stored on our secure servers in San Jose, USA. We use industry-standard best practices for encryption protocols to protect your data as it passes between the users and the translation engines. All of your data, even while training the translation engine, will be secure and private. SDL will never use this data to enhance or train its own MT engines. SDL only uses data available in the public domain and the data used is never reproduced in its original form.

For more information, please read the Safe Harbour Privacy Policy and the Hosted Products Privacy Policy (includes Language Cloud specifics) here.

Where can I find help?

You can access help from your SDL Language Cloud Translation Toolkit account if you encounter any problems. Click on the question mark icon on the top right hand side of the screen and select ‘Help & Support’.