SDL Language Cloud Custom MT Engines are offered through SDL Language Cloud Translation Toolkit. At the core of SDL’s machine translation technology is SDL XMT.
The SDL XMT baseline engines are the starting point for the Custom MT Engines training capability in SDL Language Cloud. This enables you to customize and tailor your own language pairs to suit a particular project, client or industry vertical. All language pairs are supplied in a standard state which have been created using SDL XMT training technology. XMT’s modular approach allows the training process to be optimized for each language. Using the SDL Language Cloud training capability, you can utilize relevant content from your client or your own translation assets and train the MT engine to return results that require less post-editing.
The training of custom MT engines is composed of four phases:
- Offline data collection: Parallel data must be collected from any available sources (typically Translation Memories) and evaluated to make sure it is suitable for the training project (i.e. it is in the same domain, such as Travel, and of the same content type, such as holiday package brochures).
There is no need to prepare or clean the content, as this will automatically be done using settings configured based on years of SDL’s machine translation experience, to yield optimal results for each engine. At this stage, a decision needs to be made regarding whether some parallel data should be withheld for use as test data (used to tune the engine during training). If none is withheld, the MT training capability will do it automatically.
- Engine training: Once the data has been collected for training, in the ‘Custom Engines’ tab select ‘Train a new engine’. Simply upload one or more TMX files and follow the easy to use wizard to get started. Upload evaluation data and example data (UTF-8 encoded) and start the training. Training requires a lot of computational power – depending on the current load on the computational grid the training may be queued and run later. The training run can take many hours – status updates are shown in the ‘Custom Engines’ user interface and the user is sent an email when the training is complete.
- Engine evaluation: The engine will automatically be evaluated using the uploaded evaluation data, or by selecting 1000 random lines from the training data if no evaluation data is uploaded. This is used to calculate a BLEU score, which is a measure of similarity between machine and human translated sentences. To test a trained engine, it must be deployed in your SDL Language Cloud Translation Toolkit account.
Once it is deployed, the engine can be accessed via SDL Language Cloud Translation Toolkit and through the API (and therefore through tools like SDL Trados Studio and Microsoft Office). The results of engine training can be packaged into a .zip file that contains in CSV, XLIFF or TMX format the following contents: source text, translations carried out with the trained engine, translations carried out with the baseline. This .zip file can then be sent to translators for evaluation.
- Engine activation: To start using the trained engine for production work, it needs to be “activated”. This can be done with a couple of clicks in the user interface.
Custom MT Engines are used through SDL Language Cloud Translation Toolkit. The engines are no different to the ones created by SDL. They can be used in the same ways, such as via the SDL Language Cloud Translation Toolkit API, within SDL Language Cloud online, from SDL Trados Studio and Microsoft Office etc.
Note that these engines cannot be used in the SDL Language Weaver Enterprise Translation Server or SDL BeGlobal.
With machine translation in SDL Language Cloud you can rest assured that your content is safe. SDL guarantees that none of your data is saved or used. For more information, please refer to the SDL Language Cloud Terms & Conditions.
When you train an MT engine, the data you upload to train the engine is stored on our secure servers in San Jose, USA. We use industry-standard best practices for encryption protocols to protect your data as it passes between the users and the translation engines. All of your data, even while training the translation engine, will be secure and private. SDL will never use this data to enhance or train its own MT engines. SDL only uses data available in the public domain and the data used is never reproduced in its original form.