Segmentation Rules

Segmentation refers to the rules used to break a paragraph into individual sentences. Segmentation rules can vary from language to language. While a full-stop may be an indicator of a sentence terminator in English and German paragraphs for example, it is not so in Japanese. Alchemy CATALYST has predefined segmentation rules, but these can be replaced and modified in this dialog.

Sentence Based Segmentation

In this mode, Alchemy CATALYST will apply the currently active segmentation rules and detect sentence boundaries. This is the most accurate way of building TMs as translation units will generally consist of complete sentences. This mode also produces more accurate TM Abbreviation: Translation Memory reuse and discovery.

Paragraph Based Segmentation

In this mode, Alchemy CATALYST will ignore sentence boundaries and store complete paragraph objects in its TM. This may be useful when aligning two translations that have significant differences in structure and format.

Alchemy CATALYST will ignore these segmentation rules when working with desktop applications i.e. Win32, Win64 and all .NET platforms. Since these file formats are highly structured and serialized, and text strings tend to be shorter and more abbreviated; it is far more efficient to store each software string as a single segment in the TM.

Handling Segmentation Exceptions: Abbreviated words or anagrams may occasionally cause the segmentation engine to misinterpret a sentence boundary. This can be avoided by specifying any sequence of characters that are to be ignored when applying segmentation rules.