Unicode characters in text parsing rules

Solution

Sometimes text files can contain special control or Unicode characters that you need to handle in your Text parsing rule to exclude them from being parsed into Catalyst.

Example of the 'DATA LINK ESCAPE' (DLE) and 'LEFT-POINTING DOUBLE ANGLE QUOTATION MARK' special characters in a text file:

To add Unicode characters to the parsing rule in Catalyst it is best to use the hex value for that character as using the raw character can cause conflicts with the Regular Expression.

In the above example the UTF-8 hex value for DLE is 0x10; therefore if you add \x10 to your parsing rule, the parsing engine will be able to recognise the character:

To find out the hex value you need to check in a Unicode table like this one https://www.fileformat.info/info/unicode/index.htm

Back to the exampIe, the UTF-8 hex value for the 'LEFT-POINTING DOUBLE ANGLE QUOTATION MARK' character is 0xab. Adding it to the Regular Expression in the parsing rules will look like this:

Products or Versions Affected

Alchemy CATALYST 11.0 and greater

Last updated with Catalyst 11 SP1

Unicode characters in text parsing rules

Solution

Related topics

Products or Versions Affected