|
|
Unicode characters in text parsing rules |
SolutionSometimes text files can contain special control or Unicode characters that you need to handle in your Text parsing rule to exclude them from being parsed into Catalyst. Example of the 'DATA LINK ESCAPE' (DLE) and 'LEFT-POINTING DOUBLE ANGLE QUOTATION MARK' special characters in a text file:
To add Unicode characters to the parsing rule in Catalyst it is best to use the hex value for that character as using the raw character can cause conflicts with the Regular Expression. In the above example the UTF-8 hex value for DLE is 0x10; therefore if you add \x10 to your parsing rule, the parsing engine will be able to recognise the character:
To find out the hex value you need to check in a Unicode table like this one https://www.fileformat.info/info/unicode/index.htm Back to the exampIe, the UTF-8 hex value for the 'LEFT-POINTING DOUBLE ANGLE QUOTATION MARK' character is 0xab. Adding it to the Regular Expression in the parsing rules will look like this:
Related topicsVideo: Creating ezParse rules for text based files in Catalyst
Products or Versions Affected
Last updated with Catalyst 11 SP1 |