Programmer Reference Guide |
This section covers the syntax of regular expressions. It is designed as a programme reference guide for localization or development engineers that wish to use advanced regular expressions in their text parsers.
All characters are literals except: ".", "|", "*", "?", "+", "(", ")", "{", "}", "[", "]", "^", "$" and "\". These characters are literals when preceded by a "\". A literal is a character that matches itself.
|
|||||||||||||||||||||||||||||||
The dot character "." matches any single character except newline and NULL.
|
|||||||||||||||||||||||||||||||
A repeat is an expression that is repeated an arbitrary number of times. There are various types of repeats that can be used within the body of a regular expression.
All repeat expressions refer to the shortest possible previous sub-expression: a single character; a character set, or a sub-expression grouped with "()" for example.
Non-greedy repeatsWhenever the "extended" regular expression An expression that can be created using wildcard operators (such as ? * [] ). syntax is in use (the default) then non-greedy repeats are possible by appending a '?' after the repeat; a non-greedy repeat is one which will match the shortest possible string. For example to match html Hyper Text Markup Language : Format used for representing internet pages in WEB browsers. tag pairs one could use something like:
|
|||||||||||||||||||||||||||||||
The bounds operator "{}" is used when it is necessary to specify the minimum and maximum number of repeats.
|
|||||||||||||||||||||||||||||||
Parentheses serve two purposes, to group items together into a sub-expression, and to mark what generated the match.
|
|||||||||||||||||||||||||||||||
Alternatives occur when the expression can match either one sub-expression or another, each alternative is separated by a "|".
|
|||||||||||||||||||||||||||||||
A set is a set of characters that can match any single character that is a member of the set. Sets are delimited by "[" and "]" and can contain literals, character ranges, character classes, collating elements and equivalence classes. Set declarations that start with "^" contain the compliment of the elements that follow. Examples: Character literals
Examples : Character ranges
|
|||||||||||||||||||||||||||||||
Character classes are denoted using the syntax "[: classname:]" within a set declaration, for example "[[:space:]]" is the set of all whitespace characters. Character classes are only available if the flag regbase:: char_classes is set. The available character classes are:
|
|||||||||||||||||||||||||||||||
An anchor is something that matches the null string at the start or end of a line: "^" matches the null string at the start of a line, "$" matches the null string at the end of a line.
|
|||||||||||||||||||||||||||||||
Character codes consist of the escape character followed by the digit "0" followed by the octal character code. For example "\023" represents the character whose octal code is 23. Where ambiguity could occur use parentheses to break the expression up: "\0103" represents the character whose code is 103, "(\010)3 represents the character 10 followed by "3". To match characters by their hexadecimal code, use \x followed by a string of hexadecimal digits, optionally enclosed inside {}, for example \xf0 or \x{aff}, notice the latter example is a Unicode character.
|
|||||||||||||||||||||||||||||||
The following operators are provided for compatibility with the GNU regular expression library.
|
|||||||||||||||||||||||||||||||
The escape character "\" has several meanings. The escape operator may introduce an operator for example: back references, or a word operator. The escape operator may make the following character normal, for example "\*" represents a literal "*" rather than the repeat operator.
|
|||||||||||||||||||||||||||||||
Single character escape sequences The following escape sequences are aliases for single characters:
|
Regular Expressions can also be used in QuickFind and Advance Search & Replace features of Alchemy CATALYST. |