Where can I get more information on writing Patterns?
Introduction
Our mapping products include a powerful pattern-matching engine for mapping comments, attributes and qualifiers. It’s based on “regular expressions”:
Basic Structure
All patterns should start with a carat (^) and end with a dollar sign ($). This tells the system to match the entire string. All characters in the pattern are interpreted as case-insensitive literals. So “from”, for example, is the same as “From”.
Special characters are “escaped” with a backslash () to differentiate them from literal characters. See the table below for a list of these special characters.
Since Qdb entries and vehicle attributes often require parameters, we need a way to select a portion of the string to use for a parameter. This is done with the following construct:
(?P<0>variable portion to pick up)
Character Sets
You can define a group of characters to match a single character by including them in square brackets ([]). For example,
the set of digits is [0123456789]. A short-cut can also be used to include a range of characters with a hyphen (-).
Another way to write a set of digits, then, is [0-9]. An alphanumeric character would be matched with [a-zA-Z0-9].
Repetition
You can define how many of the previous characters (or groups) match by using repetition.
Char Description
{n,m} Match the previous item at least n times but no more than m times
{n,} Match the previous item n or more times
{n} Match exactly n occrences of the previous item
? Match zero or one occurrences of the previous item. Equivalent to {0,1}
+ Match one or more occurrences of the previous item. Equivalent to {1,}
* Match zero or more occurrences of the previous item. Equivalent to {0,}
Grouping
Use parenthesis to create “groups” within your pattern. Groups are not helpful on their own, but are used with Repetition
and Alternation (see above). For example, to match either “Door” or “Dr”, you would write (Door|Dr).
Special Characters
| Char | Description |
|---|---|
. | Match any one character |
\w | Match any one character used in a word. Equivalent to [a-zA-Z0-9_] |
\s | Match any one whitespace character (space or tab) |
\d | Match any one digit. Equivalent to [0-9] |
\. | Match a period |
\( | Match a left parenthesis |
\) | Match a right parenthesis |
| | Alternation. Match either the expression to the left or the right of the vertical bar |
| \| | Match a vertical bar |
\? | Match a question mark |
\+ | Match a plus sign |
\* | Match an asterisk |
Example Patterns
From Dates: (e.g. From 1/1/2000, Fr: 2/89, Fr: 03/05/98)
Fr(\[.:\]|om)?\s(?P<0>\d\{1,2\}(/\d\{1,2\})?/(\d\d|\d\d\d\d))$
To Chassis: (e.g To Chassis # ABC-18920-1, To Ch: 92-289281)
^To\sCh(\[.:\]|assis)\s?#\s?(?P<0>.+)$
Pattern to capture a “size” type like 9 1/2 or 9.9:
^(?P<0>\d+(\[ -\]\d+\[/\]d+)?|\d+\[.\]\d+)$
Debugging Patterns
Here are a couple of great sites for debugging your regular expressions. (Select PCRE from the dropdown).