re: Regular expressions

This module provides operations for matching and manipulating strings using regular expressions. Regular expressions may be represented as strings or instances of the RegExp class.

Examples:

if Match("foo|bar", "bar   ") != nil -- Succeeds
  WriteLn("Match!")
end

var r = RegExp("a*b", IgnoreCase)
var m = Search(r, "... AaaB ...")
m.group(0)                           -- "AaaB"

Note: Case insensitive matching is possible only by using the RegExp class.

Functions

Match(regexp, str[, pos])
Test if a regular expression matches at the start of the string. If the regular expression matches a (potentially empty) prefix of the string, return a match object describing the match. Otherwise, return nil. If the pos argument is provided, start the matching at the specified string index instead.
Search(regexp, str[, start])
Search a string for a match of a regular expression. Return a match object describing the leftmost match or nil if no match could be found. If the start parameter is provided, start the matching at the specified string index instead of the string start.
Subst(str, regexp, new)
Substitute all non-overlapping occurrences of a regular expression in a string with replacement values described by the new parameter.

If new is a string, it acts as a template for the replacement string. \0 in the new string is replaced with the string matched by the regular expression and \n, where n is a positive integer, is replaced with the string matched by the group n of the regular expression. A backslash not followed by a digit is replaced with the character following the backslash. Finally, the rest of the string is included literally in the replacement string. Example:

Subst("foo fox", "fo+", "<\0>")   -- Result: "<foo> <fo>x"

If new is a callable object, call the object with the corresponding match object as the argument for each occurrence of the regular expression in the string. The object should return the replacement string when called. Example:

Subst("cat sits on a table", "cat|table", sub (m)
                                            return m.group(0).upper()
                                          end)
  -- Result: "CAT sits on a TABLE"
Split(str, regexp)
Split the string into fields at each non-overlapping occurrence of the regular expression. Return an array containing the fields. Example:
Split("cat;  dog;horse", "; *")   -- Result: ["cat", "dog", "horse"]

RegExp class

class RegExp(regexp, ...)
Construct a regular expression object. The first parameter must be a regular expression string. Optionally, the IgnoreCase constant can be given as an additional parameter to enable case insensitive matching.

Constants

IgnoreCase
Flag for case insensitive matching.

Match result objects

Match result objects have the following methods:

group(n)
Return the substring matched by a specific group. The group 0 is the substring matched by the entire regular expression. Return nil if the group exists, but it is within a part of the regular expression that was not matched.
span(n)
Return a Range object representing the (non-negative) start and end indices of the substring matched by a specific group. The group 0 refers to the substring matched by the entire regular expressions. Return nil if the group exists in the regular expression, but it is within a part of the regular expression that was not matched. If a group matched an empty string, the span start index is equal to the stop index.

Exceptions

class RegExpError
Raised when one of the operations in this module is passed an invalid regular expression string. Inherits from std::ValueError.

Regular expression syntax overview

Any character that does not have any other significance is a regular expression that matches itself, i.e. "x" matches the letter "x" and so on. Additionally, regular expressions can be constructed by following the rules below (a and b may refer to any regular expression):

. Match any single character.
^ Anchor match at the beginning of a string.
$ Anchor match at the end of a string.
ab Match a followed by b.
a* Match a repeated 0+ times.
a+ Match a repeated 1+ times.
a? Optionally match a.
a|b Match a or b.
a{n} Match a exactly n times.
a{n,} Match a at least n times.
a{n,m} Match a at least n and at most m times.
(a) Match a. Use parentheses to group expressions. Each regular expression within parentheses is a group. Groups within a regular expression are numbered so that the leftmost parenthesized expression (the one where the index of the "(" character is smallest) is the group 1, the next is the group 2, etc.
[...] Match any character inside the brackets. If the brackets contain character ranges of the form x-y, each such range matches any character in the range from x to y. Backslash sequences have the same behavior as described below, unless otherwise noted.
[^...] Match the inverse of the corresponding [...] expression.

Finally, combining backslash ("\") and another character or characters forms special regular expressions. If the backslash sequence is not special, a backslash followed by any character matches the following character. These are the special backslash sequences:

\n Match the string matched by a parenthesized group (back reference) or an octal character code if n is an integer.
\xnn   Match hexadecimal character code. nn must be an integer in hexadecimal.
\< Match the empty string at the beginning of a word.
\> Match the empty string at the end of a word.
\a Match ASCII bell.
\b Match ASCII backspace.
\d Match any decimal digit [0-9].
\D Match any character except decimal digit [^0-9].
\f Match ASCII form feed.
\n Match ASCII linefeed.
\r Match ASCII carriage return.
\s Match any whitespace character.
\S Match any non-whitespace character.
\t Match ASCII horizontal tab.
\v Match ASCII vertical tab.
\w Match any alphanumeric character or the underscore ("_").
\W Match any non-alphanumeric and non-underscore character.

See also: Regular expression matching details contains additional information about regular expression matching.