Sunday, 10 October 2010

Regular Expression Cheat-Sheet (javascript examples)

Question:
What are the most common regular expression rules and operators?


Answer:

You must escape all meta characters: . $ ^ { [ ( | ) ] } * + ? \
This will match a point: '\.' and this any character: '.'
Any other character will match itself.

Anchors

Char Sequence Description
^ Matches start of string or line
$ Matches end of string or line
\A Matches start of string
\Z Matches end of string
\b Matches word boundary; boundary between a \w and a \W character.
\B Matches Not word boundary; match must not occur on a \b boundary.

Character Classes etc

Char Sequence Description
. Matches any single character (except new line)
[aoxz] Matches any character between the brackets
[^aoxz] Matches any character not between the brackets
[0-9] Matches any digit from 0 to 9; hyphen (-) allows for contiguous character ranges.
[A-Z] Matches any character from uppercase A to uppercase Z
[a-z] Matches any character from lowercase a to lowercase z
[A-z] Matches any character from uppercase A to lowercase z
(one|two) Matches alternatives; has lowest precedence of all operators.
\d Matches any digit
\D Matches any non-digit character
\s Matches any whitespace character
\S Matches any non-whitespace character
\w Matches any word character [a-zA-Z_0-9].
\W Matches any non-word character

Special Escapes

Char Sequence Description
\0 Matches a NUL character
\n Matches a new line character
\f Matches a form feed character
\r Matches a carriage return character
\t Matches a tab character
\v Matches a vertical tab character
\019 Matches the character specified by an octal number 019
\x20 Matches the character specified by a hexadecimal number 20
\u00E0 Matches a specific Unicode code point (here à)

Quantifiers

Char Sequence Description
* = {0,} Matches any string that contains zero or more occurrences of of previous expression, greedy by default
+ = {1,} Matches any string that contains at least one of previous expression, greedy by default
? = {0,1} Matches any string that contains zero or one occurrences of of previous expression
{n} Matches any string that contains a sequence of n previous expression
{n,} Matches any string that contains a sequence of at least n of previous expression, greedy by default
{min,max} Matches any string that contains a sequence of at least min and at most max of previous expression
#? where # is a quantifier: matches as few as possible, it makes the greedy quantifier lazy

Grouping, Backreferences and capturing substrings

Char Sequence Description
(bla) Captures the matched subexpression, adds it ot the backreference array. Can be accessed by number using for example $1 for the first group.
(?:foo)+bar Non-Capturing Group : Matches any occurrence of foo, followed directly by bar. The foo group is not added to the backreference array.
a(?=b) Positive lookahead assertion. Matches a if followed directly by b
a(?!b) Negative lookahead assertion. Matches a if it is not followed directly by b
(?<=a)b Positive lookbehind assertion. Matches b if preceded directly by a (not supported in js)
(?<!a)b Negative lookbehind assertion. Matches b if it is not preceded directly by a (not supported in js)
(?(if)then|else) Conditional regular expressions.
(?# comment)Inline comments

Pattern Modifiers

Char Sequence Description
g global (not only one match)
i case-insensitive
s single line
m multi-line
x comments and white-space allowed


Javascript example using a literal regular expression object :
Test if a given string matches exactly a given pattern (case-insensitive /i/)



Javascript example using the RegExp object :
Replace first 'xx' in string (case-insensitive /i/)



Javascript example using the Pattern object
Replace all occurances of 'xx' in string (case-insensitive and global /gi/)

var str = "aaXXbbXX";
var pat = /XX/gi
var result = str.replace(pat,"__");
document.write(result);

Javascript example using the Pattern and Group reference
Replace all occurances of 'xx' that are preceded by at least one 'a'

var str = "oaaaXXbabXX";
var pat = /(a+)XX/gi
var result = str.replace(pat,"$1__");
document.write(result);

Advanced Regular Expression Site: www.regular-expressions.info

A good Regular Expression Test site: regextester.com/

Another Regular Expression Cheat Sheet (.NET): RegExLib.com/Cheatsheet

No comments: