Now that we’ve covered how to form regular expressions, it is time to look at how to use them. We do so by discussing the properties and methods of the RegExp and String objects that can be used to test and parse strings. Recall that regular expressions created with the literal syntax in the previous section are in fact RegExp objects. In this section, we favor the object syntax so the reader will be familiar with both.
The simplest RegExp method, which we have already seen in this chapter numerous times, is test(). This method returns a Boolean value indicating whether the given string argument matches the regular expression. Here we construct a regular expression and then use it to test against two strings:
var pattern = new RegExp("a*bbbc", "i"); // case-insensitive matchingalert(pattern.test("1a12c")); //displays falsealert(pattern.test("aaabBbcded")); //displays true
The RegExp object provides an easy way to extract pieces of a string that match parts of your patterns. This is accomplished by grouping (placing parentheses around) the portions of the pattern you wish to extract. For example, suppose you wished to extract first names and phone numbers from strings that look like this,
Firstname Lastname NNN-NNNN
where N’s are the digits of a phone number.
You could use the following regular expression, grouping the part that is intended to match the first name as well as the part intended to match the phone number:
var pattern = /(\w+) \w+ ([\d-]{8})/;
This pattern is read as one or more word characters, followed by a space and another sequence of one or more word characters, followed by another space and then followed by an eight-character string composed of digits and dashes.
When this pattern is applied to a string, the parentheses induce subexpressions. When a match is successful, these parenthesized subexpressions can be referred to individually by using static properties $1 to $9 of the RegExp class object. To continue our example:
var customer = "Alan Turing 555-1212";var pattern = /(\w+) \w+ ([\d-]{8})/;pattern.test(customer);
Since the pattern contained parentheses that created two subexpressions, \w+ and [\d-]{8}, we can reference the two substrings they match, “Alan” and “555-1212,” individually. Substrings accessed in this manner are numbered from left to right, beginning with $1 and ending typically with $9. For example,
var customer = "Alan Turing 555-1212";var pattern = /(\w+) \w+ ([\d-]{8})/;if (pattern.test(customer)) alert("RegExp.$1 = " + RegExp.$1 + "\nRegExp.$2 = " + RegExp.$2);
displays the alert shown here:
Notice the use of the RegExp class object to access the subexpression components, not the RegExp instance or pattern we created.
Note |
According to the ECMA specification, you should be able to reference more than nine subexpressions. In fact, up to 99 should be allowed using identifiers like $10, $11, and so on. At the time of this book's writing, however, common browsers support no more than nine. |
A rather infrequently used method is compile(), which replaces an existing regular expression with a new one. This method takes the same arguments as the RegExp() constructor (a string containing the pattern and an optional string containing the flags) and can be used to create a new expression by discarding an old one:
var pattern = new RegExp("http:.* ","i");// do something with your regexppattern.compile("https:.* ", "i"); // replaced the regexp in pattern with new pattern
Another use of this function is for efficiency. Regular expressions declared with the RegExp constructor are “compiled” (turned into string matching routines by the interpreter) each time they are used, and this can be a time-consuming process, particularly if the pattern is complicated. Explicitly calling compile() saves the recompilation overhead at each use by compiling a regexp once, ahead of time.
The RegExp object also provides a method called exec(). This method is used when you’d like to test whether a given string matches a pattern and would additionally like more information about the match, for example, the offset in the string at which the pattern first appears. You can also repeatedly apply this method to a string in order to step through the portions of the string that match, one by one.
The exec() method accepts a string to match against, and it can be written shorthand by directly invoking the name of the regexp as a function. For example, the two invocations in the following example are equivalent:
var pattern = /http:.*/;pattern.exec("http://www.w3c.org/");pattern("http://www.w3c.org/");
The exec() method returns an array with a variety of properties. Included are the length of the array; input, which shows the original input string; index, which holds the character index at which the matching portion of the string begins; and lastIndex, which points to the character after the match, which is also where the next search will begin. The script here illustrates the exec() method and its returned values:
var pattern = /cat/;var result = pattern.exec("He is a big cat, a fat black cat named Rufus.");
document.writeln("result = "+result+"<<br />>"); document.writeln("result.length = "+result.length+"<<br />>"); document.writeln("result.index = "+result.index+"<<br />>"); document.writeln("result.lastIndex = "+result.lastIndex+"<<br />>"); document.writeln("result.input = "+result.input+"<<br />>");
The result of this example is shown here:
The array returned may have more than one element if subexpressions are used. For example, the following script has a set of three parenthesized subexpressions that are parsed out in the array separately:
var pattern = /(cat) (and) (dog) /; var result = pattern.exec("My cat and dog are black."); document.writeln("result = "+result); document.writeln("result.length = "+result.length); document.writeln("result.index = "+result.index); document.writeln("result.lastIndex = "+result.lastIndex); document.writeln("result.input = "+result.input);
As you can see from the result,
the exec() method places the entire matched string in element zero of the array and any substrings that match parenthesized subexpressions in subsequent elements.
Sometimes you might wish to extract not just the first occurrence of a pattern in a string, but each occurrence of it. Adding the global flag (g) to a regular expression indicates the intent to search for every occurrence (i.e., globally) instead of just the first.
The way the global flag is interpreted by RegExp and by String is a bit subtle. In RegExp, it’s used to perform a global search incrementally, that is, by parsing out each successive occurrence of the pattern one at a time. In String, it’s used to perform a global search all at once, that is, by parsing out all occurrences of the pattern in one single function call. We’ll cover using the global flag with String methods in the following section.
To demonstrate the difference between a regexp with the global flag set and one without, consider the following simple example:
var lucky = "The lucky numbers are 3, 14, and 27"; var pattern = /\d+/; document.writeln("Without global we get:"); document.writeln(pattern.exec(lucky)); document.writeln(pattern.exec(lucky)); document.writeln(pattern.exec(lucky)); pattern = /\d+/g; document.writeln("With global we get:"); document.writeln(pattern.exec(lucky)); document.writeln(pattern.exec(lucky)); document.writeln(pattern.exec(lucky));
As you can see in Figure 8-2, when the global flag is set, the exec() starts searching where the previous match ended. Without the global flag, exec() always returns the first matching portion of the string.
How does global matching work? Recall that exec() sets the lastIndex property of both the array returned and the RegExp class object to point to the character immediately following the substring that was most recently matched. Subsequent calls to the exec() method begin their search from the offset lastIndex in the string. If no match is found, lastIndex is set to zero.
A common use of exec() is to loop through each substring matching a regular expression, obtaining complete information about each match. This use is illustrated in the following example, which matches words in the given string. The result (when used within a <<pre>> tag) is shown in Figure 8-3. Notice how lastIndex is set appropriately, as we discussed.
var sentence = "A very interesting sentence."; var pattern = /\b\w+\b/g; // recognizes words; global var token = pattern.exec(sentence); // get the first match while (token != null) { // if we have a match, print information about it document.writeln("Matched " + token[0] + " "); document.writeln("\ttoken.input = " + token.input); document.writeln("\ttoken.index = " + token.index); document.writeln("\ttoken.lastIndex = " + token.lastIndex + "\n "); token = pattern.exec(sentence); // get the next match }
One caveat when using the exec() method: If you stop a search before finding the last match, you need to manually set the lastIndex property of the regular expression to zero. If you do not, the next time you use that regexp, it will automatically start matching at offset lastIndex rather than at the beginning of the string.
Note |
The test() method obeys lastIndex as well, so it can be used to incrementally search a string in the same manner as exec(). Think of test() as a simplified, Boolean version of exec(). |
Examining the internals of regular expression instance objects as well as the static (class) properties of the RegExp object can be helpful when performing complex matching tasks and during debugging. The instance properties of RegExp objects are listed in Table 8-6 and, with a few exceptions, should be familiar to the reader by this point.
Property |
Value |
Example |
---|---|---|
Boolean indicating whether the global flag (g) was set. This property is ReadOnly. |
var pattern = /(cat) (dog)/g; |
|
ignoreCase |
Boolean indicating whether the case-insensitive flag (i) was set. This property is ReadOnly. |
var pattern = /(cat) (dog)/g; |
lastIndex |
Integer specifying the position in the string at which to start the next match. You may set |
var pattern = /(cat) (dog)/g; |
multiline |
Boolean indicating whether the multiline |
var pattern = /(cat) (dog)/g; |
source |
The string form of the regular expression. This property is ReadOnly. |
var pattern = /(cat) (dog)/g; |
The RegExp class object also has static properties that can be very useful. These properties are listed in Table 8-7 and come in two forms. The alternate form uses a dollar sign and a special character and may be recognized by those who are already intimately familiar with regexps. A downside to the alternate form is that it has to be accessed in an associative array fashion. Note that using this form will probably confuse those readers unfamiliar with languages like Perl, so it is definitely best to just stay away from it.
Property |
Alternate Form |
Value |
Example |
---|---|---|---|
>$1, $2, …, $9 |
>None |
>Strings holding |
>var pattern = /(cat) (dog)/g; |
>None |
>Holds the string |
>var pattern = /(cat) (dog)/g; |
|
>input |
>$_ |
>String containing the default string to match against the pattern. |
>var pattern = /(cat) (dog)/g; |
>lastIndex |
>None |
>Integer specifying the position in the string at which to start the next match. Same as the instance property, which should be used instead. |
>var pattern = /(cat) (dog)/g; |
>lastMatch | >$& |
>String containing |
>var pattern = /(cat) (dog)/g; |
>lastParen |
>$+ |
>String containing |
>var pattern = /(cat) (dog)/g; |
>leftContext |
>$` |
>String containing the text to the left of the most recent match. |
>var pattern = /(cat) (dog)/g; |
>rightContext |
>$ ' |
>String containing the text to the right of the most recent match. |
>var pattern = /(cat) (dog)/g; |
One interesting aspect of the static RegExp class properties is that they are global and therefore change every time you use a regular expression, whether with String or RegExp methods. For this reason, they are the exception to the rule that JavaScript is statically scoped. These properties are dynamically scoped—that is, changes are reflected in the RegExp object in the context of the calling function, rather than in the enclosing context of the source code that is invoked. For example, JavaScript in a frame that calls a function using regular expressions in a different frame will update the static RegExp properties in the calling frame, not the frame in which the called function is found. This rarely poses a problem, but it is something you should keep in mind if you are relying upon static properties in a framed environment.