Ajax software
Free javascripts
↑
Main Page
attempts to match an optional lowercase
s
— in other words, to match zero or one occurrence of lower-
case
s
. Because there is no occurrence of lowercase
s
, again, there is a match. Finally, an attempt is made
to match an optional apostrophe. Because there is no occurrence of an apostrophe, another match is
found. Because a match exists for all the components of the regular expression pattern, there is a match
for the whole regular expression pattern
colour?r’?s?’?
.
Now, how does the pattern
colou?r’?s?’?
match the word
colour
? Assume that the regular expression
engine is at the position immediately before the first letter of
colour
. It first attempts to match lowercase
c
,
because one lowercase
c
must be matched. That matches. Next, attempts are made to match a subse-
quent lowercase
o
,
l
, and another
o
. These also match. Then an attempt is made to match an optional
lowercase
u
. In other words, zero or one occurrences of the lowercase character
u
are needed. Because
there is one occurrence of lowercase
u
, there is a match. Next, an attempt is made to match lowercase
r
.
The lowercase
r
in
colour
matches. Next, the engine attempts to match an optional apostrophe. Because
there is no occurrence of an apostrophe, there is a match. Next, the regular expression engine attempts
to match an optional lowercase
s
— in other words, to match zero or one occurrences of lowercase
s
.
Because there is no occurrence of lowercase
s
, a match exists. Finally, an attempt is made to match an
optional apostrophe. Because there is no occurrence of an apostrophe, there is a match. All the compo-
nents of the regular expression pattern have a match; therefore, the entire regular expression pattern
colour?r’?s?’?
matches.
Work through the other six word forms shown earlier, and you’ll find that each of the word forms does,
in fact, match the regular expression pattern.
The pattern
colou?r’?s?’?
matches all eight of the word forms that were listed earlier, but will the
pattern match the following sequence of characters?
colour’s’
Can you see that it does match? Can you see why it matches the pattern? If each of the three optional
characters in the regular expression is present, the preceding sequence of characters matches. That rather
odd sequence of characters likely won’t exist in your sample document, so the possibility of false matches
(reduced specificity) won’t be an issue for you.
How can you avoid the problem caused by such odd sequences of characters as
colour’s’
? You want
to be able to express it something like this:
Match a lowercase
c
. If a match is present, attempt to match a lowercase
o
. If that match is
present, attempt to match a lowercase
l
. If there is a match, attempt to match a lowercase
o
.
If a match exists, attempt to match an optional lowercase
u
. If there is a match, attempt to
match a lowercase
r
. If there is a match, attempt to match an optional apostrophe. And if
a match exists here, attempt to match an optional lowercase
s
. If the earlier optional apostrophe
was not present, attempt to match an optional apostrophe.
With the techniques that you have seen so far, you aren’t able to express ideas such as “match
something only if it is not preceded by something else.” That sort of approach might help achieve
higher specificity at the expense of increased complexity.
331
Appendix A: Simple Regular Expressions
bapp01.qxd:bapp01 10:47 331
Ajax software
Free javascripts
→