Suggested solutions to these questions can be found in Appendix A.
What problem does the code below solve? var myString = "This sentence has has a fault and and we need to fix it." var myRegExp = /(\b\w+\b) \1/g; myString = myString.replace(myRegExp,"$1"); If we now change our code, so that we create our RegExp object like this:
var myRegExp = new RegExp("(\b\w+\b) \1");
why would this not work, and how could we rectify the problem? |
||
A: | The problem is that the sentence has "has has" and "and and" inside it, clearly a mistake. A lot of word processors have an autocorrect feature that fixes common mistakes like this, and what our regular expression does is mimic this feature. So the erroneous myString "This sentence has has a fault and and we need to fix it." will become "This sentence has a fault and we need to fix it." Let's look at how the code works, starting with the regular expression.
/(\b\w+\b) \1/g;
By using parentheses we have defined a group, so (\b\w+\b) is group 1. This group matches the pattern of a word boundary followed by one or more alphanumeric characters, that is, a–z, A–Z, 0–9, and_, followed by a word boundary. Following the group we have a space then \1. What \1 means is match exactly the same characters as were matched in pattern group 1. So, for example, if group 1 matched "has," then \1 will match "has" as well. It's important to note that \1 will match the exact previous match by group 1. So when group 1 then matches the "and," the \1 now matches "and" and not the "has" that was previously matched. We use the group again in our replace() method; this time the group is specified using the $ symbol, so $1 matches group 1. It's this that causes the two matched "has" and "and" to be replaced by just one. Turning to the second part of the question, how do we need to change the following code so that it works?
var myRegExp = new RegExp("(\b\w+\b) \1");
Easy; now we are using a string passed to the RegExp object's constructor, and we need to use two \ rather than one when we mean a regular expression syntax character, like this:
var myRegExp = new RegExp("(\\b\\w+\\b) \\1","g");
Notice we've also passed a g to the second parameter to make it a global match. |
Imagine you have a website with a message board. Write a regular expression that would remove barred words. (I'll let you make up your own words!) |
||
A: |
<html> <body> <script language=JavaScript> var myRegExp = /(sugar )?candy|choc(olate|oholic)?/gi var myString = "Mmm, I love chocolate, I'm a chocoholic. " + "I love candy too, sweet, sugar candy"; myString = myString.replace(myRegExp,"salad"); alert(myString) </script> </body> </html> Save this as ch08_q3.htm. For our example, we'll pretend we're creating script for a board on a dieting site where text relating to candy is barred and will be replaced with a much healthier option, salad. My barred words are chocolate choc chocoholic sugar candy candy Let's see how I built up the regular expression to remove the offending words. I started with the two basic words, so to match "choc" or "candy," I use
candy|choc
Next I added the matching for "sugar candy." Since the "sugar" bit is optional, we group it by placing it in parentheses and adding the "?" after it. This means match the group zero times or one time.
(sugar )?candy|choc
Finally we need to add the optional "olate" and "oholic" end bits. We add these as a group after the "choc" word and again make the group optional. We can match either of the endings in the group by using the | character.
(sugar )?candy|choc(olate|oholic)?/gi
Finally, we declare it as
var myRegExp = /(sugar )?candy|choc(olate|oholic)?/gi
The gi at the end means the regular expression will find and replace words on a global, case-insensitive basis. So, to sum up
/(sugar )?candy|choc(olate|oholic)?/gi
reads as: Either match zero or one occurrences of "sugar" followed by "candy." Or alternatively match "choc" followed by either one or zero occurrences of "olate" or match "choc" followed by zero or one occurrence of "oholic." Finally, the following:
myString = myString.replace(myRegExp,"salad");
replaces the offending words with "salad" and sets myString to the new clean version: "Mmm, I love salad, I'm a salad. I love salad too, sweet, salad." |