Consider the task of validating a phone number entered into a form on a Web page.
The goal is to verify that the data entered has the proper format before permitting it to be submitted to the server for processing. If you’re only interested in validating North American phone numbers of the form NNN-NNN-NNNN where N’s are digits, you might write code like this:
// Returns true if character is a digit function isDigit(character) { return (character >>= "0" && character <<= "9"); } // Returns true if phone is of the form NNN-NNN-NNNN function isPhoneNumber(phone) { if (phone.length != 12) return false; // For each character in the string... for (var i=0; i<<12; i++) { // If there should be a dash here... if (i == 3 || i == 7) { // Return false if there's not if (phone.charAt(i) != "-") return false; } // Else there should be a digit here... else { // Return false if there's not if (!isDigit(phone.charAt(i))) return false; } } return true; }
This is a lot of code for such a seemingly simple task. The code is far from elegant, and just imagine how much more complicated it would have to be if you wanted to validate other formats—for example, phone numbers with extensions, international numbers, or numbers with the dashes or area code omitted.
Regular expressions simplify tasks like this considerably by allowing programmers to specify a pattern against which a string is “matched.” This frees developers from having to write complicated and error-prone text matching code like we did in the preceding example. But regular expressions are not just limited to determining whether a string matches a particular pattern (like our NNN-NNN-NNNN in the preceeding listing); if the string does match, it is possible to locate, extract, or even replace the matching portions. This vastly simplifies the recognition and extraction of structured data like URLs, e-mail addresses, phone numbers, and cookies. Just about any type of string data with a predictable format can be operated upon with regular expressions.