Categories
JavaScript Tutorials

The RegExp Object

Regular expressions provide a powerful way to search and manipulate text. If you’re familiar with SQL, you can think of regular expressions as being somewhat similar to SQL: you use SQL to find and update data inside a database, and you use regular expressions to find and update data inside a piece of text. Different languages have different implementations (think “dialects”) of the regular expressions syntax. JavaScript uses the Perl 5 syntax. Instead of saying “regular expression”, people often shorten it to “regex” or “regexp”. A regular expression consists of:

  • A pattern you use to match text
  • Zero or more modifiers (also called flags) that provide more instructions on how the pattern should be applied

The pattern can be as simple as literal text to be matched verbatim, but that is rare and in such cases you’re better off using indexOf(). Most of the times, the pattern is more complex and could be difficult to understand. Mastering regular expressions patterns is a large topic, which won’t be discussed in details here; instead, you’ll see what JavaScript provides in terms of syntax, objects and methods in order to support the use of regular expressions. JavaScript provides the RegExp() constructor which allows you to create regular expression objects.

var re = new RegExp("j.*t");

There is also the more convenient regexp literal:

var re = /j.*t/;

In the example above, j.*t is the regular expression pattern. It means, “Match any string that starts with j, ends with t and has zero or more characters in between”. The asterisk * means “zero or more of the preceding”; the dot (.) means “any character”. The pattern needs to be placed in quotation marks when used in a RegExp() constructor.

Properties of the RegExp Objects

The regular expression objects have the following properties:

  • global: If this property is false, which is the default, the search stops when the first match is found. Set this to true if you want all matches.
  • ignoreCase: Case sensitive match or not, defaults to false.
  • multiline: Search matches that may span over more than one line, defaults to false.
  • lastIndex: The position at which to start the search, defaults to 0.
  • source: Contains the regexp pattern.

None of these properties, except for lastIndex, can be changed once the object has created.

The first three parameters represent the regex modifiers. If you create a regex object using the constructor, you can pass any combination of the following characters as a second parameter:

  • “g” for global
  • “i” for ignoreCase
  • “m” for multiline

These letters can be in any order. If a letter is passed, the corresponding modifier is set to true. In the following example, all modifiers are set to true:

var re = new RegExp('j.*t', 'gmi');

Let’s verify:

re.global;
//true

Once set, the modifier cannot be changed:

re.global = false;
re.global
//true

To set any modifiers using the regex literal, you add them after the closing slash.

var re = /j.*t/ig;
re.global
//true

Methods of the RegExp Objects

The regex objects provide two methods you can use to find matches: test() and exec(). They both accept a string parameter. test() returns a boolean (true when there’s a match, false otherwise), while exec() returns an array of matched strings. Obviously exec() is doing more work, so use test() unless you really need to do something with the matches. People often use regular expressions for validation purposes, in this case test() would probably be enough.

//No match, because of the capital J:
var re = /j.*t/;
re.test("Javascript");
//false

//Case insensitive test gives a positive result:var re = /j.*t/i;
re.test("Javascript");
//true

The same test using exec() returns an array and you can access the first element as shown below:

var re = /j.*t/i;
re.exec("Javascript")[0]
//"Javascript"

String Methods that Accept Regular Expressions as Parameters

Previously in this chapter we talked about the String object and how you can use the methods indexOf() and lastIndexOf() to search within text. Using these methods you can only specify literal string patterns to search. A more powerful solution would be to use regular expressions to find text. String objects offer you this ability.

The string objects provide the following methods that accept regular expression objects as parameters:

  • match() returns an array of matches
  • search() returns the position of the first match
  • replace() allows you to substitute matched text with another string
  • split() also accepts a regexp when splitting a string into array elements

search() and match()

Let’s see some examples of using the methods search() and match(). First, you create a string object.

var s = new String('HelloJavaScriptWorld');

Using match() you get an array containing only the first match:

s.match(/a/);
//["a"]

Using the g modifier, you perform a global search, so the result array contains two elements:

s.match(/a/g);
/*['a', 'a']*/

Case insensitive match:

s.match(/j.*a/i);
//["Java"]

The search() method gives you the position of the matching string:

s.search(/j.*a/i);
//5

replace()

replace() allows you to replace the matched text with some other string. The following example removes all capital letters (it replaces them with blank strings):

s.replace(/[A-Z]/g, '');
//"elloavacriptorld"

If you omit the g modifier, you’re only going to replace the first match:

s.replace(/[A-Z]/, '');
//"elloJavaScriptWorld"