Regular expressions, commonly known as regexes, are a
pattern-matching standard for text processing, and are a powerful tool
when dealing with strings. With regular expressions, an expression serves
as a pattern to compare with the text being searched. You can use regular
expressions to search for patterns in a string, replace text, and extract
substrings from the original string.
1. Introduction to Regular Expressions
In its simplest form, you can use a regular expression to match a
literal string; for example, the regular expression “string” will match
the string “this is a string”. Each
character in the expression will match itself, unless it is one of the
special characters +, ?, .,
*, ^, $,
(, ), [,
{, |, or \.
The special meaning of these characters can be escaped by
prepending a backslash character, \.
We can also tie our expression to the start of a string (^string) or the end of
a string (string$). For the string
“this is a string”, ^string will not
match the string, while string$ will.
We can also use quantified patterns. Here, * matches zero or more
times, ? matches zero or
one time, and + matches
one or more times. So, the regular expression
“23*4” would match “1245”,
“12345”, and “123345”, but the expression “23?4” would match
“1245” and also “12345”. Finally, the expression “23+4” would
match “12345” and “123345” but not “1245”.
Unless told otherwise, regular expressions are always greedy; they will normally match the longest string
possible.
While a backslash escapes the meaning of the special characters in
an expression, it turns most alphanumeric characters into special
characters. Many special characters are available; however, the main
ones are:
\d
Matches a numeric character
\D
Matches a nonnumeric character
\s
Matches a whitespace character
\S
Matches a nonwhitespace character
\w
Matches an alphanumeric (or the underscore)
character
\W
Matches the inverse of \w
All of these special character expressions can be modified by the
quantifier modifiers.
1.1. RegexKitLite
Unfortunately, there is no built-in support for regular
expressions in Objective-C, or as part of the Cocoa Touch framework.
However, the RegexKitLite library adds regular
expression support to the base NSString class. See http://regexkit.sourceforge.net/RegexKitLite/.
Warning:
RegexKitLite uses the regular expression
engine provided by the ICU library. Apple does not officially
support linking directly to the libicucore.dylib library. Despite this, many iPhone applications are available
on the App Store that use this library, and it is unlikely that
Apple will reject your application during the App Store review
process for making use of it. However, if you’re worried about using
the ICU library, there are alternatives, such as the
libregex wrapper GTMRegex provided as part of the Google
Toolbox for Mac.
To add RegexKitLite to your own project,
download the RegexKitLite-<X.X>.tar.bz2
compressed tarball (X.X will be the current
version, such as 3.3), and uncompress and double-click it to extract
it. Open the directory and drag and drop the two files, RegexKitLite.h and
RegexKitLite.m, into your project. Remember to
select the “Copy items into destination group’s folder” checkbox
before adding the files.
We’re not done yet; we still need to add the
libicucore.dylib library to our project.
Double-click on the project icon in the Groups & Files pane in
Xcode and go to the Build tab of the Project Info window. In the
Linking subsection of the tab, double-click on the Other Linker Flags
field and add -licucore to the
flags using the pop-up window.
You’ll want to use regular expressions to perform three main
tasks: matching strings, replacing strings, and extracting strings.
RegexKitLite allows you to do all of these, but
remember that when you want to use it, you need to import the
RegexKitLite.h file into your class.
Note:
Regular expressions use the backslash (\) character to
escape characters that have special meaning inside the regular
expression. However, since the backslash character is the C escape
character, these in turn have to escape any uses of this character
inside your regular expression by prepending it with another
backslash character. For example, to match a literal ampersand
(&) character, you must first
prepend it with a backslash to escape it for the regular expression
engine, and then prepend it with another backslash to escape this in
turn for the compiler—that is, \\&. To match a single literal
backslash (\) character with a
regular expression therefore requires four backslashes: \\\\.
The RegexKitLite library operates by
extending the NSString class via an
Objective-C category extension mechanism, making it very easy to use.
If you want to match a string, you simply operate directly on the
string you want to match. You can create a view-based project and add
the following code into the applicationDidFinishLaunching: method. Just
be sure to add #import
"RegexKitLite.h" to the top of the app delegate’s
.m (implementation) file.
NSString *string = @"This is a string";
NSString *match = [string stringByMatching:@"a string$" capture:0];
NSLog(@"%@", match);
If the match fails, the match
variable will be set to nil, and if
you want to replace a string, it’s almost as easy:
NSString *string2 = @"This is a string";
NSString *regexString = @"a string$";
NSString *replacementString = @"another string";
NSString *newString = nil;
newString = [string2
stringByReplacingOccurrencesOfRegex:regexString
withString:replacementString];
NSLog(@"%@", newString);
If you run the application, you’ll just get a gray window.
Return to Xcode and choose Run→Console
to see the output of the NSLog
calls.
This will match “a string” in
the variable string2, replacing it
and creating the string “This is another string” in the variable
newString.
1.2. Faking regex support with the built-in NSPredicate
While Cocoa Touch does not provide “real” regular expression
support, Core Data does provide the NSPredicate
class that allows you to carry out some operations that
would normally be done via regular expressions in other languages. For
those familiar with SQL, the NSPredicate class operates in a very similar
manner to the SQL WHERE
statement.
Let’s assume we have an NSArray of NSDictionary objects, structured like
this:
NSArray *arrayOfDictionaries = [NSArray arrayWithObjects:
[NSDictionary dictionaryWithObjectsAndKeys:
@"Learning iPhone Programming", @"title", @"2010", @"year", nil],
[NSDictionary dictionaryWithObjectsAndKeys:
@"Arduino Orbital Lasers", @"title", @"2012", @"year", nil],
nil];
We can test whether a given object in the array matches the
criteria foo = "bar" AND baz =
"qux" as follows:
NSPredicate *predicate =
[NSPredicate predicateWithFormat:@"year = '2012'"];
for (NSDictionary *dictionary in arrayOfDictionaries) {
BOOL match = [predicate evaluateWithObject:dictionary];
if (match) {
NSLog(@"Found a match!");
}
}
Alternatively, we can extract all entries in the array that
match the predicate:
NSPredicate *predicate2 =
[NSPredicate predicateWithFormat:@"year = '2012'"];
NSArray *matches =
[arrayOfDictionaries filteredArrayUsingPredicate:predicate2];
for (NSDictionary *dictionary in matches) {
NSLog(@"%@", [dictionary objectForKey: @"title"]);
}
However, we can also use predicates to test strings against
regular expressions. For instance, the following code will test the
email string against the regex we provided, returning YES if it is a valid email address:
NSString *email = @"[email protected]";
NSString *regex = @"^\\b[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,4}\\b$";
NSPredicate *predicate3 =
[NSPredicate predicateWithFormat:@"SELF MATCHES %@", regex];
BOOL match = [predicate3 evaluateWithObject:email];
if (match) {
NSLog(@"Found a match!");
}