Introduction
A regular expression (regex) is a sequence of characters that defines a search pattern which can be used for string processing tasks such as find/replace and input validation. Working with regular expressions in the past using NSRegularExpression has always been challenging and error-prone. Swift 5.7 introduces a new set of APIs allowing developers to write regular expressions that are more robust and easy to understand.
Regex Literals
Regex literals are useful when the regex pattern is static. The Swift compiler can check for any regex pattern syntax errors at compile time. To create a regular expression using regex literal, simply wrap your regex pattern by the slash delimiters /…/
let regex = /My flight is departing from (.+?) \((\w{3}?)\)/
Notice the above regex literal also has captures defined in the regex pattern using the parentheses (…). A capture allows information to be extracted from a match for further processing. After the regex is created, we then call wholeMatch(of:) on the input string to see if there’s a match against the regex. A match from each capture will be appended to the regex output (as tuples) and can be accessed by element index. .0 would return the whole matched string, and .1 and .2 would return matches from the first and second captures, respectively.
let input = "My flight is departing from Los Angeles International Airport (LAX)" if let match = input.wholeMatch(of: regex) { print("Match: \(match.0)") print("Airport Name: \(match.1)") print("Airport Code: \(match.2)") } // Match: My flight is departing from Los Angeles International Airport (LAX) // Airport Name: Los Angeles International Airport // Airport Code: LAX
You can also assign a name to each capture by prefixing ?<capture_name> to the regex pattern, that way you can easily reference the intended match result like the example below:
let regex = /My flight is departing from (?<name>.+?) \((?<code>\w{3}?)\)/ if let match = input.wholeMatch(of: regex) { print("Airport Name: \(match.name)") print("Airport Code: \(match.code)") } // Airport Name: Los Angeles International Airport // Airport Code: LAX
Regex
Along with regex literals, a Regex type can be used to create a regular expression if the regex pattern is dynamically constructed. Search fields in editors is a good example where dynamic regex patterns may be needed. Keep in mind that Regex type will throw a runtime exception if the regex pattern is invalid. You can create a Regex type by passing the regex pattern as a String. Note that an extended string literal #”…”# is used here so that escaping backslashes within the regex is not required.
Regex Builder
Another great tool for creating regular expressions is called regex builder. Regex builder allows developers to use domain-specific language (DSL) to create and compose regular expressions that are well structured. As a result, regex patterns become very easy to read and maintain. If you are already familiar with SwiftUI code, using regex builder will be straightforward.
The following input data represents flight schedules which consists of 4 different fields: Flight date, departure airport code, arrival airport code, and flight status.
let input = """ 9/6/2022 LAX JFK On Time 9/6/2022 YYZ SNA Delayed 9/7/2022 LAX SFO Scheduled """ let fieldSeparator = OneOrMore(.whitespace) let regex = Regex { Capture { One(.date(.numeric, locale: Locale(identifier: "en-US"), timeZone: .gmt)) } fieldSeparator Capture { OneOrMore(.word) } fieldSeparator Capture { OneOrMore(.word) } fieldSeparator Capture { ChoiceOf { "On Time" "Delayed" "Scheduled" } } }
Quantifiers like One and OneOrMore are regex builder components allowing us to specify the number of occurrences needed for a match. Other quantifiers are also available such as Optionally, ZeroOrMore, and Repeat.
To parse the flight date, we could have specified the regex pattern using a regex literal /\d{2}/\d{2}/\d{4}/ for parsing the date string manually. In fact, there’s a better way for this. Luckily, regex builder supports many existing parsers such as DateFormatter, NumberFormatter and more provided by the Foundation framework for developers to reuse. Therefore, we can simply use a DateFormatter for parsing the flight date.
Each field in the input data is separated by 3 whitespace characters. Here we can declare a reusable pattern and assign it to a fieldSeparator variable. Then, the variable can be inserted to the regex builder whenever a field separator is needed.
Parsing the departure/arrival airport code is straightforward. We can use the OneOrMore quantifier and word as the type of character class since these airport codes consist of 3 letters.
Finally, ChoiceOf lets us define a fixed set of possible values for parsing the flight status field.
Once we have a complete regex pattern constructed using regex builder, calling matches(of:) on the input string would return enumerated match results:
for match in input.matches(of: regex) { print("Flight Date: \(match.1)") print("Origin: \(match.2)") print("Destination: \(match.3)") print("Status: \(match.4)") print("========================================") } // Flight Date: 2022-09-06 00:00:00 +0000 // Origin: LAX // Destination: JFK // Status: On Time // ======================================== // Flight Date: 2022-09-06 00:00:00 +0000 // Origin: YYZ // Destination: SNA // Status: Delayed // ======================================== // Flight Date: 2022-09-07 00:00:00 +0000 // Origin: LAX // Destination: SFO // Status: Scheduled // ========================================
Captures can also take an optional transform closure which would allow captured data to be transformed to a custom data structure. We can use the transform closure to convert the captured value (as Substring) from the flight status field into a custom FlightStatus enum making it easier to perform operations like filtering with the transformed type.
enum FlightStatus: String { case onTime = "On Time" case delayed = "Delayed" case scheduled = "Scheduled" } let regex = Regex { ... Capture { ChoiceOf { "On Time" "Delayed" "Scheduled" } } transform: { FlightStatus(rawValue: String($0)) } } // Status: FlightStatus.onTime
Final Thoughts
Developers who want to use these new Swift Regex APIs may question which API they should adopt when converting existing code using NSRegularExpression or when writing new code that requires regular expressions? The answer is, it really depends on your requirements. Each of the Swift Regex APIs has its own unique advantage. Regex literals are good for simple and static regex patterns that can be validated at compile time. Regex type is better suited for regex patterns that are constructed dynamically during runtime. When working with a large input data set requiring more complex regex patterns, regex builder lets developers build regular expressions that are well structured, easy to understand and maintain.