Regular expressions, also known as regex, are powerful tools for finding specific patterns within strings. They allow you to extract data for further processing and are commonly used for validation and parsing tasks. In the world of programming, mastering regular expressions is a valuable skill that can greatly enhance your coding abilities. In this tutorial, we will delve into the world of Ruby regular expressions and learn how to build advanced patterns to match, capture, and replace various elements such as dates, phone numbers, emails, URLs, and more.
Character Classes
A character class in a regular expression allows you to define a range or a list of characters to match. For example, the character class [aeiou] would match any vowel in a string. Let’s take a look at some common character classes and their usage:
Basic Character Classes
Character Class | Description |
---|---|
. | Matches any single character except for newline |
\w | Matches any word character (letters, numbers, underscore) |
\d | Matches any digit |
\s | Matches any whitespace character (space, tab, newline) |
\W | Matches any non-word character |
\D | Matches any non-digit character |
\S | Matches any non-whitespace character |
For example, the regex /h.t/ would match “hat”, “hot”, and “hit” but not “hut” because the dot (.) matches any single character.
Custom Character Classes
You can also create your own custom character classes by enclosing a list of characters inside square brackets. For example, the regex /[aeiou]/ would match any vowel, while /[a-z]/ would match any lowercase letter. You can also use ranges to specify a range of characters, such as /[0-9]/ to match any digit or /[a-z]/ to match any lowercase letter.
Ranges
Ranges in regular expressions allow you to specify a range of characters, numbers, or symbols to match. For example, the regex /[0-9]/ would match any digit from 0 to 9. You can also use ranges with custom character classes, such as /[a-z]/ to match any lowercase letter.
Negated Ranges
You can also create negated ranges by using the caret symbol (^) inside a character class. This will match any character that is not in the specified range. For example, the regex /[^aeiou]/ would match any character that is not a vowel.
Modifiers
Modifiers in regular expressions allow you to specify how many times a pattern should be matched. They are denoted by special characters and can be used to make your regex more flexible.
Modifier | Description |
---|---|
* | Matches zero or more occurrences of the preceding character |
+ | Matches one or more occurrences of the preceding character |
? | Matches zero or one occurrence of the preceding character |
Matches exactly n occurrences of the preceding character | |
Matches at least n occurrences of the preceding character | |
Matches between n and m occurrences of the preceding character |
For example, the regex /a*/ would match “a”, “aa”, “aaa”, and so on. The regex /a+/ would match “a”, “aa”, “aaa”, but not an empty string. The regex /a?/ would match an empty string or “a”. The regex /a/ would match “aaa” but not “aa” or “aaaa”.
Exact String Matching
Sometimes, you may want to match a specific string exactly. In this case, you can use the \A and \z anchors. The \A anchor matches the beginning of a string, while the \z anchor matches the end of a string. For example, the regex /\Ahello\z/ would match only the string “hello” and not “hello world”.
Capture Groups
Capture groups in regular expressions allow you to extract specific parts of a matched pattern. They are denoted by parentheses and can be referenced later in your code. Let’s take a look at an example:
string = "My email is [email protected]"
regex = /(\w+)@(\w+)\.com/
match_data = string.match(regex)
puts match_data[1]
# john
puts match_data[2]
# example
In this example, we have two capture groups – one for the username and one for the domain name. We can access these captured values using the index of the match data array.
Look Ahead & Look Behind
Lookahead and look behind assertions in regular expressions allow you to specify conditions that must be met before or after a pattern is matched. They are denoted by special characters and can help you build more complex patterns. Let’s take a look at some examples:
Assertion | Description |
---|---|
?= | Positive look ahead assertion – matches if the pattern is followed by the specified condition |
?! | Negative look ahead assertion – matches if the pattern is not followed by the specified condition |
?<= | Positive look behind assertion – matches if the pattern is preceded by the specified condition |
?<! | Negative look behind assertion – matches if the pattern is not preceded by the specified condition |
For example, the regex /\d+(?= dollars)/ would match any number followed by the word “dollars”, such as “10 dollars” or “100 dollars”. The regex /\d+(?! dollars)/ would match any number not followed by the word “dollars”, such as “10 cents” or “100 euros”.
Ruby’s Regex Class
In Ruby, regular expressions are represented by the Regexp class. This class provides various methods for working with regular expressions, such as matching, replacing, and splitting strings. Let’s take a look at some of the most commonly used methods:
Method | Description |
---|---|
match | Matches a string against a regular expression and returns a MatchData object if successful |
=~ | Matches a string against a regular expression and returns the index of the first occurrence if successful or nil otherwise |
scan | Scans a string for matches against a regular expression and returns an array of all matches |
sub | Replaces the first occurrence of a pattern in a string with a specified replacement |
gsub | Replaces all occurrences of a pattern in a string with a specified replacement |
Let’s see these methods in action:
string = "I have 3 cats and 2 dogs"
regex = /\d+/
match_data = string.match(regex)
puts match_data[0]
# 3
puts string =~ regex
# 7
puts string.scan(regex)
# ["3", "2"]
puts string.sub(regex, "5")
# I have 5 cats and 2 dogs
puts string.gsub(regex, "1")
# I have 1 cats and 1 dogs
Regex Options
Ruby also provides options that can be added to your regular expressions to make them more powerful. These options are denoted by special characters and can change the behavior of your regex. Here are some commonly used options:
Option | Description |
---|---|
i | Case-insensitive matching |
m | Multiline mode – allows ^ and $ anchors to match at the beginning and end of each line |
x | Extended mode – allows you to add comments and whitespace to your regex for better readability |
For example, the regex /hello/i would match “hello”, “Hello”, and “HELLO”. The regex /^hello$/m would match “hello” at the beginning of a line and “hello” at the end of a line. The regex /hello/x would match “hello world” and “hello there” but not “helloworld”.
Formatting Long Regular Expressions
As you can see, regular expressions can quickly become long and complex. To make them more readable, Ruby provides the x option which allows you to add comments and whitespace to your regex. Let’s take a look at an example:
regex = /
\d+
# Matches one or more digits
\s
# Matches a whitespace character
[a-z]+
# Matches one or more lowercase letters
/x
This makes it easier to understand what each part of the regex is doing.
Ruby regex: Putting It All Together
Now that we have covered the basics of Ruby regular expressions, let’s put everything together and build a more advanced pattern. We will create a regex that matches email addresses with the following format: [email protected].
regex = /\A[a-z0-9]+@[a-z]+\.[a-z] - Matches two or three lowercase letters\z - Matches the end of a string
This regex would match email addresses such as “[email protected]” or “[email protected]” but not “john@com” or “jane@domain”.
Conclusion
Congratulations, you have now mastered the basics of Ruby regular expressions! You have learned about character classes, ranges, modifiers, exact string matching, capture groups, lookahead and look behind assertions, the Regex class, options, and formatting long regular expressions. With this knowledge, you can now confidently use regular expressions in your Ruby programs to validate and parse data. Regular expressions may seem daunting at first, but with practice, you will become more comfortable using them and will be able to tackle more complex patterns. Keep exploring and experimenting with regular expressions to become a regex master!