Ruby Regex Expressions: A Comprehensive Guide

Regular expressions, also known as regex, are powerful tools for finding specific patterns within strings. They allow you to extract data for further processing and are commonly used for validation and parsing tasks. In the world of programming, mastering regular expressions is a valuable skill that can greatly enhance your coding abilities. In this tutorial, we will delve into the world of Ruby regular expressions and learn how to build advanced patterns to match, capture, and replace various elements such as dates, phone numbers, emails, URLs, and more.

Character Classes

A character class in a regular expression allows you to define a range or a list of characters to match. For example, the character class [aeiou] would match any vowel in a string. Let’s take a look at some common character classes and their usage:

Basic Character Classes

Character Class	Description
.	Matches any single character except for newline
\w	Matches any word character (letters, numbers, underscore)
\d	Matches any digit
\s	Matches any whitespace character (space, tab, newline)
\W	Matches any non-word character
\D	Matches any non-digit character
\S	Matches any non-whitespace character

For example, the regex /h.t/ would match “hat”, “hot”, and “hit” but not “hut” because the dot (.) matches any single character.

Custom Character Classes

You can also create your own custom character classes by enclosing a list of characters inside square brackets. For example, the regex /[aeiou]/ would match any vowel, while /[a-z]/ would match any lowercase letter. You can also use ranges to specify a range of characters, such as /[0-9]/ to match any digit or /[a-z]/ to match any lowercase letter.

Ranges

Ranges in regular expressions allow you to specify a range of characters, numbers, or symbols to match. For example, the regex /[0-9]/ would match any digit from 0 to 9. You can also use ranges with custom character classes, such as /[a-z]/ to match any lowercase letter.

Negated Ranges

You can also create negated ranges by using the caret symbol (^) inside a character class. This will match any character that is not in the specified range. For example, the regex /[^aeiou]/ would match any character that is not a vowel.

Modifiers

Modifiers in regular expressions allow you to specify how many times a pattern should be matched. They are denoted by special characters and can be used to make your regex more flexible.

Modifier	Description
*	Matches zero or more occurrences of the preceding character
+	Matches one or more occurrences of the preceding character
?	Matches zero or one occurrence of the preceding character
	Matches exactly n occurrences of the preceding character
	Matches at least n occurrences of the preceding character
	Matches between n and m occurrences of the preceding character

For example, the regex /a*/ would match “a”, “aa”, “aaa”, and so on. The regex /a+/ would match “a”, “aa”, “aaa”, but not an empty string. The regex /a?/ would match an empty string or “a”. The regex /a/ would match “aaa” but not “aa” or “aaaa”.

A girl in a headset sits in front of a computer screen with a code

Exact String Matching

Sometimes, you may want to match a specific string exactly. In this case, you can use the \A and \z anchors. The \A anchor matches the beginning of a string, while the \z anchor matches the end of a string. For example, the regex /\Ahello\z/ would match only the string “hello” and not “hello world”.

Capture Groups

Capture groups in regular expressions allow you to extract specific parts of a matched pattern. They are denoted by parentheses and can be referenced later in your code. Let’s take a look at an example:

string = "My email is [email protected]"

regex = /(\w+)@(\w+)\.com/

match_data = string.match(regex)

puts match_data[1] 

#  john

puts match_data[2] 

#  example

In this example, we have two capture groups – one for the username and one for the domain name. We can access these captured values using the index of the match data array.

Look Ahead & Look Behind

Lookahead and look behind assertions in regular expressions allow you to specify conditions that must be met before or after a pattern is matched. They are denoted by special characters and can help you build more complex patterns. Let’s take a look at some examples:

Assertion	Description
?=	Positive look ahead assertion – matches if the pattern is followed by the specified condition
?!	Negative look ahead assertion – matches if the pattern is not followed by the specified condition
?<=	Positive look behind assertion – matches if the pattern is preceded by the specified condition
?<!	Negative look behind assertion – matches if the pattern is not preceded by the specified condition

For example, the regex /\d+(?= dollars)/ would match any number followed by the word “dollars”, such as “10 dollars” or “100 dollars”. The regex /\d+(?! dollars)/ would match any number not followed by the word “dollars”, such as “10 cents” or “100 euros”.

Ruby’s Regex Class

In Ruby, regular expressions are represented by the Regexp class. This class provides various methods for working with regular expressions, such as matching, replacing, and splitting strings. Let’s take a look at some of the most commonly used methods:

Method	Description
match	Matches a string against a regular expression and returns a MatchData object if successful
=~	Matches a string against a regular expression and returns the index of the first occurrence if successful or nil otherwise
scan	Scans a string for matches against a regular expression and returns an array of all matches
sub	Replaces the first occurrence of a pattern in a string with a specified replacement
gsub	Replaces all occurrences of a pattern in a string with a specified replacement

Let’s see these methods in action:

string = "I have 3 cats and 2 dogs"

regex = /\d+/

match_data = string.match(regex)

puts match_data[0] 

#  3

puts string =~ regex 

#  7

puts string.scan(regex) 

#  ["3", "2"]

puts string.sub(regex, "5") 

#  I have 5 cats and 2 dogs

puts string.gsub(regex, "1") 

#  I have 1 cats and 1 dogs

Regex Options

Ruby also provides options that can be added to your regular expressions to make them more powerful. These options are denoted by special characters and can change the behavior of your regex. Here are some commonly used options:

Option	Description
i	Case-insensitive matching
m	Multiline mode – allows ^ and $ anchors to match at the beginning and end of each line
x	Extended mode – allows you to add comments and whitespace to your regex for better readability

For example, the regex /hello/i would match “hello”, “Hello”, and “HELLO”. The regex /^hello$/m would match “hello” at the beginning of a line and “hello” at the end of a line. The regex /hello/x would match “hello world” and “hello there” but not “helloworld”.

Human hands on a computer keyboard, program code in the foreground

Formatting Long Regular Expressions

As you can see, regular expressions can quickly become long and complex. To make them more readable, Ruby provides the x option which allows you to add comments and whitespace to your regex. Let’s take a look at an example:

regex = /

  \d+ 

#  Matches one or more digits

  \s  

#  Matches a whitespace character

  [a-z]+ 

#  Matches one or more lowercase letters

/x

This makes it easier to understand what each part of the regex is doing.

Ruby regex: Putting It All Together

Now that we have covered the basics of Ruby regular expressions, let’s put everything together and build a more advanced pattern. We will create a regex that matches email addresses with the following format: [email protected].

regex = /\A[a-z0-9]+@[a-z]+\.[a-z] - Matches two or three lowercase letters\z - Matches the end of a string

This regex would match email addresses such as “[email protected]” or “[email protected]” but not “john@com” or “jane@domain”.

Conclusion

Congratulations, you have now mastered the basics of Ruby regular expressions! You have learned about character classes, ranges, modifiers, exact string matching, capture groups, lookahead and look behind assertions, the Regex class, options, and formatting long regular expressions. With this knowledge, you can now confidently use regular expressions in your Ruby programs to validate and parse data. Regular expressions may seem daunting at first, but with practice, you will become more comfortable using them and will be able to tackle more complex patterns. Keep exploring and experimenting with regular expressions to become a regex master!