Python RegEx allows you to create patterns that match particular strings, search for text within longer strings, and extract particular sections of a string based on those patterns.
A RegEx, or Regular Expression, is basically a sequence of characters that forms or leads to a specific search pattern. Python RegEx is used to check whether a string contains the specified search pattern or not.
Example for Python Regex :
Let the search pattern be ^t…e$
This represents a five-letter word that starts with ‘t’ and ends with ‘e’.
Taste –match
Turtle – Not a match
Table – match
Python re module
Python already has a built-in package called re, which is used to work with Regular Expressions.
Syntax
Syntax to python re module :
Importing:
import re
Example:
import re
search_pattern = '^t...e$'
tester_string = 'taste'
result = re.match(search_pattern, tester_string)
if result: print("Search successful.")
else: print("Search unsuccessful.")
Here we have used re.match() function to find out the search_pattern inside the tester_string. This method doesn’t return anything if the search_pattern is not found in the tester_string and if it is present, it returns the matched object.
There are some more functions defined in the re module which are used when we work with regular expressions.
There are some more functions defined in the re module which are used when we work with regular expressions.
The re-module offers a set of functions listed below:
Python re findall
This method returns a list of strings that contain all the matches.
# Program to extract all integers from the string
import re
tester_string = 'hello 12 hi 89. Howdy 34'
search_pattern = '\d+'
result = re.findall(search_pattern, tester_string)
print(result)
Output :
[’12’, ’89’, ’34’]
re.findall() returns an empty list if the search_pattern is not found in the tester_string.
Python split function
This split function python basically splits the string where the match occurs and it returns a list of strings where the splits occurred.
Ex:
import re
tester_string = 'Twelve:12 Eighty nine:89.'
search_pattern = '\d+'
result = re.split(search_pattern, tester_string) print(result)
Output:
[‘Twelve:’, ‘ Eighty nine:’, ‘.’]
re.split() returns a list containing the original string if the pattern is not found.
Python sub function
The syntax for python re.sub() function is:
re.sub(search_pattern, replace, tester_string)
This function returns a string where matched occurrences are replaced with the replaced variable.
Example :
import re
# multiline string
tester_string = 'xyz 24\ de 25 \n f45 6'
# matches all whitespace characters
search_pattern = '\s+'
# empty string
replace = ''
new_string = re.sub(search_pattern, replace, tester_string)
print(new_string)
Output:
xyz24de25f456
The original string is returned if the search_pattern is not found.
Python search function
This re. search() function takes two arguments: search_pattern and tester_string. The method finds the first location where the search_pattern matches with the tester_string. If match is found,it returns a match object else it returns None.
match = re.search(search_pattern, tester_string)
Example :
import re
tester_string = "Python is fun"
# check if 'Python' is at the beginning
match = re.search('\APython', tester_string)
if match:
print("pattern found inside the string")
else:
print("pattern not found")
Output: pattern found inside the string
Here, a match contains a match object.
Match Object
A Match Object is an object containing data related to the result and the search.
If there is a match, then the match object is returned else, none will be returned. Match object has its own properties and methods which can be used to get data about the search and the result:
dir() function is used to access the methods and attributes of the match object. Some commonly used methods and attributes of match objects are explained below
Python match group
The Match.group() method returns the part of the string where there is a match.
Example :
import re
string = '39801 356, 2102 1111'
# 3 digit number then space followed by two-digit number
pattern = '(\d{3}) (\d{2})'
# match variable contains a Match
object. match = re.search(pattern, string)
if match:
print(match.group())
else:
print("pattern not found")
Output: 801 35
Start() and End()
The start() function returns the starting index of the matched substring and end() returns the ending index of the matched substring.
>>> match.start() 2 >>> match.end() 8
Here, the match variable contains a match object.
Python Match Span()
The match.span() function returns a tuple or a pair containing starting and ending index of the matched part of the string.
>>> match.span() (2, 8)
CONCLUSION :
With the help of Python RegEx, we can define these patterns as simple or as complex as we need them to be, we can use them to search, replace, or manipulate text in different ways