Python Regex Replace

Complete Python Regex Replace Guide using re.sub()


Regular expressions are powerful for text processing in Python. The re module provides excellent support for complex regex find and replace operations using a dedicated sub() method. This allows modifying strings matching patterns easily. In this article, we’ll explore python regex replace in detail with examples.


Python Regex Replace with re.sub()

In python you can get hold of any regex based properties with the help of re module. You simply need to import the module into your python file and use re.sub() method to perform the job of replacing in a string.

The re.sub() method in Python provides powerful search and replace functionality using regular expressions. The regex replace works by specifying the string to modify, defining a regex pattern to match against, and providing a replacement substring. Here is the syntax of the same below:

Python

re.sub(pattern, replacement, string, count, flags)

Here, pattern, replacement, and string are mandatory to be provided while count and flags are default to 0 and are optional. Let’s discuss what these 5 are one by one:

  1. pattern (required): This specifies the regular expression pattern to search for in the input string. Raw strings like r”pattern” are recommended.
  2. replacement (required): The replacement substring that matched pattern text gets substituted with. Can also be a callable function.
  3. string (required): The input text string or sequence where pattern matching and substitution is performed.
  4. count (optional): How many pattern occurrences to replace. Default of 0 means replace all occurrences. Specify 1 to replace just first match.
  5. flags (optional): Allows modifying regex engine behavior. Value is bitmask of re flags like re.IGNORECASE, re.MULTILINE etc.

The method matches the regex pattern against string, then replaces matched text with repl substring up to count times based on provided flags.

Python Regex Replace Examples

Regex Example to remove hyphen from a string:

In this first example we will remove all the hyphens from a telephone number so that all the digits can occur together in the string. Here’s how we are going to do it:

Python

import re

phone = "412-555-1234"
formatted = re.sub("\D", "", phone)

print(formatted) # Output: "4125551234"

Here, “\D” matches any non-digit character. The empty quotes replace matches with nothing effectively removing non-numeric symbols. Remember, “\d” matches digits; “\D” matches non-digits.

Regex Example to remove all whitespaces from a string

In our second example, we will remove all the whitespaces that are occurring in the string. Here’s the code for same:

Python

import re

text = "Remove Whitespace from this text"
trimmmed_text = re.sub(r"\s", "", text)

print(trimmmed_text) # Output: RemoveWhitespacefromthistext

The “\s” pattern matches all the whitespaces in the text and replace the whitespace by closing it with “”.

Regex Example to extract domain from a url

In case you need to extract a domain from a given url, you can use regex to specify this pattern: r"^https?://(www\.?"

Here’s a code for extracting the pypixel.com domain from it’s url.

Python

import re

url = "https://www.pypixel.com"
domain = re.sub(r"^https?://(www\.)?", "", url)

print(domain) # Output: pypixel.com

Regex Example to replace a text from a string

This one is the simplest and the most common example that you will come across where you will be replacing a word in your strong that you won’t require. Here’s how you can do that:

Python

import re

random_string = "This website are PyPixel"
updated_string = re.sub(r”are”, “is”, random_string)

print(updated_string) # Output: This website is PyPixel

Regex Example to remove characters from a string

In this example, we’ll remove all the characters from a string. It can be a punctuation mark(.), an exclamation mark(!), semi-colon(;) etc.

Python

import re

text = "Hello, world! This has many symbols like . , : ; ?"
cleaned = re.sub(r'[^\w\s]','',text)

print(cleaned) # Output: Hello world This has many symbols like

The patterns r'[^\w\s]' find/replace all non-word characters in the string, removes them by replacing with empty string resulting in characters getting stripped.

Conclusion

In this article, Python Regex Replace Patterns were discussed with some commonly used examples. Python’s re module brings regex capabilities into our code. You can use re.sub() method to use python regex replace patterns for multiple use-cases. You may require regex while building projects that require user input to be manipulated in some manner.

Moreover, here’s a great tool called Regex 101, that let’s you test your regex pattern online, it’s really handy and definitely worth trying in case you need quick outputs.

FAQs

Is regex replace case-sensitive?

Yes, by default regex operations are case-sensitive. Use flags like re.IGNORECASE to make case-insensitive.

How to replace only first or nth match occurrence?

Specify count= parameter on re.sub() with number of replacements needed and it’ll replace those many matched patterns for you.

What if I want to access matches and reformat dynamically?

You can utilize callback function as replacement parameter to customize formatting.

Can I match and replace without knowing full string beforehand?

Yes, try using re.compile() to precompile pattern for reusability first.

How can I improve regex pattern readability?

You can simple use verbose mode with comments – r”(?x)pattern” along with method chaining.



Source link