How to input a regex in string.replace?

Posted on

Solving problem is about exposing yourself to as many situations as possible like How to input a regex in string.replace? and practice these strategies over and over. With time, it becomes second nature and a natural way you approach any problems in general. Big or small, always start with a plan, use other strategies mentioned here till you are confident and ready to code the solution.
In this post, my aim is to share an overview the topic about How to input a regex in string.replace?, which can be followed any time. Take easy to follow this discuss.

How to input a regex in string.replace?

I need some help on declaring a regex. My inputs are like the following:

this is a paragraph with<[1> in between</[1> and then there are cases ... where the<[99> number ranges from 1-100</[99>.
and there are many other lines in the txt files
with<[3> such tags </[3>

The required output is:

this is a paragraph with in between and then there are cases ... where the number ranges from 1-100.
and there are many other lines in the txt files
with such tags

I’ve tried this:

#!/usr/bin/python
import os, sys, re, glob
for infile in glob.glob(os.path.join(os.getcwd(), '*.txt')):
    for line in reader:
        line2 = line.replace('<[1> ', '')
        line = line2.replace('</[1> ', '')
        line2 = line.replace('<[1>', '')
        line = line2.replace('</[1>', '')
        print line

I’ve also tried this (but it seems like I’m using the wrong regex syntax):

    line2 = line.replace('<[*> ', '')
    line = line2.replace('</[*> ', '')
    line2 = line.replace('<[*>', '')
    line = line2.replace('</[*>', '')

I dont want to hard-code the replace from 1 to 99 . . .

Asked By: alvas

||

Answer #1:

This tested snippet should do it:

import re
line = re.sub(r"</?[d+>", "", line)

Edit: Here’s a commented version explaining how it works:

line = re.sub(r"""
  (?x) # Use free-spacing mode.
  <    # Match a literal '<'
  /?   # Optionally match a '/'
  [   # Match a literal '['
  d+  # Match one or more digits
  >    # Match a literal '>'
  """, "", line)

Regexes are fun! But I would strongly recommend spending an hour or two studying the basics. For starters, you need to learn which characters are special: “metacharacters” which need to be escaped (i.e. with a backslash placed in front – and the rules are different inside and outside character classes.) There is an excellent online tutorial at: www.regular-expressions.info. The time you spend there will pay for itself many times over. Happy regexing!

Answered By: ridgerunner

Answer #2:

str.replace() does fixed replacements. Use re.sub() instead.

Answer #3:

I would go like this (regex explained in comments):

import re
# If you need to use the regex more than once it is suggested to compile it.
pattern = re.compile(r"</{0,}[d+>")
# </{0,}[d+>
# 
# Match the character “<” literally «<»
# Match the character “/” literally «/{0,}»
#    Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «{0,}»
# Match the character “[” literally «[»
# Match a single digit 0..9 «d+»
#    Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
# Match the character “>” literally «>»
subject = """this is a paragraph with<[1> in between</[1> and then there are cases ... where the<[99> number ranges from 1-100</[99>.
and there are many other lines in the txt files
with<[3> such tags </[3>"""
result = pattern.sub("", subject)
print(result)

If you want to learn more about regex I recomend to read Regular Expressions Cookbook by Jan Goyvaerts and Steven Levithan.

Answer #4:

The easiest way

import re
txt='this is a paragraph with<[1> in between</[1> and then there are cases ... where the<[99> number ranges from 1-100</[99>.  and there are many other lines in the txt files with<[3> such tags </[3>'
out = re.sub("(<[^>]+>)", '', txt)
print out
Answered By: Ezequiel Marquez

Answer #5:

replace method of string objects does not accept regular expressions but only fixed strings (see documentation: http://docs.python.org/2/library/stdtypes.html#str.replace).

You have to use re module:

import re
newline= re.sub("</?[[0-9]+>", "", line)
Answered By: Zac

Answer #6:

don’t have to use regular expression (for your sample string)

>>> s
'this is a paragraph with<[1> in between</[1> and then there are cases ... where the<[99> number ranges from 1-100</[99>. nand there are many other lines in the txt filesnwith<[3> such tags </[3>n'
>>> for w in s.split(">"):
...   if "<" in w:
...      print w.split("<")[0]
...
this is a paragraph with
 in between
 and then there are cases ... where the
 number ranges from 1-100
.
and there are many other lines in the txt files
with
 such tags
Answered By: kurumi

Answer #7:

import os, sys, re, glob
pattern = re.compile(r"<[d>")
replacementStringMatchesPattern = "<[1>"
for infile in glob.glob(os.path.join(os.getcwd(), '*.txt')):
   for line in reader:
      retline =  pattern.sub(replacementStringMatchesPattern, "", line)
      sys.stdout.write(retline)
      print (retline)
Answered By: Abena Saulka
The answers/resolutions are collected from stackoverflow, are licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0 .

Leave a Reply

Your email address will not be published.