python .replace() regex [duplicate]

Posted on

Question :

python .replace() regex [duplicate]

I am trying to do a grab everything after the '</html>' tag and delete it, but my code doesn’t seem to be doing anything. Does .replace() not support regex?

z.write(article.replace('</html>.+', '</html>'))

Answer #1:

No. Regular expressions in Python are handled by the re module.

article = re.sub(r'(?is)</html>.+', '</html>', article)

In general:

text_after = re.sub(regex_search_term, regex_replacement, text_before)

Answer #2:

In order to replace text using regular expression use the re.sub function:

sub(pattern, repl, string[, count, flags])

It will replace non-everlaping instances of pattern by the text passed as string. If you need to analyze the match to extract information about specific group captures, for instance, you can pass a function to the string argument. more info here.


>>> import re
>>> re.sub(r'a', 'b', 'banana')

>>> re.sub(r'/d+', '/{id}', '/andre/23/abobora/43435')
Answered By: André Pena

Answer #3:

You can use the re module for regexes, but regexes are probably overkill for what you want. I might try something like

z.write(article[:article.index("</html>") + 7]

This is much cleaner, and should be much faster than a regex based solution.

Answered By: Julian

Answer #4:

For this particular case, if using re module is overkill, how about using split (or rsplit) method as


For example,


Ponta Monta 
Waff Moff


outputs out.txt as

Ponta Monta 
Answered By: norio

Leave a Reply

Your email address will not be published. Required fields are marked *