Question :

I have such regexp:

 re.compile(r"((https?):((//)|(\\))+[wd:#@%/;$()~_?+-=\.&]*)", re.MULTILINE|re.UNICODE)

But that doesn’t include hashbangs (#!). What I need to change, to get it working? I know I can add ! to group with #@% etc, but that will select something like

Check this out:!!!

and I want to avoid that.

Asked By: ThomK


Answer #1:

Don’t try to make your own regular expression for matching URLs, use someone else’s who has already solved such problems, like this one.

Answered By: kindall

Answer #2:

It could be very long but in practice mine works pretty good. Please try this one

It matches all of the example below
Answered By: Asad

Answer #3:

I’ll admit that I’m a little bit worried about an application that requires a regex like that to match URLs. That said, this seems to work for me:

Answered By: tsm

Answer #4:

This is a common problem, use default libraries.

For python use urlparse

Answered By: estani

