Question :
Python regular expression again – match url
I have such regexp:
re.compile(r"((https?):((//)|(\\))+[wd:#@%/;$()~_?+-=\.&]*)", re.MULTILINE|re.UNICODE)
But that doesn’t include hashbangs (#!)
. What I need to change, to get it working? I know I can add ! to group with #@%
etc, but that will select something like
Check this out: http://example.com/something/!!!
and I want to avoid that.
Answer #1:
Don’t try to make your own regular expression for matching URLs, use someone else’s who has already solved such problems, like this one.
Answer #2:
It could be very long but in practice mine works pretty good. Please try this one
((http|https)://)?[a-zA-Z0-9./?:@-_=#]+.([a-zA-Z]){2,6}([a-zA-Z0-9.&/?:@-_=#])*
It matches all of the example below
http://wwww.stackoverflow.com
abc.com
http://test.test-75.1474.stackoverflow.com/
stackoverflow.com/
stackoverflow.com
rfordyce@broadviewnet.com
http://www.example.com/etcetc
www.example.com/etcetc
example.com/etcetc
user:pass@example.com/etcetc
(www.itmag.com)
example.com/etcetc?query=aasd
example.com/etcetc?query=aasd&dest=asds
http://stackoverflow.com/questions/6427530/regular-expression-pattern-to-
match-url-with
www/Christina.V.Scott@gmail.com
line.lundvoll.nilsen@telemed.no.
s.hossain@unsw.edu.au
s.hossain@unsw.edu.au
Answer #3:
I’ll admit that I’m a little bit worried about an application that requires a regex like that to match URLs. That said, this seems to work for me:
((https?):((//)|(\\))+([wd:#@%/;$()~_?+-=\.&](#!)?)*)