How to get everything after last slash in a URL?

Posted on

Question :

How to get everything after last slash in a URL?

How can I extract whatever follows the last slash in a URL in Python? For example, these URLs should return the following:

URL: http://www.test.com/TEST1
returns: TEST1

URL: http://www.test.com/page/TEST2
returns: TEST2

URL: http://www.test.com/page/page/12345
returns: 12345

I’ve tried urlparse, but that gives me the full path filename, such as page/page/12345.

Asked By: mix

||

Answer #1:

You don’t need fancy things, just see the string methods in the standard library and you can easily split your url between ‘filename’ part and the rest:

url.rsplit('/', 1)

So you can get the part you’re interested in simply with:

url.rsplit('/', 1)[-1]
Answered By: Luke404

Answer #2:

One more (idio(ma)tic) way:

URL.split("/")[-1]
Answered By: Kimvais

Answer #3:

rsplit should be up to the task:

In [1]: 'http://www.test.com/page/TEST2'.rsplit('/', 1)[1]
Out[1]: 'TEST2'
Answered By: Benjamin Wohlwend

Answer #4:

You can do like this:

head, tail = os.path.split(url)

Where tail will be your file name.

Answered By: neowinston

Answer #5:

urlparse is fine to use if you want to (say, to get rid of any query string parameters).

import urllib.parse

urls = [
    'http://www.test.com/TEST1',
    'http://www.test.com/page/TEST2',
    'http://www.test.com/page/page/12345',
    'http://www.test.com/page/page/12345?abc=123'
]

for i in urls:
    url_parts = urllib.parse.urlparse(i)
    path_parts = url_parts[2].rpartition('/')
    print('URL: {}nreturns: {}n'.format(i, path_parts[2]))

Output:

URL: http://www.test.com/TEST1
returns: TEST1

URL: http://www.test.com/page/TEST2
returns: TEST2

URL: http://www.test.com/page/page/12345
returns: 12345

URL: http://www.test.com/page/page/12345?abc=123
returns: 12345
Answered By: Jacob Wan

Answer #6:

os.path.basename(os.path.normpath('/folderA/folderB/folderC/folderD/'))
>>> folderD
Answered By: Rochan

Answer #7:

Here’s a more general, regex way of doing this:

    re.sub(r'^.+/([^/]+)$', r'1', url)
Answered By: sandoronodi

Answer #8:

First extract the path element from the URL:

from urllib.parse import urlparse
parsed= urlparse('https://www.dummy.example/this/is/PATH?q=/a/b&r=5#asx')

and then you can extract the last segment with string functions:

parsed.path.rpartition('/')[2]

(example resulting to 'PATH')

Answered By: tzot

Leave a Reply

Your email address will not be published. Required fields are marked *