Downloading a picture via urllib and python

Posted on

Solving problem is about exposing yourself to as many situations as possible like Downloading a picture via urllib and python and practice these strategies over and over. With time, it becomes second nature and a natural way you approach any problems in general. Big or small, always start with a plan, use other strategies mentioned here till you are confident and ready to code the solution.
In this post, my aim is to share an overview the topic about Downloading a picture via urllib and python, which can be followed any time. Take easy to follow this discuss.

Downloading a picture via urllib and python

So I’m trying to make a Python script that downloads webcomics and puts them in a folder on my desktop. I’ve found a few similar programs on here that do something similar, but nothing quite like what I need. The one that I found most similar is right here (http://bytes.com/topic/python/answers/850927-problem-using-urllib-download-images). I tried using this code:

>>> import urllib
>>> image = urllib.URLopener()
>>> image.retrieve("http://www.gunnerkrigg.com//comics/00000001.jpg","00000001.jpg")
('00000001.jpg', <httplib.HTTPMessage instance at 0x1457a80>)

I then searched my computer for a file “00000001.jpg”, but all I found was the cached picture of it. I’m not even sure it saved the file to my computer. Once I understand how to get the file downloaded, I think I know how to handle the rest. Essentially just use a for loop and split the string at the ‘00000000’.’jpg’ and increment the ‘00000000’ up to the largest number, which I would have to somehow determine. Any reccomendations on the best way to do this or how to download the file correctly?

Thanks!

EDIT 6/15/10

Here is the completed script, it saves the files to any directory you choose. For some odd reason, the files weren’t downloading and they just did. Any suggestions on how to clean it up would be much appreciated. I’m currently working out how to find out many comics exist on the site so I can get just the latest one, rather than having the program quit after a certain number of exceptions are raised.

import urllib
import os
comicCounter=len(os.listdir('/file'))+1  # reads the number of files in the folder to start downloading at the next comic
errorCount=0
def download_comic(url,comicName):
    """
    download a comic in the form of
    url = http://www.example.com
    comicName = '00000000.jpg'
    """
    image=urllib.URLopener()
    image.retrieve(url,comicName)  # download comicName at URL
while comicCounter <= 1000:  # not the most elegant solution
    os.chdir('/file')  # set where files download to
        try:
        if comicCounter < 10:  # needed to break into 10^n segments because comic names are a set of zeros followed by a number
            comicNumber=str('0000000'+str(comicCounter))  # string containing the eight digit comic number
            comicName=str(comicNumber+".jpg")  # string containing the file name
            url=str("http://www.gunnerkrigg.com//comics/"+comicName)  # creates the URL for the comic
            comicCounter+=1  # increments the comic counter to go to the next comic, must be before the download in case the download raises an exception
            download_comic(url,comicName)  # uses the function defined above to download the comic
            print url
        if 10 <= comicCounter < 100:
            comicNumber=str('000000'+str(comicCounter))
            comicName=str(comicNumber+".jpg")
            url=str("http://www.gunnerkrigg.com//comics/"+comicName)
            comicCounter+=1
            download_comic(url,comicName)
            print url
        if 100 <= comicCounter < 1000:
            comicNumber=str('00000'+str(comicCounter))
            comicName=str(comicNumber+".jpg")
            url=str("http://www.gunnerkrigg.com//comics/"+comicName)
            comicCounter+=1
            download_comic(url,comicName)
            print url
        else:  # quit the program if any number outside this range shows up
            quit
    except IOError:  # urllib raises an IOError for a 404 error, when the comic doesn't exist
        errorCount+=1  # add one to the error count
        if errorCount>3:  # if more than three errors occur during downloading, quit the program
            break
        else:
            print str("comic"+ ' ' + str(comicCounter) + ' ' + "does not exist")  # otherwise say that the certain comic number doesn't exist
print "all comics are up to date"  # prints if all comics are downloaded
Asked By: Mike

||

Answer #1:

Python 2

Using urllib.urlretrieve

import urllib
urllib.urlretrieve("http://www.gunnerkrigg.com//comics/00000001.jpg", "00000001.jpg")

Python 3

Using urllib.request.urlretrieve (part of Python 3’s legacy interface, works exactly the same)

import urllib.request
urllib.request.urlretrieve("http://www.gunnerkrigg.com//comics/00000001.jpg", "00000001.jpg")
Answered By: Matthew Flaschen

Answer #2:

import urllib
f = open('00000001.jpg','wb')
f.write(urllib.urlopen('http://www.gunnerkrigg.com//comics/00000001.jpg').read())
f.close()
Answered By: DiGMi

Answer #3:

Just for the record, using requests library.

import requests
f = open('00000001.jpg','wb')
f.write(requests.get('http://www.gunnerkrigg.com//comics/00000001.jpg').content)
f.close()

Though it should check for requests.get() error.

Answered By: ellimilial

Answer #4:

For Python 3 you will need to import import urllib.request:

import urllib.request
urllib.request.urlretrieve(url, filename)

for more info check out the link

Answered By: HISI

Answer #5:

Python 3 version of @DiGMi’s answer:

from urllib import request
f = open('00000001.jpg', 'wb')
f.write(request.urlopen("http://www.gunnerkrigg.com/comics/00000001.jpg").read())
f.close()
Answered By: Dennis Golomazov

Answer #6:

I have found this answer and I edit that in more reliable way

def download_photo(self, img_url, filename):
    try:
        image_on_web = urllib.urlopen(img_url)
        if image_on_web.headers.maintype == 'image':
            buf = image_on_web.read()
            path = os.getcwd() + DOWNLOADED_IMAGE_PATH
            file_path = "%s%s" % (path, filename)
            downloaded_image = file(file_path, "wb")
            downloaded_image.write(buf)
            downloaded_image.close()
            image_on_web.close()
        else:
            return False
    except:
        return False
    return True

From this you never get any other resources or exceptions while downloading.

Answered By: Janith Chinthana

Answer #7:

If you know that the files are located in the same directory dir of the website site and have the following format: filename_01.jpg, …, filename_10.jpg then download all of them:

import requests
for x in range(1, 10):
    str1 = 'filename_%2.2d.jpg' % (x)
    str2 = 'http://site/dir/filename_%2.2d.jpg' % (x)
    f = open(str1, 'wb')
    f.write(requests.get(str2).content)
    f.close()
Answered By: len

Answer #8:

It’s easiest to just use .read() to read the partial or entire response, then write it into a file you’ve opened in a known good location.

Leave a Reply

Your email address will not be published.