In Python, when given the URL for a text file, what is the simplest way to access the contents off the text file and print the contents of the file out locally line-by-line without saving a local copy of the text file?
TargetURL=http://www.myhost.com/SomeFile.txt #read the file #print first line #print second line #etc
Edit 09/2016: In Python 3 and up use urllib.request instead of urllib2
Actually the simplest way is:
import urllib2 # the lib that handles the url stuff data = urllib2.urlopen(target_url) # it's a file like object and works just like a file for line in data: # files are iterable print line
You don’t even need “readlines”, as Will suggested. You could even shorten it to: *
import urllib2 for line in urllib2.urlopen(target_url): print line
But remember in Python, readability matters.
However, this is the simplest way but not the safe way because most of the time with network programming, you don’t know if the amount of data to expect will be respected. So you’d generally better read a fixed and reasonable amount of data, something you know to be enough for the data you expect but will prevent your script from been flooded:
import urllib2 data = urllib2.urlopen("http://www.google.com").read(20000) # read only 20 000 chars data = data.split("n") # then split it into lines for line in data: print line
* Second example in Python 3:
import urllib.request # the lib that handles the url stuff for line in urllib.request.urlopen(target_url): print(line.decode('utf-8')) #utf-8 or iso8859-1 or whatever the page encoding scheme is
I’m a newbie to Python and the offhand comment about Python 3 in the accepted solution was confusing. For posterity, the code to do this in Python 3 is
import urllib.request data = urllib.request.urlopen(target_url) for line in data: ...
from urllib.request import urlopen data = urlopen(target_url)
Note that just
import urllib does not work.
The requests library has a simpler interface and works with both Python 2 and 3.
import requests response = requests.get(target_url) data = response.text
There’s really no need to read line-by-line. You can get the whole thing like this:
import urllib txt = urllib.urlopen(target_url).read()
import urllib2 for line in urllib2.urlopen("http://www.myhost.com/SomeFile.txt"): print line
import urllib2 f = urllib2.urlopen(target_url) for l in f.readlines(): print l
Another way in Python 3 is to use the urllib3 package.
import urllib3 http = urllib3.PoolManager() response = http.request('GET', target_url) data = response.data.decode('utf-8')
This can be a better option than urllib since urllib3 boasts having
- Thread safety.
- Connection pooling.
- Client-side SSL/TLS verification.
- File uploads with multipart encoding.
- Helpers for retrying requests and dealing with HTTP redirects.
- Support for gzip and deflate encoding.
- Proxy support for HTTP and SOCKS.
- 100% test coverage.
For me, none of the above responses worked straight ahead. Instead, I had to do the following (Python 3):
from urllib.request import urlopen data = urlopen("[your url goes here]").read().decode('utf-8') # Do what you need to do with the data.