Question :
I’m trying to get the content of App Store > Business:
import requests
from lxml import html
page = requests.get("https://itunes.apple.com/in/genre/ios-business/id6000?mt=8")
tree = html.fromstring(page.text)
flist = []
plist = []
for i in range(0, 100):
app = tree.xpath("//div[@class='column first']/ul/li/a/@href")
ap = app[0]
page1 = requests.get(ap)
When I try the range
with (0,2)
it works, but when I put the range
in 100
s it shows this error:
Traceback (most recent call last):
File "/home/preetham/Desktop/eg.py", line 17, in <module>
page1 = requests.get(ap)
File "/usr/local/lib/python2.7/dist-packages/requests/api.py", line 55, in get
return request('get', url, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/requests/api.py", line 44, in request
return session.request(method=method, url=url, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line 383, in request
resp = self.send(prep, **send_kwargs)
File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line 486, in send
r = adapter.send(request, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/requests/adapters.py", line 378, in send
raise ConnectionError(e)
requests.exceptions.ConnectionError: HTTPSConnectionPool(host='itunes.apple.com', port=443): Max retries exceeded with url: /in/app/adobe-reader/id469337564?mt=8 (Caused by <class 'socket.gaierror'>: [Errno -2] Name or service not known)
Answer #1:
What happened here is that itunes server refuses your connection (you’re sending too many requests from same ip address in short period of time)
Max retries exceeded with url: /in/app/adobe-reader/id469337564?mt=8
error trace is misleading it should be something like “No connection could be made because the target machine actively refused it”.
There is an issue at about python.requests lib at Github, check it out here
To overcome this issue (not so much an issue as it is misleading debug trace) you should catch connection related exceptions like so:
try:
page1 = requests.get(ap)
except requests.exceptions.ConnectionError:
r.status_code = "Connection refused"
Another way to overcome this problem is if you use enough time gap to send requests to server this can be achieved by sleep(timeinsec)
function in python (don’t forget to import sleep)
from time import sleep
All in all requests is awesome python lib, hope that solves your problem.
Answer #2:
Just use requests'
features:
import requests
from requests.adapters import HTTPAdapter
from requests.packages.urllib3.util.retry import Retry
session = requests.Session()
retry = Retry(connect=3, backoff_factor=0.5)
adapter = HTTPAdapter(max_retries=retry)
session.mount('http://', adapter)
session.mount('https://', adapter)
session.get(url)
This will GET
the URL and retry 3 times in case of requests.exceptions.ConnectionError
. backoff_factor
will help to apply delays between attempts to avoid to fail again in case of periodic request quota.
Take a look at requests.packages.urllib3.util.retry.Retry
, it has many options to simplify retries.
Answer #3:
Just do this,
Paste the following code in place of page = requests.get(url)
:
import time
page = ''
while page == '':
try:
page = requests.get(url)
break
except:
print("Connection refused by the server..")
print("Let me sleep for 5 seconds")
print("ZZzzzz...")
time.sleep(5)
print("Was a nice sleep, now let me continue...")
continue
You’re welcome 🙂
Answer #4:
pip install pyopenssl
seemed to solve it for me.
Answer #5:
I got similar problem but the following code worked for me.
url = <some REST url>
page = requests.get(url, verify=False)
“verify=False” disables SSL verification. Try and catch can be added as usual.
Answer #6:
It is always good to implement exception handling. It does not only help to avoid unexpected exit of script but can also help to log errors and info notification. When using Python requests I prefer to catch exceptions like this:
try:
res = requests.get(adress,timeout=30)
except requests.ConnectionError as e:
print("OOPS!! Connection Error. Make sure you are connected to Internet. Technical Details given below.n")
print(str(e))
renewIPadress()
continue
except requests.Timeout as e:
print("OOPS!! Timeout Error")
print(str(e))
renewIPadress()
continue
except requests.RequestException as e:
print("OOPS!! General Error")
print(str(e))
renewIPadress()
continue
except KeyboardInterrupt:
print("Someone closed the program")
Here renewIPadress() is a user define function which can change the IP address if it get blocked. You can go without this function.
Answer #7:
Specifying the proxy in a corporate environment solved it for me.
page = requests.get("http://www.google.com:80", proxies={"http": "http://111.233.225.166:1234"})
The full error is:
requests.exceptions.ConnectionError: HTTPSConnectionPool(host=’www.google.com’, port=80): Max retries exceeded with url: / (Caused by NewConnectionError(‘: Failed to establish a new connection: [WinError 10060] A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond’))
Answer #8:
Adding my own experience for those who are experiencing this in the future. My specific error was
Failed to establish a new connection: [Errno 8] nodename nor servname provided, or not known'
It turns out that this was actually because I had reach the maximum number of open files on my system. It had nothing to do with failed connections, or even a DNS error as indicated.