Question :
So I’m trying to login to Quora using Python and then scrape some stuff.
I’m using Selenium to login to the site. Here’s my code:
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
driver = webdriver.Firefox()
driver.get('http://www.quora.com/')
username = driver.find_element_by_name('email')
password = driver.find_element_by_name('password')
username.send_keys('email')
password.send_keys('password')
password.send_keys(Keys.RETURN)
driver.close()
Now the questions:
-
It took ~4 minutes to find and fill the login form, which painfully slow. Is there something I can do to speed up the process?
-
When it did login, how do I make sure there were no errors? In other words, how do I check the response code?
-
How do I save cookies with selenium so I can continue scraping once I login?
-
If there is no way to make selenium faster, is there any other alternative for logging in? (Quora doesn’t have an API)
Answer #1:
I had a similar problem with very slow find_elements_xxx calls in Python selenium using the ChromeDriver. I eventually tracked down the trouble to a driver.implicitly_wait() call I made prior to my find_element_xxx() calls; when I took it out, my find_element_xxx() calls ran quickly.
Now, I know those elements were there when I did the find_elements_xxx() calls. So I cannot imagine why the implicit_wait should have affected the speed of those operations, but it did.
Answer #2:
-
I have been there, selenium is slow. It may not be as slow as 4 min to fill a form. I then started using phantomjs, which is much faster than firefox, since it is headless. You can simply replace Firefox() with PhantomJS() in the webdriver line after installing latest phantomjs.
-
To check that you have login you can assert for some element which is displayed after login.
-
As long as you do not quit your driver, cookies will be available to follow links
-
You can try using urllib and post directly to the login link. You can use cookiejar to save cookies. You can even simply save cookie, after all, a cookie is simply a string in http header
Answer #3:
You can fasten your form filling by using your own setAttribute method, here is code for java for it
public void setAttribute(By locator, String attribute, String value) {
((JavascriptExecutor) getDriver()).executeScript("arguments[0].setAttribute('" + attribute
+ "',arguments[1]);",
getElement(locator),
value);
}
Answer #4:
For Windows 7 and IEDRIVER with Python Selenium, Ending the Windows Command Line and restarting it cured my issue.
I was having trouble with find_element..clicks. They were taking 30 seconds plus a little bit. Here’s the type of code I have including capturing how long to run.
timeStamp = time.time()
elem = driver.find_element_by_css_selector(clickDown).click()
print("1 took:",time.time() - timeStamp)
timeStamp = time.time()
elem = driver.find_element_by_id("cSelect32").click()
print("2 took:",time.time() - timeStamp)
That was recording about 31 seconds for each click. After ending the command line and restarting it (which does end any IEDRIVERSERVER.exe processes), it was 1 second per click.
Answer #5:
Running the web driver headlessly should improve its execution speed to some degree.
from selenium.webdriver import Firefox
from selenium.webdriver.firefox.options import Options
options = Options()
options.add_argument('-headless')
browser = webdriver.Firefox(firefox_options=options)
browser.get('https://google.com/')
browser.close()
Answer #6:
I have changed locators and this works fast. Also, I have added working with cookies. Check the code below:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.keys import Keys
import pickle
driver = webdriver.Firefox()
driver.get('http://www.quora.com/')
wait = WebDriverWait(driver, 5)
username = wait.until(EC.presence_of_element_located((By.XPATH, '//div[@class="login"]//input[@name="email"]')))
password = wait.until(EC.presence_of_element_located((By.XPATH, '//div[@class="login"]//input[@name="password"]')))
username.send_keys('email')
password.send_keys('password')
password.send_keys(Keys.RETURN)
wait.until(EC.presence_of_element_located((By.XPATH, '//span[text()="Add Question"]'))) # checking that user logged in
pickle.dump( driver.get_cookies() , open("cookies.pkl","wb")) # saving cookies
driver.close()
We have saved cookies and now we will apply them in a new browser:
driver = webdriver.Firefox()
driver.get('http://www.quora.com/')
cookies = pickle.load(open("cookies.pkl", "rb"))
for cookie in cookies:
driver.add_cookie(cookie)
driver.get('http://www.quora.com/')
Hope, this will help.