I’m confused about parallel execution in python using selenium. There seems to be a few ways to go about it, but some seem out of date.
I’m wondering what is the latest way to do parallel execution using selenium?
There’s a python module called python-wd-parallel which seems to have some functionality to do this, but it’s from 2013, is this still useful now?
We have concurrent.future also, this seems a lot newer, but not so easy to implement – anyone have a working example with parallel execution in selenium?
There’s also using just threading and executors to get the job done, but I feel this will be slower, because it’s not using all the cores and is still running in serial formation.
Use joblib’s Parallel module to do that, its a great library for parallel execution.
Lets say we have a list of urls named
urls and we want to take a screenshot of each one in parallel
First lets import the necessary libraries
from selenium import webdriver from joblib import Parallel, delayed
Now lets define a function that takes a screenshot as base64
def take_screenshot(url): phantom = webdriver.PhantomJS('/path/to/phantomjs') phantom.get(url) screenshot = phantom.get_screenshot_as_base64() phantom.close() return screenshot
Now to execute that in parallel what you would do is
screenshots = Parallel(n_jobs=-1)(delayed(take_screenshot)(url) for url in urls)
When this line will finish executing, you will have in
screenshots all of the data from all of the processes that ran.
Explanation about Parallel
Parallel(n_jobs=-1)means use all of the resources you can
joblib‘s way of creating the input for the function you are trying to run on parallel
More information can be found on the
I created a project to do this and it reuses webdriver instances for better performance: