Python parallel execution with selenium

Posted on

Question :

Python parallel execution with selenium

I’m confused about parallel execution in python using selenium. There seems to be a few ways to go about it, but some seem out of date.

I’m wondering what is the latest way to do parallel execution using selenium?

There’s a python module called python-wd-parallel which seems to have some functionality to do this, but it’s from 2013, is this still useful now?


We have concurrent.future also, this seems a lot newer, but not so easy to implement – anyone have a working example with parallel execution in selenium?

There’s also using just threading and executors to get the job done, but I feel this will be slower, because it’s not using all the cores and is still running in serial formation.

Asked By: Ke.


Answer #1:

Use joblib’s Parallel module to do that, its a great library for parallel execution.

Lets say we have a list of urls named urls and we want to take a screenshot of each one in parallel

First lets import the necessary libraries

from selenium import webdriver
from joblib import Parallel, delayed

Now lets define a function that takes a screenshot as base64

def take_screenshot(url):
    phantom = webdriver.PhantomJS('/path/to/phantomjs')
    screenshot = phantom.get_screenshot_as_base64()

    return screenshot

Now to execute that in parallel what you would do is

screenshots = Parallel(n_jobs=-1)(delayed(take_screenshot)(url) for url in urls)

When this line will finish executing, you will have in screenshots all of the data from all of the processes that ran.

Explanation about Parallel

  • Parallel(n_jobs=-1) means use all of the resources you can
  • delayed(function)(input) is joblib‘s way of creating the input for the function you are trying to run on parallel

More information can be found on the joblib docs

Answered By: bluesummers

Answer #2:

I created a project to do this and it reuses webdriver instances for better performance:

Answered By: chrismead

Leave a Reply

Your email address will not be published. Required fields are marked *