Why can’t PySpark find py4j.java_gateway?

Posted on

Question :

Why can’t PySpark find py4j.java_gateway?

I installed Spark, ran the sbt assembly, and can open bin/pyspark with no problem. However, I am running into problems loading the pyspark module into ipython. I’m getting the following error:

In [1]: import pyspark
ImportError                               Traceback (most recent call last)
<ipython-input-1-c15ae3402d12> in <module>()
----> 1 import pyspark

/usr/local/spark/python/pyspark/__init__.py in <module>()
     62 from pyspark.conf import SparkConf
---> 63 from pyspark.context import SparkContext
     64 from pyspark.sql import SQLContext
     65 from pyspark.rdd import RDD

/usr/local/spark/python/pyspark/context.py in <module>()
     28 from pyspark.conf import SparkConf
     29 from pyspark.files import SparkFiles
---> 30 from pyspark.java_gateway import launch_gateway
     31 from pyspark.serializers import PickleSerializer, BatchedSerializer, UTF8Deserializer, 
     32     PairDeserializer, CompressedSerializer

/usr/local/spark/python/pyspark/java_gateway.py in <module>()
     24 from subprocess import Popen, PIPE
     25 from threading import Thread
---> 26 from py4j.java_gateway import java_import, JavaGateway, GatewayClient

ImportError: No module named py4j.java_gateway

Answer #1:

In my environment (using docker and the image sequenceiq/spark:1.1.0-ubuntu), I ran in to this. If you look at the pyspark shell script, you’ll see that you need a few things added to your PYTHONPATH:


That worked in ipython for me.

Update: as noted in the comments, the name of the py4j zip file changes with each Spark release, so look around for the right name.

Answered By: nealmcb

Answer #2:

I solved this problem by adding some paths in .bashrc

export SPARK_HOME=/home/a141890/apps/spark

After this, it never raise ImportError: No module named py4j.java_gateway.

Answered By: Anderson

Answer #3:

Install pip module ‘py4j’.

pip install py4j

I got this problem with Spark 2.1.1 and Python 2.7.x. Not sure if Spark stopped bundling this package in latest distributions. But installing py4j module solved the issue for me.

Answered By: kn_pavan

Answer #4:

In Pycharm,
before running above script, ensure that you have unzipped the py4j*.zip file.
and add its reference in script
sys.path.append(“path to spark*/python/lib”)

It worked for me.

Answered By: shubham gorde

Answer #5:

import os
import sys
# Set the path for spark installation
# this is the path where you have built spark using sbt/sbt assembly
os.environ['SPARK_HOME'] = "/home/shubham/spark-1.6.2"
# os.environ['SPARK_HOME'] = "/home/jie/d2/spark-0.9.1"
# Append to PYTHONPATH so that pyspark could be found
# sys.path.append("/home/jie/d2/spark-0.9.1/python")
# Now we are ready to import Spark Modules
    from pyspark import SparkContext
    from pyspark import SparkConf`enter code here`
    print "Hey nice"
except ImportError as e:
    print ("Error importing Spark Modules", e)
Answered By: shubham gorde

Answer #6:

For setup of PySpark with python 3.8, add below paths to bash profile (Mac):

export SPARK_HOME=/Users/<username>/spark-3.0.1-bin-hadoop2.7
export PATH=$PATH:/Users/<username>/spark-3.0.1-bin-hadoop2.7/bin
export PYSPARK_PYTHON=python3
export PYTHONPATH=$SPARK_HOME/python/lib/py4j-0.10.9-src.zip:$PYTHONPATH

NOTE: Use the py4j path present in your downloaded spark package.

Save the new updated bash file: Ctrl + X.

Run the new bash file: source ~/.bash_profile

Answered By: Rohan Harode

Leave a Reply

Your email address will not be published. Required fields are marked *