Py4JException:构造函数 org.apache.spark.sql.SparkSession([class org.apache.spark.SparkContext,class java.util.HashMap])不存在

我正在尝试通过 Visual Studio Code 在 EC2 Linux 机器上的 Jupyter Notebook 中运行 spark 会话。我的代码如下所示:

from pyspark.sql import SparkSession
spark = SparkSession.builder.appName("spark_app").getOrCreate()

错误是:

{
    "name": "Py4JError",
    "message": "An error occurred while calling None.org.apache.spark.sql.SparkSession. Trace:npy4j.Py4JException: Constructor org.apache.spark.sql.SparkSession([class org.apache.spark.SparkContext, class java.util.HashMap]) does not existntat py4j.reflection.ReflectionEngine.getConstructor(ReflectionEngine.java:179)ntat py4j.reflection.ReflectionEngine.getConstructor(ReflectionEngine.java:196)ntat py4j.Gateway.invoke(Gateway.java:237)ntat py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)ntat py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)ntat py4j.GatewayConnection.run(GatewayConnection.java:238)ntat java.base/java.lang.Thread.run(Thread.java:829)nn",
    "stack": "u001b[0;31m---------------------------------------------------------------------------u001b[0mnu001b[0;31mPy4JErroru001b[0m                                 Traceback (most recent call last)nu001b[1;32mc:UsersIrinaKaerkkaenenProjekteZugPortaltest.ipynb Cell 3'u001b[0m in u001b[0;36m<cell line: 2>u001b[0;34m()u001b[0mnu001b[1;32m      <a href='vscode-notebook-cell:/c%3A/Users/IrinaKaerkkaenen/Projekte/ZugPortal/test.ipynb#ch0000002?line=0'>1</a>u001b[0m u001b[39mfromu001b[39;00m u001b[39mpysparku001b[39;00mu001b[39m.u001b[39;00mu001b[39msqlu001b[39;00m u001b[39mimportu001b[39;00m SparkSessionnu001b[0;32m----> <a href='vscode-notebook-cell:/c%3A/Users/IrinaKaerkkaenen/Projekte/ZugPortal/test.ipynb#ch0000002?line=1'>2</a>u001b[0m spark u001b[39m=u001b[39m SparkSessionu001b[39m.u001b[39;49mbuilderu001b[39m.u001b[39;49mappName(u001b[39m"u001b[39;49mu001b[39mspark_appu001b[39;49mu001b[39m"u001b[39;49m)u001b[39m.u001b[39;49mgetOrCreate()nnFile u001b[0;32m~/anaconda3/lib/python3.9/site-packages/pyspark/sql/session.py:272u001b[0m, in u001b[0;36mSparkSession.Builder.getOrCreateu001b[0;34m(self)u001b[0mnu001b[1;32m    269u001b[0m     sc u001b[39m=u001b[39m SparkContextu001b[39m.u001b[39mgetOrCreate(sparkConf)nu001b[1;32m    270u001b[0m     u001b[39m# Do not update `SparkConf` for existing `SparkContext`, as it's sharedu001b[39;00mnu001b[1;32m    271u001b[0m     u001b[39m# by all sessions.u001b[39;00mnu001b[0;32m--> 272u001b[0m     session u001b[39m=u001b[39m SparkSession(sc, optionsu001b[39m=u001b[39;49mu001b[39mselfu001b[39;49mu001b[39m.u001b[39;49m_options)nu001b[1;32m    273u001b[0m u001b[39melseu001b[39;00m:nu001b[1;32m    274u001b[0m     u001b[39mgetattru001b[39m(nu001b[1;32m    275u001b[0m         u001b[39mgetattru001b[39m(sessionu001b[39m.u001b[39m_jvm, u001b[39m"u001b[39mu001b[39mSparkSession$u001b[39mu001b[39m"u001b[39m), u001b[39m"u001b[39mu001b[39mMODULE$u001b[39mu001b[39m"u001b[39mnu001b[1;32m    276u001b[0m     )u001b[39m.u001b[39mapplyModifiableSettings(sessionu001b[39m.u001b[39m_jsparkSession, u001b[39mselfu001b[39mu001b[39m.u001b[39m_options)nnFile u001b[0;32m~/anaconda3/lib/python3.9/site-packages/pyspark/sql/session.py:307u001b[0m, in u001b[0;36mSparkSession.__init__u001b[0;34m(self, sparkContext, jsparkSession, options)u001b[0mnu001b[1;32m    303u001b[0m         u001b[39mgetattru001b[39m(u001b[39mgetattru001b[39m(u001b[39mselfu001b[39mu001b[39m.u001b[39m_jvm, u001b[39m"u001b[39mu001b[39mSparkSession$u001b[39mu001b[39m"u001b[39m), u001b[39m"u001b[39mu001b[39mMODULE$u001b[39mu001b[39m"u001b[39m)u001b[39m.u001b[39mapplyModifiableSettings(nu001b[1;32m    304u001b[0m             jsparkSession, optionsnu001b[1;32m    305u001b[0m         )nu001b[1;32m    306u001b[0m     u001b[39melseu001b[39;00m:nu001b[0;32m--> 307u001b[0m         jsparkSession u001b[39m=u001b[39m u001b[39mselfu001b[39;49mu001b[39m.u001b[39;49m_jvmu001b[39m.u001b[39;49mSparkSession(u001b[39mselfu001b[39;49mu001b[39m.u001b[39;49m_jscu001b[39m.u001b[39;49msc(), options)nu001b[1;32m    308u001b[0m u001b[39melseu001b[39;00m:nu001b[1;32m    309u001b[0m     u001b[39mgetattru001b[39m(u001b[39mgetattru001b[39m(u001b[39mselfu001b[39mu001b[39m.u001b[39m_jvm, u001b[39m"u001b[39mu001b[39mSparkSession$u001b[39mu001b[39m"u001b[39m), u001b[39m"u001b[39mu001b[39mMODULE$u001b[39mu001b[39m"u001b[39m)u001b[39m.u001b[39mapplyModifiableSettings(nu001b[1;32m    310u001b[0m         jsparkSession, optionsnu001b[1;32m    311u001b[0m     )nnFile u001b[0;32m~/anaconda3/lib/python3.9/site-packages/py4j/java_gateway.py:1585u001b[0m, in u001b[0;36mJavaClass.__call__u001b[0;34m(self, *args)u001b[0mnu001b[1;32m   1579u001b[0m command u001b[39m=u001b[39m protou001b[39m.u001b[39mCONSTRUCTOR_COMMAND_NAME u001b[39m+u001b[39mnu001b[1;32m   1580u001b[0m     u001b[39mselfu001b[39mu001b[39m.u001b[39m_command_header u001b[39m+u001b[39mnu001b[1;32m   1581u001b[0m     args_command u001b[39m+u001b[39mnu001b[1;32m   1582u001b[0m     protou001b[39m.u001b[39mEND_COMMAND_PARTnu001b[1;32m   1584u001b[0m answer u001b[39m=u001b[39m u001b[39mselfu001b[39mu001b[39m.u001b[39m_gateway_clientu001b[39m.u001b[39msend_command(command)nu001b[0;32m-> 1585u001b[0m return_value u001b[39m=u001b[39m get_return_value(nu001b[1;32m   1586u001b[0m     answer, u001b[39mselfu001b[39;49mu001b[39m.u001b[39;49m_gateway_client, u001b[39mNoneu001b[39;49;00m, u001b[39mselfu001b[39;49mu001b[39m.u001b[39;49m_fqn)nu001b[1;32m   1588u001b[0m u001b[39mforu001b[39;00m temp_arg u001b[39minu001b[39;00m temp_args:nu001b[1;32m   1589u001b[0m     temp_argu001b[39m.u001b[39m_detach()nnFile u001b[0;32m~/anaconda3/lib/python3.9/site-packages/py4j/protocol.py:330u001b[0m, in u001b[0;36mget_return_valueu001b[0;34m(answer, gateway_client, target_id, name)u001b[0mnu001b[1;32m    326u001b[0m         u001b[39mraiseu001b[39;00m Py4JJavaError(nu001b[1;32m    327u001b[0m             u001b[39m"u001b[39mu001b[39mAn error occurred while calling u001b[39mu001b[39m{0}u001b[39;00mu001b[39m{1}u001b[39;00mu001b[39m{2}u001b[39;00mu001b[39m.u001b[39mu001b[39mnu001b[39;00mu001b[39m"u001b[39mu001b[39m.u001b[39mnu001b[1;32m    328u001b[0m             u001b[39mformatu001b[39m(target_id, u001b[39m"u001b[39mu001b[39m.u001b[39mu001b[39m"u001b[39m, name), value)nu001b[1;32m    329u001b[0m     u001b[39melseu001b[39;00m:nu001b[0;32m--> 330u001b[0m         u001b[39mraiseu001b[39;00m Py4JError(nu001b[1;32m    331u001b[0m             u001b[39m"u001b[39mu001b[39mAn error occurred while calling u001b[39mu001b[39m{0}u001b[39;00mu001b[39m{1}u001b[39;00mu001b[39m{2}u001b[39;00mu001b[39m. Trace:u001b[39mu001b[39mnu001b[39;00mu001b[39m{3}u001b[39;00mu001b[39mnu001b[39;00mu001b[39m"u001b[39mu001b[39m.u001b[39mnu001b[1;32m    332u001b[0m             u001b[39mformatu001b[39m(target_id, u001b[39m"u001b[39mu001b[39m.u001b[39mu001b[39m"u001b[39m, name, value))nu001b[1;32m    333u001b[0m u001b[39melseu001b[39;00m:nu001b[1;32m    334u001b[0m     u001b[39mraiseu001b[39;00m Py4JError(nu001b[1;32m    335u001b[0m         u001b[39m"u001b[39mu001b[39mAn error occurred while calling u001b[39mu001b[39m{0}u001b[39;00mu001b[39m{1}u001b[39;00mu001b[39m{2}u001b[39;00mu001b[39m"u001b[39mu001b[39m.u001b[39mnu001b[1;32m    336u001b[0m         u001b[39mformatu001b[39m(target_id, u001b[39m"u001b[39mu001b[39m.u001b[39mu001b[39m"u001b[39m, name))nnu001b[0;31mPy4JErroru001b[0m: An error occurred while calling None.org.apache.spark.sql.SparkSession. Trace:npy4j.Py4JException: Constructor org.apache.spark.sql.SparkSession([class org.apache.spark.SparkContext, class java.util.HashMap]) does not existntat py4j.reflection.ReflectionEngine.getConstructor(ReflectionEngine.java:179)ntat py4j.reflection.ReflectionEngine.getConstructor(ReflectionEngine.java:196)ntat py4j.Gateway.invoke(Gateway.java:237)ntat py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)ntat py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)ntat py4j.GatewayConnection.run(GatewayConnection.java:238)ntat java.base/java.lang.Thread.run(Thread.java:829)nn"
}

在我在文本编辑器中阅读完整错误之前运行单元格的输出如下

Output exceeds the size limit. Open the full output data in a text editor
---------------------------------------------------------------------------
Py4JError                                 Traceback (most recent call last)
/tmp/ipykernel_5260/8684085.py in <module>
      1 from pyspark.sql import SparkSession
----> 2 spark = SparkSession.builder.appName("spark_app").getOrCreate()

~/anaconda3/envs/zupo_env_test1/lib64/python3.7/site-packages/pyspark/sql/session.py in getOrCreate(self)
    270                     # Do not update `SparkConf` for existing `SparkContext`, as it's shared
    271                     # by all sessions.
--> 272                     session = SparkSession(sc, options=self._options)
    273                 else:
    274                     getattr(

~/anaconda3/envs/zupo_env_test1/lib64/python3.7/site-packages/pyspark/sql/session.py in __init__(self, sparkContext, jsparkSession, options)
    305                 )
    306             else:
--> 307                 jsparkSession = self._jvm.SparkSession(self._jsc.sc(), options)
    308         else:
    309             getattr(getattr(self._jvm, "SparkSession$"), "MODULE$").applyModifiableSettings(

~/anaconda3/envs/zupo_env_test1/lib64/python3.7/site-packages/py4j/java_gateway.py in __call__(self, *args)
   1584         answer = self._gateway_client.send_command(command)
   1585         return_value = get_return_value(
-> 1586             answer, self._gateway_client, None, self._fqn)
   1587 
...
    at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
    at py4j.GatewayConnection.run(GatewayConnection.java:238)
    at java.base/java.lang.Thread.run(Thread.java:829)

我用谷歌搜索了很多都没有成功。有人知道出了什么问题吗?

我使用安装了 3.9 Python 的 IPython 内核。

错误出现前的警告:

WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by org.apache.spark.unsafe.Platform (file:/home/ec2-user/spark/spark-3.1.2-bin-hadoop2.7/jars/spark-unsafe_2.12-3.1.2.jar) to constructor java.nio.DirectByteBuffer(long,int)
WARNING: Please consider reporting this to the maintainers of org.apache.spark.unsafe.Platform
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
22/07/05 21:06:22 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
stack overflow Py4JException: Constructor org.apache.spark.sql.SparkSession([class org.apache.spark.SparkContext, class java.util.HashMap]) does not exist
原文答案

答案:

作者头像

我有同样的问题,我已经修复了它从 pip 和 spark 安装相同版本的 pyspark。您应该检查您安装的版本是否相同。

作者头像

您的机器上安装的 spark 版本似乎与 pyspark 的版本不匹配。

使用以下命令检查 spark 的版本:

<path-to-spark-bin>/spark-submit --version
# example output
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _ / _ / _ `/ __/  '_/
   /___/ .__/_,_/_/ /_/_   version 3.1.3
      /_/

现在从示例输出中可以看出,安装的 Spark 版本是 3.1.3,因此您需要通过执行以下命令来安装相同版本的 spark (pyspark) 的 python 库:

pip install pyspark==<the-version-of-your-spark>
# Example
pip install pyspark==3.1.3

相关问题