$ zeppelin使用趟坑

$ 环境

组件 版本
zeppelin 0.8.0
CDH 5.15
spark 2.3.0
python 3.6.5

$ PySpark

$ netty包版本冲突

pyspark不能运行,产生下面的NoSuchMethodError异常:

WARN [2018-12-14 14:14:40,397] ({pool-2-thread-67} NotebookServer.java[afterStatusChange]:2302) - Job 20181204-201952_953985087 is finished, status: ERROR, exception: null, result: %text java.lang.NoSuchMethodError: io.netty.buffer.PooledByteBufAllocator.metric()Lio/netty/buffer/PooledByteBufAllocatorMetric;
            at org.apache.spark.network.util.NettyMemoryMetrics.registerMetrics(NettyMemoryMetrics.java:80)
            at org.apache.spark.network.util.NettyMemoryMetrics.<init>(NettyMemoryMetrics.java:76)
            at org.apache.spark.network.client.TransportClientFactory.<init>(TransportClientFactory.java:109)
            at org.apache.spark.network.TransportContext.createClientFactory(TransportContext.java:99)
            ...//堆栈信息
            at org.apache.zeppelin.spark.BaseSparkScalaInterpreter.spark2CreateContext(BaseSparkScalaInterpreter.scala:189)

大致原因是zeppelin的netty-all包版本比spark的版本低,将zeppelin的lib目录里的netty-all-4.0.23.Final.jar替换成netty-all-4.1.17.Final.jar解决,但是又产生下面的异常:

$ jackson包版本冲突

异常信息如下:

com.fasterxml.jackson.databind.JsonMappingException: Incompatible Jackson version: 2.8.11-1
       at com.fasterxml.jackson.module.scala.JacksonModule$class.setupModule(JacksonModule.scala:64)
       at com.fasterxml.jackson.module.scala.DefaultScalaModule.setupModule(DefaultScalaModule.scala:19)
       at com.fasterxml.jackson.databind.ObjectMapper.registerModule(ObjectMapper.java:747)
       at org.apache.spark.util.JsonProtocol$.<init>(JsonProtocol.scala:59)
       at org.apache.spark.util.JsonProtocol$.<clinit>(JsonProtocol.scala)

zeppelin使用的jackson-databind包的版本是2.8.11.1,换成2.6.7.1问题解决。

$ commons-lang3包版本冲突

运行下面的代码:

%spark.pyspark
df = spark.read.format("csv").option("header", "true").load("test.csv")
print(df.show(5))

使用pyspark读取csv文件到dataframe中,当调用df.show()后产生如下异常:

java.io.InvalidClassException: org.apache.commons.lang3.time.FastDateParser; local class incompatible: stream classdesc serialVersionUID = 2, local class serialVersionUID = 3
	at java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:621)
	at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1623)
	at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1518)
	...//堆栈信息
(<class 'py4j.protocol.Py4JJavaError'>, Py4JJavaError('An error occurred while calling o98.showString.\n', JavaObject id=o100), <traceback object at 0x7faeb0e56048>)	

大致原因是类FastDateParser序列化时的serialVersionUID是2,而反序列化时serialVersionUID是3,查看发现zeppelin使用的commons-lang3包的版本是3.4,而spark使用的是3.5,把zeppelin的包替换成3.5后此问题解决。

$ 开发环境配置

$ 编译

# zeppelin 0.9 maven编译参数
mvn clean install -DskipTests -Drat.skip=true -Dcheckstyle.skip -pl '!groovy,!angular,!shell,!livy,!hbase,!pig,!jdbc,!file,!flink,!ignite,!kylin,!lens,!cassandra,!elasticsearch,!bigquery,!alluxio,!scio,!neo4j,!sap,!scalding,!java,!beam,!hazelcastjet,!geode' -Pscala-2.11 -Pspark-2.3 -Dhadoop.version=2.6.0-cdh5.15.0

$ maven CheckStyle插件

略过checkstyle检查:

mvn [goal] -Dcheckstyle.skip

IDE插件:

  1. 安装CheckStyle IDEA plugin (opens new window).
  2. 把配置引入编辑器 (opens new window)

参考:

https://stackoverflow.com/questions/8409074/how-can-i-easily-fix-checkstyle-errors/8417213#8417213

https://stackoverflow.com/questions/35149422/how-to-fix-the-maven-check-style-error/35149647

更新时间: 9/24/2019, 6:01:53 AM