spark history界面incomplete不展示任务的原因

1、背景:

在做spark history complete和incomplete测试的时候,我们使用spark-shell启动了一个on yarn的任务,如果我们只是启动了,没有进行任何计算的话,那么我们在incomplete里面是看读的任务的,退出的时候可以在complete中可以看到,那么为什么incomplete没有展示的呢?

2、官方给complete和incomplete进行了解释:

1. The history server displays both completed and incomplete Spark jobs. If an application makes multiple attempts after failures, the failed attempts will be displayed, as well as any ongoing incomplete attempt or the final successful attempt.

2. Incomplete applications are only updated intermittently. The time between updates is defined by the interval between checks for changed files (spark.history.fs.update.interval). On larger clusters the update interval may be set to large values. The way to view a running application is actually to view its own web UI.

3. Applications which exited without registering themselves as completed will be listed as incomplete —even though they are no longer running. This can happen if an application crashes.

4. One way to signal the completion of a Spark job is to stop the Spark Context explicitly (sc.stop()), or in Python using the with SparkContext() as sc: construct to handle the Spark Context setup and tear down.

意思是:

1. spark history server同时展示complete和incomplete spark任务,如果应用程序在失败后进行多次尝试,这失败的尝试将被展示;任何正在运行的未完成的尝试或最终成功的尝试也会被展示

2. 未完成的任务间接的被更新,它的更新的时间在被定义检查文件中的spark.history.fs.update.interval参数决定。在大的集群更新间隔时间可以设置大些,通过查看web ui来看任务时间运行情况

3. 未注册已完成而退出的应用程序将被列为未完成,即使他们不在运行。如果应用程序奔溃,这种情况就会发生

4. 标志Spark作业完成的一种方法是显式地停止Spark上下文(sc.stop()),或者在Python中使用with SparkContext()作为sc:结构来处理Spark上下文的设置和销毁。

3、总结:

1)如果启动一个spark-shell后,没有任务操作,这里的操作是spark的算子操作,这个任务就不会出现在incomplete中,退出后会出现在complete中。

2)在启动一个spark-shell或者通过spark-submit可以提交一个应用程序,执行sc.stop就会在complete页面中出现。如果没有执行sc.stop或者直接退出就会显示在incomplete中。

3)同时在hdfs目录中队友完成和未完成的页面文件格式也不

[ scripts]$ hadoop fs -ls /spark/logs
Found 7 items
-rwxrwx---   3 biztech  supergroup       6092 2020-03-26 16:00 /spark/logs/application_1584437515809_1102.lz4
-rwxrwx---   3 biztech  supergroup       6111 2020-03-26 16:47 /spark/logs/application_1584437515809_1103.lz4
-rwxrwx---   3 bizdevcd supergroup      13931 2020-03-26 16:40 /spark/logs/application_1584437515809_1105_1.lz4
-rwxrwx---   3 biztech  supergroup     300172 2020-03-26 18:40 /spark/logs/application_1584437515809_1106_1.lz4.inprogress
-rwxrwx---   3 bizdevcd supergroup     243646 2020-03-26 18:37 /spark/logs/application_1584437515809_1109_1.lz4.inprogress
-rwxrwx---   3 bizdevcd supergroup      12729 2020-03-26 17:43 /spark/logs/application_1584437515809_1120_1.lz4
-rwxrwx---   3 bizdevcd supergroup      15690 2020-03-26 17:50 /spark/logs/application_1584437515809_1121.lz4
-rwxrwx---   3 biztech  supergroup    2386877 2019-08-21 20:17 /spark/logs/local-1566389515129.lz4

.inprogess为后缀的文件,这个文件是未完成的作业或者正在运行的作业的日志文件。