Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-14958

Failed task hangs if error is encountered when getting task result

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 1.6.0, 2.0.0, 2.1.0
    • 2.2.0
    • None
    • None

    Description

      In TaskResultGetter, if we get an error when deserialize TaskEndReason, TaskScheduler won't have a chance to handle the failed task and the task just hangs.

        def enqueueFailedTask(taskSetManager: TaskSetManager, tid: Long, taskState: TaskState,
          serializedData: ByteBuffer) {
          var reason : TaskEndReason = UnknownReason
          try {
            getTaskResultExecutor.execute(new Runnable {
              override def run(): Unit = Utils.logUncaughtExceptions {
                val loader = Utils.getContextOrSparkClassLoader
                try {
                  if (serializedData != null && serializedData.limit() > 0) {
                    reason = serializer.get().deserialize[TaskEndReason](
                      serializedData, loader)
                  }
                } catch {
                  case cnd: ClassNotFoundException =>
                    // Log an error but keep going here -- the task failed, so not catastrophic
                    // if we can't deserialize the reason.
                    logError(
                      "Could not deserialize TaskEndReason: ClassNotFound with classloader " + loader)
                  case ex: Exception => {}
                }
                scheduler.handleFailedTask(taskSetManager, tid, taskState, reason)
              }
            })
          } catch {
            case e: RejectedExecutionException if sparkEnv.isStopped =>
              // ignore it
          }
        }
      

      In my specific case, I got a NoClassDefFoundError and the failed task hangs forever.

      Attachments

        Issue Links

          Activity

            People

              lirui Rui Li
              lirui Rui Li
              Votes:
              1 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: