Skip to content

Resource Allocation Errors

Error: Insufficient Resource

check your cluster UI to ensure that workers are registered and have sufficient resources

What went wrong?

The allocated memory with Driver and Executor is less

Solution

Increase the memory to rectify the error

Error: OOM (Out of Memory) Killed Error

What went wrong?

  • Inefficient queries
  • High concurrency
  • Incorrect configuration

Solution

https://komodor.com/learn/how-to-fix-oomkilled-exit-code-137/

Resource Material

https://towardsdatascience.com/6-recommendations-for-optimizing-a-spark-job-5899ec269b4b

https://blog.clairvoyantsoft.com/apache-spark-out-of-memory-issue-b63c7987fff

Memory Usage of Reduce Tasks

Sometimes, you will get an OutOfMemoryError (Link) not because your RDDs don’t fit in memory, but because the working set of one of your tasks like one of Spark’s shuffle operations (sortByKeygroupByKeyreduceByKeyjoin, etc) build a hash table within each task to perform the grouping, which can often be large.

The simplest fix here is to increase the level of parallelism so that each task’s input set is smaller.

Error: Driver maxresultsize error

What went wrong?

It is caused by actions like RDD's collect() that send a big chunk of data to the driver.

Solutions

  • Set SparkConf: conf.set("spark.driver.maxResultSize", "3g")
  • Increase spark.driver.maxResultSize or set it to 0 for unlimited
  • If broadcast joins are the culprit, you can reduce spark.sql.autoBroadcastJoinThreshold to limit the size of broadcast join data
  • Reduce the number of partitions.

Error: Missing output location for shuffle partitions

Solution

The solution was to either add swap or configure the worker/executor to use less memory in addition to using MEMORY_AND_DISK storage level for several persists.

Error: Executor Lost Failure

Solution

The solution, if you're using yarn, was to set -conf spark.yarn.executor.memoryOverhead=600, alternatively if your cluster uses Mesos you can try -conf spark.mesos.executor.memoryOverhead=600 instead