Search code examples
hadoophiveapache-sparkshark-sql

How to make shark/spark clear the cache?


when i run my shark queries, the memory gets hoarded in the main memory This is my top command result.


Mem: 74237344k total, 70080492k used, 4156852k free, 399544k buffers Swap: 4194288k total, 480k used, 4193808k free, 65965904k cached


this doesn't change even if i kill/stop shark,spark, hadoop processes. Right now, the only way to clear the cache is to reboot the machine.

has anyone faced this issue before? is it some configuration problem or a known issue in spark/shark?


Solution

  • To remove all cached data:

    sqlContext.clearCache()
    

    Source: https://spark.apache.org/docs/2.0.1/api/java/org/apache/spark/sql/SQLContext.html

    If you want to remove an specific Dataframe from cache:

    df.unpersist()