Problem is quite simple, every time I query on drill, the heap memory keeps on accumulating. My heap memory is 7 GBs but its not getting refreshed. After every 15 minutes I have to kill drill and start it again to clear the heap memory.
-) I am running apache drill on single node. Queries are executed on drill using the R package 'sergeant' and usually, parquet files are target files. Current OS is windows 7 Enterprise. -) We first build the query using src_drill and then use drl_con to execute the query. The architecture of building the query and then executing the query is a architecture choice as we want the application to be able to switch between different query engines, like sql, hive, spark etc.
library(sergeant)
# setting up drill query, I do not use collect() here
ds <- src_drill("localhost")
query <- tbl(ds, "cp.`employee.json`")
query %<>% dbplyr::sql_render()
# using drill con to execute the query
drl_con <- drill_connection("localhost")
Mapping <- drill_query(drl_con, query, .progress = FALSE)
## # A tibble: 100 x 16
## employee_id full_name first_name last_name position_id position_title store_id department_id birth_date hire_date
## <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
## 1 1 Sheri No… Sheri Nowmer 1 President 0 1 1961-08-26 1994-12-…
## 2 2 Derrick … Derrick Whelply 2 VP Country Ma… 0 1 1915-07-03 1994-12-…
## 3 4 Michael … Michael Spence 2 VP Country Ma… 0 1 1969-06-20 1998-01-…
## 4 5 Maya Gut… Maya Gutierrez 2 VP Country Ma… 0 1 1951-05-10 1998-01-…
## 5 6 Roberta … Roberta Damstra 3 VP Informatio… 0 2 1942-10-08 1994-12-…
## 6 7 Rebecca … Rebecca Kanagaki 4 VP Human Reso… 0 3 1949-03-27 1994-12-…
## 7 8 Kim Brun… Kim Brunner 11 Store Manager 9 11 1922-08-10 1998-01-…
## 8 9 Brenda B… Brenda Blumberg 11 Store Manager 21 11 1979-06-23 1998-01-…
## 9 10 Darren S… Darren Stanz 5 VP Finance 0 5 1949-08-26 1994-12-…
## 10 11 Jonathan… Jonathan Murraiin 11 Store Manager 1 11 1967-06-20 1998-01-…
## # … with 90 more rows, and 6 more variables: salary <chr>, supervisor_id <chr>, education_level <chr>,
## # marital_status <chr>, gender <chr>, management_role <chr>
Ideally I would expect drill to do garbage collection on heap memory on its own after every query, but now its not happening.
Apache Drill has its own memory manager. On the task manager it never releases the heap memory but in the background it starts to reuse the heap memory once its full.
If you are getting memory issues chances are you are going overboard some of the other memory parameters like total memory allotted to a single query, etc.
Recycling of heap memory is not something that you should be worried about. Refer to: https://books.google.com.au/books?id=-Tp7DwAAQBAJ&printsec=frontcover&dq=apache+drill+nook&hl=en&sa=X&ved=0ahUKEwil7LeJuPzkAhXKZSsKHUDoBw4Q6AEIKjAA#v=onepage&q&f=false for more details