I am running Microsoft SQL server on Ubuntu 16.04.2 LTS in QEMU VM SQL Agent installed as well. 16 GB RAM assigned, and 6 processors. SQL Upper memory limit set to 10 GB
I have a single 1.2 GB database. Simple Recovery mode. Single SQL Agent job, that backs up the DB.
Problem: sqlserv process is killed by OOM shortly after job finished.
What settings should I be looking at to fix this? I do not see anything in the SQL logs, only the messages in dmesg.
BACKUP JOB: --Script 1: Backup specific database
-- 1. Variable declaration
DECLARE @path VARCHAR(500)
DECLARE @name VARCHAR(500)
DECLARE @pathwithname VARCHAR(500)
DECLARE @time DATETIME
DECLARE @year VARCHAR(4)
DECLARE @month VARCHAR(2)
DECLARE @day VARCHAR(2)
DECLARE @hour VARCHAR(2)
DECLARE @minute VARCHAR(2)
DECLARE @second VARCHAR(2)
-- 2. Setting the backup path
SET @path = 'C:\sqldata\SQLBACKUPS\'
-- 3. Getting the time values
SELECT @time = GETDATE()
SELECT @year = (SELECT CONVERT(VARCHAR(4), DATEPART(yy, @time)))
SELECT @month = (SELECT CONVERT(VARCHAR(2), FORMAT(DATEPART(mm,@time),'00')))
SELECT @day = (SELECT CONVERT(VARCHAR(2), FORMAT(DATEPART(dd,@time),'00')))
SELECT @hour = (SELECT CONVERT(VARCHAR(2), FORMAT(DATEPART(hh,@time),'00')))
SELECT @minute = (SELECT CONVERT(VARCHAR(2), FORMAT(DATEPART(mi,@time),'00')))
SELECT @second = (SELECT CONVERT(VARCHAR(2), FORMAT(DATEPART(ss,@time),'00')))
-- 4. Defining the filename format
SELECT @name ='DBNAME' + '_' + @year + @month + @day + @hour + @minute + @second
SET @pathwithname = @path + @namE + '.bak'
--5. Executing the backup command
BACKUP DATABASE [DBNAME]
ERROR MESSAGE in dmesg:
[617521.605059] kthreadd invoked oom-killer: gfp_mask=0x27000c0(GFP_KERNEL_ACCOUNT|__GFP_NOTRACK), order=2, oom_score_adj=0
[617521.605060] kthreadd cpuset=/ mems_allowed=0
[617521.605076] CPU: 1 PID: 2 Comm: kthreadd Not tainted 4.8.0-46-generic #49~16.04.1-Ubuntu
[617521.605077] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
[617521.605082] 0000000000000286 00000000ac5a0d51 ffff8806ed5dbb00 ffffffffa0e2e073
[617521.605086] ffff8806ed5dbc90 ffff8806ea450ec0 ffff8806ed5dbb68 ffffffffa0c2e97b
[617521.605088] 0000000000000000 ffff8802fb7b8a80 ffff8806ea450ec0 ffff8806ed5dbb58
[617521.605090] Call Trace:
[617521.605117] [<ffffffffa0e2e073>] dump_stack+0x63/0x90
[617521.605130] [<ffffffffa0c2e97b>] dump_header+0x5c/0x1dc
[617521.605143] [<ffffffffa0dbd629>] ? apparmor_capable+0xe9/0x1a0
[617521.605152] [<ffffffffa0ba58d6>] oom_kill_process+0x226/0x3f0
[617521.605154] [<ffffffffa0ba5e4a>] out_of_memory+0x35a/0x3f0
[617521.605156] [<ffffffffa0bab079>] __alloc_pages_slowpath+0x959/0x980
[617521.605157] [<ffffffffa0bab35a>] __alloc_pages_nodemask+0x2ba/0x300
[617521.605166] [<ffffffffa0a80726>] copy_process.part.30+0x146/0x1b50
[617521.605176] [<ffffffffa0a63eee>] ? kvm_sched_clock_read+0x1e/0x30
[617521.605183] [<ffffffffa0aa3ed0>] ? kthread_create_on_node+0x1e0/0x1e0
[617521.605194] [<ffffffffa0a2c78c>] ? __switch_to+0x2dc/0x700
[617521.605196] [<ffffffffa0a82327>] _do_fork+0xe7/0x3f0
[617521.605213] [<ffffffffa1295b17>] ? __schedule+0x307/0x790
[617521.605215] [<ffffffffa0a82659>] kernel_thread+0x29/0x30
[617521.605219] [<ffffffffa0aa48e0>] kthreadd+0x160/0x1b0
[617521.605222] [<ffffffffa129aa1f>] ret_from_fork+0x1f/0x40
[617521.605224] [<ffffffffa0aa4780>] ? kthread_create_on_cpu+0x60/0x60
[617521.605225] Mem-Info:
[617521.605231] active_anon:1075398 inactive_anon:4083 isolated_anon:0
active_file:2616493 inactive_file:328306 isolated_file:160
unevictable:1 dirty:327621 writeback:785 unstable:0
slab_reclaimable:21286 slab_unreclaimable:7420
mapped:10714 shmem:5451 pagetables:6225 bounce:0
free:33879 free_pcp:498 free_cma:0
[617521.605234] Node 0 active_anon:4301592kB inactive_anon:16332kB active_file:10465972kB inactive_file:1313224kB unevictable:4kB isolated(anon):0kB isolated(file):640kB mapped:42856kB dirty:1310484kB writeback:3140kB shmem:0kB shmem_thp: 0kB shmem_pmdmapped: 3321856kB anon_thp: 21804kB writeback_tmp:0kB unstable:0kB pages_scanned:17790528 all_unreclaimable? yes
[617521.605235] Node 0 DMA free:15900kB min:64kB low:80kB high:96kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15992kB managed:15908kB mlocked:0kB slab_reclaimable:0kB slab_unreclaimable:8kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
[617521.605238] lowmem_reserve[]: 0 2952 15988 15988 15988
[617521.605240] Node 0 DMA32 free:64576kB min:12464kB low:15580kB high:18696kB active_anon:733012kB inactive_anon:0kB active_file:2107244kB inactive_file:145520kB unevictable:0kB writepending:145520kB present:3129192kB managed:3063624kB mlocked:0kB slab_reclaimable:6992kB slab_unreclaimable:1272kB kernel_stack:1280kB pagetables:2844kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
[617521.605243] lowmem_reserve[]: 0 0 13036 13036 13036
[617521.605244] Node 0 Normal free:55040kB min:55048kB low:68808kB high:82568kB active_anon:3568580kB inactive_anon:16332kB active_file:8358728kB inactive_file:1167704kB unevictable:4kB writepending:1168104kB present:13631488kB managed:13352220kB mlocked:4kB slab_reclaimable:78152kB slab_unreclaimable:28400kB kernel_stack:5168kB pagetables:22056kB bounce:0kB free_pcp:1992kB local_pcp:100kB free_cma:0kB
[617521.605264] lowmem_reserve[]: 0 0 0 0 0
[617521.605266] Node 0 DMA: 1*4kB (U) 1*8kB (U) 1*16kB (U) 0*32kB 2*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15900kB
[617521.605277] Node 0 DMA32: 208*4kB (UE) 148*8kB (UE) 260*16kB (UE) 115*32kB (UME) 121*64kB (UME) 73*128kB (UME) 67*256kB (UME) 22*512kB (UME) 9*1024kB (UME) 0*2048kB 0*4096kB = 64576kB
[617521.605284] Node 0 Normal: 856*4kB (UMEH) 604*8kB (UEH) 278*16kB (UMEH) 373*32kB (UMEH) 185*64kB (UMEH) 53*128kB (UMEH) 14*256kB (UMEH) 6*512kB (UME) 5*1024kB (MH) 0*2048kB 0*4096kB = 55040kB
[617521.605293] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
[617521.605294] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
[617521.605294] 2950382 total pagecache pages
[617521.605295] 0 pages in swap cache
[617521.605296] Swap cache stats: add 0, delete 0, find 0/0
[617521.605296] Free swap = 0kB
[617521.605297] Total swap = 0kB
[617521.605297] 4194168 pages RAM
[617521.605297] 0 pages HighMem/MovableOnly
[617521.605298] 86230 pages reserved
[617521.605298] 0 pages cma reserved
[617521.605298] 0 pages hwpoisoned
[617521.605299] [ pid ] uid tgid total_vm rss nr_ptes nr_pmds swapents oom_score_adj name
[617521.605304] [ 337] 0 337 10867 3412 25 3 0 0 systemd-journal
[617521.605306] [ 382] 0 382 25742 291 17 3 0 0 lvmetad
[617521.605307] [ 384] 0 384 11276 897 22 3 0 -1000 systemd-udevd
[617521.605308] [ 780] 108 780 90615 2349 78 3 0 0 whoopsie
[617521.605309] [ 789] 106 789 11833 986 27 3 0 -900 dbus-daemon
[617521.605311] [ 803] 0 803 1100 312 7 3 0 0 acpid
[617521.605312] [ 823] 104 823 65138 701 29 3 0 0 rsyslogd
[617521.605313] [ 835] 0 835 129671 2914 40 6 0 0 snapd
[617521.605314] [ 836] 0 836 7137 729 18 3 0 0 systemd-logind
[617521.605315] [ 838] 0 838 7252 644 20 3 0 0 cron
[617521.605316] [ 857] 0 857 84342 1436 65 3 0 0 ModemManager
[617521.605317] [ 965] 0 965 16380 1344 35 3 0 -1000 sshd
[617521.605318] [ 967] 0 967 4884 65 14 3 0 0 irqbalance
[617521.605320] [ 992] 0 992 17496 788 40 3 0 0 login
[617521.605321] [ 1098] 0 1098 74129 1986 47 3 0 0 polkitd
[617521.605322] [ 1116] 120 1116 11105 983 23 3 0 0 ntpd
[617521.605323] [ 1152] 0 1152 71840 2120 136 4 0 0 winbindd
[617521.605324] [ 1153] 0 1153 105122 3484 203 4 0 0 winbindd
[617521.605325] [ 1159] 0 1159 73413 2856 140 4 0 0 winbindd
[617521.605326] [ 1161] 0 1161 71832 1924 135 4 0 0 winbindd
[617521.605327] [ 1163] 0 1163 71832 1295 136 4 0 0 winbindd
[617521.605328] [ 1721] 1000 1721 11312 932 26 3 0 0 systemd
[617521.605329] [ 1722] 1000 1722 16318 466 34 3 0 0 (sd-pam)
[617521.605337] [ 1725] 1000 1725 5613 1066 16 3 0 0 bash
[617521.605338] [ 1789] 0 1789 14274 787 33 3 0 0 sudo
[617521.605339] [ 1790] 0 1790 14109 719 33 3 0 0 su
[617521.605340] [ 1791] 0 1791 5619 1120 17 3 0 0 bash
[617521.605342] [ 1935] 0 1935 60002 1421 114 4 0 0 nmbd
[617521.605343] [ 1948] 0 1948 86040 3924 165 3 0 0 smbd
[617521.605345] [ 1949] 0 1949 82452 1067 155 3 0 0 smbd
[617521.605347] [ 1951] 0 1951 86171 1589 160 3 0 0 smbd
[617521.605349] [19081] 0 19081 87063 4262 167 3 0 0 smbd
[617521.605351] [19253] 0 19253 24889 1458 52 3 0 0 sshd
[617521.605352] [19275] 1000 19275 24889 891 51 3 0 0 sshd
[617521.605354] [19276] 1000 19276 5605 1104 16 3 0 0 bash
[617521.605356] [19307] 0 19307 14274 778 33 3 0 0 sudo
[617521.605357] [19308] 0 19308 14109 737 32 3 0 0 su
[617521.605359] [19309] 0 19309 5618 1184 16 3 0 0 bash
[617521.605360] [16347] 999 16347 18952 4419 40 4 0 0 sqlservr
[617521.605361] [16349] 999 16349 3028846 1043058 2562 26 0 0 sqlservr
[617521.605362] [20193] 0 20193 88057 4618 168 3 0 0 smbd
[617521.605363] [30023] 0 30023 87931 4038 167 3 0 0 smbd
[617521.605364] [ 4801] 0 4801 87627 4088 167 3 0 0 smbd
[617521.605365] [ 5266] 0 5266 68705 2451 66 4 0 0 cups-browsed
[617521.605366] [ 7563] 0 7563 88008 4183 167 3 0 0 smbd
[617521.605368] [10495] 0 10495 88072 4621 168 3 0 0 smbd
[617521.605369] [12342] 0 12342 88008 4292 167 3 0 0 smbd
[617521.605371] [12797] 0 12797 12555 719 30 3 0 0 cron
[617521.605373] [12798] 0 12798 12555 719 30 3 0 0 cron
[617521.605375] [12799] 0 12799 1127 213 8 3 0 0 sh
[617521.605376] [12800] 0 12800 1127 187 7 3 0 0 sh
[617521.605377] [12801] 0 12801 4902 785 15 3 0 0 rsync
[617521.605378] [12802] 0 12802 4732 483 14 3 0 0 rsync
[617521.605379] [12803] 0 12803 3911 690 12 3 0 0 rsync
[617521.605380] [12804] 0 12804 3741 452 11 3 0 0 rsync
[617521.605381] [12805] 0 12805 4878 477 15 3 0 0 rsync
[617521.605382] [12806] 0 12806 3911 515 11 3 0 0 rsync
[617521.605383] Out of memory: Kill process 16349 (sqlservr) score 254 or sacrifice child
[617521.608484] Killed process 16349 (sqlservr) total-vm:12115384kB, anon-rss:4164616kB, file-rss:7616kB, shmem-rss:0kB
[617521.832626] oom_reaper: reaped process 16349 (sqlservr), now anon-rss:0kB, file-rss:236kB, shmem-rss:0kB
You can configure SQL sp_configure setting to limit memory consumption if there are other processes consuming memory on the machine causing it to run out of memory or increase swap ( though you don't want SQL to be swapped out) or increase memory.
We can also tune the way that the OOM killer handles OOM conditions. If we want to make SQL process ( in this case 3452 ) less likely to be killed by the OOM killer echo -15 > /proc/3452/oom_adj