Search code examples
sql-serverlinuxsql-server-on-linux

MS SQL Server crashing on Linux


I am running Microsoft SQL server on Ubuntu 16.04.2 LTS in QEMU VM SQL Agent installed as well. 16 GB RAM assigned, and 6 processors. SQL Upper memory limit set to 10 GB

I have a single 1.2 GB database. Simple Recovery mode. Single SQL Agent job, that backs up the DB.

Problem: sqlserv process is killed by OOM shortly after job finished.

What settings should I be looking at to fix this? I do not see anything in the SQL logs, only the messages in dmesg.

BACKUP JOB: --Script 1: Backup specific database

-- 1. Variable declaration

DECLARE @path VARCHAR(500)
DECLARE @name VARCHAR(500)
DECLARE @pathwithname VARCHAR(500)
DECLARE @time DATETIME
DECLARE @year VARCHAR(4)
DECLARE @month VARCHAR(2)
DECLARE @day VARCHAR(2)
DECLARE @hour VARCHAR(2)
DECLARE @minute VARCHAR(2)
DECLARE @second VARCHAR(2)

-- 2. Setting the backup path

SET @path = 'C:\sqldata\SQLBACKUPS\'

-- 3. Getting the time values

SELECT @time   = GETDATE()
SELECT @year   = (SELECT CONVERT(VARCHAR(4), DATEPART(yy, @time)))
SELECT @month  = (SELECT CONVERT(VARCHAR(2), FORMAT(DATEPART(mm,@time),'00')))
SELECT @day    = (SELECT CONVERT(VARCHAR(2),  FORMAT(DATEPART(dd,@time),'00')))
SELECT @hour   = (SELECT CONVERT(VARCHAR(2), FORMAT(DATEPART(hh,@time),'00')))
SELECT @minute = (SELECT CONVERT(VARCHAR(2), FORMAT(DATEPART(mi,@time),'00')))
SELECT @second = (SELECT CONVERT(VARCHAR(2), FORMAT(DATEPART(ss,@time),'00')))

-- 4. Defining the filename format

SELECT @name ='DBNAME' + '_' + @year + @month + @day + @hour + @minute + @second

SET @pathwithname = @path + @namE + '.bak'

--5. Executing the backup command

BACKUP DATABASE [DBNAME] 

ERROR MESSAGE in dmesg:

[617521.605059] kthreadd invoked oom-killer: gfp_mask=0x27000c0(GFP_KERNEL_ACCOUNT|__GFP_NOTRACK), order=2, oom_score_adj=0
[617521.605060] kthreadd cpuset=/ mems_allowed=0
[617521.605076] CPU: 1 PID: 2 Comm: kthreadd Not tainted 4.8.0-46-generic #49~16.04.1-Ubuntu
[617521.605077] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
[617521.605082]  0000000000000286 00000000ac5a0d51 ffff8806ed5dbb00 ffffffffa0e2e073
[617521.605086]  ffff8806ed5dbc90 ffff8806ea450ec0 ffff8806ed5dbb68 ffffffffa0c2e97b
[617521.605088]  0000000000000000 ffff8802fb7b8a80 ffff8806ea450ec0 ffff8806ed5dbb58
[617521.605090] Call Trace:
[617521.605117]  [<ffffffffa0e2e073>] dump_stack+0x63/0x90
[617521.605130]  [<ffffffffa0c2e97b>] dump_header+0x5c/0x1dc
[617521.605143]  [<ffffffffa0dbd629>] ? apparmor_capable+0xe9/0x1a0
[617521.605152]  [<ffffffffa0ba58d6>] oom_kill_process+0x226/0x3f0
[617521.605154]  [<ffffffffa0ba5e4a>] out_of_memory+0x35a/0x3f0
[617521.605156]  [<ffffffffa0bab079>] __alloc_pages_slowpath+0x959/0x980
[617521.605157]  [<ffffffffa0bab35a>] __alloc_pages_nodemask+0x2ba/0x300
[617521.605166]  [<ffffffffa0a80726>] copy_process.part.30+0x146/0x1b50
[617521.605176]  [<ffffffffa0a63eee>] ? kvm_sched_clock_read+0x1e/0x30
[617521.605183]  [<ffffffffa0aa3ed0>] ? kthread_create_on_node+0x1e0/0x1e0
[617521.605194]  [<ffffffffa0a2c78c>] ? __switch_to+0x2dc/0x700
[617521.605196]  [<ffffffffa0a82327>] _do_fork+0xe7/0x3f0
[617521.605213]  [<ffffffffa1295b17>] ? __schedule+0x307/0x790
[617521.605215]  [<ffffffffa0a82659>] kernel_thread+0x29/0x30
[617521.605219]  [<ffffffffa0aa48e0>] kthreadd+0x160/0x1b0
[617521.605222]  [<ffffffffa129aa1f>] ret_from_fork+0x1f/0x40
[617521.605224]  [<ffffffffa0aa4780>] ? kthread_create_on_cpu+0x60/0x60
[617521.605225] Mem-Info:
[617521.605231] active_anon:1075398 inactive_anon:4083 isolated_anon:0
             active_file:2616493 inactive_file:328306 isolated_file:160
             unevictable:1 dirty:327621 writeback:785 unstable:0
             slab_reclaimable:21286 slab_unreclaimable:7420
             mapped:10714 shmem:5451 pagetables:6225 bounce:0
             free:33879 free_pcp:498 free_cma:0
[617521.605234] Node 0 active_anon:4301592kB inactive_anon:16332kB active_file:10465972kB inactive_file:1313224kB unevictable:4kB isolated(anon):0kB isolated(file):640kB mapped:42856kB dirty:1310484kB writeback:3140kB shmem:0kB shmem_thp: 0kB shmem_pmdmapped: 3321856kB anon_thp: 21804kB writeback_tmp:0kB unstable:0kB pages_scanned:17790528 all_unreclaimable? yes
[617521.605235] Node 0 DMA free:15900kB min:64kB low:80kB high:96kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15992kB managed:15908kB mlocked:0kB slab_reclaimable:0kB slab_unreclaimable:8kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
[617521.605238] lowmem_reserve[]: 0 2952 15988 15988 15988
[617521.605240] Node 0 DMA32 free:64576kB min:12464kB low:15580kB high:18696kB active_anon:733012kB inactive_anon:0kB active_file:2107244kB inactive_file:145520kB unevictable:0kB writepending:145520kB present:3129192kB managed:3063624kB mlocked:0kB slab_reclaimable:6992kB slab_unreclaimable:1272kB kernel_stack:1280kB pagetables:2844kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
[617521.605243] lowmem_reserve[]: 0 0 13036 13036 13036
[617521.605244] Node 0 Normal free:55040kB min:55048kB low:68808kB high:82568kB active_anon:3568580kB inactive_anon:16332kB active_file:8358728kB inactive_file:1167704kB unevictable:4kB writepending:1168104kB present:13631488kB managed:13352220kB mlocked:4kB slab_reclaimable:78152kB slab_unreclaimable:28400kB kernel_stack:5168kB pagetables:22056kB bounce:0kB free_pcp:1992kB local_pcp:100kB free_cma:0kB
[617521.605264] lowmem_reserve[]: 0 0 0 0 0
[617521.605266] Node 0 DMA: 1*4kB (U) 1*8kB (U) 1*16kB (U) 0*32kB 2*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15900kB
[617521.605277] Node 0 DMA32: 208*4kB (UE) 148*8kB (UE) 260*16kB (UE) 115*32kB (UME) 121*64kB (UME) 73*128kB (UME) 67*256kB (UME) 22*512kB (UME) 9*1024kB (UME) 0*2048kB 0*4096kB = 64576kB
[617521.605284] Node 0 Normal: 856*4kB (UMEH) 604*8kB (UEH) 278*16kB (UMEH) 373*32kB (UMEH) 185*64kB (UMEH) 53*128kB (UMEH) 14*256kB (UMEH) 6*512kB (UME) 5*1024kB (MH) 0*2048kB 0*4096kB = 55040kB
[617521.605293] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
[617521.605294] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
[617521.605294] 2950382 total pagecache pages
[617521.605295] 0 pages in swap cache
[617521.605296] Swap cache stats: add 0, delete 0, find 0/0
[617521.605296] Free swap  = 0kB
[617521.605297] Total swap = 0kB
[617521.605297] 4194168 pages RAM
[617521.605297] 0 pages HighMem/MovableOnly
[617521.605298] 86230 pages reserved
[617521.605298] 0 pages cma reserved
[617521.605298] 0 pages hwpoisoned
[617521.605299] [ pid ]   uid  tgid total_vm      rss nr_ptes nr_pmds swapents oom_score_adj name
[617521.605304] [  337]     0   337    10867     3412      25       3        0             0 systemd-journal
[617521.605306] [  382]     0   382    25742      291      17       3        0             0 lvmetad
[617521.605307] [  384]     0   384    11276      897      22       3        0         -1000 systemd-udevd
[617521.605308] [  780]   108   780    90615     2349      78       3        0             0 whoopsie
[617521.605309] [  789]   106   789    11833      986      27       3        0          -900 dbus-daemon
[617521.605311] [  803]     0   803     1100      312       7       3        0             0 acpid
[617521.605312] [  823]   104   823    65138      701      29       3        0             0 rsyslogd
[617521.605313] [  835]     0   835   129671     2914      40       6        0             0 snapd
[617521.605314] [  836]     0   836     7137      729      18       3        0             0 systemd-logind
[617521.605315] [  838]     0   838     7252      644      20       3        0             0 cron
[617521.605316] [  857]     0   857    84342     1436      65       3        0             0 ModemManager
[617521.605317] [  965]     0   965    16380     1344      35       3        0         -1000 sshd
[617521.605318] [  967]     0   967     4884       65      14       3        0             0 irqbalance
[617521.605320] [  992]     0   992    17496      788      40       3        0             0 login
[617521.605321] [ 1098]     0  1098    74129     1986      47       3        0             0 polkitd
[617521.605322] [ 1116]   120  1116    11105      983      23       3        0             0 ntpd
[617521.605323] [ 1152]     0  1152    71840     2120     136       4        0             0 winbindd
[617521.605324] [ 1153]     0  1153   105122     3484     203       4        0             0 winbindd
[617521.605325] [ 1159]     0  1159    73413     2856     140       4        0             0 winbindd
[617521.605326] [ 1161]     0  1161    71832     1924     135       4        0             0 winbindd
[617521.605327] [ 1163]     0  1163    71832     1295     136       4        0             0 winbindd
[617521.605328] [ 1721]  1000  1721    11312      932      26       3        0             0 systemd
[617521.605329] [ 1722]  1000  1722    16318      466      34       3        0             0 (sd-pam)
[617521.605337] [ 1725]  1000  1725     5613     1066      16       3        0             0 bash
[617521.605338] [ 1789]     0  1789    14274      787      33       3        0             0 sudo
[617521.605339] [ 1790]     0  1790    14109      719      33       3        0             0 su
[617521.605340] [ 1791]     0  1791     5619     1120      17       3        0             0 bash
[617521.605342] [ 1935]     0  1935    60002     1421     114       4        0             0 nmbd
[617521.605343] [ 1948]     0  1948    86040     3924     165       3        0             0 smbd
[617521.605345] [ 1949]     0  1949    82452     1067     155       3        0             0 smbd
[617521.605347] [ 1951]     0  1951    86171     1589     160       3        0             0 smbd
[617521.605349] [19081]     0 19081    87063     4262     167       3        0             0 smbd
[617521.605351] [19253]     0 19253    24889     1458      52       3        0             0 sshd
[617521.605352] [19275]  1000 19275    24889      891      51       3        0             0 sshd
[617521.605354] [19276]  1000 19276     5605     1104      16       3        0             0 bash
[617521.605356] [19307]     0 19307    14274      778      33       3        0             0 sudo
[617521.605357] [19308]     0 19308    14109      737      32       3        0             0 su
[617521.605359] [19309]     0 19309     5618     1184      16       3        0             0 bash
[617521.605360] [16347]   999 16347    18952     4419      40       4        0             0 sqlservr
[617521.605361] [16349]   999 16349  3028846  1043058    2562      26        0             0 sqlservr
[617521.605362] [20193]     0 20193    88057     4618     168       3        0             0 smbd
[617521.605363] [30023]     0 30023    87931     4038     167       3        0             0 smbd
[617521.605364] [ 4801]     0  4801    87627     4088     167       3        0             0 smbd
[617521.605365] [ 5266]     0  5266    68705     2451      66       4        0             0 cups-browsed
[617521.605366] [ 7563]     0  7563    88008     4183     167       3        0             0 smbd
[617521.605368] [10495]     0 10495    88072     4621     168       3        0             0 smbd
[617521.605369] [12342]     0 12342    88008     4292     167       3        0             0 smbd
[617521.605371] [12797]     0 12797    12555      719      30       3        0             0 cron
[617521.605373] [12798]     0 12798    12555      719      30       3        0             0 cron
[617521.605375] [12799]     0 12799     1127      213       8       3        0             0 sh
[617521.605376] [12800]     0 12800     1127      187       7       3        0             0 sh
[617521.605377] [12801]     0 12801     4902      785      15       3        0             0 rsync
[617521.605378] [12802]     0 12802     4732      483      14       3        0             0 rsync
[617521.605379] [12803]     0 12803     3911      690      12       3        0             0 rsync
[617521.605380] [12804]     0 12804     3741      452      11       3        0             0 rsync
[617521.605381] [12805]     0 12805     4878      477      15       3        0             0 rsync
[617521.605382] [12806]     0 12806     3911      515      11       3        0             0 rsync
[617521.605383] Out of memory: Kill process 16349 (sqlservr) score 254 or sacrifice child
[617521.608484] Killed process 16349 (sqlservr) total-vm:12115384kB, anon-rss:4164616kB, file-rss:7616kB, shmem-rss:0kB
[617521.832626] oom_reaper: reaped process 16349 (sqlservr), now anon-rss:0kB, file-rss:236kB, shmem-rss:0kB

Solution

  • You can configure SQL sp_configure setting to limit memory consumption if there are other processes consuming memory on the machine causing it to run out of memory or increase swap ( though you don't want SQL to be swapped out) or increase memory.

    We can also tune the way that the OOM killer handles OOM conditions. If we want to make SQL process ( in this case 3452 ) less likely to be killed by the OOM killer echo -15 > /proc/3452/oom_adj