Saltstack (version=3004) has recently been returning a variety of errors on different SLS files without those files having changed recently. Different runs complain of different files, or simply succeed. This happens across our fleet of 20 hosts, not just on one host. We're using salt-call
in a master-less context on ubuntu 20.04 LTS hosts.
The key point is that re-running salt-call
usually succeeds without problem. If it doesn't, the next run will. And maybe the run after that will fail, with nothing having changed in our SLS repository. No law of the universe seems to require these failures before successes, it's more like some random roll of dice.
Needless to say, looking at the SLS files at the point indicated has so far been fruitless.
Some examples:
[myhost.example.com] sudo: salt-call --local state.highstate
[myhost.example.com] out: sudo password:
[myhost.example.com] out: [CRITICAL] Rendering SLS 'base:dulcia' failed: while parsing a block node
[myhost.example.com] out: did not find expected node content
[myhost.example.com] out: in "<unicode string>", line 148, column 17
[myhost.example.com] out: local:
[myhost.example.com] out: Data failed to compile:
[myhost.example.com] out: ----------
[myhost.example.com] out: Rendering SLS 'base:dulcia' failed: while parsing a block node
[myhost.example.com] out: did not find expected node content
[myhost.example.com] out: in "<unicode string>", line 148, column 17
Another:
[otherhost.example.com] sudo: salt-call --local state.highstate
[otherhost.example.com] out: sudo password:
[otherhost.example.com] out: [CRITICAL] Rendering SLS 'base:dulcia' failed: did not find expected comment or line break
[otherhost.example.com] out: local:
[otherhost.example.com] out: Data failed to compile:
[otherhost.example.com] out: ----------
[otherhost.example.com] out: Rendering SLS 'base:dulcia' failed: did not find expected comment or line break
Yet another:
[host-3.example.com] sudo: salt-call --local state.highstate
[host-3.example.com] out: sudo password:
[host-3.example.com] out: [CRITICAL] Rendering SLS 'base:sftp' failed: while parsing a block node
[host-3.example.com] out: did not find expected node content
[host-3.example.com] out: in "<unicode string>", line 235, column 17
[host-3.example.com] out: local:
[host-3.example.com] out: Data failed to compile:
[host-3.example.com] out: ----------
[host-3.example.com] out: Rendering SLS 'base:sftp' failed: while parsing a block node
[host-3.example.com] out: did not find expected node content
[host-3.example.com] out: in "<unicode string>", line 235, column 17
[host-3.example.com] out:
Or even
[host-3.example.com] sudo: salt-call --local state.highstate
[host-3.example.com] out: sudo password:
[host-3.example.com] out: [CRITICAL] Rendering SLS 'base:sftp' failed: did not find expected alphabetic or numeric character
[host-3.example.com] out: local:
[host-3.example.com] out: Data failed to compile:
[host-3.example.com] out: ----------
[host-3.example.com] out: Rendering SLS 'base:sftp' failed: did not find expected alphabetic or numeric character
[host-3.example.com] out:
I'm at a complete loss on this, and it's more than tricky to debug, because more than half the time it doesn't happen.
so the good news. all of these errors are yaml rendering errors not jinja. they could be caused by jinja rendering but the jinja is finishing it's render cycle without throwing an error. Most likely some value is not getting set to what you think it is. or jinja is not pulling the right value when it should be. maybe a pillar variable is not being set in the masterless config that should be and then on next run it is set. or pillar is taking to long to render.
The easiest way to start debugging this is to start rendering the jinja and validating the resulting yaml. This can be done with slsutil.renderer
salt-call slsutil.renderer salt://sftp/init.sls default_renderer=jinja
Since the problem is intermittent you are going to have to keep rendering the states that seem to fail most often over and over until they fail. maybe try after a long pause and before running a high-state.
Another thing that can help is inserting logging into jinja. this can be done with simple {% do salt["log.info"]('string to log') %}
I find this can be useful in issues where the jinja isn't rendering right.
also run the highstate with -l debug. it will show the rendering of each yaml file as it goes through. so you can see what is happening and see the errors as they are happening.