Search code examples
algorithmcomplexity-theorysatsat-solvers

Specialized SAT solver (?)


(Context)

Given two byte arrays of length 16, say L and H, one can define a mapping M from the set of all bytes to itself in the following way.

If 0 <= b < 256 is a byte, let lo(b) denote the lower 4 bits of b and let hi(b) denote the higher 4 bits of b.

Let L[i] (resp. H[i]) denote the i-th byte of L (resp. H). Also let L[i, j] (resp. H[i, j]) denote the j-th bit of the i-th byte of L (resp. H).

With all this, we can define M(b) = L[lo(b)] & H[hi(b)].

Where & is bitwise AND.

If we want M to satisfy a constraint of the form M(b) == m. Then L and H have to be chosen accordingly (if possible). This constraint is equivalent to L[lo(b)] & H[hi(b)] == m. Or more precisely to L[lo(b), j] & H[hi(b), j] == m[j], where m[j] is the j-th bit of m.

In general, any equation of the form X & Y = Z where X, Y, Z are bits (or booleans) is equivalent the following boolean clauses in propositional logic:

~X | ~Y | Z
X | ~Z
Y | ~Z

Where ~X is the negation of X and | is bitwise OR.

The last remaining piece of the problem is the fact that the m bytes should be distinct from each other (assuming more than one constraint). Two bytes are non-equal if at least one of their bits aren't. Two bits X and Y are non-equal iff the following clauses hold:

X | Y
~X | ~Y

Hence this problem can be solved using 3-SAT. I have three question with regards to this:

  1. Is my problem equivalent to 3-SAT, i.e. can an arbitrary 3-SAT problem be reduced to it? Or is it further simplifiable into something less difficult?
  2. If not, do you see an algorithm for solving it efficiently?
  3. If yes, would a "simple" CDCL-based solver suffice? (We're dealing with around 3000 clauses and 300 variables).

I have already tried a basic backtracking solver and it failed to terminate even after multiple hours. I'm writing this after having spent multiple weeks thinking about this, and having failed to come up with a specialised algorithm. I could of course just use an off-the-shelf SAT solver but I'm interested in solving this as efficiently as possible.


Solution

  • Let me do the original problem by hand to illustrate the essence of this problem.

    I’m going to write bytes 0x2C as pairs of nibbles (2, C). If we consider 1-bit outputs, then the functions that we can express are, for every set of nibbles S and set of nibbles T, the indicator function of S × T. The original paper assigns each class a separate bit:

    • {2} × {C} = {(2, C)}
    • {3} × {A} = {(3, A)}
    • {5, 7} × {B, D} = {(5, B), (5, D), (7, B), (7, D)}
    • {0} × {9, A, D} = {(0, 9), (0, A), (0, D)}
    • {2} × {0} = {(2, 0)}.

    I’ve elided the table values because they don’t matter. That points to the main weakness of the SAT formulation: it’s highly symmetric, and basic branching strategies will struggle. Sophisticated SAT solvers can detect symmetry and prune redundant branches, but the logic is complicated, and you really don’t want to implement it yourself for one application.

    I’m going to assume that all of the bytes that don’t fall in any class should map to 0. Then a necessary condition is that every class can be expressed as a product. If there are at most as many classes as output bits, then this condition is also sufficient, and you don’t need a SAT solver. Otherwise, you have a restricted test cover problem. I don’t have an NP-hardness proof, but I would be surprised if it weren’t NP-hard.