I am trying to puzzle out a way to de-obfuscate javascript that looks like this:
https://jsfiddle.net/douglasg14b/4951br9f/2/
var testString = 'Test | String'
var wf6 = {
fq4: 'su',
k8d: 'bs',
l8z: 'tri',
cy1: 'ng',
t5j: 'te',
ol: 'stS',
x3q: 'tri',
l9x: 'ng',
gh: 'xO'
};
//Obfuscated
let test1 = testString[wf6.fq4 + wf6.k8d + wf6.l8z + wf6.cy1](4,11);
//Normal
let test2 = testString.substring(4,11);
let test3;
//More complex obfuscation
(function moreComplex(){
let h = "i",
w = "nde",
T0 = "f",
hj = '|',
a = eval(wf6.t5j + wf6.ol + wf6.x3q + wf6.l9x).length;
//Obfuscated
test3 = testString[wf6.fq4 + wf6.k8d + wf6.l8z + wf6.cy1](testString[h + w + wf6.gh + T0](hj), a);
//Normal
let test4 = testString.substring(testString.indexOf('|'), testString.length);
})();
$('.span1').text(test1);
$('.span2').text(test3);
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>
<span class="span1"></span><br>
<span class="span2"></span>
This is a small example, the file I'm working with is ~60k lines long and is full this kind of obfuscation. Everywhere a string can be used as a property name, this kind of obfuscation is used.
The way I can think of doing this, is to evaluate all the string concatenations so they are turned into a readable equivalent. Though, I am not sure how to go about this and ignore all the other working code that exists between all the concatenations.
Thoughts?
Bonus question: Is there a commonly used name for this kind of obfuscation that might make searches a bit easier?
Edit: Added a more complex example.
You have the basic idea right: you have to partially-evaluate the program and precompute all the constant computations. In your case, the constant computations of main interest are the concatenation steps over values which don't change.
To do this, you need a program transformation system (PTS). This is a tool that will read/parse source code for a specified language and build an abstract syntax tree, allow you specify transformations and analyses over the AST, and run those, and then spit out the modified AST as source code again.
In your case, you obviously want a PTS that is wired to know JavaScript out of the box (rare) or is willing to accept a description of JavaScript and then read JavaScript (more typical) with the hope that you can build or get a JavaScript description easily. [I build a PTS that has JavaScript descriptions available, see my bio].
With that in hand, you need to:
The above process is called "constant propagation" in the compiler literature and is a feature of many compilers.
In your case, you could restrict the constant folding to just string concatenates. However, once you have adequate machinery to do constant value propagation, doing all or most operators on constants isn't that hard. You may need this to undo other obfuscations involving constants since that seems to be the obfuscation style used on the code you are working on.
You'll need a special rule that transforms
var['string'](args)
into
var.string(args)
as a final step.
You have another complication: that is knowing that you have all the JavaScript relevant to producing constant-valued variables. A single web page may have many included chunks of JavaScript; you will need all of them to demonstrate there are no side effects on a variable. I assume in your case you are sure you have it all.
With respect to producing known-constant values, you may have worry about a tricky case: an expression that produces constant values from non-constant operands. Imagine the obfuscated expression was:
x=random(); // produce a value between 0 and 1
one=x+(1-x); // not constant by constant propagation, but constant by algebraic relations
teststring['st'[one]+'vu'[one+1]+'bz'[one]+...](4,11)
You can see it always computes 'substring' as a property. You can add a transformation rule that understands the trick used to compute "one", e.g., a rule for each algebraic trick used to compute known constants. Unfortunately for you, there's an infinite number of algebra theorems one can use to manufacture constants; how many are really used in your example bit of code? [Welcome to the problem of reverse engineering with a smart adversary].
Nope, none of this "easy". Presumably that's why the obfuscation method used was chosen.