Search code examples
javascriptnode.jsgoogle-closure-compilercallstackdeobfuscation

Get deobfuscated typescript callstack from obfuscated javascript code


Problem:

I have log files from a server, which contain the callstacks from the thrown error, that triggered the creation of this log file. The server application is written in typescript with nodejs but gest transpiled to javascript, and the the javascript code gets obfuscated with the google closure compiler. Now my callstack is rather hard to interprete, which i sought to change by deobfuscating the js code, using a source map created by the closure compiler and then, again using a source map, "untranspile" the js callstack to the typescript callstack.

My limitations

I have access to the source maps, the source code (ts and js) and the obfuscated code, but i cant change the code itself, so im stuck with the current callstacks. I also have access to all options and the code/tool that obfuscates the code, so maybe i store some needed information in a file (information that is not presented in the source map) like additional mappings.

Ideas and Attempts

First attempt was to simply interprete the source maps and with that information deobfuscate the callstack (deobfuscating is the hard part), but i after trying to understand the way the cc creates the source maps I had some problems: The cc doesn't just map one name to another, because he reuses certain names multiple times(like a, f or these kind of "names"). So there might be a function with some anonymous functions or nested functions, where the name f is being used several times, but has a different meaning in every context, due to the scopes.

Next idea was simply trusting the callstack. To understand what i mean, you have to understand( if i understood that correct) how the cc creates and manages the mappings:

                return method.call(thisObj, args[0], args[1]);

This line is obfuscated to this(i left the whitespaces to understand the indexing better):

        return f.call(d, a[0], a[1]);

Now there are several mappings created for this single line, a single mapping looking like this:

export interface MappingItem {
source: string;
generatedLine: number;
generatedColumn: number;
originalLine: number;
originalColumn: number;
name: string | null;
}

The only important information in this mapping instance are the columns and the name. Some mappings contain a name others not. Those not containing names are used to build somes sort of scope around those having names, in order to find out where a name/replaced name started and ended(index).

An example of this logic using the two statements above:

Generated   │   Original    │   Name            │   Scope
0           │       16      │   null            │   ━━━┓
15          │       23      │   method          │   x  │
16          │       23      │   call            │   x  │
21          │       23      │   null            │   ━┓ │
22          │       35      │   thisObject      │   x│ │
23          │       23      │   null            │   ━┛ │
25          │       44      │   args            │   x  │
26          │       44      │   null            │   ━┓ │
27          │       49      │   null            │   ?│ │
28          │       44      │   null            │   ━┛ │
29          │       23      │   null            │   ━━┓│
31          │       53      │   args            │   x ││
32          │       53      │   null            │   ━┓││
33          │       58      │   null            │   ?│││
34          │       53      │   null            │   ━┛││
35          │       23      │   null            │   ━━┛│
36          │       16      │   null            │   ━━━┛

Using this callstack, i want to resolve everything from applications.js. All the transpiled and obfuscated js code is in there. Rest is irrelevant:

at do2 (c:\Users\me\test\js\test.js:14:11)
at do1 (c:\Users\me\test\js\test.js:11:5)
at Server.<anonymous> (c:\Users\me\test\js\test.js:6:5)
at f (c:\Users\me\build\transpiled\obfuscated\application.js:235:18)
at Object.a.safeInvoke (c:\Users\me\build\transpiled\obfuscated\application.js:285:27)
at Server.g.getWrappedListener (c:\Users\me\build\transpiled\obfuscated\application.js:3313:17)
at emitTwo (events.js:106:13)
at Server.emit (events.js:191:7)
at HTTPParser.parserOnIncoming [as onIncoming] (_http_server.js:546:12)
at HTTPParser.parserOnHeadersComplete (_http_common.js:99:23)

Now using the info from the sourcemap it is easy to get the original line and column, but the names are not. I attempted to first try without info from the code, by ready the previous position(line and column) to infere to the name of the next line.

So if i wanted to resolve f, i would have looked where it was called(285:18) and then look it up in the source map, where i would find its name. But for this process I always need to know where it was called. Now thats the problem. Because if the function would've been stored inside a variable, or would've been anonymous or something else like that, i have a problem.

f.call(d, a[0], a[1]);

Also did i notice, that certain methods like call in this context, don't get listed in the callstack, which is another problem. So now i can atleast resolve names if i can be sure if i know where they were called and if they are in the callstack. But i don't a half solution like this.

My second attempt was using a promising javascript module I found: stacktrace-js

This module is made for browser js though and has poor typescript documentation/typings, although it is clearly written in typescript. This also results in next to no support for reading in files locally, because they always get called with xmlhttprequests. There are some workarounds to that part, but the module is so complex(probably due to being transpiled code) that there are other parts too that do not support me, using local files. Its just too much to rewrite/change it to work with nodejs properly....

Do you know a more clean way of doing it with the module? I also thought of using a source code parser to got more context to support the source maps (in case of those vicious .call methods). Maybe i could write my own source code parser if there was a documentation to all the exceptions i have to watch out when parsing the code and interpreting it... Maybe there is another way to this that i currently oversaw...


Solution

  • Composed Source Maps

    First off, ensure you have a fully composed source map. You mention two tools that generate source maps, the typescript compiler and closure compiler. Was closure-compiler provided input source maps? If so, it would output a source map that mentions the original files. If not, you'll have twice the work cut out for you. It's possible to use the source-map package to compose source maps after the fact.

    Properly Understand Source Maps

    It's clear from your original question that you don't fully understand a source map. For instance, entries without a name are often language semantics. For example:

    document.createElement('div')
    

    A source map could contain mappings for document and createElement, but also for the . and ( characters. There is no scope involved here.

    Visualization Tools

    There are multiple visualization tools that can assist here. Some of my favorites are:

    The idea here is that you load the source maps and sources in the tool, then click around to see how things mapped. It takes a bit of poking around but you should be able to find the line and column in the original source that matches the line/col information in the stack trace.

    Automate The Process

    Tools like https://sentry.io/ exist for a reason. It will automatically de-obfuscate a call stack for you.