Preemptive Protection for JavaScript (Early Preview)

Overview

PPJS is a tool that protects JavaScript files by transforming them, using several different code transformation and injection techniques.

Supported Platforms

This implementation is built as an npm package with TypeScript, and supports all platform where NodeJS runs.

Installing PPJS

Download the ppjs-cli-{version}.tgz package (where version is the actual version you got) and put it outside of your project's directory (the best way is to place it next to your project's directory). After that install it via your package manager.

npm install <package-directory>/ppjs-cli-{version}.tgz --save-dev

OR

yarn add file:<package-directory>/ppjs-cli-{version}.tgz --dev

Note: If you are working in a team environment the downloaded package should be in the same place on every developer's machine in order to make it work.

Updating:

If you got a new version of ppjs-cli in the form of a ppjs-cli-{newVersion}.tgz file then you have to place it outside of your project's directory and run the install script again with the new package that will replace the old version:

npm install <package-directory>/ppjs-cli-{newVersion}.tgz --save-dev

OR

yarn add file:<package-directory>/ppjs-cli-{newVersion}.tgz --dev

Installing from a Url:

You can upload the ppjs package to an online file store (e.g. Blob Storage) or to your own package registry. This makes it easier for your team members to install the package. In case of a file store use the url in the install script (in case of your own registry use its own method):

npm install https://<package-url> --save-dev

OR

yarn add https://<package-url> --dev

Installing Globally:

ppjs-cli also works as a system-wide/global cli tool.

npm install <package-directory>/ppjs-cli-{version}.tgz -g

OR

yarn global add <package-absolute-directory>/ppjs-cli-{version}.tgz

Note: Due to Yarn limitations you must specify the absolute path of the package to be able to install it globally.

Note: If you have not done so already, you must add the output of the yarn global bin command to your path to be able to run global Yarn packages.

Note: The Yarn Team considers global installation of a package as a bad practice in most cases.

After that you can use the ppjs command everywhere in your system like you would use it as a local dev package.

Using the PPJS Webpack Plugin

For an alternative to the PPJS command line tool you can include the ppjs-webpack-plugin in your webpack pipeline, so you can protect your bundle with ppjs before distributing it to your users.

Installation:

It has the same procedure as the ppjs-cli package but you have to install the ppjs-webpack-plugin-{version}.tgz package.

Note: It has ppjs-cli as its peerDependency so you have to install the ppjs-cli package manually to make the webpack plugin work.

Usage:

For detailed usage and configuration see the plugin's README file.

Using the PPJS command line tool

The PPJS command line tool has two mandatory arguments: the input JavaScript source file name (with optional path), and the output file name for the protected version. For example:

ppjs myJsBundle.js myProtectedScript.js

In this mode, the tool uses the default protection settings.

You can also run the tool with a configuration file that contains specific protection settings:

ppjs myJsBundle.js myProtectedScript.js -c myConfig.json
ppjs myJsBundle.js myProtectedScript.js --config myConfig.json

Note: The -c and --config command line options are identical.

Instead of using a configuration file, you can apply settings from the command line. For example, any of the following commands run the tool with string literal extraction and encoding turned on:

ppjs myJsBundle.js myProtectedScript.js -e
ppjs myJsBundle.js myProtectedScript.js --extstr

Note: The -e and --extstr command line options are identical.

When using both command line options and the configuration file, the command line options override the settings read from the file. For example, the following command turns on string literal extraction, even if this setting is turned off in the configuration file:

ppjs myJsBundle.js myProtectedScript.js -c myConfig.json --extstr

The command line provides options that can turn settings off. The next example uses the -E and --extstroff options (they are identical) to turn off string literal extraction (even if the configuration file turns them on):

ppjs myJsBundle.js myProtectedScript.js -c myConfig.json -E
ppjs myJsBundle.js myProtectedScript.js -c myConfig.json --extstroff

In the future, PPJS will allow declaring protection settings right in the source code using ECMAScript/TypeScript annotations. The -a (or --anns) command line options notify the tool that it should consider these annotations.

Annotations override the tool configuration. For example, consider this command:

ppjs myJsBundle.js myProtectedScript.js -c myConfig.json -E -a
ppjs myJsBundle.js myProtectedScript.js -c myConfig.json --extstroff --anns

In this case, PPJS determines the set of finally applied protection settings in these steps:

  1. The initial set it read from myConfig.json.
  2. The -E (or --extstroff) options overwrite the configuration set collected in the previous step.
  3. The annotations in the source code overwrite the configuration set calculated in the steps above.

Note: No command line option turns annotations off. By default, the tool does not check annotations.

Protection modes

By default, the tool operates in "exclusive mode". It protects the entire source code in the input file. You can then use the -s (or the identical --scope) option, to define a list of function names (or more precisely, function name patterns). PPJS uses this list to omit those functions from protection, while protecting everything outside of them.

ppjs myJsBundle.js myProtectedScript.js --scope "myOmittedFunction, otherFunction"

Note: Of course, the tool can handle nested function declarations. It provides a function name pattern format to find nested functions.

You can change to "inclusive mode" with the --inclusive command line option. This option tells the the tool to protect only the specified functions, while leaving everything else unchanged.

ppjs myJsBundle.js myProtectedScript.js --inclusive --scope "myIPFunction"

Hint: With the configuration file you can define fine-grained settings. You can define separate settings for each function in the scope (or even groups of functions), and different settings for the source code outside of the specified scope. As a result, you can apply particular settings for the entire source file and others for specific functions. PPJS also supports inline configuration. This mode allows you to add settings to the JavaScript source code with PPJS directives.

Command line options

The tool supports short and long notations to specify options. The short names start with a single dash (-), while long names start with two dashes (--). Please note, a few options have only long names.

PPJS supports these command line options:

Default command line options

By default, PPJS turns on string literal extraction and local declaration renaming. Thus, these command lines are equivalent:

ppjs myJsBundle.js myProtectedScript.js
ppjs myJsBundle.js myProtectedScript.js -e -l

Let's assume you intend to use only Boolean literal replacement without the default transform methods. You need to issue this command:

ppjs myJsBundle.js myProtectedScript.js -E -L -b

Here, -E turns off string literal extraction, while -L turns off local declaration renaming.

Configuration file format

The PPJS configuration file is JSON formatted. Though the tool does not require it, it is a good practice to apply the .json extension to this file.

This JSON code snippet shows the overall schema of the configuration file:

{
  "booleanLiterals": {},
  "integerLiterals": {},
  "stringLiterals": {},
  "domainLock": {},
  "propertyIndirection": {},
  "localDeclarations": {},
  "debuggerRemoval": {},
  "localContexts": [
    {
      "namePattern": "myFunc",
      "booleanLiterals": {},
      "integerLiterals": {},
      "stringLiterals": {},
      "propertyIndirection": {},
      "localDeclarations": {},
      "debuggerRemoval": {}
    },
    {
      "namePattern": "otherFunc",
      "booleanLiterals": {},
      "integerLiterals": {},
      "stringLiterals": {},
      "propertyIndirection": {},
      "localDeclarations": {},
      "debuggerRemoval": {}
    },
  ],
  "idPrefix": ""
}

All properties except namePattern and idPrefix may have multiple types, be null, or be entirely omitted from the configuration. namePattern and idPrefix can be strings or left undefined.

How PPJS uses the configuration file

When the tool runs, it applies transforms sequentially in a specific order. Each of the JSON properties at the top (from booleanLiterals to localDeclarations) sets the configuration of a particular transform. The set of objects behind these properties is called a transform set. You can see that the JSON description contains a global transform set, and has a localContexts collection that specifies transform sets for a specific group of functions — determined by the namePattern property.

When a transform runs, it visits elements within the semantic tree of the source code. If it finds an element it intends to transform, it follows these steps:

  1. If that element is in a function that matches with a single local context, the transform uses its associated property from the configuration set specified in that context.
  2. If the element is in a function that matches with multiple local contexts, the transform applies its associated property from the configuration set in the last matching context.
  3. If no matching local context is found, or that context does not specify the property associated with the transform, the tool picks up the property from the global transform set.
  4. If no configuration object is found for the transform, or the property has a false value, the transform keeps the element intact.
  5. The transform carries out the modification of the semantic tree according to the settings in the configuration object.

Note: if the configuration property is omitted from the file, or its value is set to null, it is considered undefined.

Configuration objects

In the current release, the tool applies these transforms:

Transform Description
domainLock Allows binding the code to a specific domain (or its subdomains). When the code running in the browser originates from a non-matching domain, it breaks with an error. Note: if the host page of the protected JavaScript file is opened from the file system (and not through an http/https request), the domain lock is not applied.
booleanLiterals Transforms the false and true literals to other expressions that result in the same false and true values, respectively. (The list of expressions can be easily extended in the source code.)
integerLiterals Transforms integer literals to other (less obvious) expressions that result in the same value when evaluated. It can also transform all integer literals to a specific radix (binary, decimal, hexadecimal, or octal).
propertyIndirection Transforms direct property access to indirect property access.
stringLiterals Extracts string literals into variables and initializes those variables from encoded string literals. Replaces the original string with the corresponding variables
localDeclaration Mangles the names of local declarations.
debuggerRemoval Removes all occurrences of the debugger statement from the source code.

Turning a transform on or off

Each transform's configuration object can be set to false to indicate that the particular transform should not be applied. Setting the property to true or an empty object ({}) tells the protection engine that the corresponding transform is used with its default value. (Note, the domainLock transform does not have a default value, so it cannot be applied with true or an empty object.)

A configuration object may have several setups with different behavior (demonstrated with booleanLiterals):

  1. Explicitly states that booleanLiterals is not configured:
{
  "booleanLiterals": null
}
  1. Explicitly states that booleanLiterals should be applied with its default configuration:
{
  "booleanLiterals": true
}

This is exactly the same as this:

{
  "booleanLiterals": {}
}
  1. Explicitly turns off the transform:
{
  "booleanLiterals": false
}

Turning off a transform is useful when you intend to apply it partially. For example, the next configuration disables the booleanLiterals transform for the myFunc function:

{
  "booleanLiterals": true,
  "localContext:": [
    {
      "namePattern": "myFunc",
      "booleanLiterals": false
    }
  ]

Similarly, you can disable this transform for the entire source code, save myFunc:

{
  "booleanLiterals": false,
  "localContext:": [
    {
      "namePattern": "myFunc",
      "booleanLiterals": true
    }
  ]

Or, shorter:

{
  "localContext:": [
    {
      "namePattern": "myFunc",
      "booleanLiterals": true
    }
  ]

domainLock configuration

The domainLock transform has two configuration properties, domainPattern, and errorScript, respectively. If domainPettern is null, empty, or contains only whitespace, the domainLock transform does not run. You can set domainPattern to a full domain name like www.mydomain.net, or a partial domain name like .mydomain.net to allow the code run in any subdomain of mydomain.net. You can provide a valid JavaScript code snippet in errorScript. This script is executed when the protection senses that the code runs from an invalid domain.

Example:

{
  "domainLock": {
    "domainPattern": "mysuperapp.azurewebsites.net"
  }
}

Instead of using the domainPattern property, you can write it shorter:

{
  "domainLock": "mysuperapp.azurewebsites.net"
}

You can specify an error script:

{
  "domainLock": {
    "domainPattern": "mysuperapp.azurewebsites.net",
    "errorScript": "alert(\"Invalid domain\");"
  }
}

Note: Though technically you can configure domainLock in a local context, the tool applies only the global configuration.

Note: The PPJS comman-line interface does not have a direct option for setting the errorScript property. You need to specify this setting in the configuration file, and use the -c/--config option.

booleanLiterals configuration

This transform supports the randomize property. If it is false (this is the default value), the transform will always use the same expression to replace the false and true literals. Should it be true, the transform will choose a replacement expression randomly from its predefined expression repository.

Example:

{
  "booleanLiterals": {
    "randomize": true
  }
}

By default, randomize is turned off. So if you write this:

{
  "booleanLiterals": true
}

It is the same as its longer form:

{
  "booleanLiterals": {
    "randomize": false
  }
}

integerLiterals configuration

This transform supports the randomize property. If it is false (this is the default value), the transform will always the same expression to replace the predefined integer literals (0, 1, and a few others). Should it be true, the transform will choose a replacement expression randomly from its predefined expression repository. integerLiterals has another property, radix. You can set it to one of these values: Binary, Decimal, Hexadecimal, or Octal. When radix has any of these values, it changes all integer literals to use the particular radix. Those integer literals that have already been replaced with an expression (e.g. 0, 1, and others) do not change their radix.

Example:

{
  "integerLiterals": {
    "randomize": false,
    "radix": "Octal"
  }
}

propertyIndirection configuration

This transform does not have any additional configuration properties.

stringLiteral configuration

This transform does not have any additional configuration properties.

localDeclarations properties

localDeclarations has a nameMangling property that allows you to specify one of these values: sequential, hexadecimal, base52; it works as discussed in the description of the -n (--names) command line option.

debuggerRemoval configuration

This transform does not have any additional configuration properties.

Using PPJS with Multiple Files

PPJS supports single-file protection only, and it works seamlessly with bundlers (such as Webpack or Browserify) while you use a single output bundle (chunk). However, when you create multiple chunks and load them, existing code may break because of colliding identifiers. Though a single bundle does not have name collisions, separately protected bundles might have them. If you loaded these bundles, colliding declaration names may break running code. The idPrefix property allows you to set up a separate identifier prefix for each bundle. Because name collision can happen only in the program's global declaration scope, the idPrefix setting does not affect declarations in nested scopes. Since name collision can happen only in the program's global declaration scope, the idPrefix setting does not affect declarations in nested scopes.

There's a simple solution for this issue. You can set up a different value for each bundle with the idPrefix option in the configuration file, as this sample shows:

First bundle:

{
  "propertyIndirection": true,
  "stringLiteral": true,
  "localDeclarations": {
    "nameMangling": "base52"
  },
  "idPrefix": "_1"
}

Second bundle:

{
  "propertyIndirection": true,
  "stringLiteral": true,
  "localDeclarations": {
    "nameMangling": "base52"
  },
  "idPrefix": "_2"
}

Note: The command-line --idprefix option overwrites the setting in the configuration file.

Inline configuration

As you learned earlier, PPJS supports partial protection. You can select the parts of your source code you intend to protect. If you have a large JavaScript file, it might be difficult to fine-tune which blocks of the code should be protected using just command-line options or a configuration file. PPJS also allows inline configuration: you can set the protection options right in the source code. You do not need to use command-line options or a configuration file, as the source code provides the necessary information. To define protection settings, you use JavaScript directives (similar to "use strict").

Let's see an example. This is the source code to protect (input.js):

var x = true;
var y = false;
function w() {
    var x = false;
    var y = true;
}

Assume you want to apply Boolean literal replacement to the default options. With the CLI, you can issue this command:

ppjs input.js protected.js -b

The result (protected.js) would be this:

var Acjgb=!![];
var cemgb=(NaN===NaN);
function wZcgb() {
  var Yaggb=(NaN===NaN);
  var sWWfb=!![];
}

With inline configuration, you can add PPJS directives to the source code:

"@ppjs -b";
var x = true;
var y = false;
function w() {
    var x = false;
    var y = true;
}

Now, you can omit the -b switch from the command line to get the same result:

ppjs input.js protected.js

The "@ppjs -b" in the first line of the input is a valid JavaScript construct, called directive. JavaScript uses it only in the "use string" mode.

A Short Recap on JavaScript Directives

In JavaScript, directives are string literals placed at the beginning of the code or at the beginning of block statements. If you put any other statement into the code, subsequent string literals are not considered to be directives. This short code snippet demonstrates which literals are directives and which are not:

"directive #1";
"directive #2";

var counter = 0;

"this is not a directive";
"nor this"

function square(n) {
  "directive #3";
  var result = n*n;
  "not a directive";
  return result;
}

if (square(2) === 4) {
  "directive #4"
  console.log("test passes.");
  "not a directive"
}

PPJS Inline Configuration Directives

PPJS directives start with the @ppjs prefix. Although JavaScript allows directives at the beginning of every block statement, PPJS considers only these directives (and ingnores the others):

PPJS accept two forms of directives:

If you use both types of directives in a particular context, the CLI-config-format takes priority over the others with CLI-option-format .

While executing the protection process, PPJS parses the declaration of these directives. If there are any kind of issue with them, PPJS issues a warning and continues the protection, ignoring the faulty directives.

CLI-Option-Format Directives

You can use a single directive where command line switch declarations follow the @ppjs prefix, like in this example:

"@ppjs -i --names sequential -i --radix octal";

When you specify multiple directives in the same location, PPJS uses only the first one, and ignores the others:

"@ppjs -i --names sequential -i --radix octal";
"@ppjs -p"; // --- This directive is ignored

CLI-Config-Format Directives

The protection engine combines the directives with the @ppjs: prefix into a JSON string. For example, let's assume, you have these directives:

'@ppjs: {'
'@ppjs:   "integerLiterals": {'
'@ppjs:     "radix": "octal"'
'@ppjs:   }'
'@ppjs: }'

PPJS extracts the contents of these directives to this JSON:

{
  "integerLiterals": {
    "radix": "octal"
  }
}

Of course, you can write this configuration with a single directive:

'@ppjs: { "integerLiterals": { "radix": "octal" } }'

Configuration Merging

The inline configuration settings are merged with the command line and configuration file settings. The inline configuration always overrides the others. When executing the protection process, PPJS applies the closest inline configuration for each source code elements it transforms. This configuration mode allows setting up more complex (but still easy-to-follow) protection scenarios than the ones with command line options or configuration files.

For example, this input turns on Boolean literal replacement for the entire program:

"@ppjs -b";
var x = true;
var y = false;
function w() {
    var x = false;
    var y = true;
}

With this change, only the body of the w function gets this protection:

var x = true;
var y = false;
function w() {
    "@ppjs -b";
    var x = false;
    var y = true;
}

You can declara a scenario where everything, except the body of w gets protected:

"@ppjs -b";
var x = true;
var y = false;
function w() {
    "@ppjs -B";
    var x = false;
    var y = true;
}

Declaration map

One of the most potent protection technique PPJS uses is local declaration transform. PPJS visits all local declarations (variables and functions that are not visible outside of the protected module, and renames them with their references, including occurrences in template string literals.

Overview of local declaration transform

The easiest way to understand how this method works is to look at examples. So, here is one:

var myVar = 123;
function increment(arg) {
  return arg+1;
}
console.log(increment(myVar));

When you run the tool with the -l (--localdecl) command line option, the protection process results in this code:

var _0x000000=123;
function _0x000001(_0x000002) {
  return _0x000002+1;
}
console.log(_0x000001(_0x000000));

Note: Though here you see the result with multiple lines and indentations, PPJS automatically compacts the code.

As you see, the tool renamed myVar, increment, and arg, nonetheless it did not touch console. While the three renamed identifiers are declared in the original code snippet, console is not.

PPJS recognizes declarations in ES 2015 string literals, too. Change the body of increment to this:

function increment(arg) {
  return `Result: ${arg+1}`;
}

Now, the tool emits this output:

var _0x000000=123;
function _0x000001(_0x000002) {
  return `Result: ${_0x000002+1}`;
}
console.log(_0x000001(_0x000000));

Creating a declaration map

Sometimes you need to have a good understanding of how PPJS renames declarations. The -m (--mapout) command line option allows you to output a CSV text file that shows these mappings. You can view this file with a standard text editor, or load it into Excel to see the contents in a spreadsheet.

Note: Declaration map is an early preview feature. In the future, declaration maps will be changed with source maps.

Declaration map format

Each line of the declaration map file describes a map entry. PPJS emits several map entry types:

Each line may contain several fields that are separated by a comma (,) character:

Declaration map sample

Let's see a sample to examine the declaration map output:

var myVar = 123;
function increment(arg) {
  return `Result: ${arg+1}`;
}
console.log(increment(myVar));

When running PPJS with the -l -n sequential -m <mapfile> command line options, it produces this transformed code:

var _0x000000=123;
function _0x000001(_0x000002) {
  return `Result: ${_0x000002+1}`;
}
console.log(_0x000001(_0x000000));

The map file has 12 entries:

#01: scope,,0,(module),,,,
#02: decln,varD,0,myVar,0,0,_0x000000,2
#03: decln,func,0,increment,1,1,_0x000001,2
#04: unres,,0,console,,,,
#05: refer,,0,myVar,,,_0x000000,
#06: refer,,0,increment,,,_0x000001,
#07: refer,,0,increment,,,_0x000001,
#08: refer,,0,myVar,,,_0x000000,
#09: scope,,1,(module)::_0x000001,,,,
#10: decln,argD,1,arg,2,0,_0x000002,2
#11: refer,,1,arg,,,_0x000002,
#12: refer,,1,arg,,,_0x000002,

Note: The map file does not have line numbers; here, we use them for the sake of explanation.

The map contains two scopes: the module's scope in line #01, and the scope of the increment function in line #09. From line #2 to #8, you can see the declarations and references in the module's scope. Meanwhile, lines #10-#12 contain the map items within the increment function's scope.

The module scope has two declarations, myVar with index 0 (later renamed to _0x000000), and increment with index 1 (renamed to _0x000001). You can see that the subtypes of these items are varD (variable declaration), and func (function declaration) respectively.

The module scope contains four references to these declarations; two for each declaration. Two references stand for the identifiers in the variable and function declaration statements. The other two come from the console.log(increment(myVar)); source code line. Line #04 describes an unresolved reference to the console identifier. Because this name is defined somewhere else, PPJS does not rename it.

Line #09 names the scope of increment as (module)::_0x000001 suggesting this is a nested scope, one declared within a function that belongs to the module's root scope. Line #10 declares the arg function argument (this is why it uses the argD subtype). Lines #11 and #12 are references to arg (renamed to _0x000002).

Should you invoke PPJS with the -l -n Base52 -m <mapfile> command line options, both the protected code and the map file would use different identifiers:

(Protected output, indented for the sake if readability)

var Acjgb=123;
function cemgb(wZcgb) {
  return `Result: ${wZcgb+1}`;
}
console.log(cemgb(Acjgb));

(Map file)

scope,,0,(module),,,,
decln,varD,0,myVar,0,0,Acjgb,2
decln,func,0,increment,1,1,cemgb,2
unres,,0,console,,,,
refer,,0,myVar,,,Acjgb,
refer,,0,increment,,,cemgb,
refer,,0,increment,,,cemgb,
refer,,0,myVar,,,Acjgb,
scope,,1,(module)::cemgb,,,,
decln,argD,1,arg,2,0,wZcgb,2
refer,,1,arg,,,wZcgb,
refer,,1,arg,,,wZcgb,