Compiling LibPDQ to JS with Emscripten Part 3: All in JS → Pushing to NPM

In part 1 I explained the PDQ library and its book. I managed to run the C library in the browser with Emscripten. In part 2 I got the same model to run, but written in javascript. Here we’ll be writing all JS and pushing it to NPM.

I’ve read more of the book, and really liking it. The part on Mean Value Analysis is very cool! I started an Observable Notebook with almost every single Perl listing translated to Javascript. After PDQ.js is out I’ll add the PDQ models, and ideally some charts.

Last time I was able to write a node.js script that require()d the file, and then used cwrap to create nice JS functions that interface with wasm, and then I’d call them.

Now I’ve managed to make a nicer experience by cwrapping all of the pdq functions ahead of time and exporting the C constants directly in the module. I did this by making a file post.js that I attached to the emscripten module output with the --post-js flag:

//post.js
Module.init = Module.cwrap('PDQ_Init', "number", ["string"]);
Module.createOpen = Module.cwrap('PDQ_CreateOpen', null, ["string", "number"]);
Module.createClosed = Module.cwrap('PDQ_CreateClosed', null, ["string", "number", "number", "number"]);
Module.createMultiNode = Module.cwrap('PDQ_CreateMultiNode', "number", ["number", "string", "number", "number"]);
Module.createNode = Module.cwrap('PDQ_CreateNode', null, ["string", "number", "number"]);
... // more cwraps

Module.VOID = 0;           // Queueing Network Types
Module.OPEN = 1;           // These define the queueing 'network' type in JOB_TYPE struct below
Module.CLOSED = 2;
... // more constants

Now my example model looks like this:

require('./dist/pdq.js')().then(pdq => {
    const requests = 400;
    const threads = 300;
    const service_time = 0.444;

    pdq.init("My model");
    pdq.createClosed("Requests", pdq.BATCH, requests, 0.0);
    pdq.createMultiNode(threads, "Threads", pdq.MSC, pdq.FCFS);
    pdq.setDemand("Threads", "Requests", service_time);
    pdq.setWUnit("Reqs");
    pdq.solve(pdq.EXACT);
    pdq.report();
});

Which is nice! I have nicer module names too! I was able to run that file with node as node test.js, but I also wanted to make sure I’d be able to use it in the the browser. Emscripten gives you a wrapper that can run your file for you, but I wanted to be able to write my own.

I was able to run the script using the similar import method with this html file:

<html>
<head>
    <script type="text/javascript" src="dist/pdq.js"></script>
</head>
<body>
    <h4>Hello PDQ.js</h4>
    <p>The next lines should contain a PDQ Report:</p>
    <pre id="out"></pre>
</body>
<script type="text/javascript">
    (async function () {
        var oldLog = console.log;
        console.log = function (message) {
            document.getElementById('out').innerHTML += message + '\n'
            oldLog.apply(console, arguments);
        };
        const pdq = await Module();
		// ...
		//  same model
    })()
</script>
</html>

The first issue is that PDQ.report() prints output directly to console.log, so I had to monkey patch console.log to write to the div. The second issue is that this still depends on an object called Module. You’re supposed to be able to get around this with a dynamic import('./dist/pdq.js'), but that returned Symbol(Symbol.toStringTag): "Module" and I couldn’t get it to work…

I spent a lot time flailing here, but in the end found a nice system: by adding the EXPORT_ES6=1 flag I was able to change the import method to look like this:

<script type="module">
    import Module from "./dist/pdq.js";
	(async function() {
    const pdq = await Module();
	...
	})();
</script>

When adding it to this page, I looked into doing with a dynamic import. It looks like this:

<script type="module">
	(async function() {
	const module = await import("/js/pdq.mjs");
	const pdq = await module.default();
	...
	})();
</script>

The first await is to wait for the .mjs file to be downloaded & imported, and the second await is to run the default export of our code, which is itself async because it needs to download and compile the .wasm file.

So here is the ESM version running in your browser, loaded on demand:

The web module is 155.1kB uncompressed + 60.1kB of wasm, which is fine for an initial release. It gzips to 30KB (+23KB of wasm), which is downright small for a numerical analysis library.

Making a node package

I pushed to this to npm! (My first package!) It’s under @amedee/pdq.

I started with only making an ESM module (it’s 2021 after all), but I ran into some issues running the Nodejs version of the script afterward: https://github.com/emscripten-core/emscripten/issues/11792, which is that emscripten hadn’t planned for Node to support modules, so the ES module assumed it compiled didn’t work when run in Node. I decided then to build 2 versions, one for web/web workers with ESM and one for node with require() support.

I exposed both in the npm package by writing this in my package.json:

{ // package.json
	"index": "dist/pdq.js"
	"module": "dist/pdq.mjs"
}

A small version bump

After I deployed the 0.1.0 I noticed a missing EXPORTED_FUNCTION (didn’t get linked into the code), and decided to compute those at build time with this medium-gnarly pipeline:

#!/bin/bash
# compute_exported_functions.sh
# Looks for all the cwrap calls in our --post-js file and formats them for -s EXPORTED_FUNCIONS, prepends the underscore.
quoted_functions=($(grep -o "cwrap('\(\w\+\)" post.js | cut -d"'" -f2 | awk '{ print "\"_" $1 "\"" }'))
IFS=, #We change the IFS to turn the array from `"_funca" "_funcb" ` to `"_funca","_funcb"`
echo "[${quoted_functions[*]}]"

Observable support

I’m now able to use this in an Observable Notebook, which supports importing anything from npm:

pdq = (await import('@amedee/pdq')).default()

It’s not as clean as a regular import but it works fine. I’ll probably go more into ObservableJS examples (with graphs) in the next post.

Next steps

The big thing I encountered is that the PDQ Report function only prints to stdout, which emscripten pipes to console.log. In the examples I’ve shown we monkey patch console.log, but I think I’ll patch the emscripten Module to pipe stdout to a string or something more re-usable.

I don’t like having to do that, so I wanted to write my own reportToJSON function in JS, but it turns out a lot of the variables used in the report function are internal to the C side and not exposed to JS via getters… Bummer.

I’m going to keep reading the book and implementing the models in JS.

My NPM package needs documentation too, which is tough as my exported functions are simply cwraps. I’m going to look into JSDoc, or if there’s a better place to put my cwraps.

See ya later!