I love the book Analyzing Computer System Performance with Perl::PDQ
. It’s the clearest applied Queuing Theory book I’ve found.
IMO I think the Universal Scalability Law
should be called Gunther’s law, given how much the author has done to popularize it. (Plus it sounds less
grandiose than the Universal Scalability Law
, and it would be clearer that it’s on the same level as Amdahl’s Law).
Sadly, I hate to say it, but the book sounds outdated with Perl in the title. I wish it wasn’t so but it sounds Perl/PDQ specific, when it’s a great applied queuing theory book all around.
I’m going to be re-reading it over the winter break, and one of my goals is to translate the examples to Javascript with Observable notebooks.
I think Queuing Theory overall needs some heavy marketing/outreach/explanation, and the combined accessiblity and visualization excellence of Observable notebooks would help a lot.
Plus the book would sound so much more enticing if it had a version called Analyzing Computer System Performance with PDQ.js
;)
So I looked into porting or rewriting PDQ to Javascript. Turns out it’s written as a C library (with separate bindings for R and python2), so it should be compilable to WASM with emscripten!
Emscripten
While “bringing performance to the browser” nowadays often brings up WASM+Rust, Emscripten and asm.js are the grandparents of this movement, and still used all the time (likely more than the Rust->Wasm backend). Emscripten compiles C/C++ code to WASM and provides several shims to run existing code in the WASM environment.
It turns out Emscripten is really easy to use! It wrap/replaces all the common C building commands:
- emconfigure
- emmake
- emcc
It also provides shims for things like stdio.h
, malloc/free
and a mock file system that lives in WASM memory.
For example, you can run a regular Hello World program in the browser:
#include <stdio.h>
void main() {
printf("Hello World\n");
}
And the printf will be redirected to console.log()
output.
If you compile with emcc hello.c -o hello.html
it will build you an html file that runs the code and displays its output directly in the browser, which is nice!
Using Emscripten with PDQ
PDQ (Pretty Damn Quick) is a small (3KLOC) library for computing properties of queuing theory models.
I was able to generate both MacOS (Mach-O) and WASM binaries the following example program:
#include "PDQ_Lib.h"
double requests = 400;
double threads = 300;
double service_time = 0.444;
int main() {
PDQ_Init("AWS Tomcat Model");
PDQ_CreateClosed("Requests", BATCH, requests, 0.0);
PDQ_CreateMultiNode(threads, "Threads", MSC, FCFS);
PDQ_SetDemand("Threads", "Requests", service_time);
PDQ_SetWUnit("Reqs");
PDQ_Solve(EXACT);
PDQ_Report();
return 0;
}
The emcc command I used was (starting from pdq-qnm-pkg/lib/
in the PDQ repo):
emcc PDQ_*.c MVA*.c test.c -o test.html -s INITIAL_MEMORY=94240768 -s ALLOW_MEMORY_GROWTH=1
It was really nice to be able to compile first to C, run the program and then run it in the browser with the -o test.html
flag on emcc
. This is the output:
==========================================
******** PDQ Model OUTPUTS ********
==========================================
Solution method: EXACT
******** SYSTEM Performance ********
Metric Value Unit
------ ------- ----
Workload: "Requests"
Mean concurrency 400.0000 Reqs
Mean throughput 675.6757 Reqs/Sec
Response time 0.5920 Sec
Stretch factor 1.3333
Bounds Analysis:
Max throughput 675.6757 Reqs/Sec
Min response 0.4440 Sec
Max demand 0.0015 Sec
Total demand 0.4440 Sec
Optimal jobs 300.0000 Jobs
******** RESOURCE Performance ********
Metric Resource Work Value Unit
------ -------- ---- ------- ----
Capacity Threads Requests 300 Servers
Throughput Threads Requests 675.6757 Reqs/Sec
In service Threads Requests 300.0000 Reqs
Utilization Threads Requests 100.0000 Percent
Queue length Threads Requests 400.0000 Reqs
Waiting line Threads Requests 100.0000 Reqs
Waiting time Threads Requests 0.1480 Sec
Residence time Threads Requests 0.5920 Sec
However the WASM build was giving different answers than the macOS build! Half the report was NaNs.
I dug into it and the execution was behaving so weirdly: I found an array involved in the
subcomputation (sm_x
) was having its values set to 0 in the middle of a loop…
Here is the code in question, just to show that sm_x
was never being written to :
// Call the submodel defined below to calculate sm_x[] array
MServers(N, m, D);
// Solve the composite model
for (n = 1; n <= N; n++) {
R = 0.0; // reset
// response time at the FESC
for (j = 1; j <= n; j++) { ///sm_x has different values over time here
R += (j / sm_x[j]) * pq[j - 1][n - 1];
}
// mean thruput and mean qlength at the FESC
X = n / (Z + R);
Q = X * R;
// compute qlength dsn at the FESC
for (j = 1; j <= n; j++) {
pq[j][n] = (X / sm_x[j]) * pq[j - 1][n - 1];
}
// include j = 0 row
pq[0][n] = 1.0;
for (j = 1; j <= n; j++) {
pq[0][n] -= pq[j][n];
}
} // loop over n
The subroutine where the computations diverged. MServers()
sets the sm_x
array correctly, but the second inner loop sometimes reads 0 values
out of it.
I read everything I could find on Emscripten memory corruption, and the only hunch I had was some kind of alignment issue. I found this cool blog post from the Figma team from 2016 debugging an alignment issue deep in their codebase, but my issue wasn’t the same.
I also tried -s SAFE_HEAP=1
but that didn’t change the behavior or call out any errors.
My array was only being read inside this simple numerical loop. It was never written after being computed, just
read a number of times in the loop.
I was able to get the program above to run after an afternoon of throwing pasta at the wall, by compiling it with both [UBSan] (https://clang.llvm.org/docs/UndefinedBehaviorSanitizer.html) (undefined behavior sanitizer) and ASan (address sanitizer). These are extra checks that are compiled into the program to check for either undefined behavior or memory issues (like reading a null pointer). This extra instrumetation makes your program slower but it catches important errors. I used this command:
emcc PDQ_*.c MVA*.c test.c -o test.html -s TOTAL_MEMORY=358744064 -s ALLOW_MEMORY_GROWTH=1 -fsanitize=undefined -fsanitize=address
Woof Asan makes my WASM program require 384MB of memory! IMO that’s a lot for a browser program.
Since ASan got the program to work I suspected some kind of memory corruption.
I then found the Emscripten ASSERTIONS checks,
and by compiling with -s ASSERTIONS=2 -s STACK_OVERFLOW_CHECK=2
I was able to get an exception thrown that told me a stack overflow was happening in the pre-computation step for that array.
I went looking for stack allocations and found that before the pre-computation we initialized a huge array on the stack:
//A little bit above the corrupted loop, in the same function. MAX_USERS = 1200
float pq[MAX_USERS + 1][MAX_USERS+1];
I count 4 * 1201 * 1201 = 5,769,604
bytes, and the Emscripten default stack size is 5MB. That’ll overflow the stack!
A note on stack overflows: I had never considered how big stacks, are, but they’re usually not that big! The default stack size on Mac OS X is 8MB
(you can check with ulimit -a
).
If you’re doing any kind of numerical math storing 1 million+ numbers like above, you really need to do it on the heap!
Also stack overflows have a theme song, to the tune of Under the Sea:
I turned up the WASM stack size to 10MB and was able to run the program without any errors or needing sanitizers! The final command I used was
emcc PDQ_*.c MVA*.c test.c -o test.html -s ASSERTIONS=2 -s STACK_OVERFLOW_CHECK=2 -s TOTAL_STACK=10485760 -s INITIAL_MEMORY=94240768 -s ALLOW_MEMORY_GROWTH=1
Now using “only” ~94MB of memory and 10MB of stack. Wow!
I’m not really sure why ASan was initially protecting the program but not giving me an error. My guess is that since it required so much memory (384MB) it allowed
the stack overflow to happen without touching anything important on the heap and getting the sm_x
array overwritten.
But I have no clue! I compiled the same program with twice the array size (so it would take 11MB and be sure to overflow) on macOS
and Asan threw an error during execution. So I don’t know why it wasn’t doing it when compiled to WASM.
It looks like the Emscripten flag I needed was really -s STACK_OVERFLOW_CHECK=2
.
I searched the codebase for other 2D arrays (by grepping for \]\[.*\]
) and didn’t find any others allocated on the stack (ie by a function),
so this 10MB limit should be fine.
Conclusion
Emscripten is not magic fairy dust you can sprinkle on C libraries (even small ones) to run them in the browser! Especially if the work is numerical.
Porting programs to different environments is not easy. I thought it would be quick because PDQ is small and not very platform dependent, but there are so many variables at play in the execution of a program (including the stack size) you can’t assume that. I recommend starting the compilations with as many sanitizers and ASSERTIONS as possible, to find places where things are breaking and assumptions your program makes about its environment.
C programming is not my day job, so I try to stay positive when I run against some low-level issues. I’m getting better at it! At the Recurse Center I learned about linkers and how they relocate symbols, how to investigate object files, and the difference between shared objects and archive files. I’m much less intimidated by binary programs and their memory layouts. Here I got to debug a real stack overflow and see its downstream effects, I really internalized the difference between allocating on the stack and the heap. And I submitted a tiny patch to make Emscripten easier to use.
I’m going to continue to port to JS pieces of PDQ as I re-read Peformance System Analysis with Perl:PDQ
, with a goal of making an npm package we can use.
comments powered by Disqus