Earlier this week I was writing some code that used a statically linked C library, and I was trying to use a default the library documentation used in its examples, DEFAULT_CAMERA_CONFIGURATION.

I was able to use everything else in the library itself just fine, but when I’d try to use this constant, my program wouldn’t compile and give a linker error instead SYMBOL_NOT_FOUND.

When you compile code into a library or an executable, the names (variables, functions, constants) are called symbols. In my case the symbol DEFAULT_CAMERA_CONFIGURATION wasn’t showing up in the compiled static library. I went to look at the source code for the library, and my constant had the keyword static in front of it:

static const DEFAULT_CONFIGURATION= ...

Ori took one look at it told me “yeah it’s not going to be exported”. Then he explained it all to me:

The static keyword in C

The point of the static keyword in C is mean “this variable won’t be visible outside this file”. And technically by “this file”, I mean “this object file” (.o), which I’ll explain now.

First I’ll do a small recap of the C compilation process, but I don’t recommend skipping it; I thought I was familiar with how the parts interacted myself until this morning.

A small C compiling recap

We’re going to recap there separate steps in the C code->program C process that we care about: Preprocessing, Compilation and Linking.

Our example program will have these files:

a.h, contains #include <math.h>
a.c, contains #include "a.h"
b.h, contains #include "a.h"
b.c, contains #include "b.h" 

Preprocessing

When the preprocessor sees the term #include, like a.c’s #include a.h it will literally, (literally!) copy all of a.h into a.c at the location of the #include.

So a.c will have all the contents of a.h, which are themselves all the contents of math.h.

Sometimes #includes have loops in them, like if a.h also b.h. This would cause problems!

Include guards

This is what include guards are for: breaking loops in the includes of a .c file. Include guards look like this in C:

#ifndef A_HEADER
#define A_HEADER 
...contents of a.h
#endif

This only includes the contents of a.h once, so that if someone else also includes a.h it won’t be copied into the file again. Sometimes include guards look like this too:

#pragma once

All of this #include, and #define, #ifndef business is the C preprocessor’s work. Next is compilation!

Compilation

After a.c has been preprocessed, the compiler will come in and compile a.c into an “object file”, a.o. An object file is “machine code” that could execute, but instead of pointing to the correct functions, has placeholders like (waves hand)

c = CALL_FUNCTION(__LOCATION_OF_FUNCTION_ADD__, a, b)

Linking

The linker is going to combine a.o and b.o into a final executable. Importantly, to do that it needs to replace all the references to symbols with their final location in the executable.

For example in our previous example it would insert the location of add() as “memory location 3456”.

The linker looks up the location of add() in the final program in something called a Symbol Table.

What the static keyword does

Let’s add a little bit to our example.

What if both a.c and b.c defined a constant called FILE?

//in a.c
const FILE_LOCATION = "/my/first/path"
//in b.c
const FILE_LOCATION = "/my/other/path"

Each C file would compile fine on its own into a.o and b.o.

However, when linking a and b together, the linker wouldn’t know which FILE_LOCATION the linker doesn’t know which constant you mean. Specifically, it would have duplicate symbols in the symbol table.

So the idea of static is to say “don’t export this symbol outside this object file”.

The technical term for this is internal linkage.

My mistake was thinking that include guards could fix this issue if the constants were defined in header files. Include guards can’t fix this because they are fundamentally at an earlier step in the pipeline (preprocessing): they can’t prevent duplicate symbols at the linking step.

Constants in C

From what I’ve learned, C constants are usually defined with either #defines, like

#define PI 3.14159265

or with static consts, so they don’t leak into other objects.

And what about extern?

Extern also modulates symbol resolution at linking time. In short, externs are like import statements in javascript.

Instead of having const FILE_LOCATION in a.c, we change it to extern FILE_LOCATION, while keeping the const FILE_LOCATION in b.c. That tells the linker “I’m going to use this symbol FILE_LOCATION, but I don’t define it in this Object file.”

So here, a.o would be using b.o’s FILE_LOCATION.

The technical term for this is external linkage.

Aside: There is a third possible setting for linkage in C, and that is no linkage, used for local variables, since they can never be referenced outside of their functions.

So static and extern in C are kind of opposites. static is kind of like a private variable/function (and in C by default everything is public), and extern is like an import.

But what about the header files that include each other?

Yeah, if a.h includes the symbols from b.h, won’t that mean that when linking a.o to b.o, the former will already have symbols for everying in b?

The last detail I learned is that

Function and variable declarations that don't have definitions later on are implicitly extern.

So there you have it, static and extern in C, and the different linkage types!

I hope this post helps at least one person, in the way that it could have saved me half a day at the Recurse Center earlier!


comments powered by Disqus