Earlier this week I was writing some code that used a statically linked C library, and I was trying to use a default the library documentation used in its examples, DEFAULT_CAMERA_CONFIGURATION
.
I was able to use everything else in the library itself just fine, but when I’d try to use this constant, my program wouldn’t compile and give a linker error instead
SYMBOL_NOT_FOUND
.
When you compile code into a library or an executable, the names (variables, functions, constants) are called symbols. In my case the symbol DEFAULT_CAMERA_CONFIGURATION
wasn’t showing up in the compiled static library.
I went to look at the source code for the library, and my constant had the keyword static in front of it:
static const DEFAULT_CONFIGURATION= ...
Ori took one look at it told me “yeah it’s not going to be exported”. Then he explained it all to me:
The static
keyword in C
The point of the static keyword in C is mean “this variable won’t be visible outside this file”. And technically by “this file”, I mean “this object file” (.o), which I’ll explain now.
First I’ll do a small recap of the C compilation process, but I don’t recommend skipping it; I thought I was familiar with how the parts interacted myself until this morning.
A small C compiling recap
We’re going to recap there separate steps in the C code->program C process that we care about: Preprocessing, Compilation and Linking.
Our example program will have these files:
a.h, contains #include <math.h>
a.c, contains #include "a.h"
b.h, contains #include "a.h"
b.c, contains #include "b.h"
Preprocessing
When the preprocessor sees the term #include
, like a.c
’s #include a.h
it will literally, (literally!) copy all of a.h
into a.c
at the location of the #include
.
So a.c
will have all the contents of a.h
, which are themselves all the contents of math.h
.
Sometimes #include
s have loops in them, like if a.h
also b.h
. This would cause problems!
Include guards
This is what include guards are for: breaking loops in the includes of a .c
file. Include guards look like this in C:
#ifndef A_HEADER
#define A_HEADER
...contents of a.h
#endif
This only includes the contents of a.h
once, so that if someone else also includes a.h
it won’t be copied into the file again. Sometimes include guards look like this too:
#pragma once
All of this #include
, and #define
, #ifndef
business is the C preprocessor’s work. Next is compilation!
Compilation
After a.c
has been preprocessed, the compiler will come in and compile a.c into an “object file”, a.o
.
An object file is “machine code” that could execute, but instead of pointing to the correct functions, has placeholders like (waves hand)
c = CALL_FUNCTION(__LOCATION_OF_FUNCTION_ADD__, a, b)
Linking
The linker is going to combine a.o
and b.o
into a final executable.
Importantly, to do that it needs to replace all the references to symbols with their final location in the executable.
For example in our previous example it would insert the location of add()
as “memory location 3456”.
The linker looks up the location of add()
in the final program in something called a Symbol Table.
What the static
keyword does
Let’s add a little bit to our example.
What if both a.c and b.c defined a constant called FILE?
//in a.c
const FILE_LOCATION = "/my/first/path"
//in b.c
const FILE_LOCATION = "/my/other/path"
Each C file would compile fine on its own into a.o
and b.o
.
However, when linking a and b together, the linker wouldn’t know which FILE_LOCATION the linker doesn’t know which constant you mean. Specifically, it would have duplicate symbols in the symbol table.
So the idea of static
is to say “don’t export this symbol outside this object file”.
The technical term for this is internal linkage.
My mistake was thinking that include guards could fix this issue if the constants were defined in header files. Include guards can’t fix this because they are fundamentally at an earlier step in the pipeline (preprocessing): they can’t prevent duplicate symbols at the linking step.
Constants in C
From what I’ve learned, C constants are usually defined with either #define
s, like
#define PI 3.14159265
or with static const
s, so they don’t leak into other objects.
And what about extern?
Extern also modulates symbol resolution at linking time. In short, extern
s are like import
statements in javascript.
Instead of having const FILE_LOCATION
in a.c
, we change it to extern FILE_LOCATION
, while keeping the const FILE_LOCATION
in b.c
.
That tells the linker “I’m going to use this symbol FILE_LOCATION
, but I don’t define it in this Object file.”
So here, a.o
would be using b.o
’s FILE_LOCATION
.
The technical term for this is external linkage.
Aside: There is a third possible setting for linkage in C, and that is no linkage
, used for local variables, since they can never be referenced outside of their functions.
So static
and extern
in C are kind of opposites. static
is kind of like a private
variable/function (and in C by default everything is public
), and extern
is like an import
.
But what about the header files that include each other?
Yeah, if a.h
includes the symbols from b.h
, won’t that mean that when linking a.o
to b.o
, the former will already have symbols for everying in b
?
The last detail I learned is that
Function and variable declarations that don't have definitions later on are implicitly extern.
So there you have it, static
and extern
in C, and the different linkage types!
I hope this post helps at least one person, in the way that it could have saved me half a day at the Recurse Center earlier!
comments powered by Disqus