For some reason, when I was a younger programmer, I developed the belief that I needed to learn Haskell and Category Theory to become a better programmer. It would be a long, arduous challenge, but then one day, I’d finally get it. I’d grok the pure math of function composition and use it to design perfect systems, entirely free of bugs. Once I learned what a monoid in the category of endofunctors was, I would be slinging those around all day, cranking out provably correct code with mathematically optimal API surfaces, instead of throwing another ad-hoc unit test on the pile and trying to unflake my integration tests for the 1000th time, like a schmuck.
I actually never got around to learning that fancy stuff because it turns out the schmucky approach works just fine (actually it’s much better than waiting for silver bullets like fancy programming languages!), but recently I had extra time and watched Bartosz Milewski’s 10 hour Category Theory for Programmers class on youtube. It’s a great course! Because of how clear the material is and how well Bartosz explains it, I can definitely say the problem is with the topic itself. Specifically, I’m disappointed with the credibility that category theory has as a tool for programmers. It’s often spoken of as an extremely high level, powerful tool, but it really isn’t worth learning to upgrade your skillset, not even in a “I don’t use this every day but it will make me smarter learning it” way. (If you’re looking for a math power-up, I’ll explain why I think Abstract Algebra is the topic to study.)
I think Category Theory for programming is not as popular these days, but for younger me, and particularly in academic/intellectual circles it definitely had an aura. I’m writing this in case I can help someone like my younger self and persuade them to stick with (eg) python instead and learn it well.
It could be the obsession everybody had with monads. They’re a great example of why learning category theory isn’t worth it.
Beforehand, I understood a Monad as a generic container type like List[T]
with a function fmap()
(or map
) that lets you apply a function “to the thing inside the container” and you could chain together. I really like the train tracks metaphor for it.
Now, I’m finally one of those wizards who understands what “a monad is a monoid in the category of endofunctors” is and I’m worse off for it. It’s the same description I just used, but it took 10 hours to learn and added 0 to my programming skillset or intuition. It did not enlighten me as to why monads are useful, when they should be used, or why they are powerful. I know the phrase is used tongue in cheek, but it’s a joke how vacuous it is. It’s not even worth trying to understand it. Let me show you.
A complicated definition that gives little
The “category of endofunctors” in Haskell is actually just types like List[T]
.
Functors are mappings between categories that “preserve structure”, and endofunctors map a category to itself.
In Haskell there’s only one category, so all functors are endofunctors.
This category is called Hask
and it contains the types that variables can have, like Bool
or Number
. The arrows (morphisms) are functions between types like isEven: Number -> Bool
1.
Since Hask is the category of “simple types”, endofunctors are just generic types like List[T]
.
To be able to “preserve the original structure” of Hask
, wherever you have a function from T -> Something
you need to be able to run on a List[T]
and get a List[Something]
, aka “call the function on the value inside the container” with fmap
.
Generic types for containers with a map()
function are powerful! The idea that things like Option
can have a map
function just like Result
, List
and Promise
is really nice and ergonomic. This is 90% of the value of monads anyway, and it’s actually from its parent concept Functor
.
So that’s “the category of endofunctors” bit. It’s container generics. The “monoid in” the category of endofunctors bit is just one more function that lets you unwrap the container and unlocks the ability to chain calls to fmap()
2. Haskell comes with special syntax (<<=
) to chain the calls for you.
So in the end, monads are still “generic containers that can be chained together”. We’re right back where we started, but it took a lot of time to define all of those terms in actual category theory. If I wanted to actually learn about monads I could have spent those 10 hours reading tutorials, figuring out how people use them in practice with like monad transformers, or actually writing code that uses them. That’s the problem: This stuff is so abstract it takes a long time to learn, and it’s not worth the payoff.
Vague descriptions instead of concrete tools
Imagine the kind of programming model you would need to unify a 30 year old COBOL codebase, a Java FactoryFactory enterprise swamp, a compiler written in LISP, and an accounting program written in Forth. You couldn’t use any special features of those languages, and would need to have the simplest programming model. While it would be “the most abstract”, that abstractness wouldn’t really give you much power as a programmer. It’s like thinking that Turing Machines 3 are the best programming language because they’re the most abstract.
Bartosz in his course makes the point that category theory works by completely ignoring the specific properties of the objects it studies, and reducing them to points connected by arrows. For example, if your Category is “types available in Haskell”, you can’t point to “the numeric types” because you can’t talk about anything specific like a number. You can say an object has more arrows pointing to it than other objects, or that there are arrows that form a cycle, or that for every pair of arrows then there’s a third arrow that looks like this… Then it becomes a lot of work to define even simple things like the empty set: it’s “a terminal object”, an object that has an arrow to it from every other object. It’s a lot of overhead to work at this high an abstraction level.
In programming, we have a lot of information about our domain that we constantly use. All the quantities we deal with are finite, like memory, bandwidth and IEEE floats. Sequential reads from disk are faster than random reads. Accesses to RAM at locations aligned to word boundaries are faster than other locations. Sometimes these things cause bugs and we work around them and other times they’re features we use to our benefit. We exploit concrete knowledge about computers to do our jobs, but Category Theory’s MO is to ignore that knowledge so it ends up being too vague to be useful. It’s the polar opposite of Leetcode, which is too specific. But grinding Leetcode is 10x more useful than studying CT: every so often you’ll actually use a DFS.
I feel like this whole Category Theory thing was some kind of gatekeepy “I’m smarter than you” shtick. I’m glad it’s no longer trendy in my circles. If you’re a young student who’s heard about the topic and is thinking of learning it, I hope you’ll reconsider. Even learning “regular” Haskell and bikeshedding between monad transformer libraries is more useful than learning Category Theory.
The shame is that there is pure math that would help most programmers model their systems better and work with composition! It’s Abstract Algebra.
Abstract Algebra is math that will make you better as a programmer
Abstract algebra, or modern algebra, (in math departments it’s just called Algebra) studies how sets of objects change under different operations, and what happens when the properties of the operations vary. It’s usually taught starting with group theory.
Group theory is kind of what laypeople imagine category theory to be! It’s also about abstract objects, with arrows between them. The main practical difference is that there’s an extra stipulation that each object needs to have an inverse: eg 5 needs to have a -5 to add back to 0 to form a group.
Learning category theory is like trying to climb a vertical ice sheet, there’s very little to grab onto. Group theory is less steep and made of rock, with examples you can grab onto and learn from. The “everything needs an inverse” requirement of groups gives them structure, and makes them deeply related to the notion of symmetry 4. Symmetry is in and of itself immensely powerful for programmers: you definitely want to know when you can cut your problem in half because of it.
Abstract Algebra has a very high ceiling too, and you can keep squeezing coding insight juice out of it a long way. I’ve barely started but it’s been useful the whole way, with applications like elliptic curves (they form a cyclic group! that’s why they’re used in cryptography), error correcting codes and Advent of Code cycle finding problems 5.
It’s still plenty abstract if you like that! It builds a lot of muscles for thinking about abstract things like cosets and group actions, but it actually builds intuition about how binary relations compose. In programming, we often define new operations, like “a customer can pause their subscription for 2 weeks at a time.” Then we have to reason about how that interacts with all the other operations in our domain model:
- Can you cancel a pause before it happens? Does it have an inverse?
- What happens if pause-then-cancel vs cancel-then-pause ? Do the operations commute?
- If two user accounts get merged, do the pauses merge? How do mappings (homomorphisms) affect it?
It doesn’t give you any solved answers or blueprints, but you go through so many instances of groups when learning the concept that you get dramatically stronger at reasoning about binary operations. And the really nice thing about it is, it goes really well with Linear Algebra (which arguably is the THE MOST USEFUL MATH FOR PROGRAMMERS) and thinking about commutativity. A lot of the reason matrices are weird is that they don’t naturally commute: given two matrices A, B, AB != BA, and group theory can explain some of the consequences.
Programmers don’t need “the math of math”
Back when I was a impressionable youngster, one of my naïve reasons for wanting to learn CT was to wield “the math behind math itself”. I thought that I would learn this special thing and then get superpowers: x-ray beams would shoot out of my eyes and I’d be able to do laps around mathematicians.
In practice you need to learn a ton of (non-CT) math before “exhausting” its patterns and needing to abstract. There is a lot of powerful knowledge in each Category that you can learn and apply before you need to generalize across Categories.
A lot of important developments from the impossibility of a quintic equation to the proof of Fermat’s Last Theorem use Abstract Algebra. You can do a whole undergrad (several years of study) in math before needing the “power” of the genericness of CT.
How useful is math to programming?
Math isn’t even the best avenue to upskill your programming. Ask every super-senior programmer, they’ll all tell you: the most effective way to get better as a programmer is to write mountains of real world code. The second most effective is to read mountains of real world code.
Those two things are so valuable that for the rest of the list of potential improvements, the ranking doesn’t really matter. Reading and writing tons of code will make you at least 3x better in your day to day, everything else like IDEs/workflows, leetcode, and even math just pales in comparison, probably like 1.03x each. (The real percentage improvement is impossible to measure and is the source of much debate online.)
I’m not against doing math, I’m doing a ton of it. Eventually, if you hit a point where you’ve read mountains of code already and want to get to the next level, the smaller improvements add up and can be worth it. Just don’t start with Category Theory, pick up Abstract Algebra 6 and Linear Algebra first. They will have a much higher payoff, both in short-term applications and in long-term conceptual patterns. You can even layer on CT after that, and I’m sure it will enhance the base you’ve built with algebra.
-
That’s another one of my disappointments: it’s entirely about doing logic on types and combining them instead of the contents of the functions. It can say next to nothing about what’s in the functions themselves and how they’re built. ↩︎
-
It also requires an identity “empty box” function called
pure
but that’s not hard to understand. If you think of them as “container types” you’ll already assume there’s a way to build an empty container. ↩︎ -
Same can be said for the Lambda Calculus. Understanding it doesn’t make you that much better of a programmer. At least LISP is so simple you have no choice but to go and write lots of code. ↩︎
-
A symmetry is a “an action you can do to an object that leaves it looking the same”. Since it looks the same, it means there’s also an opposite action that also “undoes” the symmetry. Groups are symmetries. ↩︎
-
A lot of AoC problems benefit from undergrad level math :) ↩︎
-
For linear algebra, Linear Algebra Done Right by Axler is fantastic. For algebra, I skimmed Contemporary Abstract Algebra by Galian, then watched this lecture series of a Harvard algebra class by Benedict Gross and followed along in the textbook, Algebra by Artin. ↩︎
comments powered by Disqus