Interesting article, and good analogy. I've recently gone to making the comparison to physical systems. A physical interconnect can only connect so many things (think your steering column in the car) due to the laws of physics (there's only so much physical space). Software allows us to act without this constraint, each object is a point and can be connected to and connect to an arbitrary number of other objects directly. This potential complexity must be actively fought against in order to keep the system maintainable and comprehensible.
I actually thought this was going to go in another direction. When the proof for the four color theorem was published there was a good deal of controversy about it. It had not been fully verified by a person, the bulk of it was generated by computers. The method of generating the elements of the proof was vetted, but not the final output. This has some relation to the issues we have now with machine learning and systems being built based on the generated models that we don't fully understand. It also relates to more complex chain of compilation systems we use these days where each layer of translation introduces the potential for subtle errors or malicious action.
I think the 4-color teaches us that sometimes software has to be shit. A lot of people held out for an elegant proof but what they got was auto-generated spaghetti. I cannot think of a better demonstration of the fact that sometimes you just have to write a lot of boring code and solve all the cases than the fact that an important theorem in mathematics was solved this way.
Nice analogy, I wouldn't have thought of that. But I'm unconvinced that this is a good measure of software complexity; for one, there are only three nontrivial complexity levels (2, 3, 4).
But it's also unstable: a circle with an odd number of nodes (e.g. a triangle) requires three colours, while a circle with an even number of nodes needs only two, even though both models seem similarly complex.
I understand and appreciate the analogy the author made. I wish more people thought about code complexity from first principles.
That being said, I doubt chromatic number is the right measure for software dependency graphs. Especially when there are hundreds of graph complexity metrics (tree-width, centrality measures, measures based on cuts and flows, etc.). Even just now for the first time I saw this thing called cyclomatic complexity: https://en.wikipedia.org/wiki/Cyclomatic_complexity
I'd be interested to hear from someone who has experience applying graph metrics to static code analysis in practice. What's useful and informative?
Excellent thought piece - wish HN was filled with more original analogies like this I one.
This reminds me a bit of domain driven contexts (something just added to Elixir/Phoenix 1.3) - reducing the surface area of sub nodes from each other by exposing that single point interface.
Quick addendum: although the four color theorem proof always seems to require a case analysis at some point, proving that planar graphs admit 5-colorings can be done with a very short proof. Not particularly relevant to the analogy in the post, but if you want proof that planar graphs admit constant-sized colorings that's the one for you.
I can't help thinking that it should then be possible to write a tool that ingests a codebase and shows you a graph similar to the ones in this article. Perhaps that would be a good way to figure out where to start working on reducing the code complexity. But I have to think on this for a while to see if I still believe it after mulling it over.
It's not obvious, but the graphs on this page are interactive. You can drag the nodes around, and click them to change their colors.
I think if you want to represent dependencies by a graph, it should be a directed graph, not an undirected graph.
I think once you put in direction, you will see that cycles are much more important than overlaps in edges in determining complexity. If you can keep the graph acyclic, life is pretty easy. I think big cycles are worse than small cycles, cycles that opverlap each other make things much more complex.
In the context of HOTT? That most code in the real world can be embedded in low dimensions, and when you embed something on a surface of small genus you can do a bunch of optimizations.
Also, that you can use the finite set of minimal graph minors for a surface of genus X to do case analysis instead of having an infinite amount of cases to test with.
> The first thing that struck me, as a software engineer, is that four colors is not very many.
I really like the visualization. It's a happy medium between something like UML and something that's actually useful, and creates useful constraints around communication channels between classes.
I'm sure there's some design pattern that breaks the rule, but overall trying to force a "4 color" programming pattern seems like it can't do anything but help.
When I was told about this many years ago, it was described as the 'n factorial problem'. This problem assumes that directionality of the communication is important. If you have two nodes, they are 2! ways to communicate. If you have three things, there are 3! or 6 ways to communicate between the nodes. The suggested solution was similar, group components together behind a black box facade so that 30 components can be reduced to 3! main communication paths or less.
If you were weirdly interested in the four color theorem, you might like this common coding interview question about graph coloring: https://www.interviewcake.com/question/graph-coloring
How do you color things like a method returning a un/ordered list which you are depending on? There’s no code to ”color”?
Did he missed a connection between the top red blob and the green one?
Also, what's the name for the process for transforming a "map" into a connected network, just like he did at the beginning?
I thougut this article was going to be about Appel and Haken's proof of the Four Color Theorem, as it was the first to use software and not an actual analytical proof. As a result mathematicians were split: while this proof may have shown the theorem was true, it didn't explain why it was true. Because the latter was reduced to a brute force search.
Shouldn't there be a connection between the top-right red circle and green circle in that very first graph?
Also known as tight vs loose coupling.
This is a nice analogy, but these days I feel like abstraction is usually a bad thing. By its very nature, making things more abstract makes it less clear what you are actually doing, you end up going through all these layers of indirection. If I want to use the data from some node (aka object) way over there, I should just be able to use it, without it getting pulled through all of these other things first.
To use the same node graph analogy, when you focus too much on reducing the number of colors, you end up needing to create tons of nodes between the nodes that actually DO SOMETHING. Your software ends up more complex and difficult to manage.