Home / Technical Insights / Resource Centre / The Big Refactor

The Big Refactor

Written by Jarl Ostensen

Lessons learned transforming a deep OO C++ code base

TL;DR

Over time, codebases that evolve with traditional OO methods can become hard to read and maintain. In this article, we look at how we transformed a large-ish code base from OO-style C++ to “c-with-c++” and improved readability, maintainability, and our ability to reason about our code.

The starting point

We had developed a codebase to support a Windows desktop client application. Fundamentally it consisted of a number of service layers interacting with remote servers and marshaling information to and from the client PC.
The architecture had started out with a high degree of flexibility in mind with support for multiple platforms, etc, and had evolved organically. Maintainability and extensibility had slowly suffered but “if it ain’t broke, don’t fix it” so we had not had to properly overhaul until we saw a need to move the code base to multiple platforms, and to meet the demands of an ever-growing set of feature requirements.

The analysis highlighted a couple of “anti-patterns” which were at the root of the problems with the code base:

Runtime manifest architecture

The code depended on a number of features (including virtual inheritance) which only become manifest at runtime, making static reasoning very hard. In our cases it also turned out that most of these dynamic dependencies could be made static and decided upon at compile-time, so we were paying the cost of all this extra plumbing for no real benefit. What exacerbated the situation was that as all this plumbing was invisible – hidden in virtual method calls and behind pointer dereferences – it was only when you tried to trace through things step by step that it became a clear path the code was actually taking. Our problem wasn’t the virtual mechanisms themselves but that we had unnecessary use of them which ended up adding complexity at the call site.

Documentation can help and we had made efforts as much as was sensible but it does not help if you have to runtime trace the code by hand in order to understand it.

“Action at a distance” is what we call these types architectures where the behaviour of code in one location depends on state and decisions made in different parts of the code at different points in time during execution. The amount of mental state one has to maintain in order to trace these actions and form a coherent picture of execution paths is unwieldy. This in turn can lead to shortcuts and slicing through the intended architecture in order to get things done, in particular when new people start working on it.

The remaining anti patterns are contributing factors to this fundamental problem and so the goal of the refactoring exercise largely became one of making the code statically and locally understandable.

Classes as namespaces

This seems fairly common and is a result of the tendency to default to classes when designing code in c++.
Remember that a class is a type in c++, whereas a namespace is something you use to group functionality logically.
The former is something you can have many of, manifest as objects, the latter is just a handy way of allowing you to name things in better ways.
The rule-of-zero certainly helps here, but much of the code had become intertwined with custom types and classes which made it difficult to take full advantage of it without considerable refactoring and we were left with many classic boiler-plate examples like this;

class MySimpleClass
{
public:
    MySimpleClass();
    MySimpleClass(const MySimpleClass&);
    MySimpleClass(MySimpleClass&&);
    MySimpleClass& operator=(const MySimpleClass&);
    MySimpleClass& operator=(MySimpleClass&&);
    ~MySimpleClass();
    
    bool MyLittleFunction(int);
};

In addition to the “cognitive overhead” we’ve created quite a lot of code to support the type MySimpleClass when perhaps it is nothing but a logical grouping of functionality. You might consider a singleton for this job, a purely static class of which there is only one but then a namespace (or nothing!) can be just as good an option. Arguably this looks pretty simple, doesn’t it?

bool MyLittleFunction(int);

A note on singletons
they’re marginally useful in cases where you need a class (i.e. type) and have to restrict objects of that type to one instance but mostly you probably don’t…
The design principle we adopted in this refactoring exercise was to strictly limit the use of classes in the public APIs if (and only if) they strictly corresponded to a type. This meant that pretty much all classes in the public APIs from the original design disappeared and were replaced with free functions in namespaces.

Stateful pipelines, or “high-latency programming”

This reminds me of rendering pipelines in graphics programming where you build up a large amount of state for various stages of a pipeline, then execute the pipeline. This is analogous to classes that contain a large amount of state that is set with mutators, and this state influences methods executed (sometimes much-) later. This approach can greatly exacerbate the problem of reasoning about code flow because the state is built up in different stages in different physical (i.e. source file) locations at different times during execution. When you finally get to the point where the state is being realised the decisions that have influenced the buildup of this state are lost or hidden.

Deeply nested responsibilities

This is not so much a c++ problem as a problem of overzealous encapsulation in general. It is probably a form of “premature optimisation” where we see some code that we think could be useful elsewhere and which we, therefore, pull out into a separate function. It is a sound principle but when overused, or used prematurely, it reduces locality and further complicates our ability to reason about the code.
Just like with classes; unless there really are more than one (uses) of it, don’t treat it as if there is. In our codebase, there were a number of utility and helper functions appearing that turned out to have only one reference, but because these lived in different source files they added to the cognitive load when parsing the code by hand. Simply doing a manual “inlining” of these functions greatly helped on readability in addition to being a pragmatic and healthy approach of not fixing problems that are not there. Yes, you may later have to break that code out into a function, but at that point, you will probably also have a much clearer idea of its use and benefits than you have at the point where you are just “scratching an itch.”

Functions and classes as programmes

This appears to be a fairly common anti pattern; build a class with complex functionality and system resource management. In our case we had classes that under the hood managed threads behind seemingly harmless functions. This made global setup and teardown difficult, in addition to negatively impacting performance. For code like ours, where a lot of it resides in DLLs, there are also strict limitations on what can be done at what point during the loading process and a verbose and explicit API will save you a lot of hard-to-squash bugs later than overly clever encapsulation. Also, threads might seem like a nice way of hiding work and making things seamlessly asynchronous, but threads must be managed globally across an application. Leaving them to sub modules or hiding them behind class implementations is a very bad idea, as is memory allocations and network access and any other performance critical, and costly, resource.

After the refactor

  • The class hierarchy is completely gone as part of the API. Classes are used internally but not publicly exposed.
  • The code is broken out into a variant of a service oriented architecture, i.e. presented as self-contained, black box, that as much as possible is stateless.
  • An emphasis on locality; instead of setting state somewhere in a class object which is then implicitly assumed by its assumed by its methods, we explicitly provide the state at the call site.
    • Oh look, it’s C, I hear you say…yes, yes, alright then.
  • In addition to removal of class hierarchies, we have overall reduced the depth of implementations, i.e. the number of function calls you have to step into to comprehend the path the code takes. We’re now trying to keep it as shallow as possible.
  • The template method pattern worked really well because it isolated large amounts of boilerplate and flow control code and avoided copy-and-paste in implementations, ultimately this lowered maintenance cost, and we kept this pattern in the refactored version.
  • Where template methods would have been implemented using virtual inheritance we use std::function, or just plain function pointers, again emphasising the locality of state and functionality. Furthermore, we use C++ lambdas…a lot…and this encourages locality even more.
  • We adopted a principle of minimising the nesting depth of code, tearing code out of their little functions where they were much better kept right there at the call site.
  • Peer code reviews! We use GitHub and enforcing pull requests means that more eyes can help spot at least some of the issues arising from a misunderstanding of code intent.

In Conclusion

Writing good, readable, and maintainable, C++ code is HARD. Refactoring often, questioning your assumptions, often, and remaining pragmatic about what solution you pick for a given job is crucial. Resist the urge to jump straight into a class hierarchy, ask yourself if it makes sense, and think about how runtime and static behaviour is mixed. An architecture that only becomes manifest at runtime is very difficult to reason about.

Stay safe, and good luck.

The Big Refactor

99 Walnut Tree Cl,
Guildford GU1 4UQ

© Elektraglide Ltd 
Registered in England & Wales #09041146

Keep In Touch

Become part of our ground-breaking community.


Share via
Copy link