Do Unit Tests Make Refactoring Harder?

This is probably one of the most common narratives against TDD (test-driven development) that I hear. It’s often stated by developers who may have shortly tried TDD at some point but then had a bad experience with it. It is very common for developers (and humans in general) to conclude that the technique does not work, instead of looking deeper into what they are missing. Let’s look into why “TDD (or unit tests in general) makes it harder to refactor” is a big misconception.

The tests are strongly coupled to the implementation

The obvious reason why someone might find it hard to refactor with unit tests in place is that the tests are too strongly coupled to the implementation details, so the tests need to be significantly changed whenever the implementation details change. I am emphasizing significantly because very small automated refactorings like renaming or reordering parameters should be possible without problems with or without tests, but should definitely not be made harder by tests.

This is also the first counter argument by TDD practitioners you often hear, and I agree with it, but I find it problematic because it focuses too much on the tests. I suspect the skills required to understand the intention of that argument might be the same as those required for effective TDD, and without those skills it might still sound like the tests are the problem because the tests are coupled.

All callers are strongly coupled to the implementation

Unit tests, especially with TDD, are in a sense just examples of API invocations that specify expected behavior. When practicing TDD, meaning you write the test first and let the test specify the API (thus driving the design), that test is just the first caller of your new API. Once that piece of functionality is implemented, you call it from other places in the code base trusting that it behaves as expected. Without TDD you still have a first caller anyway, and probably many more. You still decide what the API (e.g., the function signature) should look like, only that in that case you don’t verify the behavior in 10 milliseconds, but in a few minutes by clicking through your GUI. With a unit test written first, you would just have one more caller in total.

Unit tests are not completely at fault for being strongly coupled to implementation details. Since they are just the first caller, it is easy to miss that without a unit test written first you still call your function somewhere and you still strongly couple that caller to the implementation details of that function. The techniques necessary to create better abstractions (“zooming out”, technical empathy) apply in the same way. So you change the implementation details and then fix all of the coupled callers. The difference is that if you’re not used to having tests in place, or even changing them first, fixing the tests afterwards seems like additional work, instead of just fixing one more caller compared to the six or so call sites you just fixed.

So, what can we do to reduce the coupling from tests to implementation details? We can test the same behavior at a higher abstraction level. This does not mean replacing the unit test with an integration test, but we can create a sociable unit test at the boundary of the component. We can (and should) also listen to our test, and after writing a test first, we pause for a moment and ask ourselves (or a colleague) “How easy is this to change?”. Then we make it easier to change while we have only a single caller before calling the API in five different places.

But the specific techniques for reducing the coupling are not the focus of this article. Instead, I want to emphasize this mindset shift:

The test is only coupled to the implementation details because we didn’t change that when the test was the only caller.
All callers are strongly coupled in the same way to the implementation details, and the test is only one of those callers.
This is caused by the API (which we should have actively driven through the test) and the callers may not have a choice. Therefore we should listen to the test and improve the API instead of blaming the messenger; the test only tells us about the coupling.