Teaching people to write tests by telling them to use mocks is a huge mistake. Mocks are useful in very limited situations. Most of the time, they do nothing more than let broken code pass unnoticed. The vast majority of everyday tests shouldn’t use mocks at all. Bold claim? I have code to back it up.
Spis treści
Code
A typical mock-based test:
@Test
void emergencySignalClosesAllRoadsSoThatAmbulanceCanPass() {
var lights = (Map<String, LightColour>) mock(Map.class);
var control = new TrafficLightsControl(lights);
control.signalEmergency();
verify(lights).put("road1", RED);
verify(lights).put("road2", RED);
}
What could possibly go wrong?
The test is green, readable, coverage is at 100%, all metrics look perfect. There is nothing to think about. Push, approve, merge, deploy to production, next task, fast, faster.
That’s how I was taught; that’s how it was, is, and will be. If there is a test, it has to be a mock. There is no other way.

Now the production code.
void signalEmergency() {
lights.put("road1", LightColour.RED);
lights.put("road1", LightColour.AMBER);
lights.put("road1", LightColour.GREEN);
lights.put("road2", LightColour.RED);
lights.put("road2", LightColour.AMBER);
lights.put("road2", LightColour.GREEN);
}
How well does this code actually work?

A problem with the person or the code
The biggest misunderstanding is treating similarly botched tests as a one-off mistake, a bad day, a bad developer, or something like that. I disagree. This is not an isolated case. It is a flawed approach to testing that generates recurring problems. If you keep stuffing mocks everywhere, bugs will keep appearing.
Here’s an experiment: in the test above, replace the mock with a simple HashMap. Now try to find a way to break something. Hard, isn’t it?
It is worth pausing to note that turning a crappy test into a solid one required neither special skills nor much effort. The test did not become any less readable after removing the mocks. The real barrier is not knowledge or effort; on the path to better tests, the main obstacles are bad habits and misguided beliefs.
Real projects have the same problem
“This example proves nothing; no one would mock a Map! Traffic lights, twenty lines of code, it’s easy to come up with examples like that. In my code there are real, complex problems, I have to use mocks there.”
The artificial example above is not detached from reality; it is a simplified picture of reality. For several years now, across different projects, I have repeatedly encountered tests where people mock an HTTP response in order to test that their code sets the correct headers in the response sent to the client.
This use of mocks is just as unreasonable as in my made-up traffic-lights code, because it is testing a very similar problem. An HTTP response is, in a sense, a Map with decorations.
In fact, the vast majority of mock usage boils down to situations where exactly the same flaws appear as with the traffic lights above. That is, taking certain tools and applying them where they simply do not fit.
In real code, we do things that are just as stupid as overwriting a good put with a bad put in the fictional traffic lights. The difference is that we do not notice our mistakes, because we have long methods, uncontrolled mutations instead of immutable data, JPA, SQL, transactions, and lazy loading. All the complications that obscure the foolish things we are doing.
The wrong tool

I will now describe what lies at the root of the mismatch between mocks and testing traffic lights, HTTP request processing, and the vast majority of the most practical, everyday, and typical use cases.
When we use mocks, we focus on interactions. Mocking libraries make it easy to express that calling a given method in a given way is expected, or that calling another method is undesirable. As a result, the programmer will focus in the test on what is easy to encode: which methods were called and with what parameters.
Embedded in the claim that mocks are useful for writing tests is a hidden assumption: that thinking in terms of method calls is a good way to design a test. In other words, the assumption that counting interactions is a good mental model for checking whether the code meets its specification.
I am convinced that this assumption is wrong. In the vast majority of code that contemporary programmers deal with, dropping down in a test to the level of concrete method calls leads you astray. The programmer works hard to control things in the test that are not important (which method was called how many times), and forgets to control the things that are fundamentally important (what the final result is). A mock is like a leaky bucket: you can put a lot of effort into writing the test, and still fail to get the situation under control. Just as a leaky bucket is an inappropriate tool for bailing water out of a lifeboat (it is probably suitable only for plants that dislike being overwatered), mocks are an inappropriate tool for writing typical tests—because they represent a model that does not fit typical code.
Moreover, this assumption is wrong regardless of the test layer. It is wrong in very low-level tests: a mock is not a good model of something that has HTTP headers. It is also wrong in high-level tests: a mock is not a good model of a database or a service like AWS S3.
The uselessness of mocks
The assumption that mocks accurately describe code in action survives a clash with reality only as long as our code strictly follows the style described by Martin Fowler as the Minimal Interface. That is, roughly speaking, that an interface has just a few methods, and if you want to do something, it is obvious which method to use—most likely a single specific one, not any of the others.
In a world governed by the Minimal Interface, mocks do in fact allow you to control what the code does. In such a world, tests that use mocks are able to answer whether the code under test works or not.
The problem is that such a world does not exist. What dominates is code where there is more than one way to achieve a goal. Even the Java standard library is not minimalist. If I want to set a red light, I can do put("road1", RED), but I can also do put("road1", AMBER); put("road1", RED)—a different sequence of method calls, the same effect. And that is not the end of it: there are other methods; we have putIfAbsent, we can insert in bulk using putAll. I can call replace.
In real-world code, there are more methods than we remember at the moment of writing a test. The behavior of these methods can be quite complex, because it depends on previous calls (putIfAbsent). The absence of a method call expected by the test does not necessarily mean that the result is wrong (there is nothing wrong with omitting put if the code correctly inserted the value by calling putIfAbsent). Conversely, the presence of a method call expected by the test does not mean that the result is correct (calling put as expected is meaningless if the code then calls another put, or breaks everything by invoking replace, an operation the test does not even account for).
We can get lost even in the most mundane, low-level types from the standard library, in classes used day in and day out, from company to company and from project to project (such as the Map interface shown here). If that is the case, then we will get lost even more in high-level interfaces written for the needs of a specific project, or in the interfaces of some new library. We will not mock them well, because we will not anticipate the possible ways they can be used.
Meticulous work thrown away, or the short life of mocks
Mocks not only make it harder to write correct tests. They also lead to wasted time and money, because the programmer ends up writing tests that will have to be fixed over and over again.
Mocks force us to focus on methods—that is, on the lowest-level layer of the code. They bind the test very tightly to the implementation. Every change, even the most innocent one in a method signature, will require the test to be updated. That is a minor inconvenience if the change is something like reordering method parameters, because the IDE will update the test for us. It gets worse when we change the meaning of a parameter (for example, whether null causes an error or triggers the use of a default value). Then we have to go through the tests and manually update usage after usage.
Or we stop calling one method and start calling another (putAll instead of put)—again, we have to run around fixing tests. Will the test catch the fact that we made a mistake by using putAll? Of course it will not, because we simply throw the test away and write it from scratch.
A better model
It is more productive to focus in a test not on interactions, but on their effects. In the end, the light should be red; it does not matter by what means the code achieved this—whether it called put or putIfAbsent. In the database, there should be an employee with access revoked; it does not matter whether the code called the ORM’s save with the full entity once or twice, whether it executed an update using a prepared statement or a hand-crafted SQL string. What matters is that, in the end, access is revoked in the database. The code may be terrible today but still achieve the goal, and tomorrow it may be refactored to be super clean and fast; what matters is that the test, both today and tomorrow, ensures that the final effect is correct.
How can we write such good tests? One option is to use an implementation that behaves like a HashMap: where the fields are open, we can inspect them and compare them with the expected values. Another option is to change not the code, but the “infrastructure”. We set up a test database, or a test REST API.
Expanding on the first approach: good libraries provide ready-to-use implementations designed specifically for tests. For example, Spring offers MockHttpServletResponse. Instead of trying to mock more than thirty methods of the HttpServletResponse interface (which is just as foolish as trying to write a mock for the Map interface), we take a ready-made, proven implementation and focus on the test logic. If an interface does not come with an implementation suitable for tests, we write our own, following a similar philosophy: everything should be open to inspection in the test, just like with a HashMap.
As for the second approach—using “test infrastructure”—there are convenient tools for this, such as WireMock for REST or Testcontainers for databases (and also for queues, and so on). Using them, it is possible to write tests that are still very fast.
Education
I believe the programming world needs to mature a bit in its approach to mocks, just as it matured in its approach to design patterns and other “discoveries.”
I remember being taught the Gamma book at university as if it were sacred truth, as if there were no other legitimate way for a competent programmer to write code. Strategy, Flyweight, Visitor—give me more! Job interviews tested knowledge of patterns, and then a new hire would sit down at their desk and immediately try to prove they deserved the position: let’s see who can apply more patterns in the code they write. This overuse was supposed to be “by the book”, and prepare the code for changing requirements. And then you would join a project and find overengineering so severe that nothing was visible anymore, and the only way to understand why the code was written at all was to ask the longest-serving person on the team.
That phase is over. No one stuffs a Strategy into the code just in case something might need to become configurable a year from now. Code is supposed to be simple; there are things more important than design patterns.
In a similar way, the Java world has matured in its approach to concurrency. If you put new Thread and synchronized into production code, you will get a question in code review: hey, what are you doing? You do not drop to such a low level—use java.util.concurrent, or Reactor (or whatever concurrency library the project has chosen). I hope that using mocks in tests will start triggering a similar reaction in code reviews.
This is not to say that low-level work with threads is forbidden. In specialized software (for example, when you are writing a concurrency library), working exclusively at that level is necessary. Even in ordinary projects, there are isolated places where such code is required. The same applies to mocks. There will be specific projects (again, likely libraries) where mocks fit naturally. In ordinary projects, there will be cases where testing something without mocks is simply too inconvenient. But everyone should be aware that such usage is something extraordinary, dictated by circumstances—not as the default way of working with code.
It’s time mocks started raising suspicion and doubt in code reviews. It’s time they stopped being treated as something normal. It’s time most work happened without them. It’s time juniors were taught with sample tests that avoid mocks entirely—or, if necessary, use only classes like MockHttpServletResponse, and no Mockito.
Nothing new under the sun
I am not claiming to discover anything new here. Similar topics were raised by Ian Cooper in his well-known talk TDD, where did it all go wrong. Ten years have passed, yet the situation has not improved significantly!
Still, sometimes it is possible to talk about old issues in a new, perhaps more understandable way. I came up with the Map example not long ago, while answering questions after my presentation on problems with mocking. I think the example is interesting, hence this post. It is worth discussing, even topics that seem to have been debated a hundred times.
It is also worth discussing in a friendly way. The dialogues in this post are entirely fictional and were not inspired by real events.
Summary
- Checking interactions (calls to specific methods) is a poor approach to writing tests.
- It is not easy to notice that a test using mocks is flawed; mocks can convincingly pretend that they are checking something.
- For the correctness of the code, it does not matter which methods are called or in what order; what matters is the final effect.
- Mocks distract attention from the final effect, causing the programmer to focus on low-level method calls.
- The world is too complex to be fully mocked and verified: for almost every method, there is usually another method with a different name that does the same thing; and even a single method may behave differently depending on previous calls.
- Use tools that allow you to observe the final effect directly, without depending on method names or call sequences.
- Test with real dependencies, or use fakes that can be inspected like a List or a Map.
Photo credits: Богдан Митронов-Слободской under CC BY 3.0, Tony Webster under CC BY 2.0, Andy Beales, The Silverdalex.