Microservices — The Good, The Bad, The Ugly
Lantum embarked on the journey of building a microservices based architecture over a year ago, and it has been a long journey. Over this time we have learned quite a few things about such an architecture that no one tells you in the text books. I share this knowledge in the hope that if you are in the process of making architecture decisions or have already chosen to build a microservices based architecture then you have some practical experiences to make your future decisions on.
If you are unfamiliar with what a microservices based architecture is then I would recommend you start with Martin Fowler’s introduction to it.
The Good
- Speed of deployment: When your code base is broken down in to very small manageable microservices, it becomes easier modify one microservice. You can be confident about your change and test it quickly.
- Resilience: A microservices based architecture provides resilience based on a simple principle: If a monolith fails, everything fails, but if a microservice fails, only a small part of the overall system (delivered by that microservice) fails.
- Scalability: We are referring to scalability of development here. As the team size grows it is natural to split them up into smaller sub-teams responsible for specific parts of the system. A microservices based architecture allows you to define separate code bases for each of your microservices. This makes it easy to allocate microservices to sub-teams as your company grows.
- Right language/tech for the job: The separate code bases provide another advantage — you can use the right language/tool for the job. If one microservice is compute expensive then you can write it in a compiled language. If another microservice requires integration with a legacy system that only works with Java, you can write that microservice in Java.
- Maximisation of developer skills: The separate code bases also allows you to maximise use of existing developer skills. If you have developers that know different programming languages or technologies well, then they can use their favourite technologies within their own microservices (provided the technologies are suitable for the job at hand of course). This will boost developer productivity and reduce technical debt that one picks up when one writes production systems in languages one is new with.
- Independent deployability: Microservices are usually deployed in their own individual containers allowing them to be brought up/down or deployed independently. This independence usually reduces the amount of downtime the overall system would face. With correct tooling there is no down time, which helps move towards continuous delivery.
- Ease of maintenance: Microservices by definition perform a very specific function and thus have a small code base compared to a monolith. This makes it easy to read and maintain. This is very valuable as the team that manages the microservice changes quite frequently.
- Multiversion deployment: Because microservices are deployed as independent “machines” talking to each other over a network, it is easy to deploy two different versions of the same microservice together. With appropriate tooling, you can then use this capability to do Canary Releases, A/B testing, etc.
- Separate Database: A good microservices architecture will also ensure that each has its own database. This ensures that there is no single point of failure or that the database does not become a bottleneck.
The Bad
Microservices architectures come with its own set of headaches and challenges, which I briefly describe below.
- Orchestration complexity: In a monolith, one only needs to worry about keeping one application and its replicas up. However, with a microservices based architecture the number can be quite significant. For example, Lantum has about 40 microservices and Monzo has about 200 microservices today. One needs to worry about making sure each microservice is up, reachable, has replicas, and can be deployed easily. Further deploying new versions of microservices and rolling them back should also be easy to do. This makes traditional tools of orchestration unusable. Luckily tools like Kubernetes and Docker Swarm have emerged that make our lives easier.
- Bespoke development tools: In a monolith world, a developer can easily run the application and test his/her code locally on the dev-machine. With large number of microservices though, this becomes harder. So effort is needed to explore ways of running all microservices locally and doing things like livereload for faster development. Since each microservice is a “machine” in itself, even if the tooling has been figured out developer machine’s computing and disk capacities may not allow running the whole setup locally.
- Code repetition: It is common to have pieces of code that get used as utilities across an application. Usually such utilities are organised in to a file/directory of its own and the application uses it by directly importing the relevant stuff. Something as trivial as this can become a logistical nightmare in a microservices based architecture. An obvious approach would be to convert this common code into a library that each microservice imports. But there are two challenges to this approach. First, the moment you have to make a change in the library, you need to rebuilt and redeploy all of your microservices for them to use the latest library. If you do not do this, then different microservices may be using different versions of your library which may lead to errors. Second, you need to implement the library in the all the programming languages you use across your microservices, or at least find a reasonable way to import your library across different languages. An alternative to using a library is to make the library a microservice and allow all the microservices access the functions over API calls. This, however, makes the functions orders of magnitude slower due to the network overheads.
- Transaction management: Managing transactions with a monolith and a single database is a solved problem. However, what do you do when you have a transaction that needs to span multiple microservices? There are a few approaches being explored right now, but I don’t believe this has been solved yet.
- Database join limits: In a single database, getting data from two different tables joined on a field is a quick operation. However, with a microservices based architecture, performing join on data present in two different microservices (and thus two different databases) is not possible. One workaround to this problem is to define an aggregator microservice and keeps joined version of the data with it, which then requires you to ensure the data is updated. The other approach is to carefully design microservices so that frequent and large joins are required in the same microservice.
- API delays: When data from multiple microservices is required to perform an operation, the API calls can contribute significantly to the processing times. To elaborate this, suppose you need to fetch data from 5 microservices to perform a task. For each API call you make, you will incur delays related to DNS lookup, TCP connection, HTTP communication, serialisation and deserialisation of the data, and any initialisation the microservices need to do to serve the data. None of these delays would have occurred if you were operating a monolith where fetching data simply required database lookups.
- Asynchronous communication: In a monolith, one often chooses to carry out certain tasks asynchronously, to optimise end user experience. The same principle applies in a microservices based architecture. However, with large number of microservices, one ends up relying on asynchronous communication a lot more. This usually means that the design and implementation of asynchronous channels needs significant thought.
- Log collation: Collecting and rotating logs in monoliths is trivial with several solutions often baked into the tools you use to write your applications. However, with microservices, you will need special tooling to collate your logs across microservices to be able to gain a unified view of system activities. Solutions like Papertrail and Datadog can help here.
- Debugging: Debugging faults across microservices can get tricky without chronologically ordered combined logs.
The Ugly
The ugly thing about microservices is that once you decide to do microservices, you get all the bad things about microservices for free but you need to work to take advantage of its benefits. For example, at Lantum we continue to struggle with common code being used across microservices. We took the library approach but ensuring that all the microservices are using the latest version of the library has been hard to guarantee. We use Papertrail to collate our logs in production environments but don’t have a tool that would do the same on developer machines making things hard for developers when they need to build a feature that spans multiple microservices. Therefore, choosing a microservices based architecture requires careful consideration.
For an early stage startup, going the microservices way frontloads a lot of the orchestration and tooling work. Time that should ideally be spent in finding product market fit, could end up being spent in keeping things running reliably. A better approach may be to carefully define apps/modules and build an abstraction layer/stub through which apps/modules communicate with each other. Later in the company’s life when it makes sense to split the code base into microservices, each app/module can be made into a microservice one by one using the strangler pattern.