Lessons learned from building Microservices – Part 3: Patterns, Architecture, Implementation And Testing

Introduction

In this blog post I will go over things I’ve learned when working with microservices. I will cover things to do and things that you should not do. There won’t be alot of code here, mostly theory and ideas.

I will discuss most of the topic from a smaller development teams point of view. Some of these things are applicable for larger teams and projects but not all necessarily. It all depends on your own project and needs.

  • Architecture
  • Sharing functionality (Common code base)
  • Continuous Integration / Continuous Delivery (CI/CD)
  • High Availability
  • Messaging and Event driven architecture
  • Security
  • Templates and scripting
  • Logging and Monitoring (+metrics)
  • Configuration pattern
  • Exception handling and Errors
  • Performance and Testing

General advice

Generally I advice to use and consider existing technologies, products and solution with your application and architecture to speed up your development and keep the amount of custom code at a minimum.

This will allow you to save on errors, save on time and money.

Still, make sure that you choose technologies and products that support your and your clients solutions; not things that you think are “cool” right now or would be fun to use. Your choices should fit the needs and requirements on not only your project but the whole of the architecture and the future vision of your project.

Architecture

To keep things simple I would say there are two ways to approach microservices.

Approach one: Starting big but small

The first approach is the one you probably are familiar with. These are big projects by big companies like Amazon, Netflix, Google, Uber etc.

Usually this involves creating hundreds or even thousands of microservice on multiple platforms and technologies. This usually require large teams of people both developing, deploying and up keeping the microservice solution they are working on.

This approach is definitely not for everyone; it requires alot of people, resources and money.

So this is why I recommend approach number two.

Approach two: Starting small but plan big

Most likely you have a team of a few people and limited resources. In this case I recommend starting between microservice architecture and monolith one.

By this I mean that you start the process by designing it and implementing all the infrastructure of a microservice but do not start splitting you application into microservices from the start. Create one microservice, expand it then split it when things start to grow so that it feels like a new microservice is needed.

By this time you have had time to understand and analyze your business domain. Now you have an idea what kind of a communication between microservices you need; perhaps HTTP based or decoupled messaging based.

When you are creating you microservice keep you design pattern for you microservices simple. Do not implement overly complicated patterns to impress anyone, including yourself. It will make the upkeep of you microservices a hell. You want to keep the code as simple and as common inside a microservice and between them.

Share as much as possible between microservices.

Create good Continuous Integration and Continuous Deployment procedures as soon as possible. It will save you time.

Verify you have proper availability and scalability based on your application needs.

Prefer scripting to automate and speed up development and upkeep.

Use templates everywhere you can, especially for creating new microservices.

Have a common way to do exception handling and logging.

Have a good configuration plan when deploying your Microservices.

You also need team member who are not afraid to do many different things with multiple technologies. You need people who can learn anything, adapt and develop with any tool, tech or language.

With these approaches and check list you should be able to manage a microservice architecture with only a handful of people. For up-keeping even one person is enough, but constant development at least two or three would be a good amount.

Sharing functionality (Common code base)

When it comes to code I prefer the “golden” rule in programming to not repeat myself but with microservices you will end up with duplication.

The wise thing to do with microservices is to know when to not duplicate and have a common access to shareable code and functionality; and why do this?:

  • Developer and up doing similar code that is used again and again in multiple microservices.
  • These common pieces of code and functionality end up having the same king of problems and bug which have to be corrected in every place
  • The problems and bugs cause security issues
  • With performance issues
  • With possible hard to understand code even when the logic and functionality is the same but the code ends of being slightly or vastly different.
  • And lastly all of the above combined cause you to spend time and money that you may not have

The next question to ask is:

What should you share? The main rule is that is must be common. The code should not be specific to a certain microservice or a domain.

Again it all depends on your project but here are a few things:

  • Logging, I highly recommend this to a unified logging output format that is easy to index and analyze.
  • Access Logs
  • Security
    • User authorization but not authentication or registration. Registration is better suited as an external microservice as it’s own domain.
    • Encryption
    • JSON Web Token related security code and processing
    • API Key
    • Basic Auth
  • Metrics
  • HTTP Client class for HTTP requests. Create an instance of this class with different parameters to share common functionality for logging, metrics and security.
  • Code to use and access Cloud based resources from AWS, Azure
    • CloudWatch
    • SQL Database
    • AppInsights
    • SQS
    • ServiceBus
    • Redis
    • etc…
  • Email Client
  • Web Related Base classes like for controllers
  • Validations and rules
  • Exception and error handling
  • Metrics logic
  • Configuration and settings logic

How should you distribute your shared functionality and code? Well it all depends on your project but here are a few ways:

  • One library to rule them all :D. Create one library which all projects need to use. Notice: This might become a problem later on when your common library code amount grows. You will end up with functionality that you may not need in a particular Microservice.
  • Create multiple libraries that are used on need basis. Thus using only bits of functionality which you need.
  • Create Web API’s or similar services that you request them to perform a certain action. This might work for things like logging or caching but not for all functionality. Also notice that you will lose speed of code to latency if you outsource your common functionality to a common service that is run independently from your actual code that needs that common functionality.
  • A combination of all of the above.

Dependency injection

Use your preferred dependency injection library to manage your classes and dependencies.

When using your DI I recommend thinking of combining classes into “packages” of functionality by feature, domain, logic, data source etc. By doing this you can target specific parts of your code without “contaminating” the project with unneeded code even if you have a large library like a common library.

For example you could pack a set of classes that provide you the functionality to communicate with a CRM, get, modify and add data.

This would include your model classes. A CRM client class, some logic that operate on the model to clean them up etc.

Once you identify them, make your code that that you can add them into your project with the least amount of code.

Also consider creating a logic to automatically tell a developer which configurations are missing once a set of functionalities are added. The easiest way to achieve this is to add this at compile and/or runtime with checks.

See my previous article on this matter for a more detailed description:

https://lionadi.wordpress.com/2019/10/01/spring-boot-bean-management-and-speeding-development/

Continuous Integration/Continuous Delivery (CI/CD)

There are may ways of doing CI and CD but the main point is that ADD it and automate as much as possible. This is especially important with microservices and small team sizes.

It will speed things up and keep things working.

Here are a few things to take into consideration:

  1. Create unit tests that are run during your pipelines
  2. Create API or Service level tests that verify that things work with mock or real life data. You can do this by mocking external dependencies or using them for real if available.
  3. Add performance tests and stability tests to your pipelines if possible to verify that things run smoothly.
  4. Think of using the same tool for creating your API or service tests when developing and when running the same tests in a pipeline. You can just reuse the same tests and be sure that what you test manually is the same that should work in production. For example: https://www.postman.com/ and https://github.com/postmanlabs/newman
  5. Script as much as possible and parametrize your scripts for reuse. Identify which scripts can be used and shared to avoid doing things twice.
  6. Use semantic versioning https://semver.org/
  7. Have a deployment plan on how you are going to use branches to deploy to different environments (https://www.atlassian.com/git/tutorials/comparing-workflows/gitflow-workflow), for example:
    1. You can use release branches to deploy to different environments based on pipeline steps
    2. Or have specific branches for specific environment. Once things are merged into them certain things start to happen
  8. Use automated build, test and deployment for dev environment once things are merged to your development branch.
  9. Use manual steps for other environments deployment, this is to avoid testers to receiving bad builds in QA or production crashing on bugs not caught up.
  10. If you do decide to automate everything all the way to production, make sure you have good safe guards that things don’t blow up.
  11. And lastly; nothing is eternal. Experiment and re-iterate often and especially if you notice problems.

Common tools to CI/CD

https://www.sonatype.com/product-nexus-repository

https://www.ansible.com/

https://www.rudder.io/

https://www.saltstack.com/

https://puppet.com/try-puppet/puppet-enterprise/

https://cfengine.com/

https://about.gitlab.com/

https://www.jenkins.io/

https://codenvy.com/

https://www.postman.com/

https://www.sonarqube.org/

High Availability

The main point in high availability is that your solution will continue to work as well as possible or as normal even if some parts of it fail.

Here are the three main points:

  • Redundancy—ensuring that any elements critical to system operations will have an additional, redundant components that can take over in case of failure.
  • Monitoring—collecting data from a running system and detecting when a component fails or stops responding.
  • Failover—a mechanism that can switch automatically from the currently active component to a redundant component, if monitoring shows a failure of the active component.

Technical components enabling high availability

  • Data backup and recovery—a system that automatically backs up data to a secondary location, and recovers back to the source.
  • Load balancing—a load balancer manages traffic, routing it between more than one system that can serve that traffic.
  • Clustering—a cluster contains several nodes that serve a similar purpose, and users typically access and view the entire cluster as one unit. Each node in the cluster can potentially failover to another node if failure occurs. By setting up replication within the cluster, you can create redundancy between cluster nodes.

Things that help in high availability

  • Make your application stateless
  • Use messaging/events to ensure that business critical functionality is performed at some point in time. This is especially true to any write, update or delete operations.
  • Avoid heavy coupling between services, if possible and if you have to do use a lightweight messaging system. The most troublesome aspect of communicating between microservices is going to be over HTTP.
  • Have good health checks that are fast to respond when requested. This can be divided into two categories:
    • Liveness: Checks if the microservice is alive, that is, if it’s able to accept requests and respond.
    • Readiness: Checks if the microservice’s dependencies (Database, queue services, etc.) are themselves ready, so the microservice can do what it’s supposed to do.
  • Use “circuit breaker” to quickly kill unresponsive services and quickly run them back up
  • Make sure that you have enough physical resources(CPU, Memory, disk space etc) to run your solution and your architecture
  • Make sure you have enough request threads supported in your web server and web application
  • Make sure you verify how sizable HTTP requests your web server and application is allowed to receive, the header is usually that will fail your application.
  • Test your solution broadly in stress and load balancing tests to identity problems. Attach a profiler during these tests to see how your application perform, what bottlenecks there are in your code, what hogs resources etc.
  • Keep your microservices image sizes to the minimum for optimal run in production and optimal deployment. Don’t add things that you don’t use, it will slow your application down and deployment will suffer; all of this will lead to more needed physical resources and more money needed.

Messaging and Event driven architecture

I will be covering this topic in an upcoming post but until then here are a few pointers.

Because of the nature of Microservices; that they can be quickly scaled up and down based on needs I would very highly recommend that you use messaging for business critical operations and logic.

The most important ones I would say are: Writing, Updating and Deleting data.

Also all long running operations I would recommend using messaging.

Notice: One of the most important thing I consider is that you log and monitor the success of messages being send, processed and finished, with trailing to the original request to connect logs and metrics together to get a whole picture when troubleshooting.

I have coveted this in my previous logging post: https://lionadi.wordpress.com/2019/12/03/lessons-learned-from-building-microservices-part-1-logging/

Security

Generally security is an important aspect of any application and has many different topics and details to cover.

Related on security I covered this extensively in my last post in this series on Microservices, go check it out: https://lionadi.wordpress.com/2020/03/23/lessons-learned-from-building-microservices-part-2-security/

Templates and scripting

To speed up development, keep things the same and thus avoiding duplicate errors + unnecessary fixes, use templates where possible. This is especially true for Microservices.

What are possible templates that you could have:

  • Templates for deploying Cloud resources like ARM for Azure or Cloudformation for AWS.
  • Beckend Application templates
  • Front application templates
  • CI/CD templates
  • Kubernetes templates
  • and so on…

Anything that you know you will end up having multiple copies is good to standardize and create templates.

Also I recommend that for applications (front or backend), it is a very good practice to have the applications up and running as soon as you duplicate them from your repository. They should be able to be up and running as soon as you start them.

Script as much as possible and make the scripts reusable.

Parametrize all of the variables you can in your scrips and templates

Here are a few things you would need for a backend application template:

  • Security such as authentication and authorization.
  • Logging and metrics
  • Configuration and settings logic
  • Access Logs
  • Exception handling and errors
  • Validations

Logging and Monitoring (+metrics)

Again as with security this is a large topic, I’ve also written about this in my previous post in the series and recommend go checking it out:

https://lionadi.wordpress.com/2019/12/03/lessons-learned-from-building-microservices-part-1-logging/

Configurations pattern

For microservice configurations I recommend a following pattern where your deployment environments (DEV, QA, PROD etc) configuration files configurations/settings values are left empty. You still have the configuration/setting in your configuration/settings files but you leave them empty.

Next you need to make sure that your code knows how to report empty configuration values when your application is started. You can achieve this by creating a common way to retrieve configurations/settings value and being able to analyze which of the needed and loaded configurations are present.

This way when your docker image is started and the application inside the image starts running and retrieving configurations, you should be able to see what is missing in your environment.

This is mostly because you don’t want to have your environment specific configurations set in your git repository, especially the secrets. You will end up setting these values in your actual QA, PROD etc environments through a mechanism. If you forget to add a setting/configuration in your mechanism your docker image may crash and you will end up searching for the problem a long time, even with proper logging it may not be immediately clear.

I’ve written an previous post on this matter which opens things up on the code level:

https://lionadi.wordpress.com/2019/10/01/spring-boot-bean-management-and-speeding-development/

Exception handling and Errors

Three main points with exceptions and errors:

  • Global exception handling
  • Make sure you do not “leak” exceptions to your clients
  • Use a standardized error response
  • Log things properly
  • And take into consideration security issues with errors

Again for for details on logging and security check my previous posts:

https://lionadi.wordpress.com/2019/12/03/lessons-learned-from-building-microservices-part-1-logging/

https://lionadi.wordpress.com/2020/03/23/lessons-learned-from-building-microservices-part-2-security/

For error responses, you have two choices:

  1. Make up your own
  2. Or use an existing system

I would say avoid making your own if possible but it all depends on your application and architecture.

Consider first existing ones for a reference:

https://www.hl7.org/fhir/operationoutcome.html

https://developers.google.com/search-ads/v2/standard-error-responses

https://developers.facebook.com/docs/graph-api/using-graph-api/error-handling/

Still here is also an official standard which you can use and may be supported by your preferred framework or library: https://www.rfc-editor.org/rfc/rfc7807.html

The RFC 7807 specifies the following for error responses and details from https://www.rfc-editor.org/rfc/rfc7807.html:

  • Error responses MUST use standard HTTP status codes in the 400 or 500 range to detail the general category of error.
  • Error responses will be of the Content-Type application/problem, appending a serialization format of either json or xml: application/problem+json, application/problem+xml.
  • Error responses will have each of the following keys(Internet Engineering Task Force (IETF)):
    • detail (string) – A human-readable description of the specific error.
    • type (string) – a URL to a document describing the error condition (optional, and “about:blank” is assumed if none is provided; should resolve to a human-readable document).
    • title (string) – A short, human-readable title for the general error type; the title should not change for given types.
    • status (number) – Conveying the HTTP status code; this is so that all information is in one place, but also to correct for changes in the status code due to the usage of proxy servers. The status member, if present, is only advisory as generators MUST use the same status code in the actual HTTP response to assure that generic HTTP software that does not understand this format still behaves correctly.
    • instance (string) – This optional key may be present, with a unique URI for the specific error; this will often point to an error log for that specific response.

RFC 7807 example error response:

HTTP/1.1 403 Forbidden
Content-Type: application/problem+json
Content-Language: en

{
  "type": "https://example.com/invalid-account",
  "title": "Your account is invalid.",
  "detail": "Your account is invalid, your account is not confirmed.",
  "instance": "/account/34122323/data/abc",
  "balance": 30,
  "accounts": ["/account/34122323", "/account/8786875"]
}
   HTTP/1.1 400 Bad Request
   Content-Type: application/problem+json
   Content-Language: en

   {
   "type": "https://example.net/validation-error",
   "title": "Your request parameters didn't validate.",
   "invalid-params": [ {
                         "name": "age",
                         "reason": "must be a positive integer"
                       },
                       {
                         "name": "color",
                         "reason": "must be 'green', 'red' or 'blue'"}
                     ]
   }

Performance and Testing

Testing

To make sure that your solution and architecture works and performs I recommend doing extensive testing. Familiarize yourself with the testing pyramid which hold the following test procedures:

  • Units tests:
    • Small units of code tests which tests preferably one specific thing in your code
    • The tests makes sure things work as intended
    • The number of unit tests will outnumber all or tests
    • Your unit tests should run very fast
    • Mock things used in your tested functionality: replace a real thing with a fake version
    • Stub things; set up test data that is then returned and tests are verified against
    • You end up leaving out external dependencies for better isolation and faster tests.
    • Test structure:
      • Set up the test data
      • Call your method under test
      • Assert that the expected results are returned
  • Integration tests:
    • Here you test your code with external dependencies
    • Replace your real life dependencies with test doubles that perform and return same kind of data
    • You can run them locally by spinning them up using technologies like docker images
    • Your can run them as part of your pipeline by creating and starting a specific cluster that hold test double instances
    • Example database integration test:
      • start a database
      • connect your application to the database
      • trigger a function within your code that writes data to the database
      • check that the expected data has been written to the database by reading the data from the database
    • Example REST API test:
      • start your application
      • start an instance of the separate service (or a test double with the same interface)
      • trigger a function within your code that reads from the separate service’s API
      • check that your application can parse the response correctly
  • Contract tests
    • Tests that verify how two separate entities communicate and function with each other based on a commonly predefined contract (provider/publisher and consumer/subscriber. Common communications between entities:
      • REST and JSON via HTTPS
      • RPC using something like gRPC
      • building an event-driven architecture using queues
    • Your tests should cover both the publisher and the consumer logic and data
  • UI Tests:
    • UI tests test that the user interface of your application works correctly.
    • User input should trigger the right actions, data should be presented to the user
    • The UI state should change as expected.
    • UI Tests does not need to be performed end-to-end; the backend could be stubbed
  • End-to-End testing:
    • These tests are covering the whole spectrum of your application, UI, to backend, to database/external services etc.
    • These tests verify that your applications work as intended; you can use tools such as Selenium with the WebDriver Protocol.
    • Problems with end-to-end tests
      • End-to-end tests require alot of maintenance; even the slightest change somewhere will affect the end result in the UI.
      • Failure is common and may be unclear why
      • Browser issues
      • Timing issues
      • Animation issues
      • Popup dialogs
      • Performance and long wait times for a test to be verified; long run times
    • Consider keeping end-to-end to the bare minimum due to the problems described above; test the main and most critical functionalities
  • Acceptance testing:
    • Making sure that your application works correctly from a user’s perspective, not just from a technical perspective.
    • These tests should describe what the users sees, experiences and gets as an end result.
    • Usually done through the user interface
  • Exploratory testing:
    • Manual testing by human beings that try to find out creative ways to destroy the application or unexpected ways an end user might use the application which might cause problems.
    • After these finding you can automate these things down the testing pyramid line in units tests, or integration or UI.

All of the automated tests can be integrated to you integration and deployment pipeline and you should consider to do so to as many of the automated tests as possible.

Performance

For performance tests the only good way to get an idea of your solutions and architectures performance is to break it and see how it works under long sustained duration.

Two test types are good for this:

  • Stress testing: Trying to break things by scaling the load up constantly untill your application stop totally working. Then you analyze your finding based on logs, metrics, test tool results etc.
  • Load testing: A sustained test where you keep on making the same requests as you would expect in real life to get an idea how things work in the long run; these tests can go on from a few hours to a few days.

The main idea is that you see problems in your code like:

  • Memory leaks
  • CPU spikes
  • Resource hogging pieces of code
  • Slow pieces of code
  • Network problems
  • External dependencies problem
  • etc

One of my favorite tool for this is JMeter https://jmeter.apache.org/.

And to get the most out of these tests I recommend attaching a code profiler to your solutions and see what happens during these tests.

There is a HUGE difference how your code behaves when you manually test your code under a profiles and how it behaves when thousands or millions or requests are performed. Some problems only become evident when they are called thousands of times, especially memory allocations and releases.

And lastly; cover at least the most important and critical sections of your solutions and keep adding new ones when possible or problems areas are discovered.

These tests can also be added as part of the pipelines.

Topology tests

Do your performances tests while simulating possible error in your architecture or down times.

  • Simulate slow start times for servers.
  • Simulate slow response times from servers.
  • Simulate servers going down; specific or randomly

Test how your system works under missing resources and problems.

Test how expensive your system is

When you are creating tests consider testing the financial impact of your overall system and architecture. By doing different levels of load tests and stress tests you should be able to get a view on what kind of costs you will end up with.

This is especially important with cloud resources where what you pay is related to what to consume.

Modern C++ 11/14 – C# alike events

For a project of mine I needed a functionality like that found in C# events. Since my C++ is a bit rusty 🙂 I looked up some options and found this example which I edited to support variadic templates.

The original implementation(now modified with variadic templates) from: http://geekswithblogs.net/raccoon_tim/archive/2011/09/28/lambdas-and-events-in-c.aspx

template<typename… T1>
class Event
{
public:
typedef std::function<void(T1…)> Func;

public:
void Call(T1 …arg)
{
for (auto i = m_handlers.begin(); i != m_handlers.end(); i++)
{
(*i)(arg…);
}
}

void operator ()(T1 …arg)
{
Call(arg…);
}

Event& operator += (Func f)
{
if (typeid(f) != typeid(Func))
throw std::bad_function_call(“Error in type”);

m_handlers.push_back(f);
return *this;
}

Event& operator -= (Func f)
{
for (auto i = m_handlers.begin(); i != m_handlers.end(); i++)
{
if ((*i).target<void(T1)>() == f.target<void(T1)>())
{
m_handlers.erase(i);
break;
}
}

return *this;
}

private:
std::vector<Func> m_handlers;
};

 

To use the event class in you code:

Example 1 – Single parameter:

int stackValue = 200; Event<int> e; e += [=](int i) { printf( “%d\n”, i + stackValue ); }; e += [&](int i) { printf( “%d\n”, i + stackValue ); stackValue += 100; }; e += [&stackValue](int i) { printf( “%d\n”, i + stackValue ); }; e += [stackValue](int i) { printf( “%d\n”, i + stackValue ); }; e( 100 );

Example 2 – Multiple Parameters:
int stackValue = 200;
float stackValue2 = 12;
double stackValue3 = 200;
Event<int, float, double> e;
e += [=](int i, float j, double k) { printf(“%d\n”, i + stackValue + (int)j + (int)k); };
e += [&](int i, float j, double k) { printf(“%d\n”, i + stackValue + (int)j + (int)k); stackValue += 100; };
e += [&stackValue](int i, float j, double k) { printf(“%d\n”, i + stackValue + (int)j + (int)k); };
e += [stackValue](int i, float j, double k) { printf(“%d\n”, i + stackValue + (int)j + (int)k); };

e(100, stackValue2, 53);

Resources: SharePoint Site Definitions, Site/Web Templates and Provisioning

Creating a Custom SharePoint 2007 Portal Site Definition using the PortalProvisioningProvider Class – Paul Ballard’s WebLog

Understanding Onet.xml Files

Site Definition (Onet.xml) Files

Understanding Web*Temp.xml Files

Site Schema

Exploring The SharePoint 12 Hive :TEMPLATE Directory. – Windows Live

Supported and unsupported scenarios for working with custom site definitions and custom area definitions in SharePoint 2003, 2007 and 2010

Save a site as a site template – SharePoint Server – Microsoft Office

Adding a ‘Save site as template’ Link to Site Settings in WSS v3/MOSS 2007 using a CustomAction Feature — SharePoint Solutions Team Blog

How To Tell Which Template is in Use in SharePoint

Know the site template used for the SharePoint site | SharePoint 2010, SharePoint, C-sharp, ASP.Net, JQuery, SQL Server Solutions

SharePoint 2010 and web templates – Vesa “vesku” Juvonen – Site Home – MSDN Blogs

Understanding webtemp*.xml – Bill Baer – Site Home – TechNet Blogs

Extending the Project Workspace Template

Develop new custom site definitions and create upgrade definition files (Office SharePoint Server)

Site Definitions in SharePoint 2007 – Syrinx on SharePoint

MOSS 2007 – Save site as a template missing – The SharePoint Farmer’s Almanac

Finding site template names and ID’s in SharePoint using PowerShell | Get-SPScripts : PowerShell Scripts for SharePoint

The template you have chosen is invalid or cannot be found. – fengzhimei – 博客园

Microsoft SharePoint: Find the SharePoint web Template ID

SharePoint site creation using SPWebProvisioningProvider « Jamesemann’s Blog

WebTemplate Element (Upgrade)

SharePoint 2010 and web templates – Vesa “vesku” Juvonen – Site Home – MSDN Blogs

Modules

SPSite – The Web application at <address> could not be found – SharePoint