Lessons learned from building Microservices – Part 3: Patterns, Architecture, Implementation And Testing

Introduction

In this blog post I will go over things I’ve learned when working with microservices. I will cover things to do and things that you should not do. There won’t be alot of code here, mostly theory and ideas.

I will discuss most of the topic from a smaller development teams point of view. Some of these things are applicable for larger teams and projects but not all necessarily. It all depends on your own project and needs.

  • Architecture
  • Sharing functionality (Common code base)
  • Continuous Integration / Continuous Delivery (CI/CD)
  • High Availability
  • Messaging and Event driven architecture
  • Security
  • Templates and scripting
  • Logging and Monitoring (+metrics)
  • Configuration pattern
  • Exception handling and Errors
  • Performance and Testing

General advice

Generally I advice to use and consider existing technologies, products and solution with your application and architecture to speed up your development and keep the amount of custom code at a minimum.

This will allow you to save on errors, save on time and money.

Still, make sure that you choose technologies and products that support your and your clients solutions; not things that you think are “cool” right now or would be fun to use. Your choices should fit the needs and requirements on not only your project but the whole of the architecture and the future vision of your project.

Architecture

To keep things simple I would say there are two ways to approach microservices.

Approach one: Starting big but small

The first approach is the one you probably are familiar with. These are big projects by big companies like Amazon, Netflix, Google, Uber etc.

Usually this involves creating hundreds or even thousands of microservice on multiple platforms and technologies. This usually require large teams of people both developing, deploying and up keeping the microservice solution they are working on.

This approach is definitely not for everyone; it may require alot of people, resources and money.

In this approach you can minimize the impact of needing many resources and people by sharing code but this creates coupling which may or may not be what you are looking for. I’ll explain the benefits of shared code in second approach.

You could also minimize needed resources and people by going very small on microservices size allowing them to be easily deleted or created in any language and technology. In this approach you should be ready to just delete something and start from scratch with ease. The idea is to avoid permanence, so you may end up having little or no unit tests that add permanence .

Also if there is a need to coupling between services, create a service that provides a needed functionality.

Approach two: Starting small but plan big

Most likely you have a team of a few people and limited resources. In this case I recommend starting between microservice architecture and monolith one.

By this I mean that you start the process by designing it and implementing all the infrastructure of a microservice but do not start splitting you application into microservices from the start. Create one microservice, expand it then split it when things start to grow so that it feels like a new microservice is needed.

By this time you have had time to understand and analyze your business domain. Now you have an idea what kind of a communication between microservices you need; perhaps HTTP based or decoupled messaging based.

When you are creating you microservice keep you design pattern for you microservices simple. Do not implement overly complicated patterns to impress anyone, including yourself. It will make the upkeep of you microservices a hell. You want to keep the code as simple and as common inside a microservice and between them.

Share as much as possible between microservices.

Create good Continuous Integration and Continuous Deployment procedures as soon as possible. It will save you time.

Verify you have proper availability and scalability based on your application needs.

Prefer scripting to automate and speed up development and upkeep.

Use templates everywhere you can, especially for creating new microservices.

Have a common way to do exception handling and logging.

Have a good configuration plan when deploying your Microservices.

You also need team member who are not afraid to do many different things with multiple technologies. You need people who can learn anything, adapt and develop with any tool, tech or language.

With these approaches and check list you should be able to manage a microservice architecture with only a handful of people. For up-keeping even one person is enough, but constant development at least two or three would be a good amount.

Sharing functionality (Common code base)

When it comes to code I prefer the “golden” rule in programming to not repeat myself but with microservices you will end up with duplication.

The wise thing to do with microservices is to know when to not duplicate and have a common access to shareable code and functionality; and why do this?:

  • Developer and up doing similar code that is used again and again in multiple microservices.
  • These common pieces of code and functionality end up having the same king of problems and bug which have to be corrected in every place
  • The problems and bugs cause security issues
  • With performance issues
  • With possible hard to understand code even when the logic and functionality is the same but the code ends of being slightly or vastly different.
  • And lastly all of the above combined cause you to spend time and money that you may not have

The next question to ask is:

What should you share? The main rule is that is must be common. The code should not be specific to a certain microservice or a domain.

Again it all depends on your project but here are a few things:

  • Logging, I highly recommend this to a unified logging output format that is easy to index and analyze.
  • Access Logs
  • Security
    • User authorization but not authentication or registration. Registration is better suited as an external microservice as it’s own domain.
    • Encryption
    • JSON Web Token related security code and processing
    • API Key
    • Basic Auth
  • Metrics
  • HTTP Client class for HTTP requests. Create an instance of this class with different parameters to share common functionality for logging, metrics and security.
  • Code to use and access Cloud based resources from AWS, Azure
    • CloudWatch
    • SQL Database
    • AppInsights
    • SQS
    • ServiceBus
    • Redis
    • etc…
  • Email Client
  • Web Related Base classes like for controllers
  • Validations and rules
  • Exception and error handling
  • Metrics logic
  • Configuration and settings logic

How should you distribute your shared functionality and code? Well it all depends on your project but here are a few ways:

  • One library to rule them all :D. Create one library which all projects need to use. Notice: This might become a problem later on when your common library code amount grows. You will end up with functionality that you may not need in a particular Microservice.
  • Create multiple libraries that are used on need basis. Thus using only bits of functionality which you need.
  • Create Web API’s or similar services that you request them to perform a certain action. This might work for things like logging or caching but not for all functionality. Also notice that you will lose speed of code to latency if you outsource your common functionality to a common service that is run independently from your actual code that needs that common functionality.
  • A combination of all of the above.

Dependency injection

Use your preferred dependency injection library to manage your classes and dependencies.

When using your DI I recommend thinking of combining classes into “packages” of functionality by feature, domain, logic, data source etc. By doing this you can target specific parts of your code without “contaminating” the project with unneeded code even if you have a large library like a common library.

For example you could pack a set of classes that provide you the functionality to communicate with a CRM, get, modify and add data.

This would include your model classes. A CRM client class, some logic that operate on the model to clean them up etc.

Once you identify them, make your code that that you can add them into your project with the least amount of code.

Also consider creating a logic to automatically tell a developer which configurations are missing once a set of functionalities are added. The easiest way to achieve this is to add this at compile and/or runtime with checks.

See my previous article on this matter for a more detailed description:

https://lionadi.wordpress.com/2019/10/01/spring-boot-bean-management-and-speeding-development/

Continuous Integration/Continuous Delivery (CI/CD)

There are may ways of doing CI and CD but the main point is that ADD it and automate as much as possible. This is especially important with microservices and small team sizes.

It will speed things up and keep things working.

Here are a few things to take into consideration:

  1. Create unit tests that are run during your pipelines
  2. Create API or Service level tests that verify that things work with mock or real life data. You can do this by mocking external dependencies or using them for real if available.
  3. Add performance tests and stability tests to your pipelines if possible to verify that things run smoothly.
  4. Think of using the same tool for creating your API or service tests when developing and when running the same tests in a pipeline. You can just reuse the same tests and be sure that what you test manually is the same that should work in production. For example: https://www.postman.com/ and https://github.com/postmanlabs/newman
  5. Script as much as possible and parametrize your scripts for reuse. Identify which scripts can be used and shared to avoid doing things twice.
  6. Use semantic versioning https://semver.org/
  7. Have a deployment plan on how you are going to use branches to deploy to different environments (https://www.atlassian.com/git/tutorials/comparing-workflows/gitflow-workflow), for example:
    1. You can use release branches to deploy to different environments based on pipeline steps
    2. Or have specific branches for specific environment. Once things are merged into them certain things start to happen
  8. Use automated build, test and deployment for dev environment once things are merged to your development branch.
  9. Use manual steps for other environments deployment, this is to avoid testers to receiving bad builds in QA or production crashing on bugs not caught up.
  10. If you do decide to automate everything all the way to production, make sure you have good safe guards that things don’t blow up.
  11. And lastly; nothing is eternal. Experiment and re-iterate often and especially if you notice problems.

Common tools to CI/CD

https://www.sonatype.com/product-nexus-repository

https://www.ansible.com/

https://www.rudder.io/

https://www.saltstack.com/

https://puppet.com/try-puppet/puppet-enterprise/

https://cfengine.com/

https://about.gitlab.com/

https://www.jenkins.io/

https://codenvy.com/

https://www.postman.com/

https://www.sonarqube.org/

High Availability

The main point in high availability is that your solution will continue to work as well as possible or as normal even if some parts of it fail.

Here are the three main points:

  • Redundancy—ensuring that any elements critical to system operations will have an additional, redundant components that can take over in case of failure.
  • Monitoring—collecting data from a running system and detecting when a component fails or stops responding.
  • Failover—a mechanism that can switch automatically from the currently active component to a redundant component, if monitoring shows a failure of the active component.

Technical components enabling high availability

  • Data backup and recovery—a system that automatically backs up data to a secondary location, and recovers back to the source.
  • Load balancing—a load balancer manages traffic, routing it between more than one system that can serve that traffic.
  • Clustering—a cluster contains several nodes that serve a similar purpose, and users typically access and view the entire cluster as one unit. Each node in the cluster can potentially failover to another node if failure occurs. By setting up replication within the cluster, you can create redundancy between cluster nodes.

Things that help in high availability

  • Make your application stateless
  • Use messaging/events to ensure that business critical functionality is performed at some point in time. This is especially true to any write, update or delete operations.
  • Avoid heavy coupling between services, if possible and if you have to do use a lightweight messaging system. The most troublesome aspect of communicating between microservices is going to be over HTTP.
  • Have good health checks that are fast to respond when requested. This can be divided into two categories:
    • Liveness: Checks if the microservice is alive, that is, if it’s able to accept requests and respond.
    • Readiness: Checks if the microservice’s dependencies (Database, queue services, etc.) are themselves ready, so the microservice can do what it’s supposed to do.
  • Use “circuit breaker” to quickly kill unresponsive services and quickly run them back up
  • Make sure that you have enough physical resources(CPU, Memory, disk space etc) to run your solution and your architecture
  • Make sure you have enough request threads supported in your web server and web application
  • Make sure you verify how sizable HTTP requests your web server and application is allowed to receive, the header is usually that will fail your application.
  • Test your solution broadly in stress and load balancing tests to identity problems. Attach a profiler during these tests to see how your application perform, what bottlenecks there are in your code, what hogs resources etc.
  • Keep your microservices image sizes to the minimum for optimal run in production and optimal deployment. Don’t add things that you don’t use, it will slow your application down and deployment will suffer; all of this will lead to more needed physical resources and more money needed.

Messaging and Event driven architecture

I will be covering this topic in an upcoming post but until then here are a few pointers.

Because of the nature of Microservices; that they can be quickly scaled up and down based on needs I would very highly recommend that you use messaging for business critical operations and logic.

The most important ones I would say are: Writing, Updating and Deleting data.

Also all long running operations I would recommend using messaging.

Notice: One of the most important thing I consider is that you log and monitor the success of messages being send, processed and finished, with trailing to the original request to connect logs and metrics together to get a whole picture when troubleshooting.

I have coveted this in my previous logging post: https://lionadi.wordpress.com/2019/12/03/lessons-learned-from-building-microservices-part-1-logging/

Security

Generally security is an important aspect of any application and has many different topics and details to cover.

Related on security I covered this extensively in my last post in this series on Microservices, go check it out: https://lionadi.wordpress.com/2020/03/23/lessons-learned-from-building-microservices-part-2-security/

Templates and scripting

To speed up development, keep things the same and thus avoiding duplicate errors + unnecessary fixes, use templates where possible. This is especially true for Microservices.

What are possible templates that you could have:

  • Templates for deploying Cloud resources like ARM for Azure or Cloudformation for AWS.
  • Beckend Application templates
  • Front application templates
  • CI/CD templates
  • Kubernetes templates
  • and so on…

Anything that you know you will end up having multiple copies is good to standardize and create templates.

Also I recommend that for applications (front or backend), it is a very good practice to have the applications up and running as soon as you duplicate them from your repository. They should be able to be up and running as soon as you start them.

Script as much as possible and make the scripts reusable.

Parametrize all of the variables you can in your scrips and templates

Here are a few things you would need for a backend application template:

  • Security such as authentication and authorization.
  • Logging and metrics
  • Configuration and settings logic
  • Access Logs
  • Exception handling and errors
  • Validations

Logging and Monitoring (+metrics)

Again as with security this is a large topic, I’ve also written about this in my previous post in the series and recommend go checking it out:

https://lionadi.wordpress.com/2019/12/03/lessons-learned-from-building-microservices-part-1-logging/

Configurations pattern

For microservice configurations I recommend a following pattern where your deployment environments (DEV, QA, PROD etc) configuration files configurations/settings values are left empty. You still have the configuration/setting in your configuration/settings files but you leave them empty.

Next you need to make sure that your code knows how to report empty configuration values when your application is started. You can achieve this by creating a common way to retrieve configurations/settings value and being able to analyze which of the needed and loaded configurations are present.

This way when your docker image is started and the application inside the image starts running and retrieving configurations, you should be able to see what is missing in your environment.

This is mostly because you don’t want to have your environment specific configurations set in your git repository, especially the secrets. You will end up setting these values in your actual QA, PROD etc environments through a mechanism. If you forget to add a setting/configuration in your mechanism your docker image may crash and you will end up searching for the problem a long time, even with proper logging it may not be immediately clear.

I’ve written an previous post on this matter which opens things up on the code level:

https://lionadi.wordpress.com/2019/10/01/spring-boot-bean-management-and-speeding-development/

Exception handling and Errors

Three main points with exceptions and errors:

  • Global exception handling
  • Make sure you do not “leak” exceptions to your clients
  • Use a standardized error response
  • Log things properly
  • And take into consideration security issues with errors

Again for for details on logging and security check my previous posts:

https://lionadi.wordpress.com/2019/12/03/lessons-learned-from-building-microservices-part-1-logging/

https://lionadi.wordpress.com/2020/03/23/lessons-learned-from-building-microservices-part-2-security/

For error responses, you have two choices:

  1. Make up your own
  2. Or use an existing system

I would say avoid making your own if possible but it all depends on your application and architecture.

Consider first existing ones for a reference:

https://www.hl7.org/fhir/operationoutcome.html

https://developers.google.com/search-ads/v2/standard-error-responses

https://developers.facebook.com/docs/graph-api/using-graph-api/error-handling/

Still here is also an official standard which you can use and may be supported by your preferred framework or library: https://www.rfc-editor.org/rfc/rfc7807.html

The RFC 7807 specifies the following for error responses and details from https://www.rfc-editor.org/rfc/rfc7807.html:

  • Error responses MUST use standard HTTP status codes in the 400 or 500 range to detail the general category of error.
  • Error responses will be of the Content-Type application/problem, appending a serialization format of either json or xml: application/problem+json, application/problem+xml.
  • Error responses will have each of the following keys(Internet Engineering Task Force (IETF)):
    • detail (string) – A human-readable description of the specific error.
    • type (string) – a URL to a document describing the error condition (optional, and “about:blank” is assumed if none is provided; should resolve to a human-readable document).
    • title (string) – A short, human-readable title for the general error type; the title should not change for given types.
    • status (number) – Conveying the HTTP status code; this is so that all information is in one place, but also to correct for changes in the status code due to the usage of proxy servers. The status member, if present, is only advisory as generators MUST use the same status code in the actual HTTP response to assure that generic HTTP software that does not understand this format still behaves correctly.
    • instance (string) – This optional key may be present, with a unique URI for the specific error; this will often point to an error log for that specific response.

RFC 7807 example error response:

HTTP/1.1 403 Forbidden
Content-Type: application/problem+json
Content-Language: en

{
  "type": "https://example.com/invalid-account",
  "title": "Your account is invalid.",
  "detail": "Your account is invalid, your account is not confirmed.",
  "instance": "/account/34122323/data/abc",
  "balance": 30,
  "accounts": ["/account/34122323", "/account/8786875"]
}
   HTTP/1.1 400 Bad Request
   Content-Type: application/problem+json
   Content-Language: en

   {
   "type": "https://example.net/validation-error",
   "title": "Your request parameters didn't validate.",
   "invalid-params": [ {
                         "name": "age",
                         "reason": "must be a positive integer"
                       },
                       {
                         "name": "color",
                         "reason": "must be 'green', 'red' or 'blue'"}
                     ]
   }

Performance and Testing

Testing

To make sure that your solution and architecture works and performs I recommend doing extensive testing. Familiarize yourself with the testing pyramid which hold the following test procedures:

  • Units tests:
    • Small units of code tests which tests preferably one specific thing in your code
    • The tests makes sure things work as intended
    • The number of unit tests will outnumber all or tests
    • Your unit tests should run very fast
    • Mock things used in your tested functionality: replace a real thing with a fake version
    • Stub things; set up test data that is then returned and tests are verified against
    • You end up leaving out external dependencies for better isolation and faster tests.
    • Test structure:
      • Set up the test data
      • Call your method under test
      • Assert that the expected results are returned
  • Integration tests:
    • Here you test your code with external dependencies
    • Replace your real life dependencies with test doubles that perform and return same kind of data
    • You can run them locally by spinning them up using technologies like docker images
    • Your can run them as part of your pipeline by creating and starting a specific cluster that hold test double instances
    • Example database integration test:
      • start a database
      • connect your application to the database
      • trigger a function within your code that writes data to the database
      • check that the expected data has been written to the database by reading the data from the database
    • Example REST API test:
      • start your application
      • start an instance of the separate service (or a test double with the same interface)
      • trigger a function within your code that reads from the separate service’s API
      • check that your application can parse the response correctly
  • Contract tests
    • Tests that verify how two separate entities communicate and function with each other based on a commonly predefined contract (provider/publisher and consumer/subscriber. Common communications between entities:
      • REST and JSON via HTTPS
      • RPC using something like gRPC
      • building an event-driven architecture using queues
    • Your tests should cover both the publisher and the consumer logic and data
  • UI Tests:
    • UI tests test that the user interface of your application works correctly.
    • User input should trigger the right actions, data should be presented to the user
    • The UI state should change as expected.
    • UI Tests does not need to be performed end-to-end; the backend could be stubbed
  • End-to-End testing:
    • These tests are covering the whole spectrum of your application, UI, to backend, to database/external services etc.
    • These tests verify that your applications work as intended; you can use tools such as Selenium with the WebDriver Protocol.
    • Problems with end-to-end tests
      • End-to-end tests require alot of maintenance; even the slightest change somewhere will affect the end result in the UI.
      • Failure is common and may be unclear why
      • Browser issues
      • Timing issues
      • Animation issues
      • Popup dialogs
      • Performance and long wait times for a test to be verified; long run times
    • Consider keeping end-to-end to the bare minimum due to the problems described above; test the main and most critical functionalities
  • Acceptance testing:
    • Making sure that your application works correctly from a user’s perspective, not just from a technical perspective.
    • These tests should describe what the users sees, experiences and gets as an end result.
    • Usually done through the user interface
  • Exploratory testing:
    • Manual testing by human beings that try to find out creative ways to destroy the application or unexpected ways an end user might use the application which might cause problems.
    • After these finding you can automate these things down the testing pyramid line in units tests, or integration or UI.

All of the automated tests can be integrated to you integration and deployment pipeline and you should consider to do so to as many of the automated tests as possible.

Performance

For performance tests the only good way to get an idea of your solutions and architectures performance is to break it and see how it works under long sustained duration.

Two test types are good for this:

  • Stress testing: Trying to break things by scaling the load up constantly untill your application stop totally working. Then you analyze your finding based on logs, metrics, test tool results etc.
  • Load testing: A sustained test where you keep on making the same requests as you would expect in real life to get an idea how things work in the long run; these tests can go on from a few hours to a few days.

The main idea is that you see problems in your code like:

  • Memory leaks
  • CPU spikes
  • Resource hogging pieces of code
  • Slow pieces of code
  • Network problems
  • External dependencies problem
  • etc

One of my favorite tool for this is JMeter https://jmeter.apache.org/.

And to get the most out of these tests I recommend attaching a code profiler to your solutions and see what happens during these tests.

There is a HUGE difference how your code behaves when you manually test your code under a profiles and how it behaves when thousands or millions or requests are performed. Some problems only become evident when they are called thousands of times, especially memory allocations and releases.

And lastly; cover at least the most important and critical sections of your solutions and keep adding new ones when possible or problems areas are discovered.

These tests can also be added as part of the pipelines.

Topology tests

Do your performances tests while simulating possible error in your architecture or down times.

  • Simulate slow start times for servers.
  • Simulate slow response times from servers.
  • Simulate servers going down; specific or randomly

Test how your system works under missing resources and problems.

Test how expensive your system is

When you are creating tests consider testing the financial impact of your overall system and architecture. By doing different levels of load tests and stress tests you should be able to get a view on what kind of costs you will end up with.

This is especially important with cloud resources where what you pay is related to what to consume.

Lessons learned from building microservices – Part 2: Security

In this blog post I will go through some of the things I have learned regarding security when it comes to micro-services. It is not a comprehensive guide and things change constantly, so keep learning and investigating.

Best advice here is to avoid re-inventing the wheel. Avoid making your own solutions related to security. This is because someone else with more resources and time has done it before you. Think of the libraries provided by the .NET Core or Java, these have been developed and tested for years. A good example of this would be encryption libraries.

Topics on this post are the following:

  • JSON Web Tokens
  • Monitoring, Logging and Audit trailing
  • Identity and access management
  • Encryption
  • Requests and data validations
  • Error handling
  • CORS & CSP & CSRF
  • OWASP (Open Web Application Security Project)
  • Configurations
  • Quality
  • Security Audit
  • Logs
  • Architecture

JSON Web Tokens

Basically they are JSON objects compacted and secured to transfer data between two or more entities, depending on usage.

https://jwt.io/introduction/

https://cheatsheetseries.owasp.org/cheatsheets/JSON_Web_Token_Cheat_Sheet_for_Java.html

The most common usage is to use them with authentication and authorization.

Another usage could be when you want a person to take some action but the action is delayed to a further date and moment. This is a common thing related to registration and verifying the person.

At some point during the registration process you would need to verify the user, so you can generate a token with needed metadata and send him/her an email. Later the user clicks a link on the received email containing the token as a parameter. One the data gets to your application you can open and validate the token and finish the registration.

There are many other uses but these are just an example. Any time you need to pass data or send it over the internet and that data needs to be secured and not long lived you could consider to use a token.

Security things to do with tokens

Summary: Always validate your tokens

  • Set the audience
  • Set the issuer
  • Set an expiration time
    • You don’t want your tokens in the outside world to live forever, meaning that they should not be used after a certain amount of time.
  • Sign the token to detect tampering attempts
    • Notice: In this scenario the token will not hide the data in the payload, it will only verify that the token hasn’t been tampered with. You need to combine it with encryption
  • Encrypt the token to hide the data in the token
    • Be aware of different encryption methods and when to use them. Generally symmetric encryption is preferred for data that is not in transit; like data that reside in a database. Asymmetric encryption is good for data that is moving and not stationary; data that is moving through the internet is a good example.

Monitoring, Logging and Audit Trailing

From a security perspective I consider logging a very important thing to do. When talking about this category the following things are important to your microservice (or any other application also).

  • Being able to trace activity in your application.
    • Who is operating
    • What is being done
    • How long things take and not just your application requests but also any external resources
    • Errors/warnings and successes
  • Being able to tell if possible attacks happen that want to cause damage or steal something valuable to your or your clients
  • Being able to tell the health of your solutions
  • Have a monitoring tool that can aggregate and display detailed information about your solutions
  • Being able to create alerts that inform you and your team of possible problems
  • Consider automatic actions to avoid issues if certain things are triggered, like possible attack attempts.
  • Consider having a plan on what to do when things seem to break or go bad based on gathered data, alerts, monitoring tools etc. Having an idea what will happen next will make things easier and help avoid public problems.

For more details on logging check out my previous blog entry on these series: https://lionadi.wordpress.com/2019/12/03/lessons-learned-from-building-microservices-part-1-logging/

Identity and access management

Application users

First things first as before with the example of encryption libraries I would recommend using a ready solution, especially if you plan to do a cloud based solution or an app with thousands or more users.

Consider AWS Cognito or Azure AD B2C.

The reason for this is that they provide all the security you need and more in some cases. You will have a huge possible security risk off your shoulders. There are many details to take into consideration if you go the way of manually creation an identity solution with authentication and authorization.

These ready solutions allow you to modify many details how you will use the authentication tokens and authorization tokens in your app. You can add custom attributes, you can use social media to create accounts, support for MFA, mobile users etc.

Require Re-authentication for Sensitive Features.

Proxy

Does the above mean that you can’t create a proxy service with custom logic and logging when users authenticate against Cognito or AD B2C? The answer is NO but consider if you really need it.

Possible situations where you might need an identity proxy are:

  • You need to verify that the authenticated user is allowed to authenticate. The account may not be disabled but might require a human step to be performed somewhere
  • A custom registration flow with custom business logic; for example a person can’t register is his/hers data is not in a certain state or if the data is in certain states then the registration will look, behave and end differently for different users.
  • Custom security logging; for maximum traceability and analysis. You might want to create custom logging and use proper tools to analyze what each person is doing. Especially things related to registration are critical and it is very common that people forget passwords, don’t know how to reset their password, problems logging in might occur etc. In all these cases logging saves hours if not days of troubleshooting.

Admin users

For admin users there are many good best practices to follow and I recommed looking overt them for you particular needs and technology uses. Here are a few links on the matter for AWS, Azure:

https://docs.aws.amazon.com/IAM/latest/UserGuide/best-practices.html

https://docs.microsoft.com/en-us/azure/security/fundamentals/identity-management-best-practices

Here is a quick list on the top things at the moment:

  • Require MFA
  • Limit the number of Super Admins
  • Enforce a limited session lifetime (Reducing the time a malicious 3rd party can take advantage of an open active session)
  • Enable user notifications for new sign-ons, password resets, account changes etc
  • Consider limiting access from specific locations/IPs etc, or between a certain date and time range, require SSL
  • Use Strong Password Policies (although some of these may make people “lazy” when changing password and pick low security or bad passwords)
    • Lockout
    • Password history
    • Password age
    • Minimum length
  • Create individual user accounts, do not share accounts
  • Grant least privilege
  • Do not share access keys
  • Remove unnecessary credentials
  • Monitor admin activity and send alerts of things that are suspicious

Encryption

Encryption comes in many forms, usually done by two methods: symmetric and asymmetric

Notice: Strong recommendation use existing common libraries for encryption all the way. Do not re-invent the wheel.

Symmetric

You should prefer symmetric encryption in stationary data, data that resides in databases. Also remember to add a salt to the encryption key to avoid possible guess work by an attacker. A salt is added to the hashing process to force their uniqueness, increase their complexity without increasing user requirements, and to mitigate password attacks like rainbow tables.

Probably the most recommended symmetric encryption is AES.

More info: https://cheatsheetseries.owasp.org/cheatsheets/Password_Storage_Cheat_Sheet.html

Asymmetric

Asymmetric encryption is best suited and recommended for data in transit, data that is moving from one place to another over the web. Also in my opinion data that is leaving your secure environment to another location. The most common asymmetric is RSA that is used very broadly like any time you are using HTTPS application protocol to view a site.

Requests and data validations

Generally security authentication and authorization is based on user signin and user roles. While this is a good option there are drawbacks which I will discuss later.

So I will discuss here a way of achieving security through individual permissions for each action the person tries to do. The proper name for this is: Permission Based Access Control

Notice: Still take into consideration your needs in your project. As always some methods of doing things maybe suited for vastly different purposes. I would say that if you chose the Permission Based Access Control, I would recommend that you have many users, in the thousands and even many time greater than that. Also if your users permissions need to change dynamically this is a good solution.

Other options are:

  • Role-Based Access Control (RBAC)
  • Discretionary Access Control (DAC)
  • Mandatory Access Control (MAC)

The main focus is on individual permissions that are defined in a policy/access map. These maps can then be assigned on the fly in necessary to users and/or groups. So if you where to choose a role based access control, you will likely narrow down very much of what a person can do in a system. Also as your application grows it may start to become very rigid. All persons under that role must comply to role(s) exactly in the locations you apply that role.

This most likely will force you to apply multiple roles to a location or users, this will add security options/accesses which are greater than what a particular functionality requires. You may be opening up your application to security vulnerabilities by giving to much access.

So I would suggest creating or starting to work from the idea of individual access management based on permission. Your data in your system(s) should be able to tell you which permission/access maps can a user use.

Now in this situation I am talking about the authorization of a user. When a request happens the code will check the users data and determine which permission he/he has. Your permission/access maps are usually created manually and shared in your system in a secure manner so that they can only be read and not modified by any entity that reads them.

This permission/access map can be used in two very important situations; that is to control what a person can do and what a person can see, so below are our two main requirements:

  1. Test can the person use the requested functionality
  2. If the first step has passed: Test what the user can see. A person may see only parts of data or none at all

Notice: Steps 1 and 2 above are not the same thing. Step 1 is usually something you do on a controller level, on the level where your requests starts to be processed. Step 2 is something you should be doing at the data level; like a service that operates on a data source.

Steps for security validations:

  • Find out can the user access the system
  • Based on the authentication find out who the user is
  • Gather user related data to generate a permission/access map
    • User ID(s)
    • Permission/access map(s); this should be determined based on persons data in the system. A person can have multiple access maps.
    • For each role, have a list of read/create/write/delete for each “category” of importance/bounded context/models etc. This depends on your application and the size of the application and what you are trying to achieve.
  • Use this permission/access map to determine which requests the person can access and which data can he/she see

Taking this approach you are able to:

  • be as loose as you want
  • as rigid as you want
  • exactly where you want.
{
    "AccessCategory": {
      "USER": [
        "READ",
        "UPDATE"
      ],
      "ORDERS": [
        "READ",
        "UPDATE",
        "CREATE"
      ]
    },
    "id": "DEFAULT"
  }

For the requests in the controller make checks on what kind of operations you want the person to be able to do. Have a generated “access/permission map” that knows what the person can do based on his data and states. Have the access/permission map generated frequently, preferably each request.

API Request check example at controller level:

hasUserRights(EnumSet.of(AccessCategory.USER, AccessCategory.ORDERS), EnumSet.of(Permission.READ, Permission.UPDATE));

The above function will go through the access/permission map defined above in the JSON data and see if the requested categories have the requested permissions.

Data request: Does the person have the right to view all of the data requested; if partial show only partial or nothing.

So when you read the access/permission map you need to associate that map to the data that the users can view. This connection can be done inside the code based on the access categories.

Then when a user requests data you have to have internal business logic that will determine can the user view the requested data.

Your access/permission map by itself can’t tell your code how the code should behave, you have to associate the business logic by which to filter out data or deny data access.

I would recommend having a user access service that is responsible for generating the permissions and performs the main logic for the security checks. This way you can ask your service to generate for any user a an access service just by providing a user id. Then you can use this user specific access service to make security checks.

A good example on this would be AWS access permissions and policies:

https://docs.aws.amazon.com/IAM/latest/UserGuide/access_controlling.html

Or Azure:

https://docs.microsoft.com/en-us/azure/governance/policy/tutorials/create-custom-policy-definition

Error handling

It is important that you do not “leak” or give away any exceptions to the outside world.

I would recommend that you have a global way of catching all of your exceptions and replacing the response of your application with a client friendly message that tells the client what possibly went wrong but does not give away sensitive information that can be used against your application or their users. Always return a generic error: https://cipher.com/blog/a-complete-guide-to-the-phases-of-penetration-testing/

Remember to log your error properly.

Also regarding your responses to the outside world consider inserting your error friendly message in the body of the response as a custom JSON with data that might help your client app to response properly to the end user. This might be:

  • An error id
  • Possible translation error id for fetching the appropriate error message
  • Error source ID, like a database, 3rd party API, CRM etc, but be carefull not to give away this info to carelessly. Think how this info can be used against you.

Other things to consider regarding any error response is for you definitely think how the things you send to the outside world might be used against you.

This is especially true regarding authentication, authorization situation and registration. Depending on what you are doing you need to mask as much as possible in your responses when something goes wrong, even to the point of sending 200 HTTP status code in error situations. https://cheatsheetseries.owasp.org/cheatsheets/Authentication_Cheat_Sheet.html#authentication-and-error-messages

CORS & CSP & CSRF

The following security measures are a combination of procedures and steps you need to take on both a client and server side applications. I won’t go into the details of implementing them or the in depth knowledge on them. There are many ways to implement these measures in your preferred technological stack. The end result should be the same but how you do this can be different on your choices of technology, being in Azure App service and enabling CORS can be as simple as pressing a button but do a microservice within kubernetes and things change drastically. Just be aware of these measures and seek out examples how to implement them.

Important: Don’t use pick one of them but use them in combination for maximum security.

Cross-Origin Resource Sharing (CORS): In this security measure you can specify who can communicate with your server, which HTTP methods are OK and which headers are allowed. This happens from request that originate from a different different origin (domain, protocol, or port) from its own.

Implementing has to be done in your server configuration and/or code. The client application will usually make an options request to the server with what is wants to do and from where it tries to do it, the server will then say of it is OK to continue by sending what it knows is allowed. The browser will then continue or stop the request there.

https://developer.mozilla.org/en-US/docs/Web/HTTP/CORS

Content Security Policy (CSP):

In this measure you are defining which resources are allowed from which sources to be used in your client applications. This includes, font, media, images, javascript, objects etc.

Notice that for dynamic script/content you need a nonce value for those contents. This nonce value need to be generated on a server each time the web application is loaded. If you assign a static nonce value this leaves an attack opportunity in your application to execute things which you do not intend.

https://developer.mozilla.org/en-US/docs/Web/HTTP/CSP

https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Content-Security-Policy

https://cheatsheetseries.owasp.org/cheatsheets/Content_Security_Policy_Cheat_Sheet.html

Cross-Site Request Forgery (CSRF):

By this I mean unwanted actions performed in your users browser. For more details I recommed OWASP source for more detailed information:

https://cheatsheetseries.owasp.org/cheatsheets/Cross-Site_Request_Forgery_Prevention_Cheat_Sheet.html

https://cheatsheetseries.owasp.org/cheatsheets/Cross_Site_Scripting_Prevention_Cheat_Sheet.html

OWASP (Open Web Application Security Project)

OWASP is great resource for information related web appication security. If you want to know more I very strongly recommend to look at their stuff. I’ll post here some of their material which I consider a must to know or atleast to have an idea and come back to.

A good thorough cheat sheet: https://cheatsheetseries.owasp.org/

Top Ten security issues project: https://owasp.org/www-project-top-ten/

Top API security issues: https://github.com/OWASP/API-Security

Top Serverless application issues: https://github.com/OWASP/Serverless-Top-10-Project

A tool to check for security holes and vulnerabilities within your 3rd party libraries and dependencies:

https://owasp.org/www-project-dependency-check/

https://jeremylong.github.io/DependencyCheck/

Configurations

For configurations the most important thing that I think all developer have done at least by accident is to push production credentials into git. So avoid this :).

But other than that here are a few tips:

  • Only add configurations in your configuration files for local development
  • For any other environment have your desired environment configuration file configuration empty. What I mean is that you configuration keys are there but they are empty. You want to do this to make sure that once your test, qa or prod environment configuration files are loaded they are empty unless an outside source sets them, the next step.
  • In your non local development environment load your application with the desired environment configuration like test, qa, prod and replace the empty configurations from a secure secrets store. For example in kubernetes secrets files, in Azure use the Key Vault in AWS Key Management service.
  • Now at this point your should also have a piece of code that can determine if a configuration key is not set and thus is empty. At this stage you should throw an exception and stop the application running. This is usually something that can happen during application start processing. For this I have a post that gives a sample code: https://lionadi.wordpress.com/2019/10/01/spring-boot-bean-management-and-speeding-development

The steps here will improve both your security and quality of your code which I think go hand in hand.

Quality

Quality is important for security because if you have the time and take the interest to create good code that can live for years then it is likely that you will have a secure code, or at least more secure.

Simple things like having good coding practices, common tools and way of doings things within your team can reduce the number of error that can reduce the number of security problems.

Here are a few tools that can help improve your code quality and workflows:

https://www.sonarqube.org/

https://www.sonarlint.org/

https://www.sonatype.com/product-nexus-repository

I will write more about quality in my next post in this series and link it here.

Security Audit

Lastly have someone do a security audit on your application and the entire ecosystem if possible. Have them try to hack into your application, your ecosystem. Have them create a threat analysis with out etc.

If you can’t afford someone then think of learning the basics yourself. This will also improve your code quality and things that you will start to automatically take into consideration when you work on your code.

Logs

The important things is that you have logs about your system that reveal possible security problems or threats. The previous part I went into logging details.

Architecture

When designing your Microservices architecture be aware of every detail and entity within your design.

  • Be aware of the traffic between your containers.
  • Be aware of encryption data between your containers.
  • Be aware of access to your resources within your architecture. Can resource X access resources Y? Are the given privileges too much? etc.
  • Only open ports and routes to your resources that are truly needed.
  • Don’t store sensitive information in places that are not secure, prefer ready made products like KeyVault in Azure or Key Management Service in AWS.
  • Use system identities between resources in Cloud environments, they are more secure than manually handled security accounts.
  • Prefer ready made solutions than creating/reinventing the wheel, if possible. Usually a good popular product has a large team and resources to keep things secure and up to date.
  • Have audit trails on what happens in your architecture, how does what etc.
  • Give the least amount of privileges to people within your architecture, only what is needed for that person or group of people to do their job.
  • Set expiration dates to secrets and privileges to resources, where applicable.

My Kubernetes Cheat Sheet, things I find useful everyday

Hi,

Here is a list of my personal most used and useful commands with Kubernetes.

kubectl config current-context # Get the Kuberneste context where you are operating

kubectl get services # List all services in the namespace
kubectl get pods # Get all pods
kubectl get pods –all-namespaces # List all pods in all namespaces
kubectl get pods -o wide # List all pods in the namespace, with more details
kubectl get deployment my-dep # List a particular deployment
kubectl get pods –include-uninitialized # List all pods in the namespace, including uninitialized ones

kubectl describe nodes my-node
kubectl describe pods my-pod
kubectl describe my-dep

kubectl scale –replicas=0 my-dep # roll down a deployment to zore instances
kubectl scale –replicas=1 my-dep # roll up a deployment to desired instaces number

kubectl set image my-dep my-containers=my-image –record # Update the image of the diven deployment containers

kubectl apply -f my-file.yaml # apply a kubernetes specific conifiguration, secrets file, deployment file

kubectl logs -f –tail=1 my-pod # Attach to the pods output and print one line of at a time

kubectl exec my-podf — printenv | sort # print all environmental variables from a pod and sort them

kubectl get my-dep –output=yaml # Print a deployment yaml file the deployment is using

kubectl get pod my-pod –output=yaml # Print the pod related configurations it is using

kubectl logs -p my-pod # Print the logs of the previous container instance, you can use this if there was a crash

kubectl run -i –tty busybox –image=busybox –restart=Never — sh # run a busybox pod for troubleshooting

More useful commands: https://kubernetes.io/docs/reference/kubectl/cheatsheet/