Lessons learned from building Microservices – Part 3: Patterns, Architecture, Implementation And Testing

Introduction

In this blog post I will go over things I’ve learned when working with microservices. I will cover things to do and things that you should not do. There won’t be alot of code here, mostly theory and ideas.

I will discuss most of the topic from a smaller development teams point of view. Some of these things are applicable for larger teams and projects but not all necessarily. It all depends on your own project and needs.

  • Architecture
  • Sharing functionality (Common code base)
  • Continuous Integration / Continuous Delivery (CI/CD)
  • High Availability
  • Messaging and Event driven architecture
  • Security
  • Templates and scripting
  • Logging and Monitoring (+metrics)
  • Configuration pattern
  • Exception handling and Errors
  • Performance and Testing

General advice

Generally I advice to use and consider existing technologies, products and solution with your application and architecture to speed up your development and keep the amount of custom code at a minimum.

This will allow you to save on errors, save on time and money.

Still, make sure that you choose technologies and products that support your and your clients solutions; not things that you think are “cool” right now or would be fun to use. Your choices should fit the needs and requirements on not only your project but the whole of the architecture and the future vision of your project.

Architecture

To keep things simple I would say there are two ways to approach microservices.

Approach one: Starting big but small

The first approach is the one you probably are familiar with. These are big projects by big companies like Amazon, Netflix, Google, Uber etc.

Usually this involves creating hundreds or even thousands of microservice on multiple platforms and technologies. This usually require large teams of people both developing, deploying and up keeping the microservice solution they are working on.

This approach is definitely not for everyone; it may require alot of people, resources and money.

In this approach you can minimize the impact of needing many resources and people by sharing code but this creates coupling which may or may not be what you are looking for. I’ll explain the benefits of shared code in second approach.

You could also minimize needed resources and people by going very small on microservices size allowing them to be easily deleted or created in any language and technology. In this approach you should be ready to just delete something and start from scratch with ease. The idea is to avoid permanence, so you may end up having little or no unit tests that add permanence .

Also if there is a need to coupling between services, create a service that provides a needed functionality.

Approach two: Starting small but plan big

Most likely you have a team of a few people and limited resources. In this case I recommend starting between microservice architecture and monolith one.

By this I mean that you start the process by designing it and implementing all the infrastructure of a microservice but do not start splitting you application into microservices from the start. Create one microservice, expand it then split it when things start to grow so that it feels like a new microservice is needed.

By this time you have had time to understand and analyze your business domain. Now you have an idea what kind of a communication between microservices you need; perhaps HTTP based or decoupled messaging based.

When you are creating you microservice keep you design pattern for you microservices simple. Do not implement overly complicated patterns to impress anyone, including yourself. It will make the upkeep of you microservices a hell. You want to keep the code as simple and as common inside a microservice and between them.

Share as much as possible between microservices.

Create good Continuous Integration and Continuous Deployment procedures as soon as possible. It will save you time.

Verify you have proper availability and scalability based on your application needs.

Prefer scripting to automate and speed up development and upkeep.

Use templates everywhere you can, especially for creating new microservices.

Have a common way to do exception handling and logging.

Have a good configuration plan when deploying your Microservices.

You also need team member who are not afraid to do many different things with multiple technologies. You need people who can learn anything, adapt and develop with any tool, tech or language.

With these approaches and check list you should be able to manage a microservice architecture with only a handful of people. For up-keeping even one person is enough, but constant development at least two or three would be a good amount.

Sharing functionality (Common code base)

When it comes to code I prefer the “golden” rule in programming to not repeat myself but with microservices you will end up with duplication.

The wise thing to do with microservices is to know when to not duplicate and have a common access to shareable code and functionality; and why do this?:

  • Developer and up doing similar code that is used again and again in multiple microservices.
  • These common pieces of code and functionality end up having the same king of problems and bug which have to be corrected in every place
  • The problems and bugs cause security issues
  • With performance issues
  • With possible hard to understand code even when the logic and functionality is the same but the code ends of being slightly or vastly different.
  • And lastly all of the above combined cause you to spend time and money that you may not have

The next question to ask is:

What should you share? The main rule is that is must be common. The code should not be specific to a certain microservice or a domain.

Again it all depends on your project but here are a few things:

  • Logging, I highly recommend this to a unified logging output format that is easy to index and analyze.
  • Access Logs
  • Security
    • User authorization but not authentication or registration. Registration is better suited as an external microservice as it’s own domain.
    • Encryption
    • JSON Web Token related security code and processing
    • API Key
    • Basic Auth
  • Metrics
  • HTTP Client class for HTTP requests. Create an instance of this class with different parameters to share common functionality for logging, metrics and security.
  • Code to use and access Cloud based resources from AWS, Azure
    • CloudWatch
    • SQL Database
    • AppInsights
    • SQS
    • ServiceBus
    • Redis
    • etc…
  • Email Client
  • Web Related Base classes like for controllers
  • Validations and rules
  • Exception and error handling
  • Metrics logic
  • Configuration and settings logic

How should you distribute your shared functionality and code? Well it all depends on your project but here are a few ways:

  • One library to rule them all :D. Create one library which all projects need to use. Notice: This might become a problem later on when your common library code amount grows. You will end up with functionality that you may not need in a particular Microservice.
  • Create multiple libraries that are used on need basis. Thus using only bits of functionality which you need.
  • Create Web API’s or similar services that you request them to perform a certain action. This might work for things like logging or caching but not for all functionality. Also notice that you will lose speed of code to latency if you outsource your common functionality to a common service that is run independently from your actual code that needs that common functionality.
  • A combination of all of the above.

Dependency injection

Use your preferred dependency injection library to manage your classes and dependencies.

When using your DI I recommend thinking of combining classes into “packages” of functionality by feature, domain, logic, data source etc. By doing this you can target specific parts of your code without “contaminating” the project with unneeded code even if you have a large library like a common library.

For example you could pack a set of classes that provide you the functionality to communicate with a CRM, get, modify and add data.

This would include your model classes. A CRM client class, some logic that operate on the model to clean them up etc.

Once you identify them, make your code that that you can add them into your project with the least amount of code.

Also consider creating a logic to automatically tell a developer which configurations are missing once a set of functionalities are added. The easiest way to achieve this is to add this at compile and/or runtime with checks.

See my previous article on this matter for a more detailed description:

https://lionadi.wordpress.com/2019/10/01/spring-boot-bean-management-and-speeding-development/

Continuous Integration/Continuous Delivery (CI/CD)

There are may ways of doing CI and CD but the main point is that ADD it and automate as much as possible. This is especially important with microservices and small team sizes.

It will speed things up and keep things working.

Here are a few things to take into consideration:

  1. Create unit tests that are run during your pipelines
  2. Create API or Service level tests that verify that things work with mock or real life data. You can do this by mocking external dependencies or using them for real if available.
  3. Add performance tests and stability tests to your pipelines if possible to verify that things run smoothly.
  4. Think of using the same tool for creating your API or service tests when developing and when running the same tests in a pipeline. You can just reuse the same tests and be sure that what you test manually is the same that should work in production. For example: https://www.postman.com/ and https://github.com/postmanlabs/newman
  5. Script as much as possible and parametrize your scripts for reuse. Identify which scripts can be used and shared to avoid doing things twice.
  6. Use semantic versioning https://semver.org/
  7. Have a deployment plan on how you are going to use branches to deploy to different environments (https://www.atlassian.com/git/tutorials/comparing-workflows/gitflow-workflow), for example:
    1. You can use release branches to deploy to different environments based on pipeline steps
    2. Or have specific branches for specific environment. Once things are merged into them certain things start to happen
  8. Use automated build, test and deployment for dev environment once things are merged to your development branch.
  9. Use manual steps for other environments deployment, this is to avoid testers to receiving bad builds in QA or production crashing on bugs not caught up.
  10. If you do decide to automate everything all the way to production, make sure you have good safe guards that things don’t blow up.
  11. And lastly; nothing is eternal. Experiment and re-iterate often and especially if you notice problems.

Common tools to CI/CD

https://www.sonatype.com/product-nexus-repository

https://www.ansible.com/

https://www.rudder.io/

https://www.saltstack.com/

https://puppet.com/try-puppet/puppet-enterprise/

https://cfengine.com/

https://about.gitlab.com/

https://www.jenkins.io/

https://codenvy.com/

https://www.postman.com/

https://www.sonarqube.org/

High Availability

The main point in high availability is that your solution will continue to work as well as possible or as normal even if some parts of it fail.

Here are the three main points:

  • Redundancy—ensuring that any elements critical to system operations will have an additional, redundant components that can take over in case of failure.
  • Monitoring—collecting data from a running system and detecting when a component fails or stops responding.
  • Failover—a mechanism that can switch automatically from the currently active component to a redundant component, if monitoring shows a failure of the active component.

Technical components enabling high availability

  • Data backup and recovery—a system that automatically backs up data to a secondary location, and recovers back to the source.
  • Load balancing—a load balancer manages traffic, routing it between more than one system that can serve that traffic.
  • Clustering—a cluster contains several nodes that serve a similar purpose, and users typically access and view the entire cluster as one unit. Each node in the cluster can potentially failover to another node if failure occurs. By setting up replication within the cluster, you can create redundancy between cluster nodes.

Things that help in high availability

  • Make your application stateless
  • Use messaging/events to ensure that business critical functionality is performed at some point in time. This is especially true to any write, update or delete operations.
  • Avoid heavy coupling between services, if possible and if you have to do use a lightweight messaging system. The most troublesome aspect of communicating between microservices is going to be over HTTP.
  • Have good health checks that are fast to respond when requested. This can be divided into two categories:
    • Liveness: Checks if the microservice is alive, that is, if it’s able to accept requests and respond.
    • Readiness: Checks if the microservice’s dependencies (Database, queue services, etc.) are themselves ready, so the microservice can do what it’s supposed to do.
  • Use “circuit breaker” to quickly kill unresponsive services and quickly run them back up
  • Make sure that you have enough physical resources(CPU, Memory, disk space etc) to run your solution and your architecture
  • Make sure you have enough request threads supported in your web server and web application
  • Make sure you verify how sizable HTTP requests your web server and application is allowed to receive, the header is usually that will fail your application.
  • Test your solution broadly in stress and load balancing tests to identity problems. Attach a profiler during these tests to see how your application perform, what bottlenecks there are in your code, what hogs resources etc.
  • Keep your microservices image sizes to the minimum for optimal run in production and optimal deployment. Don’t add things that you don’t use, it will slow your application down and deployment will suffer; all of this will lead to more needed physical resources and more money needed.

Messaging and Event driven architecture

I will be covering this topic in an upcoming post but until then here are a few pointers.

Because of the nature of Microservices; that they can be quickly scaled up and down based on needs I would very highly recommend that you use messaging for business critical operations and logic.

The most important ones I would say are: Writing, Updating and Deleting data.

Also all long running operations I would recommend using messaging.

Notice: One of the most important thing I consider is that you log and monitor the success of messages being send, processed and finished, with trailing to the original request to connect logs and metrics together to get a whole picture when troubleshooting.

I have coveted this in my previous logging post: https://lionadi.wordpress.com/2019/12/03/lessons-learned-from-building-microservices-part-1-logging/

Security

Generally security is an important aspect of any application and has many different topics and details to cover.

Related on security I covered this extensively in my last post in this series on Microservices, go check it out: https://lionadi.wordpress.com/2020/03/23/lessons-learned-from-building-microservices-part-2-security/

Templates and scripting

To speed up development, keep things the same and thus avoiding duplicate errors + unnecessary fixes, use templates where possible. This is especially true for Microservices.

What are possible templates that you could have:

  • Templates for deploying Cloud resources like ARM for Azure or Cloudformation for AWS.
  • Beckend Application templates
  • Front application templates
  • CI/CD templates
  • Kubernetes templates
  • and so on…

Anything that you know you will end up having multiple copies is good to standardize and create templates.

Also I recommend that for applications (front or backend), it is a very good practice to have the applications up and running as soon as you duplicate them from your repository. They should be able to be up and running as soon as you start them.

Script as much as possible and make the scripts reusable.

Parametrize all of the variables you can in your scrips and templates

Here are a few things you would need for a backend application template:

  • Security such as authentication and authorization.
  • Logging and metrics
  • Configuration and settings logic
  • Access Logs
  • Exception handling and errors
  • Validations

Logging and Monitoring (+metrics)

Again as with security this is a large topic, I’ve also written about this in my previous post in the series and recommend go checking it out:

https://lionadi.wordpress.com/2019/12/03/lessons-learned-from-building-microservices-part-1-logging/

Configurations pattern

For microservice configurations I recommend a following pattern where your deployment environments (DEV, QA, PROD etc) configuration files configurations/settings values are left empty. You still have the configuration/setting in your configuration/settings files but you leave them empty.

Next you need to make sure that your code knows how to report empty configuration values when your application is started. You can achieve this by creating a common way to retrieve configurations/settings value and being able to analyze which of the needed and loaded configurations are present.

This way when your docker image is started and the application inside the image starts running and retrieving configurations, you should be able to see what is missing in your environment.

This is mostly because you don’t want to have your environment specific configurations set in your git repository, especially the secrets. You will end up setting these values in your actual QA, PROD etc environments through a mechanism. If you forget to add a setting/configuration in your mechanism your docker image may crash and you will end up searching for the problem a long time, even with proper logging it may not be immediately clear.

I’ve written an previous post on this matter which opens things up on the code level:

https://lionadi.wordpress.com/2019/10/01/spring-boot-bean-management-and-speeding-development/

Exception handling and Errors

Three main points with exceptions and errors:

  • Global exception handling
  • Make sure you do not “leak” exceptions to your clients
  • Use a standardized error response
  • Log things properly
  • And take into consideration security issues with errors

Again for for details on logging and security check my previous posts:

https://lionadi.wordpress.com/2019/12/03/lessons-learned-from-building-microservices-part-1-logging/

https://lionadi.wordpress.com/2020/03/23/lessons-learned-from-building-microservices-part-2-security/

For error responses, you have two choices:

  1. Make up your own
  2. Or use an existing system

I would say avoid making your own if possible but it all depends on your application and architecture.

Consider first existing ones for a reference:

https://www.hl7.org/fhir/operationoutcome.html

https://developers.google.com/search-ads/v2/standard-error-responses

https://developers.facebook.com/docs/graph-api/using-graph-api/error-handling/

Still here is also an official standard which you can use and may be supported by your preferred framework or library: https://www.rfc-editor.org/rfc/rfc7807.html

The RFC 7807 specifies the following for error responses and details from https://www.rfc-editor.org/rfc/rfc7807.html:

  • Error responses MUST use standard HTTP status codes in the 400 or 500 range to detail the general category of error.
  • Error responses will be of the Content-Type application/problem, appending a serialization format of either json or xml: application/problem+json, application/problem+xml.
  • Error responses will have each of the following keys(Internet Engineering Task Force (IETF)):
    • detail (string) – A human-readable description of the specific error.
    • type (string) – a URL to a document describing the error condition (optional, and “about:blank” is assumed if none is provided; should resolve to a human-readable document).
    • title (string) – A short, human-readable title for the general error type; the title should not change for given types.
    • status (number) – Conveying the HTTP status code; this is so that all information is in one place, but also to correct for changes in the status code due to the usage of proxy servers. The status member, if present, is only advisory as generators MUST use the same status code in the actual HTTP response to assure that generic HTTP software that does not understand this format still behaves correctly.
    • instance (string) – This optional key may be present, with a unique URI for the specific error; this will often point to an error log for that specific response.

RFC 7807 example error response:

HTTP/1.1 403 Forbidden
Content-Type: application/problem+json
Content-Language: en

{
  "type": "https://example.com/invalid-account",
  "title": "Your account is invalid.",
  "detail": "Your account is invalid, your account is not confirmed.",
  "instance": "/account/34122323/data/abc",
  "balance": 30,
  "accounts": ["/account/34122323", "/account/8786875"]
}
   HTTP/1.1 400 Bad Request
   Content-Type: application/problem+json
   Content-Language: en

   {
   "type": "https://example.net/validation-error",
   "title": "Your request parameters didn't validate.",
   "invalid-params": [ {
                         "name": "age",
                         "reason": "must be a positive integer"
                       },
                       {
                         "name": "color",
                         "reason": "must be 'green', 'red' or 'blue'"}
                     ]
   }

Performance and Testing

Testing

To make sure that your solution and architecture works and performs I recommend doing extensive testing. Familiarize yourself with the testing pyramid which hold the following test procedures:

  • Units tests:
    • Small units of code tests which tests preferably one specific thing in your code
    • The tests makes sure things work as intended
    • The number of unit tests will outnumber all or tests
    • Your unit tests should run very fast
    • Mock things used in your tested functionality: replace a real thing with a fake version
    • Stub things; set up test data that is then returned and tests are verified against
    • You end up leaving out external dependencies for better isolation and faster tests.
    • Test structure:
      • Set up the test data
      • Call your method under test
      • Assert that the expected results are returned
  • Integration tests:
    • Here you test your code with external dependencies
    • Replace your real life dependencies with test doubles that perform and return same kind of data
    • You can run them locally by spinning them up using technologies like docker images
    • Your can run them as part of your pipeline by creating and starting a specific cluster that hold test double instances
    • Example database integration test:
      • start a database
      • connect your application to the database
      • trigger a function within your code that writes data to the database
      • check that the expected data has been written to the database by reading the data from the database
    • Example REST API test:
      • start your application
      • start an instance of the separate service (or a test double with the same interface)
      • trigger a function within your code that reads from the separate service’s API
      • check that your application can parse the response correctly
  • Contract tests
    • Tests that verify how two separate entities communicate and function with each other based on a commonly predefined contract (provider/publisher and consumer/subscriber. Common communications between entities:
      • REST and JSON via HTTPS
      • RPC using something like gRPC
      • building an event-driven architecture using queues
    • Your tests should cover both the publisher and the consumer logic and data
  • UI Tests:
    • UI tests test that the user interface of your application works correctly.
    • User input should trigger the right actions, data should be presented to the user
    • The UI state should change as expected.
    • UI Tests does not need to be performed end-to-end; the backend could be stubbed
  • End-to-End testing:
    • These tests are covering the whole spectrum of your application, UI, to backend, to database/external services etc.
    • These tests verify that your applications work as intended; you can use tools such as Selenium with the WebDriver Protocol.
    • Problems with end-to-end tests
      • End-to-end tests require alot of maintenance; even the slightest change somewhere will affect the end result in the UI.
      • Failure is common and may be unclear why
      • Browser issues
      • Timing issues
      • Animation issues
      • Popup dialogs
      • Performance and long wait times for a test to be verified; long run times
    • Consider keeping end-to-end to the bare minimum due to the problems described above; test the main and most critical functionalities
  • Acceptance testing:
    • Making sure that your application works correctly from a user’s perspective, not just from a technical perspective.
    • These tests should describe what the users sees, experiences and gets as an end result.
    • Usually done through the user interface
  • Exploratory testing:
    • Manual testing by human beings that try to find out creative ways to destroy the application or unexpected ways an end user might use the application which might cause problems.
    • After these finding you can automate these things down the testing pyramid line in units tests, or integration or UI.

All of the automated tests can be integrated to you integration and deployment pipeline and you should consider to do so to as many of the automated tests as possible.

Performance

For performance tests the only good way to get an idea of your solutions and architectures performance is to break it and see how it works under long sustained duration.

Two test types are good for this:

  • Stress testing: Trying to break things by scaling the load up constantly untill your application stop totally working. Then you analyze your finding based on logs, metrics, test tool results etc.
  • Load testing: A sustained test where you keep on making the same requests as you would expect in real life to get an idea how things work in the long run; these tests can go on from a few hours to a few days.

The main idea is that you see problems in your code like:

  • Memory leaks
  • CPU spikes
  • Resource hogging pieces of code
  • Slow pieces of code
  • Network problems
  • External dependencies problem
  • etc

One of my favorite tool for this is JMeter https://jmeter.apache.org/.

And to get the most out of these tests I recommend attaching a code profiler to your solutions and see what happens during these tests.

There is a HUGE difference how your code behaves when you manually test your code under a profiles and how it behaves when thousands or millions or requests are performed. Some problems only become evident when they are called thousands of times, especially memory allocations and releases.

And lastly; cover at least the most important and critical sections of your solutions and keep adding new ones when possible or problems areas are discovered.

These tests can also be added as part of the pipelines.

Topology tests

Do your performances tests while simulating possible error in your architecture or down times.

  • Simulate slow start times for servers.
  • Simulate slow response times from servers.
  • Simulate servers going down; specific or randomly

Test how your system works under missing resources and problems.

Test how expensive your system is

When you are creating tests consider testing the financial impact of your overall system and architecture. By doing different levels of load tests and stress tests you should be able to get a view on what kind of costs you will end up with.

This is especially important with cloud resources where what you pay is related to what to consume.

Lessons learned from building microservices – Part 2: Security

In this blog post I will go through some of the things I have learned regarding security when it comes to micro-services. It is not a comprehensive guide and things change constantly, so keep learning and investigating.

Best advice here is to avoid re-inventing the wheel. Avoid making your own solutions related to security. This is because someone else with more resources and time has done it before you. Think of the libraries provided by the .NET Core or Java, these have been developed and tested for years. A good example of this would be encryption libraries.

Topics on this post are the following:

  • JSON Web Tokens
  • Monitoring, Logging and Audit trailing
  • Identity and access management
  • Encryption
  • Requests and data validations
  • Error handling
  • CORS & CSP & CSRF
  • OWASP (Open Web Application Security Project)
  • Configurations
  • Quality
  • Security Audit
  • Logs
  • Architecture

JSON Web Tokens

Basically they are JSON objects compacted and secured to transfer data between two or more entities, depending on usage.

https://jwt.io/introduction/

https://cheatsheetseries.owasp.org/cheatsheets/JSON_Web_Token_Cheat_Sheet_for_Java.html

The most common usage is to use them with authentication and authorization.

Another usage could be when you want a person to take some action but the action is delayed to a further date and moment. This is a common thing related to registration and verifying the person.

At some point during the registration process you would need to verify the user, so you can generate a token with needed metadata and send him/her an email. Later the user clicks a link on the received email containing the token as a parameter. One the data gets to your application you can open and validate the token and finish the registration.

There are many other uses but these are just an example. Any time you need to pass data or send it over the internet and that data needs to be secured and not long lived you could consider to use a token.

Security things to do with tokens

Summary: Always validate your tokens

  • Set the audience
  • Set the issuer
  • Set an expiration time
    • You don’t want your tokens in the outside world to live forever, meaning that they should not be used after a certain amount of time.
  • Sign the token to detect tampering attempts
    • Notice: In this scenario the token will not hide the data in the payload, it will only verify that the token hasn’t been tampered with. You need to combine it with encryption
  • Encrypt the token to hide the data in the token
    • Be aware of different encryption methods and when to use them. Generally symmetric encryption is preferred for data that is not in transit; like data that reside in a database. Asymmetric encryption is good for data that is moving and not stationary; data that is moving through the internet is a good example.

Monitoring, Logging and Audit Trailing

From a security perspective I consider logging a very important thing to do. When talking about this category the following things are important to your microservice (or any other application also).

  • Being able to trace activity in your application.
    • Who is operating
    • What is being done
    • How long things take and not just your application requests but also any external resources
    • Errors/warnings and successes
  • Being able to tell if possible attacks happen that want to cause damage or steal something valuable to your or your clients
  • Being able to tell the health of your solutions
  • Have a monitoring tool that can aggregate and display detailed information about your solutions
  • Being able to create alerts that inform you and your team of possible problems
  • Consider automatic actions to avoid issues if certain things are triggered, like possible attack attempts.
  • Consider having a plan on what to do when things seem to break or go bad based on gathered data, alerts, monitoring tools etc. Having an idea what will happen next will make things easier and help avoid public problems.

For more details on logging check out my previous blog entry on these series: https://lionadi.wordpress.com/2019/12/03/lessons-learned-from-building-microservices-part-1-logging/

Identity and access management

Application users

First things first as before with the example of encryption libraries I would recommend using a ready solution, especially if you plan to do a cloud based solution or an app with thousands or more users.

Consider AWS Cognito or Azure AD B2C.

The reason for this is that they provide all the security you need and more in some cases. You will have a huge possible security risk off your shoulders. There are many details to take into consideration if you go the way of manually creation an identity solution with authentication and authorization.

These ready solutions allow you to modify many details how you will use the authentication tokens and authorization tokens in your app. You can add custom attributes, you can use social media to create accounts, support for MFA, mobile users etc.

Require Re-authentication for Sensitive Features.

Proxy

Does the above mean that you can’t create a proxy service with custom logic and logging when users authenticate against Cognito or AD B2C? The answer is NO but consider if you really need it.

Possible situations where you might need an identity proxy are:

  • You need to verify that the authenticated user is allowed to authenticate. The account may not be disabled but might require a human step to be performed somewhere
  • A custom registration flow with custom business logic; for example a person can’t register is his/hers data is not in a certain state or if the data is in certain states then the registration will look, behave and end differently for different users.
  • Custom security logging; for maximum traceability and analysis. You might want to create custom logging and use proper tools to analyze what each person is doing. Especially things related to registration are critical and it is very common that people forget passwords, don’t know how to reset their password, problems logging in might occur etc. In all these cases logging saves hours if not days of troubleshooting.

Admin users

For admin users there are many good best practices to follow and I recommed looking overt them for you particular needs and technology uses. Here are a few links on the matter for AWS, Azure:

https://docs.aws.amazon.com/IAM/latest/UserGuide/best-practices.html

https://docs.microsoft.com/en-us/azure/security/fundamentals/identity-management-best-practices

Here is a quick list on the top things at the moment:

  • Require MFA
  • Limit the number of Super Admins
  • Enforce a limited session lifetime (Reducing the time a malicious 3rd party can take advantage of an open active session)
  • Enable user notifications for new sign-ons, password resets, account changes etc
  • Consider limiting access from specific locations/IPs etc, or between a certain date and time range, require SSL
  • Use Strong Password Policies (although some of these may make people “lazy” when changing password and pick low security or bad passwords)
    • Lockout
    • Password history
    • Password age
    • Minimum length
  • Create individual user accounts, do not share accounts
  • Grant least privilege
  • Do not share access keys
  • Remove unnecessary credentials
  • Monitor admin activity and send alerts of things that are suspicious

Encryption

Encryption comes in many forms, usually done by two methods: symmetric and asymmetric

Notice: Strong recommendation use existing common libraries for encryption all the way. Do not re-invent the wheel.

Symmetric

You should prefer symmetric encryption in stationary data, data that resides in databases. Also remember to add a salt to the encryption key to avoid possible guess work by an attacker. A salt is added to the hashing process to force their uniqueness, increase their complexity without increasing user requirements, and to mitigate password attacks like rainbow tables.

Probably the most recommended symmetric encryption is AES.

More info: https://cheatsheetseries.owasp.org/cheatsheets/Password_Storage_Cheat_Sheet.html

Asymmetric

Asymmetric encryption is best suited and recommended for data in transit, data that is moving from one place to another over the web. Also in my opinion data that is leaving your secure environment to another location. The most common asymmetric is RSA that is used very broadly like any time you are using HTTPS application protocol to view a site.

Requests and data validations

Generally security authentication and authorization is based on user signin and user roles. While this is a good option there are drawbacks which I will discuss later.

So I will discuss here a way of achieving security through individual permissions for each action the person tries to do. The proper name for this is: Permission Based Access Control

Notice: Still take into consideration your needs in your project. As always some methods of doing things maybe suited for vastly different purposes. I would say that if you chose the Permission Based Access Control, I would recommend that you have many users, in the thousands and even many time greater than that. Also if your users permissions need to change dynamically this is a good solution.

Other options are:

  • Role-Based Access Control (RBAC)
  • Discretionary Access Control (DAC)
  • Mandatory Access Control (MAC)

The main focus is on individual permissions that are defined in a policy/access map. These maps can then be assigned on the fly in necessary to users and/or groups. So if you where to choose a role based access control, you will likely narrow down very much of what a person can do in a system. Also as your application grows it may start to become very rigid. All persons under that role must comply to role(s) exactly in the locations you apply that role.

This most likely will force you to apply multiple roles to a location or users, this will add security options/accesses which are greater than what a particular functionality requires. You may be opening up your application to security vulnerabilities by giving to much access.

So I would suggest creating or starting to work from the idea of individual access management based on permission. Your data in your system(s) should be able to tell you which permission/access maps can a user use.

Now in this situation I am talking about the authorization of a user. When a request happens the code will check the users data and determine which permission he/he has. Your permission/access maps are usually created manually and shared in your system in a secure manner so that they can only be read and not modified by any entity that reads them.

This permission/access map can be used in two very important situations; that is to control what a person can do and what a person can see, so below are our two main requirements:

  1. Test can the person use the requested functionality
  2. If the first step has passed: Test what the user can see. A person may see only parts of data or none at all

Notice: Steps 1 and 2 above are not the same thing. Step 1 is usually something you do on a controller level, on the level where your requests starts to be processed. Step 2 is something you should be doing at the data level; like a service that operates on a data source.

Steps for security validations:

  • Find out can the user access the system
  • Based on the authentication find out who the user is
  • Gather user related data to generate a permission/access map
    • User ID(s)
    • Permission/access map(s); this should be determined based on persons data in the system. A person can have multiple access maps.
    • For each role, have a list of read/create/write/delete for each “category” of importance/bounded context/models etc. This depends on your application and the size of the application and what you are trying to achieve.
  • Use this permission/access map to determine which requests the person can access and which data can he/she see

Taking this approach you are able to:

  • be as loose as you want
  • as rigid as you want
  • exactly where you want.
{
    "AccessCategory": {
      "USER": [
        "READ",
        "UPDATE"
      ],
      "ORDERS": [
        "READ",
        "UPDATE",
        "CREATE"
      ]
    },
    "id": "DEFAULT"
  }

For the requests in the controller make checks on what kind of operations you want the person to be able to do. Have a generated “access/permission map” that knows what the person can do based on his data and states. Have the access/permission map generated frequently, preferably each request.

API Request check example at controller level:

hasUserRights(EnumSet.of(AccessCategory.USER, AccessCategory.ORDERS), EnumSet.of(Permission.READ, Permission.UPDATE));

The above function will go through the access/permission map defined above in the JSON data and see if the requested categories have the requested permissions.

Data request: Does the person have the right to view all of the data requested; if partial show only partial or nothing.

So when you read the access/permission map you need to associate that map to the data that the users can view. This connection can be done inside the code based on the access categories.

Then when a user requests data you have to have internal business logic that will determine can the user view the requested data.

Your access/permission map by itself can’t tell your code how the code should behave, you have to associate the business logic by which to filter out data or deny data access.

I would recommend having a user access service that is responsible for generating the permissions and performs the main logic for the security checks. This way you can ask your service to generate for any user a an access service just by providing a user id. Then you can use this user specific access service to make security checks.

A good example on this would be AWS access permissions and policies:

https://docs.aws.amazon.com/IAM/latest/UserGuide/access_controlling.html

Or Azure:

https://docs.microsoft.com/en-us/azure/governance/policy/tutorials/create-custom-policy-definition

Error handling

It is important that you do not “leak” or give away any exceptions to the outside world.

I would recommend that you have a global way of catching all of your exceptions and replacing the response of your application with a client friendly message that tells the client what possibly went wrong but does not give away sensitive information that can be used against your application or their users. Always return a generic error: https://cipher.com/blog/a-complete-guide-to-the-phases-of-penetration-testing/

Remember to log your error properly.

Also regarding your responses to the outside world consider inserting your error friendly message in the body of the response as a custom JSON with data that might help your client app to response properly to the end user. This might be:

  • An error id
  • Possible translation error id for fetching the appropriate error message
  • Error source ID, like a database, 3rd party API, CRM etc, but be carefull not to give away this info to carelessly. Think how this info can be used against you.

Other things to consider regarding any error response is for you definitely think how the things you send to the outside world might be used against you.

This is especially true regarding authentication, authorization situation and registration. Depending on what you are doing you need to mask as much as possible in your responses when something goes wrong, even to the point of sending 200 HTTP status code in error situations. https://cheatsheetseries.owasp.org/cheatsheets/Authentication_Cheat_Sheet.html#authentication-and-error-messages

CORS & CSP & CSRF

The following security measures are a combination of procedures and steps you need to take on both a client and server side applications. I won’t go into the details of implementing them or the in depth knowledge on them. There are many ways to implement these measures in your preferred technological stack. The end result should be the same but how you do this can be different on your choices of technology, being in Azure App service and enabling CORS can be as simple as pressing a button but do a microservice within kubernetes and things change drastically. Just be aware of these measures and seek out examples how to implement them.

Important: Don’t use pick one of them but use them in combination for maximum security.

Cross-Origin Resource Sharing (CORS): In this security measure you can specify who can communicate with your server, which HTTP methods are OK and which headers are allowed. This happens from request that originate from a different different origin (domain, protocol, or port) from its own.

Implementing has to be done in your server configuration and/or code. The client application will usually make an options request to the server with what is wants to do and from where it tries to do it, the server will then say of it is OK to continue by sending what it knows is allowed. The browser will then continue or stop the request there.

https://developer.mozilla.org/en-US/docs/Web/HTTP/CORS

Content Security Policy (CSP):

In this measure you are defining which resources are allowed from which sources to be used in your client applications. This includes, font, media, images, javascript, objects etc.

Notice that for dynamic script/content you need a nonce value for those contents. This nonce value need to be generated on a server each time the web application is loaded. If you assign a static nonce value this leaves an attack opportunity in your application to execute things which you do not intend.

https://developer.mozilla.org/en-US/docs/Web/HTTP/CSP

https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Content-Security-Policy

https://cheatsheetseries.owasp.org/cheatsheets/Content_Security_Policy_Cheat_Sheet.html

Cross-Site Request Forgery (CSRF):

By this I mean unwanted actions performed in your users browser. For more details I recommed OWASP source for more detailed information:

https://cheatsheetseries.owasp.org/cheatsheets/Cross-Site_Request_Forgery_Prevention_Cheat_Sheet.html

https://cheatsheetseries.owasp.org/cheatsheets/Cross_Site_Scripting_Prevention_Cheat_Sheet.html

OWASP (Open Web Application Security Project)

OWASP is great resource for information related web appication security. If you want to know more I very strongly recommend to look at their stuff. I’ll post here some of their material which I consider a must to know or atleast to have an idea and come back to.

A good thorough cheat sheet: https://cheatsheetseries.owasp.org/

Top Ten security issues project: https://owasp.org/www-project-top-ten/

Top API security issues: https://github.com/OWASP/API-Security

Top Serverless application issues: https://github.com/OWASP/Serverless-Top-10-Project

A tool to check for security holes and vulnerabilities within your 3rd party libraries and dependencies:

https://owasp.org/www-project-dependency-check/

https://jeremylong.github.io/DependencyCheck/

Configurations

For configurations the most important thing that I think all developer have done at least by accident is to push production credentials into git. So avoid this :).

But other than that here are a few tips:

  • Only add configurations in your configuration files for local development
  • For any other environment have your desired environment configuration file configuration empty. What I mean is that you configuration keys are there but they are empty. You want to do this to make sure that once your test, qa or prod environment configuration files are loaded they are empty unless an outside source sets them, the next step.
  • In your non local development environment load your application with the desired environment configuration like test, qa, prod and replace the empty configurations from a secure secrets store. For example in kubernetes secrets files, in Azure use the Key Vault in AWS Key Management service.
  • Now at this point your should also have a piece of code that can determine if a configuration key is not set and thus is empty. At this stage you should throw an exception and stop the application running. This is usually something that can happen during application start processing. For this I have a post that gives a sample code: https://lionadi.wordpress.com/2019/10/01/spring-boot-bean-management-and-speeding-development

The steps here will improve both your security and quality of your code which I think go hand in hand.

Quality

Quality is important for security because if you have the time and take the interest to create good code that can live for years then it is likely that you will have a secure code, or at least more secure.

Simple things like having good coding practices, common tools and way of doings things within your team can reduce the number of error that can reduce the number of security problems.

Here are a few tools that can help improve your code quality and workflows:

https://www.sonarqube.org/

https://www.sonarlint.org/

https://www.sonatype.com/product-nexus-repository

I will write more about quality in my next post in this series and link it here.

Security Audit

Lastly have someone do a security audit on your application and the entire ecosystem if possible. Have them try to hack into your application, your ecosystem. Have them create a threat analysis with out etc.

If you can’t afford someone then think of learning the basics yourself. This will also improve your code quality and things that you will start to automatically take into consideration when you work on your code.

Logs

The important things is that you have logs about your system that reveal possible security problems or threats. The previous part I went into logging details.

Architecture

When designing your Microservices architecture be aware of every detail and entity within your design.

  • Be aware of the traffic between your containers.
  • Be aware of encryption data between your containers.
  • Be aware of access to your resources within your architecture. Can resource X access resources Y? Are the given privileges too much? etc.
  • Only open ports and routes to your resources that are truly needed.
  • Don’t store sensitive information in places that are not secure, prefer ready made products like KeyVault in Azure or Key Management Service in AWS.
  • Use system identities between resources in Cloud environments, they are more secure than manually handled security accounts.
  • Prefer ready made solutions than creating/reinventing the wheel, if possible. Usually a good popular product has a large team and resources to keep things secure and up to date.
  • Have audit trails on what happens in your architecture, how does what etc.
  • Give the least amount of privileges to people within your architecture, only what is needed for that person or group of people to do their job.
  • Set expiration dates to secrets and privileges to resources, where applicable.

Lessons learned from building microservices – Part 1: Logging

This is a part in a series of posts discussing things learned while I worked with micro-services. The things I write here are not absolute truths and should be considered as the best solutions at the time I and my team used these methods. You might chose to do things differently and I recommend highly to find out for yourself the best practices and approaches that work for you and your project.

I also assume that you have a wide range of pre-existing knowledge on building microservices, API, programming languages, programming, cloud providers etc.

I recommend looking at the OWASP cheat sheet to get even a more in depth view: https://cheatsheetseries.owasp.org/cheatsheets/Logging_Cheat_Sheet.html

UPDATE – 17.3.2020: I’ve improved this post based on the OWASP logging cheat sheet

Notice: In the examples below I will omit “boilerplate” code to save space.

Introduction

By logging usually is meant created records by a piece of software at your operating system level, a web application, a mobile app, a load balancer, databases, mail servers and so on. Logs are created by many different types of sources.

Their importance comes from their ability to allow you to understand what is happening in your system or application. They should show you that that everything is alright and if they are not you should be able to determine what is not alright.

Base requirements for logging

General requirements for logging are:

  • Identifying security incidents
  • Providing information about problems and unusual conditions
  • Business process monitoring
  • Audit trails
  • Performance monitoring

Which events to log:

  • Input validation failures
  • Output validation failures
  • Authentication successes and failures
  • Authorization (access control) failures
  • Session management failures
  • Application errors and system events
  • Application and related systems start-ups and shut-downs, and logging initialization
  • Use of higher-risk functionality (user management, critical system changes etc)

Things to exclude:

  • Application source code
  • Access tokens
  • Sensitive personal data and some forms of personally identifiable information
  • Authentication passwords
  • Database connection strings
  • Encryption keys and other master secrets
  • Bank account or payment card holder data
  • Data of a higher security classification than the logging system is allowed to store
  • Information a user has opted out of collection, or not consented

In some of the cases to exclude you can obscure/remove/alter sensitive data to provide partial benefits without exposing all of the sensitive data to malicious people:

  • File paths
  • Database connection strings
  • Internal network names and addresses
  • Non sensitive personal data

Still, be very careful with this information, especially with user related data.

Now as much it is important to log and have a good view on what is happening in your system and application, it is also a fine art to understand when not to log things.

Having too much log will make it hard to find out the relevant critical information you need. Having too little logging you risk not being able to understand your problem properly.

So, there is a fine balance between logging too much or too little.

A possible solution to this issue is to have more verbose logging during development and when deploying to production your application will only log what is determined important by the developers so that someone will be able to troubleshoot a problem in production without having too much or too little logging. This is also a process that need refactoring during the lifetime of the application.

This leads us to a requirement of logs: logs should be structured and easily indexed, filtered and searched.

Logging audience

When you are logging, I recommend considering for who are you logging for?

You need to ask yourself: Why add logging to an application?

Someone someday will read that log entry and to that person the log entry should make sense and help that person. So, when you log things, think of your audience and the following things:

  • What is the content of the message
  • Context of the message
  • Category
  • Log Level

All of these can be quite different depending on who is looking at your logs. As a developer, you can easily understand quite complex logs but as a non-developer you mostly likely would not be able to make much sense of complex log entries. So, adapt your language to the intended target audience, you can even dedicate separate categories for this.

Also, think if the log entries can be visualized. For example, metrics logs should have categories, dates and numbers which can be translated into charts that show how long things last or succeed.

Write meaningful log messages

When writing log entries avoid writing them so that you need to have in depth knowledge of the application internals or code logic, even if you are a developer or someone who will look at logs that will be a developer.

There are a few reasons to write log messages that are not depended on knowing the application code or the technicalities behind your application:

  • The log messages will most likely be read by someone else that is not a technical person and even if they are not you may need to prove something in your application to a non-technical person.
  • Even if you are the only developer who is working on your application, will you remember all your logic and meaning of log entries a year or two from now? If you must go to your code and check on what the heck this log entry means, then your log entry was not meaningful enough. Yes, you do have to go back to the code anyway if you have problems but if you have to do this frequently then you definitely need to refactor your logging logic and the log content in your application.
  • If you have multiple developer and they do an analysis of a problem they may not understand what is going on. This is because they might not have any correlation or understanding of a log entry because they have not been apart of the initial solution. They must find out what is going on from the code.

Logging is about the four W:

  • When
  • Where
  • Who
  • What

Add context to your log messages

By context I mean that you log message should usually tell what is going on by giving away all the needed details to understand what is happening.

So, this is not OK:

“An order was placed”

If you where to read that one, you would ask: “What order? Who placed the order? When did this happen?”

A much more detailed and helpful log message would be:

“Order 234123-A175 was placed by user 9849 at 29.3.2019 13:39”

This message will allow someone to get that order from a system, look at what was ordered and by whom and at what time.

Log at the proper level

When you create a log entry your log entry should have an associated level of severity and importance. The common levels that are used are the following:

  • TRACE: The most verbose logging, will produce A LOT of log entries and is used to track very difficult problems. Never use it in production, if you have to them in production you have a design problem in your application. It is the finest grained log level.
  • DEBUG: This is mostly used for debugging purposes during development. At this level you want to log additional and extra information about the workings of your application that help you track down problems. This could be enabled in production if necessary, but only temporarily and to troubleshoot an issue.
  • INFO: Actions that are user-driven or system specific like scheduled operations.
  • NOTICE: Notable events that are not considered an error.
  • WARN: Events that could potentially become an error or pose might a security risk.
  • ERROR: Error conditions that might still allow the application to continue running.
  • FATAL: This should not happen a lot in your application but if it does it usually terminates your program and you need to know why.

Service instances

In a microservice architecture the most important thing is to be able to see what each microservice instance is doing. This means in the case of kubernetes each pod, or each container with docker etc.

So if you have a service named Customer and you have three instances of this service you would want to know what each service is doing when logging. So here is a check list of things to consider:

  • You need to know what each service instance is doing because each instance will process logic and each instance will have it’s own output based on what it is doing or requested to do
  • Each log entry should be able to identify which service instance was that performed the log entry by providing a unique service instance id
  • Each log entry should identify which application version the service instance is using
  • Each log entry should tell in which environment the service instance is operating in, example: development, test, qa, prod
  • If possible each log entry should tell where the service instance is like IP address or host-name

Monitoring

First thing I would recommend is to have an understating where your logs will end up and how you are going to analyze them.

The simplest form would be a log file where you would push your log entries and then using a common text editor or development editor to look at the entries. This works fine if your application is very small or you are dealing with a script. The log entries amount will likely be small, and they won’t be stored for a long period of time.

But, if you know your application or system will produce thousands, hundred of thousands or even millions of log entries each day and you need to store them for a longer period of time then you need a good monitoring tool that than read robustly log entries. You also need a good place to store your log entries.

What you would need normally is something that would:

  • Receive and process a log entry, them transform it and send it to a data store
  • At the data store you would need a tool that will index the data.
  • Then you would need to be able to search and analyze your indexed log entries

A very common tech stack for storing log entries and analysis would be ElasticSearch, Logstash and Kibana. You would use Logstash to process a log entry, transform it and send it to a data store like Elasticsearch where you would index, search and analyze data with. Finally you would use Kibana which is a UI on top of Elasticsearch to visually do the searching and analysis of logs.

Log types

Next I’ll cover the different logging types you might need and that will make your life easier.

General logging details

Before we cover the different types of logs which you might need first we need to have some common data witch each log entry. This data will help us in different way depending on the solution you are making. In my example here these data are related to an API backend but you might find them useful in some other types of solutions.

So consider adding these logging fields to other logs as metadata.

public class LogData
 {

    private String requestId;
    private String userId;
    private String environmentId;
    private String appName;
    private String appVersion;
    private Instant createdAt;

}
FieldSampleDescription
requestId6f88dcd0-f628-44f1-850e-962a4ba086e3This is a value that should represent a request to your API. This request id should be applied to all log entries to be able to group all log entries from a request. Should be unique.
userId9ff4016d-d4e6-429f-bca8-6503b9d629e1Same as with the request id but a user id that represents a possible user that made the API request. Should be unique.
environmentIdDEV, TEST, PRODThis should tell a person looking at a log entry from which environment the log entry came for. This is important in cases where all log entries are pushed into one location and not separated physically.
appNameYour Cool APISame as with the environment id but concerns the app name.
appVersion2.1.7Same as with the environment id but concerns the app version.
createdAt02/08/2019 12:37:59This should represent when the log entry has been created. This will help very much in tracking the progress of the application logic in all environment in case of troubleshooting. Preferable in UTC time.

As you can see with this base line details, we get a pretty good view on where things are happening, who is doing things and when. I can’t stress enough how important these details are!

General log entry

Well this is the base line log entry with an added message field and perhaps a title field. That’s it.

This is what you would need at a bare minimum to find out what is going on.

Access log

Access logs are a great way to keep track of your API requests and their response to a client. It’s a way to the server to keep records of all requests processed by the server. I won’t go deeper into them, there are plenty of detail descriptions available which I recommend going through, here is one:

https://httpd.apache.org/docs/2.4/logs.html#accesslog

https://en.wikipedia.org/wiki/Server_log

Here is some sample code:

public class AccessLog {
    private String clientIP;
    private String userId;
    private String timestamp;
    private String method;
    private String requestURL;
    private String protocol;
    private int statusCode;
    private int payloadSize;
    private String borwserAgent;
    private String requestId;
}
FieldSampleDescription
clientIP127.0.0.1The IP address of the client that made the request to you API.
userIdaa10318a-a9b7-4452-9616-0856a206da75Preferably this should be the same user id that was used in the LogData class above
timestamp02/08/2019 12:37:59A date time format of your choice when the request occured.
methodGET, POST, PUT etc.HTTP Method of the request.
requestURLhttps://localhost:9000/api/customer/infoThe URL of the request
protocolHTTP/1.1The protocol used to communicate with the API request.
statusCode200, 201, 401, 500 etc.HTTP status code of the request response.
payloadSize2345The size of the payload returned to the client.
borwserAgentMozilla/4.08 [en] (Win98; I ;Nav)“The User-Agent request header contains a characteristic string that allows the network protocol peers to identify the application type, operating system, software vendor or software version of the requesting software user agent.” – https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/User-Agent
requestIdThis should the the same request id used in the LogData class earlier.

Message Queue Log

This is related to a decoupling pattern between two or more entities. You push a message to a storage location and someone or something reads and processes it, this is a simplified description of course.

This is a sample log which you could use with events/message queues. Depending on what message queue you use and what kind of configurations, you would most likely have minimal information about the message pushed to a queue.

From a troubleshooting point of view and being able to track things I would recommend passing with the message additional metadata related to the message original situation.

Lets take as an example an API request. What I did was add an additional

This is a bit of complex thing to go into but the main focus here is that depending on what king of message queue or event queue technology and applications you use, you might not get a very detailed view on who, when and what happened.

An example: You have an API that a client application invokes, this request has to do an asynchronous save to a CRM, you have to make sure that this is completed and re-tried if things go bad. This is fine but what if things go bad and even after several attempts nothing has happened. A common practice is that the message is going to go to a dead letter queue, for troubleshooting and future processing.

Now to be able to find out what the problem was you need detailed information and by default messages in queues have little details. So I would recommend adding additional data to the message in a queue so that when the receiving end gets it you can log and associate that message to our previous API request. Then later when using analysis tools you can get a history of the events that has happened, for example using the requestId/correlationId.

public class MessageQueueLog {
    private String sourceHostname;
    private String sourceAppName;
    private String sourceAppVersion;
    private String sourceEnvironmentId;
    private String sourceRequestId;
    private String sourceUserId;
    private String message;
    private String messageType;
}
FieldSampleDescription
sourceHostnameLook at the LogData example earlier.
sourceAppNameLook at the LogData example earlier.
sourceAppVersionLook at the LogData example earlier.
sourceEnvironmentIdLook at the LogData example earlier.
sourceRequestIdLook at the LogData example earlier.
sourceUserIdLook at the LogData example earlier.
messageJSON data JSON data representing a serialized object that hold important data to be used the the receiving end.
messageTypeUPDATE_USER, DELETE_USERA simple unique static ID for the message. This ID will tell the receiving end what it needs to do with the data in the message field.
createdAt02/08/2019 12:37:59This should represent when the message queue entry was created. Preferable in UTC time.

Metrics log

With metrics logs the idea is to be able to track desired performance and successes in your application. A common thing that you might like to track would be how external request from your own code is performing. This will allow you set up alerts and troubleshoot problem with external sources, especially if combined with an access log you can see and a metrics log of how long you request totally took to finish.

But depending on what kind of tools you use you might get automatic But depending on what kind of tools you use, you might get automatic metrics for your application; like CPU usage, memory usage, data usage etc. Here I will focus on metrics logs you would produce manually from your application.

So you could track the following metrics:

  • External source like database, API, service etc.
  • You request total processing time from start to end to return a response
  • Some important section of your code
public class MetricsLog {

    private String title;
    private String body;
    private String additional;
    private String url;
    private int statusCode;
    private Double payloadSize;
    private Long receivedResponseAtMillis = 0L;
    private Long sentRequestAtMillis = 0L;
    private MetricsLogTypes logType;
    private double elapsedTimeInSeconds = 0;
    private double elapsedTimeInMS = 0;
    private String category;
}
FieldSampleDescription
titleUser Database
bodyUpdate user
additionalSome additional data
urlhttp://localhost:9200/api/car/typesIf this is a API request to an external service you should log the request URL.
statusCode200, 401, 500 etc.The HTTP status code returned by the external source.
payloadSize234567The size of the returned data.
receivedResponseAtMillis1575364455When the response was received, this could be in UNIX epoch time.
sentRequestAtMillis1575363455When the request was received, this could be in UNIX epoch time.
logTypeAPI, DATABASE, CODE etc.The HTTP status code returned by the external source or some other code that you wish to use.
elapsedTimeInSeconds1Calculate and write how long it took for the response to be received.
elapsedTimeInMS1000Calculate and write how long it took for the response to be received.
categoryCategory1/2/3 etc.This could be used to group different metrics together.

Security Logs

I would also consider creating a separate security log that would be logged and identified by the logging indexer to it’s own pattern or category etc.

This is to speed up troubleshooting related to security issues like when someone signs in, signs out, registers etc.

A security log provides tools to establish an audit trail. It allows you to record, track and investigate security related operations that happen in your system. This is a hard thing to do it right since you must have enough information to troubleshoot but keep secrets and sensitive information hidden.

Start by using default features of the technology you are using like Azure AD or Cognito and then go into manually logging security logs to complement them which you would do normally from your application.

For each recorded event, the record in the security log includes at least the following:

  • Date and time of event.
  • User identification including associated terminal, port, network address, or communication device etc.
  • Type of event.
  • Names of resources accessed.
  • Success or failure of the event.

For the security logging you can combine the General Security Logging with just a Title and Body. The bare minimum. The idea is to log an event that is related to a security issue and if possible separate it in it’s own index pattern/category.

Aggregated log entry

This is an example where you would have a main log class that will contain our desired log entry data and details for a system.

Possible use cases is when streaming to Cloudwatch or to perhaps Elasticsearch.

public class CloudLog {
    private LocalDateTime timeStamp;
    private String logger;
    private Map<String, Object> metadata;
    private String message;
    private String level;
}
FieldDescription
timeStampA timestamp when the log entry was created.
loggerThe logger entity name.
metadataA map full of key value pair, full of data which can be serialized into JSON for indexing.
messageThe main message to the log entry
levelSeverity level of the log entry, DEBUG, INFO, ERROR, etc.