Lessons learned from building microservices – Part 1: Logging

This is a part in a series of posts discussing things learned while I worked with micro-services. The things I write here are not absolute truths and should be considered as the best solutions at the time I and my team used these methods. You might chose to do things differently and I recommend highly to find out for yourself the best practices and approaches that work for you and your project.

I also assume that you have a wide range of pre-existing knowledge on building microservices, API, programming languages, programming, cloud providers etc.

I recommend looking at the OWASP cheat sheet to get even a more in depth view: https://cheatsheetseries.owasp.org/cheatsheets/Logging_Cheat_Sheet.html

UPDATE – 17.3.2020: I’ve improved this post based on the OWASP logging cheat sheet

Notice: In the examples below I will omit “boilerplate” code to save space.

Introduction

By logging usually is meant created records by a piece of software at your operating system level, a web application, a mobile app, a load balancer, databases, mail servers and so on. Logs are created by many different types of sources.

Their importance comes from their ability to allow you to understand what is happening in your system or application. They should show you that that everything is alright and if they are not you should be able to determine what is not alright.

Base requirements for logging

General requirements for logging are:

  • Identifying security incidents
  • Providing information about problems and unusual conditions
  • Business process monitoring
  • Audit trails
  • Performance monitoring

Which events to log:

  • Input validation failures
  • Output validation failures
  • Authentication successes and failures
  • Authorization (access control) failures
  • Session management failures
  • Application errors and system events
  • Application and related systems start-ups and shut-downs, and logging initialization
  • Use of higher-risk functionality (user management, critical system changes etc)

Things to exclude:

  • Application source code
  • Access tokens
  • Sensitive personal data and some forms of personally identifiable information
  • Authentication passwords
  • Database connection strings
  • Encryption keys and other master secrets
  • Bank account or payment card holder data
  • Data of a higher security classification than the logging system is allowed to store
  • Information a user has opted out of collection, or not consented

In some of the cases to exclude you can obscure/remove/alter sensitive data to provide partial benefits without exposing all of the sensitive data to malicious people:

  • File paths
  • Database connection strings
  • Internal network names and addresses
  • Non sensitive personal data

Still, be very careful with this information, especially with user related data.

Now as much it is important to log and have a good view on what is happening in your system and application, it is also a fine art to understand when not to log things.

Having too much log will make it hard to find out the relevant critical information you need. Having too little logging you risk not being able to understand your problem properly.

So, there is a fine balance between logging too much or too little.

A possible solution to this issue is to have more verbose logging during development and when deploying to production your application will only log what is determined important by the developers so that someone will be able to troubleshoot a problem in production without having too much or too little logging. This is also a process that need refactoring during the lifetime of the application.

This leads us to a requirement of logs: logs should be structured and easily indexed, filtered and searched.

Logging audience

When you are logging, I recommend considering for who are you logging for?

You need to ask yourself: Why add logging to an application?

Someone someday will read that log entry and to that person the log entry should make sense and help that person. So, when you log things, think of your audience and the following things:

  • What is the content of the message
  • Context of the message
  • Category
  • Log Level

All of these can be quite different depending on who is looking at your logs. As a developer, you can easily understand quite complex logs but as a non-developer you mostly likely would not be able to make much sense of complex log entries. So, adapt your language to the intended target audience, you can even dedicate separate categories for this.

Also, think if the log entries can be visualized. For example, metrics logs should have categories, dates and numbers which can be translated into charts that show how long things last or succeed.

Write meaningful log messages

When writing log entries avoid writing them so that you need to have in depth knowledge of the application internals or code logic, even if you are a developer or someone who will look at logs that will be a developer.

There are a few reasons to write log messages that are not depended on knowing the application code or the technicalities behind your application:

  • The log messages will most likely be read by someone else that is not a technical person and even if they are not you may need to prove something in your application to a non-technical person.
  • Even if you are the only developer who is working on your application, will you remember all your logic and meaning of log entries a year or two from now? If you must go to your code and check on what the heck this log entry means, then your log entry was not meaningful enough. Yes, you do have to go back to the code anyway if you have problems but if you have to do this frequently then you definitely need to refactor your logging logic and the log content in your application.
  • If you have multiple developer and they do an analysis of a problem they may not understand what is going on. This is because they might not have any correlation or understanding of a log entry because they have not been apart of the initial solution. They must find out what is going on from the code.

Logging is about the four W:

  • When
  • Where
  • Who
  • What

Add context to your log messages

By context I mean that you log message should usually tell what is going on by giving away all the needed details to understand what is happening.

So, this is not OK:

“An order was placed”

If you where to read that one, you would ask: “What order? Who placed the order? When did this happen?”

A much more detailed and helpful log message would be:

“Order 234123-A175 was placed by user 9849 at 29.3.2019 13:39”

This message will allow someone to get that order from a system, look at what was ordered and by whom and at what time.

Log at the proper level

When you create a log entry your log entry should have an associated level of severity and importance. The common levels that are used are the following:

  • TRACE: The most verbose logging, will produce A LOT of log entries and is used to track very difficult problems. Never use it in production, if you have to them in production you have a design problem in your application. It is the finest grained log level.
  • DEBUG: This is mostly used for debugging purposes during development. At this level you want to log additional and extra information about the workings of your application that help you track down problems. This could be enabled in production if necessary, but only temporarily and to troubleshoot an issue.
  • INFO: Actions that are user-driven or system specific like scheduled operations.
  • NOTICE: Notable events that are not considered an error.
  • WARN: Events that could potentially become an error or pose might a security risk.
  • ERROR: Error conditions that might still allow the application to continue running.
  • FATAL: This should not happen a lot in your application but if it does it usually terminates your program and you need to know why.

Service instances

In a microservice architecture the most important thing is to be able to see what each microservice instance is doing. This means in the case of kubernetes each pod, or each container with docker etc.

So if you have a service named Customer and you have three instances of this service you would want to know what each service is doing when logging. So here is a check list of things to consider:

  • You need to know what each service instance is doing because each instance will process logic and each instance will have it’s own output based on what it is doing or requested to do
  • Each log entry should be able to identify which service instance was that performed the log entry by providing a unique service instance id
  • Each log entry should identify which application version the service instance is using
  • Each log entry should tell in which environment the service instance is operating in, example: development, test, qa, prod
  • If possible each log entry should tell where the service instance is like IP address or host-name

Monitoring

First thing I would recommend is to have an understating where your logs will end up and how you are going to analyze them.

The simplest form would be a log file where you would push your log entries and then using a common text editor or development editor to look at the entries. This works fine if your application is very small or you are dealing with a script. The log entries amount will likely be small, and they won’t be stored for a long period of time.

But, if you know your application or system will produce thousands, hundred of thousands or even millions of log entries each day and you need to store them for a longer period of time then you need a good monitoring tool that than read robustly log entries. You also need a good place to store your log entries.

What you would need normally is something that would:

  • Receive and process a log entry, them transform it and send it to a data store
  • At the data store you would need a tool that will index the data.
  • Then you would need to be able to search and analyze your indexed log entries

A very common tech stack for storing log entries and analysis would be ElasticSearch, Logstash and Kibana. You would use Logstash to process a log entry, transform it and send it to a data store like Elasticsearch where you would index, search and analyze data with. Finally you would use Kibana which is a UI on top of Elasticsearch to visually do the searching and analysis of logs.

Log types

Next I’ll cover the different logging types you might need and that will make your life easier.

General logging details

Before we cover the different types of logs which you might need first we need to have some common data witch each log entry. This data will help us in different way depending on the solution you are making. In my example here these data are related to an API backend but you might find them useful in some other types of solutions.

So consider adding these logging fields to other logs as metadata.

public class LogData
 {

    private String requestId;
    private String userId;
    private String environmentId;
    private String appName;
    private String appVersion;
    private Instant createdAt;

}
FieldSampleDescription
requestId6f88dcd0-f628-44f1-850e-962a4ba086e3This is a value that should represent a request to your API. This request id should be applied to all log entries to be able to group all log entries from a request. Should be unique.
userId9ff4016d-d4e6-429f-bca8-6503b9d629e1Same as with the request id but a user id that represents a possible user that made the API request. Should be unique.
environmentIdDEV, TEST, PRODThis should tell a person looking at a log entry from which environment the log entry came for. This is important in cases where all log entries are pushed into one location and not separated physically.
appNameYour Cool APISame as with the environment id but concerns the app name.
appVersion2.1.7Same as with the environment id but concerns the app version.
createdAt02/08/2019 12:37:59This should represent when the log entry has been created. This will help very much in tracking the progress of the application logic in all environment in case of troubleshooting. Preferable in UTC time.

As you can see with this base line details, we get a pretty good view on where things are happening, who is doing things and when. I can’t stress enough how important these details are!

General log entry

Well this is the base line log entry with an added message field and perhaps a title field. That’s it.

This is what you would need at a bare minimum to find out what is going on.

Access log

Access logs are a great way to keep track of your API requests and their response to a client. It’s a way to the server to keep records of all requests processed by the server. I won’t go deeper into them, there are plenty of detail descriptions available which I recommend going through, here is one:

https://httpd.apache.org/docs/2.4/logs.html#accesslog

https://en.wikipedia.org/wiki/Server_log

Here is some sample code:

public class AccessLog {
    private String clientIP;
    private String userId;
    private String timestamp;
    private String method;
    private String requestURL;
    private String protocol;
    private int statusCode;
    private int payloadSize;
    private String borwserAgent;
    private String requestId;
}
FieldSampleDescription
clientIP127.0.0.1The IP address of the client that made the request to you API.
userIdaa10318a-a9b7-4452-9616-0856a206da75Preferably this should be the same user id that was used in the LogData class above
timestamp02/08/2019 12:37:59A date time format of your choice when the request occured.
methodGET, POST, PUT etc.HTTP Method of the request.
requestURLhttps://localhost:9000/api/customer/infoThe URL of the request
protocolHTTP/1.1The protocol used to communicate with the API request.
statusCode200, 201, 401, 500 etc.HTTP status code of the request response.
payloadSize2345The size of the payload returned to the client.
borwserAgentMozilla/4.08 [en] (Win98; I ;Nav)“The User-Agent request header contains a characteristic string that allows the network protocol peers to identify the application type, operating system, software vendor or software version of the requesting software user agent.” – https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/User-Agent
requestIdThis should the the same request id used in the LogData class earlier.

Message Queue Log

This is related to a decoupling pattern between two or more entities. You push a message to a storage location and someone or something reads and processes it, this is a simplified description of course.

This is a sample log which you could use with events/message queues. Depending on what message queue you use and what kind of configurations, you would most likely have minimal information about the message pushed to a queue.

From a troubleshooting point of view and being able to track things I would recommend passing with the message additional metadata related to the message original situation.

Lets take as an example an API request. What I did was add an additional

This is a bit of complex thing to go into but the main focus here is that depending on what king of message queue or event queue technology and applications you use, you might not get a very detailed view on who, when and what happened.

An example: You have an API that a client application invokes, this request has to do an asynchronous save to a CRM, you have to make sure that this is completed and re-tried if things go bad. This is fine but what if things go bad and even after several attempts nothing has happened. A common practice is that the message is going to go to a dead letter queue, for troubleshooting and future processing.

Now to be able to find out what the problem was you need detailed information and by default messages in queues have little details. So I would recommend adding additional data to the message in a queue so that when the receiving end gets it you can log and associate that message to our previous API request. Then later when using analysis tools you can get a history of the events that has happened, for example using the requestId/correlationId.

public class MessageQueueLog {
    private String sourceHostname;
    private String sourceAppName;
    private String sourceAppVersion;
    private String sourceEnvironmentId;
    private String sourceRequestId;
    private String sourceUserId;
    private String message;
    private String messageType;
}
FieldSampleDescription
sourceHostnameLook at the LogData example earlier.
sourceAppNameLook at the LogData example earlier.
sourceAppVersionLook at the LogData example earlier.
sourceEnvironmentIdLook at the LogData example earlier.
sourceRequestIdLook at the LogData example earlier.
sourceUserIdLook at the LogData example earlier.
messageJSON data JSON data representing a serialized object that hold important data to be used the the receiving end.
messageTypeUPDATE_USER, DELETE_USERA simple unique static ID for the message. This ID will tell the receiving end what it needs to do with the data in the message field.
createdAt02/08/2019 12:37:59This should represent when the message queue entry was created. Preferable in UTC time.

Metrics log

With metrics logs the idea is to be able to track desired performance and successes in your application. A common thing that you might like to track would be how external request from your own code is performing. This will allow you set up alerts and troubleshoot problem with external sources, especially if combined with an access log you can see and a metrics log of how long you request totally took to finish.

But depending on what kind of tools you use you might get automatic But depending on what kind of tools you use, you might get automatic metrics for your application; like CPU usage, memory usage, data usage etc. Here I will focus on metrics logs you would produce manually from your application.

So you could track the following metrics:

  • External source like database, API, service etc.
  • You request total processing time from start to end to return a response
  • Some important section of your code
public class MetricsLog {

    private String title;
    private String body;
    private String additional;
    private String url;
    private int statusCode;
    private Double payloadSize;
    private Long receivedResponseAtMillis = 0L;
    private Long sentRequestAtMillis = 0L;
    private MetricsLogTypes logType;
    private double elapsedTimeInSeconds = 0;
    private double elapsedTimeInMS = 0;
    private String category;
}
FieldSampleDescription
titleUser Database
bodyUpdate user
additionalSome additional data
urlhttp://localhost:9200/api/car/typesIf this is a API request to an external service you should log the request URL.
statusCode200, 401, 500 etc.The HTTP status code returned by the external source.
payloadSize234567The size of the returned data.
receivedResponseAtMillis1575364455When the response was received, this could be in UNIX epoch time.
sentRequestAtMillis1575363455When the request was received, this could be in UNIX epoch time.
logTypeAPI, DATABASE, CODE etc.The HTTP status code returned by the external source or some other code that you wish to use.
elapsedTimeInSeconds1Calculate and write how long it took for the response to be received.
elapsedTimeInMS1000Calculate and write how long it took for the response to be received.
categoryCategory1/2/3 etc.This could be used to group different metrics together.

Security Logs

I would also consider creating a separate security log that would be logged and identified by the logging indexer to it’s own pattern or category etc.

This is to speed up troubleshooting related to security issues like when someone signs in, signs out, registers etc.

A security log provides tools to establish an audit trail. It allows you to record, track and investigate security related operations that happen in your system. This is a hard thing to do it right since you must have enough information to troubleshoot but keep secrets and sensitive information hidden.

Start by using default features of the technology you are using like Azure AD or Cognito and then go into manually logging security logs to complement them which you would do normally from your application.

For each recorded event, the record in the security log includes at least the following:

  • Date and time of event.
  • User identification including associated terminal, port, network address, or communication device etc.
  • Type of event.
  • Names of resources accessed.
  • Success or failure of the event.

For the security logging you can combine the General Security Logging with just a Title and Body. The bare minimum. The idea is to log an event that is related to a security issue and if possible separate it in it’s own index pattern/category.

Aggregated log entry

This is an example where you would have a main log class that will contain our desired log entry data and details for a system.

Possible use cases is when streaming to Cloudwatch or to perhaps Elasticsearch.

public class CloudLog {
    private LocalDateTime timeStamp;
    private String logger;
    private Map<String, Object> metadata;
    private String message;
    private String level;
}
FieldDescription
timeStampA timestamp when the log entry was created.
loggerThe logger entity name.
metadataA map full of key value pair, full of data which can be serialized into JSON for indexing.
messageThe main message to the log entry
levelSeverity level of the log entry, DEBUG, INFO, ERROR, etc.

3 thoughts on “Lessons learned from building microservices – Part 1: Logging

  1. Pingback: Lessons learned from building microservices – Part 2: Security – For the love of challenges :)

  2. Pingback: Lessons learned from building Microservices – Part 3: Patterns, Architecture, Implementation And Testing – For the love of challenges :)

  3. Pingback: Software Development fallacies every software developer should know – For the love of challenges :)

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.