Few thought about Logging

In this post, I would like to share my thinking about logging. I have more than 6 years of work experience as a software developer. My thinking about logging may not be 100 percent accurate because I do not have enough experience In my point of view, enough experience means 20 to 30 years of experience. However, I would like to express my thinking about logging.

I believe in rules. Without rules, a software team will not see the light at the end of the tunnel. In my experience, I do not like to take stress and do not like to work from home everyday. I have a family, I am not single person. I have to attend kid's birthday party. Rules prescribe common guidelines where everybody agrees on it without any doubt before writing any single line of code. Therefore, there will not be any question or confusion after the final release of the product and no more stress.

Why rules is required? Rules is nothing but requirements or common structures or common guidelines from the technical point of view. Each software developer in a team thinks different way. In this world, no two persons think in the same way, except supermen, they use telepathic power to exchange their complex thinking. In real team based environment, it is not possible. Few people may not agree with me because probably all thinking and initial coding are done by one person and rest of the team members just responsible for debugging and adding new functionalities on the top of the established foundation. In this case, the team does not see the importance of the rules.

Anyhow, log is an important record of the interaction between user and application. It can be used to store user activity (audit message) or to find the cause of problematic events or alert system administrator about potential security anomalies. It is management and team responsibility to define the logging requirements and pattern or style. What information should be logged and what information should not be logged. After releasing the software, to discuss about log is too late. It will be fruitless discussion because developer need to edit the log which may create more confusing log.

Now, I am going to give you two funny scenarios to understand why logging requirement is required. My dear readers, I prefer message driven humour and it is my attitude. If you do not like humour in the technical blog, I am sorry and please accept my apology.

Scenario of super team:
A team was formed with 5 donkeys and the super man. There was no logging requirement, no rules actually. Therefore, all donkeys created own logging requirements, logging statements. End result, the logging statement and structure were different, one donkey added few logs than other donkeys. The project released. Few days later the super man tried to execute the code and found that each donkeys logging structure, style, log level for particular messages were different. Super man became so clam as severe tornado is prepared to hit the coast of the sea. Super man dictated each and every log statement, style and structure and logging level and more to donkeys. Donkeys followed with fears because they have to make happy super man to stay in the job to pay their bills and support family, even though they execrate super man with passion. Next morning, monkeys came and said they did not understand the log after client reporting a bug. The monkeys do not like to read documentation and they think log is the way to understand business logic from A to Z. So they want details log. Monkeys were happy, super man was happy after butchering the code by writing the expressive log. After few days later customer reported to the company CEO they hate to use the software because the log statements exposed sensitive data and too much information to pin point the real issue.

Scenario of sailor team:
Now, think about the following scenario, a team is formed by 5 sailors and the captain of the ship. The captain discussed with sailors, management and support groups to define the rules which contains logging requirements for the upcoming project. The sailors knew which direction they had to sail the ship. Each sailor logging structure and style of the logging level were identical. The sailor felt as professional that they learned something and implemented it and their hard work produced fruit without any arguments and no monkeys are jumping on the bed.

Now, think about the above two fictitious examples, which team do you want to join?

I have a created anonymous poll to get idea from other software developers' thinking about the logging requirement. It is totally up-to you whether you want to participate or not.

Create your own user feedback survey

So what kind of information do we need for logging before writing the code? Here, I have discussed very heigh level points, it could be more detailed, depending on the project requirements:
    1. What should be logged?
    2. What should not be logged?
    3. What is the logging pattern?
    4. Will log be descriptive or short and crisp?
    5. Will Log be stored in the local machine?
    6. Will application execute in the distributed environment? If yes, then how logs will be aggregated from distributed system to central location later or on real time?
    7. How will log be visualized for support group from central location?

Few tips about logging:

1) Follow a pattern for log statement. For instance,
    [datetime] - [trackertId] - [oprationId] - [package structure] - [message]"

  • datetime is the current datetime. 
  • trackertId could be user id, login id or authorization id.
  • oprationId is the unique id for the particular event or request or operation. It could be particular operation name. A trackerId could be same for multiple operation/request but oprationId defines a unique id for the particular operation of the request. The oprationId will be very useful in multi-modules system to identify the specific request/operation of the application. For example, Machine A receives a request to update user profile information and perform it. After few minutes later, same user updates profile again and Machine B is performed the request, in this time. So, two machines contain the same operation log. After merging the log, we can easily say, that particular user worked with the profile module.
  • package structure provides the finer grain indication which programming file is responsible for the particular user action.
  • message should not contain infinite number of characters to help support group. It should be as minimal as possible. Such as,
          "Request has not been submitted successfully."
   
          It should not the following message:
          "Request has not been submitted successfully because user 
           does not have enough fund and the fund should greater than 
           1000 dollar. Please check database ID_FUND_INFO 
           table for more details."

A well educated support group will not need this kind of log to identify why request is failed. They should know the business environment and database table information.

2) Log should not provide steps of the logic, logic should be documented in the dedicated documentation.

Example of fictitious logging:

  public class RequestHandler{

     ....
      
     public void submit(){

        if(clientId!=null){
          LOG.DEBUG("Client Id is not null, 
                     so we can go the next step to check 
                     request parameter");
        }else{
          LOG.DEBUG("Client Id is null, 
                     so we cannot go the next step to check 
                     request parameter");
          return;
        }

        if(requestParameter!=null){
          LOG.DEBUG("requestParameter is not null, 
                     so we can query against the 
                     ID_FUNDING_INFO 
                     to verify user eligibility");    
        }else{ 
         LOG.DEBUG("requestParameter is null, 
                     so we cannot query against the 
                     ID_FUNDING_INFO 
                     to verify user eligibility");    
          return;
        }

        ....

     }
     ....

  }  

The above code describes the step of the logic in the log statements. It is not good at all.

I think, the above log could be written in the following way:

  public class RequestHandler{

     ....
      
     public void submit(){

        LOG.DEBUG("trackerId: 1 - operationId: RequestHandler - 
                            ClientId: 1000 - 
                            RequestParameter: "+a,b,c,d...);

        if(clientId==null || requestParameter == null){
           LOG.DEBUG("trackerId: 1 - operationId: 
                      RequestHandler - 
                      ClientId: 1000 - 
                      Request submission failed");    
           return;
        }

        ....

     }
     ....

  }  

3. Audit/Event log describes user activity, audit log is not diagnosis log. It should be treated differently.

4. You cannot log everything to diagnose the application behaviour, which is already mentioned in listing 1.

5. Distributed system must be crawled by log aggregator and merged the log and produce a single file. Therefore, support group can visualize it to diagnose the application easily. We may create own distributed system log crawler or we may use open source library. Such as:
  1) https://cwiki.apache.org/confluence/display/FLUME/Home
  2) http://wiki.apache.org/hadoop/Chukwa
  3) https://www.elastic.co/products/logstash
  4) https://kafka.apache.org/

6) Log can be single lined or specific format or structured format (i.e JSON). So, log aggregator can easily crawl and merge the log from multiple machines.

According to my point of view, logging requirements is necessary before writing single line of code. Rules reduce the stress level after releasing the software.

Comments

Popular posts from this blog

There is a process already using the admin port 4848 -- it probably is another instance of a GlassFish server; ERROR

How to Convert OutputStream to InputStream

How to compile and install GraphicsMagick (GM) in Linux