When managing Kubernetes clusters on Amazon Elastic Kubernetes Service (EKS), efficiently handling log data is a crucial aspect of maintaining both cost-effectiveness and system clarity. With the vast amount of logging data generated by EKS containers, it becomes imperative to filter and manage these logs smartly to avoid excessive costs and noise. This blog post explores how to reduce Kubernetes logging costs and noise by customizing FluentBit outputs for EKS users sending logs to CloudWatch.

Introduction to EKS and Logging Challenges

Amazon Elastic Kubernetes Service (EKS) is a managed service that makes it easy to run Kubernetes on AWS without needing to install and operate your own Kubernetes control plane. While EKS simplifies many aspects of managing a Kubernetes environment, handling the volume of logs generated by containers can be a significant challenge. These logs are essential for monitoring and troubleshooting but can create unnecessary noise and incur high costs if not managed properly.

Understanding FluentBit in Kubernetes

FluentBit is an open-source log processor and forwarder, which allows for the collection of data like logs, metrics, and other telemetry from different sources, processes them, and sends them to multiple destinations. It's lightweight, fast, and known for its performance in logging data pipelines. In a Kubernetes context, FluentBit can be used to aggregate and forward logs to various destinations, including AWS CloudWatch.

The Cost of Unfiltered Logs

Unfiltered logs can quickly become a cost issue, especially when dealing with large-scale Kubernetes deployments. CloudWatch charges based on the volume of data ingested, stored, and transferred. Without proper log management strategies, you might end up paying for a lot of unnecessary log data, which adds no value to your monitoring or troubleshooting efforts.

Steps to Filter Logs in FluentBit for EKS

To optimize costs and reduce noise, it's essential to filter out unnecessary logs. Here's how you can set up FluentBit to filter logs before they are sent to CloudWatch:

  1. Install FluentBit: Ensure that FluentBit is installed and running in your EKS cluster. You can use the AWS for Fluent Bit container image, which is optimized for AWS.

  2. Configure Log Filtering: Modify the FluentBit ConfigMap to include filters. FluentBit offers several filtering options, such as excluding logs from certain namespaces, pods, or containers, or even based on specific log patterns.

    [FILTER]
        Name                kubernetes
        Match               kube.*
        Kube_URL            https://kubernetes.default.svc:443
        Kube_Tag_Prefix     kube.var.log.containers.
        Exclude             Namespace_Name=unwanted-namespace
    
  3. Test Your Configuration: After configuring the filters, test them to ensure they're working as expected. This step is crucial to avoid accidentally filtering out important logs.

  4. Monitor and Adjust: Continuously monitor the performance of your log filtering strategy. Over time, you might need to adjust the filters as the logging needs of your applications evolve.

Best Practices for Log Management

  • Prioritize Logs: Determine which logs are essential for your operations and which are not. Prioritize logging that adds value to your troubleshooting and monitoring efforts.

  • Regular Review: Regularly review your log filtering rules to ensure they align with your current needs.

  • Educate Your Team: Ensure that your team understands the importance of efficient logging practices and how they impact costs.

  • Leverage Log Analytics: Use log analytics tools to gain insights from your logs. Efficiently filtered logs can provide more value when analyzed correctly.

Conclusion

Effective log management in Kubernetes, particularly for EKS users, is essential for maintaining cost efficiency and operational clarity. By customizing FluentBit outputs to filter out unnecessary log data before it reaches CloudWatch, organizations can significantly reduce costs and minimize noise. This strategy not only optimizes resource usage but also ensures that the logs you do keep are more relevant and valuable for your operational needs.

Full code example

apiVersion: v1
kind: ConfigMap
metadata:
  name: fluent-bit-config
  namespace: amazon-cloudwatch
  labels:
    k8s-app: fluent-bit
data:
  fluent-bit.conf: |
    [SERVICE]
        Flush                     5
        Log_Level                 info
        Daemon                    off
        Parsers_File              parsers.conf
        HTTP_Server               ${HTTP_SERVER}
        HTTP_Listen               0.0.0.0
        HTTP_Port                 ${HTTP_PORT}
        storage.path              /var/fluent-bit/state/flb-storage/
        storage.sync              normal
        storage.checksum          off
        storage.backlog.mem_limit 5M

    @INCLUDE application-log.conf

  application-log.conf: |
    [INPUT]
        Name                tail
        Tag                 application.*
        Exclude_Path        /var/log/containers/cloudwatch-agent*, /var/log/containers/fluent-bit*, /var/log/containers/aws-node*, /var/log/containers/kube-proxy*
        Path                /var/log/containers/*
        Docker_Mode         On
        Docker_Mode_Flush   5
        Docker_Mode_Parser  container_firstline
        Parser              docker
        DB                  /var/fluent-bit/state/flb_container.db
        Mem_Buf_Limit       50MB
        Skip_Long_Lines     On
        Refresh_Interval    10
        Rotate_Wait         30
        storage.type        filesystem
        Read_from_Head      ${READ_FROM_HEAD}

    [FILTER]
        Name                kubernetes
        Match               application.*
        Kube_URL            https://kubernetes.default.svc:443
        Kube_Tag_Prefix     application.var.log.containers.
        Merge_Log           On
        Merge_Log_Key       log_processed
        K8S-Logging.Parser  On
        K8S-Logging.Exclude On
        Labels              Off
        Annotations         Off

    [OUTPUT]
        Name                cloudwatch_logs
        Match               application.*
        region              ${AWS_REGION}
        log_group_name      /aws/containerinsights/${CLUSTER_NAME}/application
        log_stream_prefix   ${HOST_NAME}-
        auto_create_group   true
        extra_user_agent    container-insights

  parsers.conf: |
    [PARSER]
        Name                docker
        Format              json
        Time_Key            time
        Time_Format         %Y-%m-%dT%H:%M:%S.%LZ

    [PARSER]
        Name                syslog
        Format              regex
        Regex               ^(?<time>[^ ]* {1,2}[^ ]* [^ ]*) (?<host>[^ ]*) (?<ident>[a-zA-Z0-9_\/\.\-]*)(?:\[(?<pid>[0-9]+)\])?(?:[^\:]*\:)? *(?<message>.*)$
        Time_Key            time
        Time_Format         %b %d %H:%M:%S

    [PARSER]
        Name                container_firstline
        Format              regex
        Regex               (?<log>(?<="log":")\S(?!\.).*?)(?<!\\)".*(?<stream>(?<="stream":").*?)".*(?<time>\d{4}-\d{1,2}-\d{1,2}T\d{2}:\d{2}:\d{2}\.\w*).*(?=})
        Time_Key            time
        Time_Format         %Y-%m-%dT%H:%M:%S.%LZ

    [PARSER]
        Name                cwagent_firstline
        Format              regex
        Regex               (?<log>(?<="log":")\d{4}[\/-]\d{1,2}[\/-]\d{1,2}[ T]\d{2}:\d{2}:\d{2}(?!\.).*?)(?<!\\)".*(?<stream>(?<="stream":").*?)".*(?<time>\d{4}-\d{1,2}-\d{1,2}T\d{2}:\d{2}:\d{2}\.\w*).*(?=})
        Time_Key            time
        Time_Format         %Y-%m-%dT%H:%M:%S.%LZ

Take note of the application.conf [INPUT] section. Notice the following line:

Path    /var/log/containers/*

This standard setup uses a wildcard pattern to match all container logs mounted inside the FluentBit agent at the /var/log/ directory. In the next section we can modify the Path field or the Exclude_Path property to filter containers for logging and exclude namespaces or pods.

Excluding containers

The easiest way to exclude everything and only include the pods you wish is to change the Path property in your input configuration to include a comma separated list of container patterns that you want to include in the logging pipeline. Here is an example:

application-log.conf: |
  [INPUT]
      Name                tail
      Tag                 application.*
      Exclude_Path        /var/log/containers/cloudwatch-agent*, /var/log/containers/fluent-bit*, /var/log/containers/aws-node*, /var/log/containers/kube-proxy*
      Path                /var/log/containers/my-container.log, /var/log/containers/my-other-container.log
      Docker_Mode         On
      Docker_Mode_Flush   5
      Docker_Mode_Parser  container_firstline
      Parser              docker
      DB                  /var/fluent-bit/state/flb_container.db
      Mem_Buf_Limit       50MB
      Skip_Long_Lines     On
      Refresh_Interval    10
      Rotate_Wait         30
      storage.type        filesystem
      Read_from_Head      ${READ_FROM_HEAD}

Note that the Path field now matches containers that we wish.