Setting up the ELK Stack on MALT

In the previous post we showed how to get started with the TICK stack using the MALT principles.  In this post we will look at another central principle of the MALT stack, which is the L.  The L in MALT stands for logging.  Logging is a critical need for any system.  Without logging it is hard to diagnose past events and understand key failures.  Logging, however, is not simply just application logs.  Logging also includes all types of hit logs and access logs.  Hit logs are essential to associating traffic to those key application log events.  The metadata and context provided by both aides in diagnosing key problems.

There are several key platforms for viewing, aggregating, and managing logs, other than the flat text files often created.  In this post we will talk about and show how to setup the popular open source Elatic Stack, more commonly referred to as the ELK stack.  The Elastic company provides a suite of products with the most popular being Elasticsearch and Kibana.  Elasticsearch, the E in the stack, is the main data store and query interface.  Log messages and the associated context are stored and indexed within Elasticsearch for fast retrieval and aggregation.  Kibana, the K in the stack, is the main visual interface.  Kibana uses Elasticsearch to conduct queries and then portrays those queries graphically.  This includes a pure log viewer as well as custom dashboards based on the data.  The last portion of the stack is the L  which is Logstash.  Elastic has lots of other similar products that help to control the flow of information to and from servers.

The TICK stack and ELK stack have very common paradigms.  Logstash and Telegraf are both collectors used to collect and filter information from servers and publish to varying stores.  Elasticsearch and InfluxDB are the data stores that provide the data in efficient manners.  Kapacitor (as well as Grafana) and Kibana are the visual tools used to aggregate and display the data.

There is another similar acronym in the logging community known as ELK(K).  This stack adds Kafka to the mix which is the main principle of MALT.  The remainder of this article will show how to setup an ELK(K) stack to collect various types of logs.

Step 1 : Setup Elasticsearch

The first step is getting an instance of Elasticsearch setup to use in collecting and aggregating our logs.  Elasticsearch acts as the underlying data store for log data.

Elastic provides a good set of Docker images to get started quickly.  For more information, see their excellent documentation.

mkdir elasticsearchdocker run -p 9200:9200 -v elasticsearch:/usr/share/elasticsearch/data elasticsearch:5.5.1

This will create an instance of Elasticsearch running locally. To verify use the base endpoint of http://localhost:9200/.

To setup Elasticsearch with Docker Compose use the following:

  elasticsearch:
    image: elasticsearch:5.5.1
    ports:
      - "9200:9200"
    volumes:

      - ./elasticsearch:/usr/share/elasticsearch/data

Note that this configuration creates a single Elasticsearch server. In production you will want to cluster at least 3 or more servers with master replication.

Step 2 : Setup Kafka

Rather than repeating everything from the previous post on setting up Kafka, reference Step 2 on Setting up the TICK Stack on MALT. The cluster configuration between metrics streaming and log streaming are exactly the same. In fact, it is recommended to just utilize a single cluster for both systems.

Setup 3 : Setup Logstash Publisher

The Logstash publisher will be used to consume logs off Kafka and push into Elasticsearch.  This allows all logs to be centralized.  The Logstash publisher is just one consumer.  We could also easily setup other publishers to push logs to longer term retention or archiving stores, such as S3.  At the end, I'll also demonstrate how to connect into Kafka to create a live tail of the application logs.

Once again, Elastic provides excellent Logstash Docker images and documentation.  Before we create the Docker container, we must create our configuration file for Logstash to use.  This will use a single input plugin for Kafka and a single output plugin for Elasticsearch.

input { 
    kafka { 
        bootstrap_servers => "kafka1:9092,kafka2:9092,kafka3:9092"
        topics_pattern => "logs_.+" 
        codec => "json"
    } 
filter { 
}  
output {
    stdout { codec => rubydebug }  # debugging 
    elasticsearch { 
        hosts    => [ 'elasticsearch' ]
        user     => 'elastic'
        password => 'changeme' 
    } 
}

With this logstash.conf file, we can start an example agent with Docker.

docker run -v `pwd`/logstash.conf:/config-dir/logstash.conf logstash:5.5.1 logstash -f /config-dir/logstash.conf

Note that this will fail since the logstash.conf file above references the Kafka hosts by their Docker compose names.  To resolve this, simply change those names to be the IP address of the docker container with ports 9092-9094 (the exposed ports on the localhost).

To run within Docker compose, use the following configuration:

  logstash_publisher:
    image: logstash:5.5.1
    volumes:
      - ./logstash-publisher.conf:/config-dir/logstash.conf
    command: "logstash -f /config-dir/logstash.conf"
    depends_on:
      - 'kafka1'
      - 'kafka2'
      - 'kafka3'
      - 'elasticsearch'

Step 4 : Setup Logstash Agent

This step will show one simple way of creating a Logstash agent that will stream logs into Kafka.  Note there are multiple ways of pushing data into Logstash.  This is a very rudimentary solution and I would encourage looking at better more in-depth options.  I will demonstrate these other solutions in a later post.  One of the more preferred ways is creating a base Docker image that contains a running Logstash server over TCP and having applications push directly to Logstash (ie: Logback for Java).  This allows the application to not require knowledge of Kafka and allows each application to maintain its own configuration. 

In this example I am going to create a simple agent that listens for input on a particular file and then pushes those lines to Kafka.

input { 
    file {
        path => "/var/log/my-app.log"
    }
filter {
    grok {
        match => [ "message", "%{TIMESTAMP_ISO8601:timestamp} %{GREEDYDATA:text}" ]
        add_field => [ "application", "my-app" ]
    }
    date {
        match => [ "timestamp" , "ISO8601" ]
    }
}
output { 
    stdout { codec => rubydebug }  # debugging
    kafka { 
        bootstrap_servers => "kafka1:9092,kafka2:9092,kafka3:9092"
        topic_id => "logs_%{application}" 
        message_key => "%{host}" 
        codec => "json" 
    } 
}

The grok filter is used to split the message into actual metadata in order to create structured JSON.  The goal is to only push structured data through Kafkaa so that the publishers can be simple and push that structure directly into indexed fields in Elasticsearch.  Further, by applying the grok filter within each application it allows the publisher to be free of complicated logic per application.

The Kafka output also contains the topic to push too.  If you recall from the previous section on the Logstash publisher, we used the expression logs_.+ in order to monitor any topic beginning with logs.  This is done so that each application uses a separate topic for logs to better separate and scale the cluster.  Further, the message_key uses the name of the host so that partitions can be used to scale the Kafka cluster effectively.  This also helps to ensure each host maintains its message order.  

Use the following to add this agent to the Docker compose file.

  logstash_consumer:
    image: logstash:5.5.1
    volumes:
      - ./logstash-consumer.conf:/config-dir/logstash.conf
      - ./my-app.log:/var/log/my-app.log
    command: "logstash -f /config-dir/logstash.conf"
    depends_on:
      - 'kafka1'
      - 'kafka2'
      - 'kafka3'

Step 5 : Setup Kibana

Now that we have logs being collected from our system and being routed through Kafka and into Elasticsearch, we can setup a Kibana dashboard to showcase the data.

Elastic provides an out of the box Docker image that may be used to easily create the Kibana instance.

docker run -p 5601:5601 kibana:5.5.1

This will create a Kibana instance running on port 5601 at http://localhost:5601.

To plug this into Docker compose use the following setup:

  kibana:
    image: kibana:5.5.1
    ports: 
      - 5601:5601
    depends_on:
      - 'elasticsearch'

Step 6 : Put It All Together

Now that we have all our required components in Docker Compose, simply run docker-compose up and the entire stack should start.  For more information, see the GitHub repository: https://github.com/malt-stack/elk-stack-with-docker

Once running, send a test message to the log file so that logstash sends data into Elasticsearch:

echo `date +%Y-%m-%dT%H:%M:%S%z` Testing 1 2 3 >> my-app.log

Now open your browser to http://localhost:5601

This will launch Kibana and open the Settings page to setup the indexes.

Use logstash-* as the index name and select timestamp as the time filter field.  Then click Create

Now click Discover to show the latest logs.  This should show the Testing 1 2 3 message from above.  Send some more messages to verify connectivity.

echo `date +%Y-%m-%dT%H:%M:%S%z` Application working >> my-app.log

Make sure to Refresh Kibana or set Auto-Refresh to on.  Now that data is flowing into Kibana, we can setup all sorts of dashboards, searches, etc.

As much as Kibana is a good tool for searching and creating dashboards, it's not a good live tail.  To create a simple live tail, use the Kafka Console Consumer tool.  Note that due to the network connectivity inside Docker, the easiest way to test this is to execute bash within one of the Kafka hosts:

docker exec -it elkstackonmalt_kafka1_1 bash

Now you can use the tool to listen for incoming message:

/opt/kafka/bin/kafka-console-consumer.sh --bootstrap-server kafka1:9092,kafka2:9092,kafka3:9092 --topic logs_my-app --from-beginning

Using the handy jq library you can utilize the structured JSON in the stream such as:

kafka-console-consumer --bootstrap-server kafka1:9092,kafka2:9092,kafka3:9092 --topic logs_my-app --from-beginning | jq '.["@timestamp"] + " " + .["text"]'

Hopefully this helps to get started on streaming logs through Kafka.  In a real production system, you'd want to also collect hit logs such as reading the Tomcat or Apache access logs or even load balancer logs via syslog or S3. In future posts, I'll show other ways of not only consuming logs from application but also other ways of publishing logs to other stores such as S3, Splunk, etc.  I'll also show how to tie Alerts into Elasticsearch using ElastAlert.

Comments

Popular posts from this blog

Setting up TICK Stack on Docker

Todo Application Demo