<img height="1" width="1" style="display:none;" alt="" src="https://px.ads.linkedin.com/collect/?pid=2180921&amp;fmt=gif">

Accuracy of Elasticsearch facets

Introduction

Elasticsearch and Kibana are wonderful tools but as all tools you need to know their limits. This article will try to explain how you must be careful when reading data and explain how to improve this situation by using an existing Elastisearch feature.

The Problem

All did start with the analysis of an SSH bruteforce attack coming from Vietnam. This attack was interesting because of the announced SSH client "PuTTY-Local: Mar 19 2005 07:19:17" which really looks like a correct PuTTY software version when most attack don't spoof their software version and reveal what they are using.

The Kibana dashboard was showing all information needed to get a good idea of attacks:

Screenshot from 2014-12-03 16:36:47

But when looking at less used and most used passwords, there was something really strange:

Screenshot from 2014-12-02 08:59:41

For example, webmaster is seen in the two panels with different values which is not logical.

By adding a filter on this value, the result was a bit surprising:

Screenshot from 2014-12-02 09:08:59

When looking at the detail of events, it was obvious this last result was correct. This SSH bruteforce has tried 10 different logins and has always used the same dictionary of 23 passwords.

To a solution

So the panels with top passwords and less seen passwords are displaying incorrect data in some circumstances. They have been setup in Kibana using the terms type.

This corresponds in Elasticsearch to a facets query. Here's is the content of the query with the filter removed for readability:

{
"facets": {
"terms": {
"terms": {
"field": "password.raw",
"size": 10,
"order": "count",
"exclude": []
},
}
}

So we have a simple request and it is not returning the correct data. The explanation of this problem can be found in Elasticsearch Issue #1305.

Adrien Grand is explaining that a algorithm returning possibly inaccurate values has been chosen to avoid a too memory intensive and network intensive search. The per-default algorithm is mainly wrong when they are more different values than searched values.

We can confirm that behavior in our case by asking for 30 values (on the 23 different passwords we have):

Screenshot from 2014-12-02 09:30:30

The result is correct this time.

If we continue reading Adrien Grand comment on the issue, we see that a shard_size parameter has been introduced to be able improve the algorithm accuracy.

So we can use this parameters to improve the accuracy of the queries. Patching this in Kibana is trivial:

diff --git a/src/vendor/elasticjs/elastic.js b/src/vendor/elasticjs/elastic.js
index ba9c8ee..8daa72a 100644
--- a/src/vendor/elasticjs/elastic.js
+++ b/src/vendor/elasticjs/elastic.js
@@ -3085,6 +3085,7 @@
}

facet[name].terms.size = facetSize;
+ facet[name].terms.shard_size = 10 * facetSize;
return this;
},

Here we just choose a far larger shard_size than the number of elements asked in the query. We could also have used the special value 0 (or Integer.MAX_VALUE) for shard_size to get perfect result. But in our test setup, Elasticsearch is failing to honor the request with this parameter. And furthermore, the result was already correct:

Screenshot from 2014-12-02 10:10:10

This patch has been proposed to Elasticsearch as PR 2106.

That was a small patch but this fixed our dashboard as the value in the terms panels are now correct:

Screenshot from 2014-12-03 16:44:57

Eric Leblond

Éric Leblond is the co-founder and chief technology officer (CTO) at Stamus Networks. He sits on the board of directors at Open Network Security Foundation (OISF). Éric has more than 15 years of experience as co-founder and technologist of cybersecurity software companies and is an active member of the security and open-source communities. He has worked on the development of Suricata – the open-source network threat detection engine – since 2009 and is part of the Netfilter Core team, responsible for the Linux kernel's firewall layer. Eric is a respected expert and speaker on all things network security. Éric resides in Escalles, France.

Schedule a Demo of Stamus Security Platform

REQUEST A DEMO

Related posts

SELKS 10: The Next Big Leap for Open-Source Network Security

Stamus Networks is pleased to announce the release and availability of SELKS 10, the newest version...

SELKS: 10 Years of Open-Source Network Defense

This month, we celebrate the 10th anniversary of SELKS, Stamus Networks’ open-source Suricata-based...