We collect NetFlow cheaply and angrily
: the author assembled the collector NetFlow /sFlow from GoFlow , Kafka , ClickHouse , Grafana and crutches on Go.
come to the rescue. NetFlow / IPFIX and sFlow which generate rich traffic information directly from network equipment. It remains to put it somewhere and somehow process it.
From the available NetFlow collectors, the following were considered:
flow-tools - did not like the storage in the files (long to make selections, especially operational ones in the process of reaction to the incident) or MySQL (having a table of billions of lines there seems a rather bleak idea);
Elasticsearch + Logstash + Kibana is a very resource-intensive bundle, up to 6 cores of the elderly 2.2 GHz CPU for receiving 5000 flows per second. However, Kibana allows you to stick any filters in the browser, which is valuable;
vflow - I did not like the output format (JSON, which can not be combined without the modification in the same Elasticsearch);
box solutions - did not like either a high price, or a small difference from the chosen one.
And the one chosen was described in presentation of Louis Poinsignon on RIPE 75 . The general scheme of a simple collector is as follows:
GoFlow parses NetFlow /sFlow packages and adds them to the local Kafka in the protobuf format. The self-written "shovel" goflow2ch removes messages from Kafka and moves them to Clickhouse bundles for greater performance. The scheme does not address the issue of high availability, but for each component there are either regular or more or less simple external ways of providing it.
The tests showed that the CPU costs for parsing and saving the same 5000 threads per second are about a quarter of the CPU core, and the disk space used is on average 11-14 bytes on a slightly truncated stream.
To display information, either a Web UI for ClickHouse called is used. Tabix , or plug-in for Grafana .
The advantages of the scheme:
the ability to ask arbitrary questions about the state of the network using the SQL dialect;
Undemanding resources and horizontal scalability. Old /slow processors and magnetic hard drives will do;
if necessary, a full-fledged data pipeline is being built for analyzing network events, including in real time with the help of Kafka Streams, Flink or analogues;
The possibility of changing storage to arbitrary minimum means.
The minuses are also quite decent:
To ask questions, you need to know SQL and its ClickHouse-dialect well, there are no finished reports and graphs;
a lot of new moving parts in the form of Kafka, Zookeeper and ClickHouse. The first two are in Java, which can cause rejection for religious reasons. Personally for me, this was not a problem, since all this was already used in the organization anyway;
have to write code. Either a "shovel" that shifts data from Kafka to ClickHouse, or an adapter for writing directly from GoFlow.
You should definitely adjust the rotation by the size of the data in Kafka and ClickHouse, and then check that it really works. In Kafka there is a limit on the size of the log file, and in the ClickHouse - the randomization on an arbitrary key. The new partition every hour and the removal of unnecessary partitions every 10 minutes work well for operational monitoring and are made by a script from just a few lines;
"Shovel" wins from using consumer groups , allowing to add more "shovels" for scaling and fault tolerance;
Kafka allows you to not lose data when a "shovel" or ClickHouse falls (for example, from a heavy request and /or incorrectly limited resources), but it's better to carefully configure the database, of course;
if you are building sFlow, remember that some switches switch the packet sampling frequency on the fly by default, and it is specified for each thread.
As a result, an instrument for monitoring the situation in the network, both in real time plus-minus and in the historical perspective, was obtained from the components with open source code and blue tape. Despite his knotting, he has already helped to reduce the time to solve several incidents at times.
Please do not ask for the source code for the "shovel". There's an atmosphere, including a secret. If you do not want to write the code yourself - there is roughly the same inexpensive boxed solution for ~ $ 100 /month for 10 gigabit traffic; he will be prompted either in the comments, or in my personal messages.
It may be interesting
Situs QQ Online
Situs QQ Online