Grafana

This is a follow-up post for my PV and Power Recording. Before I continue on the next step, it is necessary to make a side-step. This means, I’m going to write about the follow-up about something completely different that becomes vital in the blogging process later. This is a recap of some old notes that I tried to scrap tougher to write a post. It lingers for 1 ¹/2 years in my to-do list and I though it might be a good addition to the series to make sense as a whole.

Grafana, network statistics and TCP TIME_WAIT

I’ve started to collect statistical data with Grafana since 2018. At first it was just some data with no real implication and for fun. However, ever since I’m fascinated with data provided by Linux. The kernel provides a variety of information to work. In my mind I had the idea to record the traffic statistics of my router. For a long time this was not possible, but since I’ve installed OpenWrt on my router this has changed. With a shell required I can utilize data from the kernel.

Most of this data can be found within the /proc/net/dev file. Here is an example:

Inter-|   Receive                                                |  Transmit
 face |bytes    packets errs drop fifo frame compressed multicast|bytes    packets errs drop fifo colls carrier compressed
  tun0: 4468821252 36751158    0    0    0     0          0         0 2089659707 23613660    0 6768    0     0       0          0
    lo: 18444634266 83603776    0    0    0     0          0         0 18444634266 83603776    0    0    0     0       0          0
 ens10: 2314024641 23239927    0    0    0     0          0         0 6064828263 37434243    0    0    0     0       0          0
  eth0: 8557466066 29895469    0    0    0     0          0         0 7514729065 28464243    0    0    0     0       0          0

For a different project, I had installed Grafana. So, how do I get the data from my Router to Grafana?

bash all the way

My first solution was fairly primitive. You’re going to see how this continues to develop over the course of time. To start we will log into the router and run cat on the /proc/net/dev. This output will be truncated:

cat /proc/net/dev|grep wlp3s0 |awk '{print $2,$10}'

With that cat command you’re getting the amount of bytes there were received and transmitted by interface wlp3s0. Wrap it on your laptop in ssh, and we have the data!1 What you’re getting back are simple values counted in bytes in that moment in time. This is odd, because usually we handle network traffic in bit. Anyway, the next step is to format the given value.

Before I continue, I need to write that I do not use InfluxDB anymore. Instead, I’m using Graphite. Unlike InfluxDB it seemed to be simple in adding data into the storage. Also, I started to run it with the docker container. It provides all the necessary thing you need:

docker run -d \
 --name graphite \
 --restart=always \
 -p 80:80 \
 -p 2003-2004:2003-2004 \
 -p 2023-2024:2023-2024 \
 -p 8125:8125/udp \
 -p 8126:8126 \
 graphiteapp/graphite-statsd

And putting some data into the service we run:

echo "foo.bar 1 `date +%s`" | nc localhost 2003

To change the data we had from the /proc/net/dev to a fitting format we execute:

echo openwrt.network.wlp3s0.receive (cat /proc/net/dev | grep wlp3s0 |awk '{print $2}') (date +%s)
openwrt.network.wlp3s0.receive 10858659523 1618172800

Back when I did this the first time, I’ve used a script to go through the entry devices files and read all interfaces. I’ve lost that part, but for this post I think this is enough.

Downside to the bash

That approach had several disadvantages: each request takes fairly long because each connection would require a new SSH handshake. This could be worked around by having a persisting ssh tunnels, however, the strain it had onto the CPU is huge. It is for the reason that this is a OpenWrt router and the MIPS CPU does not have much processing power at hand.

Also, the result would be printed onto my local system and needed to be forward afterwards to the graphite service. It was not fault tolerant at all…

In the next post we’re going to see some other problems too.

So far,
akendo


  1. Please forgive me for this terrible hack. Back then in 2017 this seemed to be good idea. ↩︎