Before talking about Grafana and what it is, we must understand what telemetry is. Telemetry is the automatic recording and transmission of data from a remote or inaccessible source and when the system is in a different location for monitoring and analysis.
We usually need to get information from parts of the infrastructure or parts of an application that we cannot normally access. For example, if we need to deploy a version in one of the already implemented implementations, we cannot debug it.
Therefore, the code is inaccessible in production, making it difficult to understand what part of the code is slow, or has problems.
Grafana is an open-source tool with Apache 2.0 license. It is used to visualize time series data. Collect a series of data and generate a graphic overview of the situation. Grafana uses data sources to take information and graph it.
It is difficult to monitor the system while they are working. So what we do with telemetry is put little snippets of code, for example, in our applications and websites to send data to a data source (a data source is where it stores telemetry data).
Therefore, by placing small snippets of code in our application and while the app or website is in the production environment and is processing, it sends small snippets of data to our source of data, how long does a command take to connect to the database or to save files to disk, how many milliseconds does it take to complete that operation, so the data is telemetry data and then we send it to a data source and then we can work on it to analyze it.
An example of telemetry data is the average time it takes to connect to a database over time. We can obtain this data to see if this is what slows down the application.
And if so, we make some improvements.
Another example would be, if we refactor the code and put it back into production, we can compare the data from before the code was refactored with the data after the code was refactored to see how we managed to improve performance.
Present: The challenge is that organizations today rely more and more on metric data as systems are now larger.
Normally, now the systems are built based on microservices architectures, the infrastructure happens to be in the cloud in the near future all the services are aiming for a solution in the cloud, with this type of metric data extraction the work of performance analysis or indicators of any kind.
And the second big challenge is that companies must collect telemetry data from different data sources and merge it to make sense.
Generally, if you look at the data, it won’t help you much unless you combine it with other data in other business parts.
For example, a series of calls received from customers on a web service. You should get these requests and then, for example, the number of hours you spend doing some kind of operation within your site or application. Then, look at them and say, “how can we improve the application to reduce the number of calls I receive from customers? How can we reduce the number of complaints?
Suppose some data is obtained from our code in the production system. If you want to see and measure that code within the app, with the following question: how many milliseconds will it take for a process to run? On the other hand, knowing if the infrastructure causes this slowness in the code, imagine that the application is on AWS and performance needs to be measured.
Is it the network? Do we need to scale? Do we monitor the infrastructure?
Clearly, you must merge data collected from the infrastructure with the data you get from your application, so, in that case, the metric data resides in two different data sources. One comes from your application and the one that provides you with an AWS.
Grafana is a tool that can be used in the cloud without installing it, or you can install it on your own servers.
The following data sources are officially supported:
- AWS CloudWatch
- Azure Monitor
- Google Stackdriver
- Microsoft SQL Server (MSSQL)
This tool displays the time series data (The time series data is a type of telemetry data that has information and a time attached to it in which the time series data must have a date or time attached), for example, 01/01/2020 23:21 am. Without these time data, the data cannot be visualized, since visualizing implies seeing data over time, obviously, and graphing them for later analysis. Grafana is really good at fetching data from different data sources and then putting it on a board so you can understand what’s going on above.
You can also define alerts, for example, if the number of requests to a webservice is excessive. If the number of exceptions in the code exceeds a certain number, you can define layers and generate alerts. Then alerts can be sent to a wide range of channels.
For example, sending email alerts.
Grafana allows you to query, visualize, alert on and understand your metrics no matter where they are stored. Create, explore, and share dashboards with your team and foster a data-driven culture:
Visualize: Fast and flexible client-side graphs with a multitude of options. Panel plugins for many different ways to visualize metrics and logs.
Dynamic Dashboards: Create dynamic & reusable dashboards with template variables that appear as dropdowns at the top of the dashboard.
Explore Metrics: Explore your data through ad-hoc queries and dynamic drilldown. Split view and compare different time ranges, queries and data sources side by side.
Explore Logs: Experience the magic of switching from metrics to logs with preserved label filters. Quickly search through all your logs or streaming them live.
Alerting: Visually define alert rules for your most important metrics. Grafana will continuously evaluate and send notifications to systems like Slack, PagerDuty, VictorOps, OpsGenie.
Mixed Data Sources: Mix different data sources in the same graph! You can specify a data source on a per-query basis. This works for even custom data sources.