Scaling telemetry monitoring with InfluxDB

User expectations for software applications keep rising. Nowadays, services are expected to be highly reliable and perform well 24/7. Any kind of downtime is going to result in frustrated users and hurt your business long-term.

A key component in improving reliability is monitoring your application. While setting up basic monitoring is easy, having the ability to scale monitoring efficiently as traffic to your service grows is a major challenge. You also want visibility into every important metric for your service and the ability to make the data you are collecting useful and actionable with the ability to query and analyze it efficiently in real time on demand.

In short, there’s a big difference between the problems you run into throwing together something for a side project or small scale system vs. deploying telemetry monitoring at scale in a production environment.

One team at Cisco experimented with InfluxDB to create an example of a scalable telemetry monitoring architecture that other companies with large-scale production environments could draw on, without having to start from scratch. This setup allowed Cisco to scale up its telemetry data ingestion to 3TB per day (or around 16GB per minute). At the core of this architecture is Cisco IOS-XR and InfluxDB.

Cisco telemetry monitoring architecture overview

There are three main components in Cisco’s telemetry architecture. The first part is the Cisco hardware running IOS-XR, which produces the telemetry data. The second part is the collector agent that takes in that data and then sending it to the final component for storage, which is accomplished with InfluxDB.

scaling telemetry 01 InfluxData

Cisco IOS-XR

IOS-XR is the operating system used by Cisco for its high-end, carrier-grade routers such as the CRS series, 12000 series, and ASR 9000 series network routers. Compared to other network operating systems, IOS-XR provides improved availability, better scalability for large hardware configurations, the ability to install upgrades or patches while the router remains in service, and numerous other features not available in competitors.

