Professional & Personal Projects · Operations

Observability & Operations

Monitoring, telemetry, and feedback-driven systems for faster incident triage

Overview

Familiar with monitoring and telemetry patterns using Prometheus and Grafana, enabling feedback-driven systems and faster incident triage. This experience comes from both professional telecom operations at Huawei and personal exploration of modern observability stacks.

Prometheus Grafana SNMP Python Docker Unix/Linux

Impact

~20%
Fewer Critical Faults
40+
Runbooks Standardized

Observability Stack

📊

Prometheus Metrics Collection

Infrastructure and application metrics collection with custom exporters. Time-series data used for capacity planning and anomaly alerting.

📈

Grafana Dashboards

Real-time visualization dashboards for system health, service performance, and pipeline status. Custom panels for team-specific operational views.

🔔

Alert Management

Threshold-based and pattern-based alerting to catch degradation before it becomes an outage. Focus on actionable alerts, not noise.

📋

Runbook Automation

Standardized incident response procedures for telecom operations. Reduced diagnosis time and ensured consistent handling of recurring issues.

At Huawei Ltd (Sep 2022 – Sep 2023)

Managed MPLS/VPN network operations with proactive monitoring approaches. Developed standardized operating procedures and runbooks achieving ~20% fewer critical faults. Performed root cause analysis with SNMP polling and diagnostic scripts.

At KICS, UET (Oct 2018 – Apr 2019)

Early exposure to network automation support developed troubleshooting utilities and contributed to runbook standardization for the institute's networking infrastructure.

Lesson Learned

Working in telecom operations taught me that observability is not optional it is the foundation of reliable systems. You cannot improve what you cannot measure, and you cannot fix what you cannot see. This principle now guides how I build data pipelines and automation at Infineon.