Goals

Help user to understand the test behavior, and analyze the performance results from metrics generated by VSPERF and Alert-Management solution to send alerts that will be notified to VSPERF.

Tasks

Week

Activity

Week 1 - Week 3


  • Understanding Prometheus, Alert Manager, and Grafana
  •  Understanding Collectd, Collectd-Exporter, cAdvisor
  •  Deployment of Monitoring Stack (containers)

Week 4 - Week 6


  •  Creating, Configurting and testing Alerts - BM
  •  Creating, Configurting and testing Alerts - OS
  •  Alert Notification and Handling - BM
  •  Alert Notification and Handling - OS
  • HA Deployment of Monitoring Stack
Week 7 - Week 9
  • Automated deployment using Ansible
  • Enhance the Alert solution for K8S data
  • Custom Dashboards for metrics visualization
Week 10 - Week 12
  • Client side automated deployment using Ansible
  • Release Complete Monitoring Solution
  • Custom Analytics - Causation, Trend/Pattern

Deliverables

  • Client-Side Ansible Playbook:
    • Deploy and Configure agents (collectd)
  • Server-Side Ansible playbooks
    • Deploy K8S Cluster
    • Deploy and configure PAG stack
  • Alerting Configuration
  • Jupyter Notebooks
    • Metrics Analysis
  • Visualization and alert management in OPNFV airship.

Evaluation Criteria

  • 1st Evaluation (end of week 3):  Understanding Prometheus, Alert Manager, and Grafana, Understanding Collectd, Collectd-Exporter, cAdvisor and Deployment of Monitoring Stack

  • 2nd Evaluation (end of week 6):  Creating, Configurting and testing Alerts - BM & OS, , Alert Notification and Handling - BM & OS,  Starting with Create Alert Visualization

  • 3rd Evaluation (end of week 9):  HA Deployment of monitoring solution, Complete Create Alert Visualization  and Enhance the Alert solution for K8S data

  • Final Evaluation (end of week 12):  Custom analytics and complete Release of  Complete Monitoring Solution

Deliverables not Completed

  • Visualization and alert management in OPNFV airship (OS)
    • Unfortunately the OPNFV-Airship deployments were not stable and the LMA components of the Airship constantly crashed.
    • OPNFV-Airship team could not fix the issues.
    • Whenever the OPNFV-Airship team is ready, and their deployments stable, I would be more than willing to contribute.

Recommendation for Future Work

  • Complete the planned OPNFV - Airship  (openstack) monitoring part
  • Closed Loop Automation for the complete logs and metrics analysis system

Code examples

Results

  • HA - Setup for P.A.G stack 

  • Grafana Dashboard: OVS Stats (Avg. RX values Panel) 

          

  • Grafana Dashboard: Memory Panel

               

Insights Gained

  • Logging and Monitoring is really important
  • Management tools and functioning of an Open Source Org
  • Code and contributions are recognized and will be used by many
  • People are there to help you (smile)

Presentation Slides



  • No labels