PERFORMANCE FORECASTING FOR WEB SERVICES IN
CLOUD ENVIRONMENT

Clément Cassé

Résumé

Cloud Computing has changed how software is developed and deployed. Nowadays, Cloud applications are designed as rapidly evolving distributed systems that are hosted in third-party data centre and potentially scattered around the globe. This shift of paradigms also had a considerable impact on how software is monitored: Cloud application have been growing to reach the scale of hundreds of services, and state-of-the-art monitoring quickly faced scaling issues. In addition, monitoring tools also now have to address distributed systems failures, like partial failures, configuration inconsistencies, networking bottlenecks or even noisy neighbours. In this thesis we present an approach based on a new source of telemetry that has been growing in the realm of Cloud application monitoring. Indeed, by leveraging the recent OpenTelemetry standard, we present a system that converts “distributed tracing” data in a hierarchical property graph. With such a model, it becomes possible to highlight the actual topology of Cloud applications like the physical distribution of its workloads in multiple data centres. The goal of this model is to exhibit the behaviour of Cloud Providers to the developers maintaining and optimizing their application. Then, we present how this model can be used to solve some prominent distributed systems challenges: the detection of inefficient communications and the anticipation of hot points in a network of services. We tackle both of these problems with a graph-theory approach. Inefficient composition of services is detected with the computation of the Flow Hierarchy index. A Proof of Concept is presented based on a real OpenTelemetry instrumentation of a Zonal Kubernetes Cluster. In, a last part we address the concern of hot point detection in a network of services through the perspective of graph centrality analysis. This work is supported by a simulation program that has been instrumented with OpenTelemetry in order to emit tracing data. These traces have been converted in a hierarchical property graph and a study on the centrality algorithms allowed to identify choke points. Both of the approaches presented in this thesis comply with state-of-the-art Cloud application monitoring. They propose a new usage of Distributed Tracing not only for investigation and debugging but for automatic detection and reaction on a full system.

Le Cloud Computing a bouleversé la façon dont nous développons et déployons les logiciels. De nos jours, les applications Cloud sont conçues comme des systèmes distribués en permanente évolution, hébergés dans des data center, et potentiellement même dispersés dans le monde entier. Ce changement de paradigme a également eu un impact considérable sur la façon dont les logiciels sont monitorés : les applications Cloud peuvent se composer de plusieurs centaines de services, et les outils de monitoring ont rapidement rencontré des problèmes de passage à l’échelle. De plus, ces outils de monitoring doivent désormais également traiter les défaillances et les pannes inhérentes aux systèmes distribués, comme par exemple, les pannes partielles, les configurations incohérentes, les goulots d’étranglement ou même la vampirisation de ressources. Dans cette thèse, nous présentons une approche basée sur une nouvelle source de télémétrie qui s’est développée dans le domaine du monitoring des applications Cloud. En effet, en nous appuyant sur le récent standard OpenTelemetry, nous présentons un système qui convertit les données de “traces distribuées” en un graphe de propriétés hiérarchique. Grâce à ce modèle, il devient possible de mettre en évidence la topologie des applications, y compris sur plusieurs data-centers. L’objectif de ce modèle est donc d’exposer le comportement des fournisseurs de service Cloud aux développeurs qui maintiennent et optimisent leur application. Ensuite, nous présentons l’utilisation de ce modèle pour résoudre certains des défis majeurs des systèmes distribués : la détection des communications inefficaces entre les services ainsi que l’anticipation des goulots d’étranglement. Nous abordons ces deux problèmes avec une approche basée sur la théorie des graphes. La composition inefficace des services est détectée avec le calcul de l’indice de hiérarchie de flux. Une plateforme Proof-of-Concept représentant un cluster Kubernetes zonal pourvu d’une instrumentation OpenTelemetry est utilisée pour créer et détecter les compositions de services inefficaces. Dans une dernière partie, nous abordons la problématique de la détection des goulots d’étranglement dans un réseau de services au travers de l’analyse de centralité du graphe hiérarchique précédent. Ce travail s’appuie sur un programme de simulation qui a aussi été instrumenté avec Open- Telemetry afin d’émettre des données de traçage. Ces traces ont été converties en un graphe de propriétés hiérarchique et une étude sur les algorithmes de centralité a permis d’identifier les points d’étranglement. Les deux approches présentées dans cette thèse utilisent et exploitent l’état de l’art en matière de monitoring des applications Cloud. Elles proposent une nouvelle utilisation des données de “distributed tracing” pas uniquement pour l’investigation et le débogage, mais pour la détection et la réaction automatiques sur un système réel.

PERFORMANCE FORECASTING FOR WEB SERVICES IN CLOUD ENVIRONMENT

PRÉVISION DES PERFORMANCES DES SERVICES WEB EN ENVIRONNEMENT CLOUD

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager