SLO-aware IAC Automation Framework for Dynamic Cloud Deployment.

Publication date

DOI

Document Type

Master Thesis

Collections

Open Access logo

License

CC-BY-NC-ND

Abstract

According to O’Reilly’s 2021 report [1], over 90% of companies worldwide utilize cloud computing, highlighting its critical role in the IT industry. To aid developers in managing cloud infrastructure, the paradigm of Infrastructure as Code (IaC) has emerged [2], allowing infrastructure to be defined and maintained through code. Designing infrastructure, however, requires extensive expertise, and in most business scenarios, it must adhere to certain constraints known as Service Level Objectives (SLOs). These SLOs impose limits on Service Level Indicators (SLIs) such as CPU usage, memory consumption, and uptime. This thesis explores two frameworks aimed at automating IaC creation while meeting defined SLOs, leveraging Large Language Models (LLMs) and statistical prediction methods. The first framework uses manually defined SLOs to guide the LLM in adjusting CPU and memory allocations, with the goal of achieving target performance while predicting potential SLO violations. The second framework uses statistical methods to derive SLOs from observed metrics, which are then used to iteratively refine the IaC through an LLM in pursuit of desired performance levels. Both frameworks were evaluated against a baseline. In no case did adjusting the infrastructure for SLO compliance result in performance matching that of the baseline. The first framework, which relies on manual SLO definitions, experienced several SLO violations after 3 LLM adjustments. Its best performance reached only 22% of the target throughput (131 RPS vs. 600 RPS). Conversely, the second framework, based on metric-driven SLOs, achieved up to 79% (476 rps vs. 600 rps) of the target throughput without violating any SLOs after three LLM-guided code adjustments. However, this improvement came at the cost of increased average response times and a significant rise in failed requests. Additionally, it was observed that prompt design greatly impacts the quality of the IaC output. When specific SLOs are provided for individual services, the LLM tends to overemphasize those services while neglecting others. What initially appears to be helpful information can quickly overwhelm the LLM and degrade output quality. Nevertheless, the findings suggest that with the insights gained, both frameworks can be further refined to yield improved results.

Keywords

Citation