1
0
Fork 0

Introduction, Conclusion: improvements according to feedback from Max

master
Jack Henschel 2 years ago
parent 4514668cff
commit 84fac4d036
  1. 6
      include/01-introduction.md
  2. 31
      include/06-conclusion.md
  3. 15
      thesis.tex

@ -5,18 +5,18 @@
Cloud computing is a computing paradigm that allows users to access compute, storage and network resources on-demand over the Internet.
Resources can be allocated and released whenever needed, thus *elasticity* is one of the most prominent features of cloud computing.
Around the same time that this computing paradigm became widely used, also the *microservices* software architecture became popular.
In fact, these two trends are correlated, since cloud infrastructure facilities building distributed microservice architectures.
In fact, these two trends are correlated, since cloud infrastructure facilitates the development of distributed microservice architectures.
To take full advantage of the elasticity in cloud computing, *autoscaling* (sometimes also referred to as *adaptive scaling*) needs to be implemented.
It is a technique to automatically scale the application (and the services it is composed of) based on the current demand.
In general, *scaling* refers to acquiring and releasing resources while maintaining a certain application performance level, such as response time or throughput.
Scaling an application can be achieved in two ways: *horizontal scaling* and *vertical scaling*.
Horizontal scaling, also referred to as *scaling out*, refers to creating more instances of the same service.
The workload is then be distributed across all instances (*load balancing*), resulting in a lower workload per instance.
The workload is then distributed across all instances (*load balancing*), resulting in a lower workload per instance.
An example here is adjusting the number of web servers according to the amount of visitors of the website.
Vertical scaling, also referred to as *scaling up*, refers to giving more resources (compute, memory, network, storage) to a particular instance of the service.
By giving more resources to one or multiple instances, they are able to handle more workload.
An example for this is providing more memory resources for a database instance so that it can fit more data into memory, instead of loading it from disk.
An example for this is providing more memory resources to a database instance so that it can fit more data into memory, instead of having to load the data from disk.
Due to the generally higher cost of cloud infrastructure, it is vital to take advantage of its elasticity and implement autoscaling.
This autoscaling needs to provision and release cloud resources without human intervention.

@ -1,45 +1,45 @@
# Summary and Future Work {#conclusion}
This thesis tackled the question of how cloud-native applications can be effectively dimensioned and their performance can be assessed.
This thesis tackled the questions of how to effectively dimension cloud-native applications and how to assess their performance.
While modern cloud platforms allow developers to allocate arbitrary amount of resources, operating a production-grade service on Kubernetes requires deep insights into the performance behavior of the application \cite{ReadyRainViewSPEC_2016}.
Once this question has been answered, we move on to the challenge of scaling our application based on the current workload, i.e., *autoscaling*.
Once this question had been answered, we moved on to the challenge of scaling our application based on the current workload, i.e., *autoscaling*.
In Chapter \ref{background} we explored the background of Kubernetes and the concepts it is based on: cloud computing, containers and microservices.
We found that there is a significant correlation between the rise in popularity of these technologies, and that Kubernetes is the logical conclusion of this development.
Additionally, the chapter also gave a brief introduction to the Kubernetes concepts relevant for scaling.
Chapter \ref{autoscaling} gave an overview of the available literature on the subject of autoscaling applications in the cloud.
While there have been numerous articles and surveys about VM- and container-based autoscaling, only recently researchers started started investigated specifically Kubernetes.
While there have been numerous articles and surveys about VM- and container-based autoscaling, only recently researchers started investigating specifically Kubernetes.
Then, the chapter provided a deep-dive on the algorithms and technical architecture of publicly available autoscaling components for Kubernetes (HPA, VPA, CA, KEDA).
Finally, a survey of research proposals for novel Kubernetes autoscalers was conducted and the proposals were evaluated qualitatively.
From the research it seems clear that proactive autoscaling (i.e., scaling not only based on current load, but based on future predicted load) is beneficial for aggressive up- and down-scaling.
However, this leads to more complex algorithms (which require more time to train and potentially large amounts of data) as well as makes the system behavior more opaque to cluster operators.
The research makes it clear that proactive autoscaling (i.e., scaling not only based on current load, but based on future predicted load) is beneficial for aggressive scaling.
However, this leads to more complex algorithms (which require more time to train and potentially large amounts of data) as well as system behavior that is more opaque to cluster operators.
Thus, these two aspects need to be balanced.
No conclusion has been reached about whether a service should be scaled based on low-level (e.g., CPU and memory utilization) or high-level metrics (e.g., response time).
<!-- While several articles found that high-level metrics work much better, Rzadca et al.\ \cite{AutopilotWorkloadAutoscalingGoogle_2020} advise against directly optimizing application metrics based on their experience at Google. -->
Ultimately, the choice of scaling metrics depends on the development context and application usage scenario.
For this reason we outline the steps necessary to expose and identify metrics relevant for scaling an application running on Kubernetes in Chapter \ref{implementation}.
For this reason, we outline the steps necessary to expose and identify metrics relevant for scaling an application running on Kubernetes in Chapter \ref{implementation}.
Unfortunately, none of the reviewed articles has a publicly available implementation.
This is problematic because we cannot evaluate the technical soundness of the implementation or how well it is integrated into Kubernetes.
This is problematic because it prevents us from evaluating the technical soundness of the implementation and its integrated into Kubernetes.
In the end, not only the underlying algorithms are important when setting up a production-grade system, but also how the operators need to configure and interact with it.
We believe there is a gap between industry and academia here:
researchers want to test and evaluate their novel algorithms for autoscaling, without having to worry about the integration with other Kubernetes components, which can be a challenging tasks.
industry practitioners value reliability and interoperability: Kubernetes has been designed from the ground up with a strong focus on reuseable APIs, thus a new autoscaling component should also use as well as expose these APIs.
researchers want to test and evaluate their novel algorithms for autoscaling without having to worry about the integration with other Kubernetes components, which can be a challenging tasks.
Industry practitioners value reliability and interoperability: Kubernetes has been designed from the ground up with a strong focus on reuseable APIs, thus a new autoscaling component should also use -- as well as expose -- these APIs.
We propose a modular autoscaling component for Kubernetes that combines these two interests.
The autoscaling component should be implemented in Go (like all other Kubernetes components), fetch metrics from the Metrics API or an external monitoring system (e.g., Prometheus), and is configured with Kubernetes CRD objects.
The autoscaling component should be implemented in Go (like all other Kubernetes components), fetch metrics from the Metrics API or an external monitoring system (e.g., Prometheus), and be configured with Kubernetes CRD objects.
This will satisfy the requirements of cluster operators.
Then, the autoscaling component passes the metrics and configuration onto a WebAssembly sandbox, which runs the core autoscaling algorithm and returns scaling results.
Afterwards, the autoscaling component will perform the actions described by the results of the algorithm (e.g., increase the number of replicas to N).
WebAssembly^[<https://webassembly.org/>] is a portable binary instruction format which can be used as a compilation target of many programming languages.
WebAssembly^[<https://webassembly.org/>] is a portable binary instruction format which can be used as a compilation target for many programming languages.
Its main features are speed, memory safety and debuggability \cite{DifferentialFuzzingWebAssembly_2020}.
Running the core scaling algorithm inside a WebAssembly sandbox has several advantages for researchers:
* the algorithm can be implemented in any programming language (Python, JavaScript, Rust etc.) and then compiled to WebAssembly bytecode;
* the algorithm runs at near-native speed (faster than interpreted languages);
* the sandbox provides simple interfaces for input and output data (i.e., no need to interact with complex and evolving Kubernetes API).
* the sandbox provides simple interfaces for input and output data (i.e., no need to interact with the complex and evolving Kubernetes API).
Cluster operators gain the following advantages from the WebAssembly sandbox:
@ -53,13 +53,13 @@ Grafana was used as a visualization layer to get an intuition for the behavior a
The chapter then provided a reference point for which kinds of metrics should be collected by monitoring system to allow the operators to have a complete picture of the application behavior:
low-level metrics (e.g., CPU and memory usage), high-level metrics (e.g., response time), platform-level metrics (e.g., Kubernetes pods), service-level metrics (e.g., message queue status) and application-level metrics (e.g., number of users).
Finally, we described how to identify metrics relevant for scaling and how to configure Kubernetes autoscaling components (HPA, VPA, KEDA) based on these metrics.
While the implementation we have shown is a specific for the target application, the principles and methodologies can be applied to any cloud-native application.
While the implementation we have shown is specific to the target application, the principles and methodologies can be applied to any cloud-native application.
Since we provided detailed documentation about our setup, industry professionals and researchers are able to replicate similar setups in their own environments.
In Chapter \ref{evaluation} we performed a quantitative evaluation of various autoscaling policies.
Our findings showed that the target application is able to achieve maximum performance with the autoscaling policies, while having only minor variances in performance.
At the same time, we were able to realize significant cost-savings due to downscaling during times of low load in our benchmark.
The benchmark results are specific to our target application, however other researchers and professionals can reuse the same benchmarking procedures for any queue-based cloud application.
Despite the benchmark results being specific to our target application, other researchers and professionals can reuse the same benchmarking procedures for any queue-based cloud application.
Furthermore, the scaling optimization we have discussed (delayed scale-down, overscaling etc.) are applicable to any system leveraging autoscaling.
In particular, the criteria for evaluating the performance (time-to-completion) and cost (replica seconds) dimensions are valuable for anyone carrying out performance-and-cost optimizations with container-based infrastructure.
@ -72,3 +72,6 @@ Subsequently, such an architecture has the potential for even more elasticity.
Mohanty et.\ al.\ \cite{EvaluationOpenSourceServerless_2018} published a survey about open-source event-driven platforms, which could be used as a basis for this future work.
More recently, the SPEC research group conducted a general review of use-cases for serverless architectures \cite{ReviewServerlessUseCases_2020}.
<!-- For this comparison, the same performance and cost criteria should be used. -->
Overall, this thesis provided foundational and relevant knowledge on the topic of autoscaling for researchers and industry practitioners alike.
While not all software architectures and deployment models were discussed in our work (in particular stateful applications), the reader should have gained a good intuition for tackling the challenging task of dimensioning, optimizing and scaling their cloud-native applications.

@ -445,10 +445,11 @@ TODO TODO TODO TODO TODO TODO TODO TODO TODO
%%
\begin{abstractpage}[english]
Cloud computing and OS-level virtualization in the form of containers have seen major adoption over the last decade.
Due to this, several container orchestration platforms were developed and Kubernetes has gained the most majority of market share.
Application running on Kubernetes are often developed with a microservices architecture.
This poses significant challenges for the observability of application performance.
Thus, we investigate how such a cloud-native application can be monitored and dimensioned to ensure smooth operation.
Due to this, several container orchestration platforms were developed, with Kubernetes gaining a majority of the market share.
Applications running on Kubernetes are often developed with a microservices architecture.
This means that applications are split into loosely coupled services which can be distributed across many servers.
The distributed nature of this architecture poses significant challenges for the observability of application performance.
We investigate how such a cloud-native application can be monitored and dimensioned to ensure smooth operation.
Based on this fundamental work, we explore the topic of autoscaling for performance and cost optimization in Kubernetes.
Autoscaling refers to automatically adjusting the amount of allocated resources based on the application load.
@ -472,15 +473,15 @@ TODO
\mysection{Preface}
First of all, I want to thank the SECCLO consortium for giving me the opportunity to participate in this exciting degree programme and study at two excellent universities.
During the course of this programme I was able to explore two countries, meet new friends and just in general broaden my horizon.
A special thanks goes to \textbf{Eija Kujanpää}, \textbf{Laura Mursu}, \textbf{Anne Kiviharju} and \textbf{Gwenaëlle Le Stir} for their administrative support over this time.
During the course of this programme I was able to explore two countries, meet new friends and in general broaden my intellectual and cultural horizon.
Special thanks to \textbf{Eija Kujanpää}, \textbf{Laura Mursu}, \textbf{Anne Kiviharju} and \textbf{Gwenaëlle Le Stir} for their administrative support during this time.
I would like to thank \textbf{Mario Di Francesco}, my main academic supervisor at Aalto University, for his continuous, comprehensive and honest feedback about my thesis.
Also thanks to \textbf{Raja Appuswamy} for being my academic supervisor at EURECOM.
\textbf{Yacine Khettab}, my thesis instructor, gave helpful guidance while starting my work at Ericsson and proofread several drafts.
I am thankful for \textbf{Adam Peltoniemi}, my manager, who hired me for Ericsson and gave me enough time to work freely on my thesis.
I am grateful for the help of my colleagues at Ericsson, who provided inputs and guidance for setting up an test environment for experiments:
I am grateful for the help of my colleagues at Ericsson, who provided inputs and guidance for setting up a test environment for experiments:
\textbf{Gábor Kapitány}, \textbf{Olli Salonen}, \textbf{Tomi Poutanen}, \textbf{Bálint Csatári} and \textbf{Jussi Tuomela}.
Finally, I would like to thank my editor, GNU Emacs, for allowing me to seamlessly develop, write and edit all aspects of the work covered in this thesis.

Loading…
Cancel
Save