Improving Cloud Service Resilience using Brownout-Aware Load-Balancing

Forskningsoutput: Kapitel i bok/rapport/Conference proceedingKonferenspaper i proceeding

Abstract

We focus on improving resilience of cloud services (e.g., e-commerce website), when correlated or cascading failures lead to computing capacity shortage. We study how to extend the classical cloud service architecture composed of a load-balancer and replicas with a recently proposed self-adaptive paradigm called brownout. Such services are able to reduce their capacity requirements by degrading user experience (e.g., disabling recommendations).
Combining resilience with the brownout paradigm is to date an open practical problem. The issue is to ensure that replica self-adaptivity would not confuse the load-balancing algorithm, overloading replicas that are already struggling with capacity shortage. For example, load-balancing strategies based on response times are not able to decide which replicas should be selected, since the response times are already controlled by the brownout paradigm.
In this paper we propose two novel brownout-aware load-balancing algorithms. To test their practical applicability, we extended the popular lighttpd web server and load-balancer, thus obtaining a production-ready implementation. Experimental evaluation shows that the approach enables cloud services to remain responsive despite cascading failures. Moreover, when compared to Shortest Queue First (SQF), believed to be near-optimal in the non-adaptive case, our algorithms improve user experience by 5%, with high statistical significance, while preserving response time predictability.

Detaljer

Författare
Enheter & grupper
Forskningsområden

Ämnesklassifikation (UKÄ) – OBLIGATORISK

  • Reglerteknik
Originalspråkengelska
Titel på värdpublikation[Host publication title missing]
FörlagIEEE - Institute of Electrical and Electronics Engineers Inc.
Sidor31-40
Antal sidor10
StatusPublished - 2014
PublikationskategoriForskning
Peer review utfördJa
Evenemang33rd IEEE International Symposium on Reliable Distributed Systems - Nara, Japan
Varaktighet: 2014 okt 7 → …

Konferens

Konferens33rd IEEE International Symposium on Reliable Distributed Systems
LandJapan
OrtNara
Period2014/10/07 → …

Relaterad forskningsoutput

Dürango, J., 2016 jun 14, Department of Automatic Control, Lund Institute of Technology, Lund University. 111 s.

Forskningsoutput: AvhandlingLicentiatavhandling

Visa alla (1)

Related projects

Alessandro Vittorio Papadopoulos, Anders Robertsson, Johan Åkesson, Karl-Erik Årzén, Martina Maggio, Manfred Dellkrantz, William Tärneberg, Zheng Li, Jonas Dürango & Maria Kihl

2013/01/012016/12/31

Projekt: Forskning

Visa alla (1)