Real-Time Anomaly Detection Using Distributed Tracing in Microservice Cloud Applications

Mahsa Raeiszadeh, Amin Ebrahimzadeh, Ahsan Saleem, Roch Glitho, Johan Eker, Raquel Mini

Forskningsoutput: Kapitel i bok/rapport/Conference proceedingKonferenspaper i proceedingPeer review

223 Nedladdningar (Pure)

Sammanfattning

Distributed tracing plays a vital role in microservice infrastructure, and learning-based trace analysis has been utilized to detect anomalies within such systems. However, existing approaches for learning-based trace-based anomaly detection face certain limitations. Some assume that trace patterns can be learned solely from normal executions, while others depend on anomaly injection to generate labeled traces categorized as normal or anomalous. However, in practical scenarios, anomalies may also happen during the normal execution. Moreover, a wide variety of anomalies may occur in practice, which cannot be captured solely through anomaly injection. To address these issues, we propose a Trace-Driven Anomaly Detection (TDAD) approach based on a Span Causal Graph (SCG) representation, which trains a model using a Graph Neural Network (GNN) and Positive and Unlabeled (PU) learning. This technique allows the model parameters to be optimized by estimating the underlying data distribution. As a result, TDAD can be effectively trained using a small number of labeled anomalous traces along with a relatively large number of unlabeled traces. Our evaluation reveals that TDAD outperforms not only the existing unsupervised trace-based anomaly detection methods by 11.9% in terms of F1-score but also a supervised learning-based benchmark by 12x in terms of detection time.
Originalspråksvenska
Titel på värdpublikationProceeding of IEEE CloudNet 2023
StatusPublished - 2023 nov. 1
Externt publiceradJa

Ämnesklassifikation (UKÄ)

  • Reglerteknik

Citera det här