Compass
Chinese Document
Abstract
Compass is a platform for diagnosing computing engines and schedulers in the big data ecosystem, aiming to improve
the efficiency of troubleshooting and reduce the complexity of problem tuning. It automatically collects logs and
metrics, and uses heuristic rules to identify problems and provide tuning advice. In addition, for logs, ChatGPT is
used to provide diagnostic suggestions. The logs are automatically aggregated into templates using the drain algorithm,
which can be used for manual intervention, etc., to improve the automation of diagnosis and optimization solutions.
Feature
- Non-invasive, in-time diagnosis, no need to modify the original platform code.
- Compatible with multiple version for different componts such Spark 2.4+、Flink 1.2+、Hadoop 2.4+, DolphinScheduler 2.x+, Airflow, etc.
- Supports diagnostics for kinds of scheduling job issues, such as failure, abnormal elapsed time, abnormal baseline, etc.
- Supports diagnostics for kinds of engine task issues, such as data skew, big table scan, memory waste, long tail task, etc.
- Supports diagnostics for capturing log exception and offers advise or solution.
- Supports ChatGPT to diagnose abnormal logs and provide solutions; uses the drain algorithm to aggregate templates, saving costs.
Feature Support
- ChatGPT
- Spark
- Flink
- Mapreduce
- Trino
- Spark Tez
- Airflow
- DolphinScheduler
- Azkaban
- Oozie
- Debezium (Synchronize Postgresql data to Postgresql)