dianping
CAT 作为服务端项目基础组件,提供了 Java, C/C++, Node.js, Python, Go 等多语言客户端,已经在美团点评的基础架构中间件框架(MVC框架,RPC框架,数据库框架,缓存框架等,消息队列,配置系统等)深度集成,为美团点评各业务线提供系统丰富的性能指标、健康状况、实时告警等。
fail2ban
Daemon to ban hosts that cause multiple authentication errors
bcicen
Top-like interface for container metrics
influxdata
Agent for collecting, processing, aggregating, and writing metrics, logs, and other arbitrary data.
VictoriaMetrics
VictoriaMetrics: fast, cost-effective monitoring solution and time series database
winsiderss
A free, powerful, multi-purpose tool that helps you monitor system resources, debug software and detect malware. Brought to you by Winsider Seminars & Solutions, Inc. @ https://windows-internals.com
sqshq
Tool for shell commands execution, visualization and alerting. Configured with a simple YAML file.
thanos-io
Highly available Prometheus setup with long term storage capabilities. A CNCF Incubating project.
pinpoint-apm
APM, (Application Performance Management) tool for large-scale distributed systems.
ClementTsang
Yet another cross-platform graphical process/system monitor.
dastergon
A curated list of Site Reliability and Production Engineering resources.
ccfos
Nightingale is to monitoring and alerting what Grafana is to visualization.
keephq
The open-source AIOps and alert management platform
grafana
Continuous Profiling Platform. Debug performance issues down to a single line of code
TwiN
Automated developer-oriented status page with alerting and incident support
giampaolo
Cross-platform lib for process and system monitoring in Python
Syllo
GPU & Accelerator process monitoring for AMD, Apple, Huawei, Intel, NVIDIA and Qualcomm
SigmaHQ
Main Sigma Rule Repository
healthchecks
Open-source cron job and background task monitoring service, written in Python & Django
prometheus-operator
Prometheus Operator creates/configures/manages Prometheus clusters atop Kubernetes
upgundecha
A curated collection of publicly available resources on how technology and tech-savvy organizations around the world practice Site Reliability Engineering (SRE)
hyperdxio
Resolve production issues, fast. An open source observability platform unifying session replays, logs, metrics, traces and errors powered by ClickHouse and OpenTelemetry.
highlight
highlight.io: The open source, full-stack monitoring platform. Error monitoring, session replay, logging, distributed tracing, and more.
openstatusHQ
🫖 Status page with uptime monitoring & API monitoring as code 🫖