Lizenzserver ist fertig
Dieser Commit ist enthalten in:
272
monitoring/README.md
Normale Datei
272
monitoring/README.md
Normale Datei
@@ -0,0 +1,272 @@
|
||||
# V2 Docker Monitoring Stack
|
||||
|
||||
## Übersicht
|
||||
|
||||
Die Monitoring-Lösung für V2 Docker basiert auf dem Prometheus-Stack und bietet umfassende Einblicke in die Performance und Gesundheit aller Services.
|
||||
|
||||
## Komponenten
|
||||
|
||||
### 1. **Prometheus** (Port 9090)
|
||||
- Zentrale Metrik-Sammlung
|
||||
- Konfigurierte Scrape-Jobs für alle Services
|
||||
- 30 Tage Datenaufbewahrung
|
||||
- Alert-Rules für kritische Ereignisse
|
||||
|
||||
### 2. **Grafana** (Port 3000)
|
||||
- Visualisierung der Metriken
|
||||
- Vorkonfigurierte Dashboards
|
||||
- Alerting-Integration
|
||||
- Standard-Login: admin/admin (beim ersten Login ändern)
|
||||
|
||||
### 3. **Alertmanager** (Port 9093)
|
||||
- Alert-Routing und -Gruppierung
|
||||
- Email-Benachrichtigungen
|
||||
- Webhook-Integration
|
||||
- Alert-Silencing und -Inhibition
|
||||
|
||||
### 4. **Exporters**
|
||||
- **PostgreSQL Exporter**: Datenbank-Metriken
|
||||
- **Redis Exporter**: Cache-Metriken
|
||||
- **Node Exporter**: System-Metriken
|
||||
- **Nginx Exporter**: Proxy-Metriken
|
||||
|
||||
## Installation
|
||||
|
||||
### 1. Monitoring-Stack starten
|
||||
|
||||
```bash
|
||||
cd monitoring
|
||||
docker-compose -f docker-compose.monitoring.yml up -d
|
||||
```
|
||||
|
||||
### 2. Services überprüfen
|
||||
|
||||
```bash
|
||||
docker-compose -f docker-compose.monitoring.yml ps
|
||||
```
|
||||
|
||||
### 3. Grafana-Zugang
|
||||
|
||||
1. Öffnen Sie https://monitoring.v2-docker.com (oder http://localhost:3000)
|
||||
2. Login mit admin/admin
|
||||
3. Neues Passwort setzen
|
||||
4. Dashboard "License Server Overview" öffnen
|
||||
|
||||
## Konfiguration
|
||||
|
||||
### Environment-Variablen
|
||||
|
||||
Erstellen Sie eine `.env` Datei im monitoring-Verzeichnis:
|
||||
|
||||
```env
|
||||
# Grafana
|
||||
GRAFANA_USER=admin
|
||||
GRAFANA_PASSWORD=secure-password
|
||||
|
||||
# PostgreSQL Connection
|
||||
POSTGRES_PASSWORD=your-postgres-password
|
||||
|
||||
# Alertmanager SMTP
|
||||
SMTP_USERNAME=alerts@yourdomain.com
|
||||
SMTP_PASSWORD=smtp-password
|
||||
|
||||
# Webhook URLs
|
||||
WEBHOOK_CRITICAL=https://your-webhook-url/critical
|
||||
WEBHOOK_SECURITY=https://your-webhook-url/security
|
||||
```
|
||||
|
||||
### Alert-Konfiguration
|
||||
|
||||
Alerts sind in `prometheus/rules/license-server-alerts.yml` definiert:
|
||||
|
||||
- **HighLicenseValidationErrorRate**: Fehlerrate > 5%
|
||||
- **PossibleLicenseAbuse**: Verdächtige Aktivitäten
|
||||
- **LicenseServerDown**: Service nicht erreichbar
|
||||
- **HighLicenseValidationLatency**: Antwortzeit > 500ms
|
||||
- **DatabaseConnectionPoolExhausted**: DB-Verbindungen > 90%
|
||||
|
||||
### Neue Alerts hinzufügen
|
||||
|
||||
1. Editieren Sie `prometheus/rules/license-server-alerts.yml`
|
||||
2. Fügen Sie neue Alert-Regel hinzu:
|
||||
|
||||
```yaml
|
||||
- alert: YourAlertName
|
||||
expr: your_prometheus_query > threshold
|
||||
for: 5m
|
||||
labels:
|
||||
severity: warning
|
||||
service: your-service
|
||||
annotations:
|
||||
summary: "Alert summary"
|
||||
description: "Detailed description"
|
||||
```
|
||||
|
||||
3. Prometheus neu laden:
|
||||
|
||||
```bash
|
||||
curl -X POST http://localhost:9090/-/reload
|
||||
```
|
||||
|
||||
## Dashboards
|
||||
|
||||
### License Server Overview
|
||||
|
||||
Zeigt wichtige Metriken:
|
||||
- Aktive Lizenzen
|
||||
- Validierungen pro Sekunde
|
||||
- Fehlerrate
|
||||
- Response Time Percentiles
|
||||
- Anomalie-Erkennung
|
||||
- Top 10 aktivste Lizenzen
|
||||
|
||||
### Neue Dashboards erstellen
|
||||
|
||||
1. In Grafana einloggen
|
||||
2. Create → Dashboard
|
||||
3. Panel hinzufügen
|
||||
4. Prometheus-Query eingeben
|
||||
5. Dashboard speichern
|
||||
6. Export als JSON für Backup
|
||||
|
||||
## Metriken
|
||||
|
||||
### License Server Metriken
|
||||
|
||||
- `license_validation_total`: Anzahl der Validierungen
|
||||
- `license_validation_duration_seconds`: Validierungs-Dauer
|
||||
- `active_licenses_total`: Aktive Lizenzen
|
||||
- `anomaly_detections_total`: Erkannte Anomalien
|
||||
|
||||
### System Metriken
|
||||
|
||||
- `node_cpu_seconds_total`: CPU-Auslastung
|
||||
- `node_memory_MemAvailable_bytes`: Verfügbarer Speicher
|
||||
- `node_filesystem_avail_bytes`: Verfügbarer Festplattenspeicher
|
||||
|
||||
### Datenbank Metriken
|
||||
|
||||
- `pg_stat_database_numbackends`: Aktive DB-Verbindungen
|
||||
- `pg_stat_database_tup_fetched`: Abgerufene Tupel
|
||||
- `pg_stat_database_conflicts`: Konflikte
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Prometheus erreicht Service nicht
|
||||
|
||||
1. Netzwerk überprüfen:
|
||||
```bash
|
||||
docker network inspect v2_internal_net
|
||||
```
|
||||
|
||||
2. Service-Discovery testen:
|
||||
```bash
|
||||
docker exec prometheus wget -O- http://license-server:8443/metrics
|
||||
```
|
||||
|
||||
### Keine Daten in Grafana
|
||||
|
||||
1. Datasource überprüfen:
|
||||
- Settings → Data Sources → Prometheus
|
||||
- Test Connection
|
||||
|
||||
2. Prometheus Targets checken:
|
||||
- http://localhost:9090/targets
|
||||
- Alle Targets sollten "UP" sein
|
||||
|
||||
### Alerts werden nicht gesendet
|
||||
|
||||
1. Alertmanager Logs prüfen:
|
||||
```bash
|
||||
docker logs alertmanager
|
||||
```
|
||||
|
||||
2. SMTP-Konfiguration verifizieren
|
||||
3. Webhook-URLs testen
|
||||
|
||||
## Wartung
|
||||
|
||||
### Backup
|
||||
|
||||
1. Prometheus-Daten:
|
||||
```bash
|
||||
docker exec prometheus tar czf /prometheus/backup.tar.gz /prometheus
|
||||
docker cp prometheus:/prometheus/backup.tar.gz ./backups/
|
||||
```
|
||||
|
||||
2. Grafana-Dashboards:
|
||||
- Export über UI als JSON
|
||||
- Speichern in `grafana/dashboards/`
|
||||
|
||||
### Updates
|
||||
|
||||
1. Images updaten:
|
||||
```bash
|
||||
docker-compose -f docker-compose.monitoring.yml pull
|
||||
docker-compose -f docker-compose.monitoring.yml up -d
|
||||
```
|
||||
|
||||
2. Konfiguration neu laden:
|
||||
```bash
|
||||
# Prometheus
|
||||
curl -X POST http://localhost:9090/-/reload
|
||||
|
||||
# Alertmanager
|
||||
curl -X POST http://localhost:9093/-/reload
|
||||
```
|
||||
|
||||
## Performance-Optimierung
|
||||
|
||||
### Retention anpassen
|
||||
|
||||
In `docker-compose.monitoring.yml`:
|
||||
```yaml
|
||||
command:
|
||||
- '--storage.tsdb.retention.time=15d' # Reduzieren für weniger Speicher
|
||||
```
|
||||
|
||||
### Scrape-Intervalle
|
||||
|
||||
In `prometheus/prometheus.yml`:
|
||||
```yaml
|
||||
global:
|
||||
scrape_interval: 30s # Erhöhen für weniger Last
|
||||
```
|
||||
|
||||
### Resource Limits
|
||||
|
||||
Passen Sie die Limits in `docker-compose.monitoring.yml` an Ihre Umgebung an.
|
||||
|
||||
## Sicherheit
|
||||
|
||||
1. **Grafana**: Ändern Sie das Standard-Passwort sofort
|
||||
2. **Prometheus**: Kein öffentlicher Zugriff (nur intern)
|
||||
3. **Alertmanager**: Webhook-URLs geheim halten
|
||||
4. **Exporters**: Nur im internen Netzwerk erreichbar
|
||||
|
||||
## Integration
|
||||
|
||||
### In CI/CD Pipeline
|
||||
|
||||
```bash
|
||||
# Deployment-Metriken senden
|
||||
curl -X POST http://prometheus-pushgateway:9091/metrics/job/deployment \
|
||||
-d 'deployment_status{version="1.2.3",environment="production"} 1'
|
||||
```
|
||||
|
||||
### Custom Metriken
|
||||
|
||||
In Ihrer Anwendung:
|
||||
```python
|
||||
from prometheus_client import Counter, Histogram
|
||||
|
||||
custom_metric = Counter('my_custom_total', 'Description')
|
||||
custom_metric.inc()
|
||||
```
|
||||
|
||||
## Support
|
||||
|
||||
Bei Problemen:
|
||||
1. Logs überprüfen: `docker-compose -f docker-compose.monitoring.yml logs [service]`
|
||||
2. Dokumentation: https://prometheus.io/docs/
|
||||
3. Grafana Docs: https://grafana.com/docs/
|
||||
In neuem Issue referenzieren
Einen Benutzer sperren