add files
This commit is contained in:
parent
15f0542a36
commit
e5962af62d
1648
TSM/Dokumentation.md
Normal file
1648
TSM/Dokumentation.md
Normal file
File diff suppressed because it is too large
Load diff
21
TSM/Licence
Normal file
21
TSM/Licence
Normal file
|
|
@ -0,0 +1,21 @@
|
|||
MIT License
|
||||
|
||||
Copyright (c) 2026 Marius Gielnik
|
||||
|
||||
Permission is hereby granted, free of charge, to any person obtaining a copy
|
||||
of this software and associated documentation files (the "Software"), to deal
|
||||
in the Software without restriction, including without limitation the rights
|
||||
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
||||
copies of the Software, and to permit persons to whom the Software is
|
||||
furnished to do so, subject to the following conditions:
|
||||
|
||||
The above copyright notice and this permission notice shall be included in all
|
||||
copies or substantial portions of the Software.
|
||||
|
||||
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
||||
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
||||
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
||||
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
||||
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
||||
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
||||
SOFTWARE.
|
||||
338
TSM/QUICKSTART.md
Normal file
338
TSM/QUICKSTART.md
Normal file
|
|
@ -0,0 +1,338 @@
|
|||
# TSM Backup Monitoring - Schnellstart-Anleitung
|
||||
|
||||
## 🚀 Quick Start (5 Minuten)
|
||||
|
||||
Diese Anleitung führt Sie durch die grundlegende Installation des TSM Backup Monitoring Plugins für CheckMK.
|
||||
|
||||
---
|
||||
|
||||
## Voraussetzungen
|
||||
|
||||
- ✅ CheckMK 2.3.0p40 oder höher
|
||||
- ✅ Python 3.6+ auf überwachten Hosts
|
||||
- ✅ TSM CSV-Export verfügbar
|
||||
- ✅ Root-Zugriff auf überwachte Hosts
|
||||
- ✅ Site-User-Zugriff auf CheckMK Server
|
||||
|
||||
---
|
||||
|
||||
## Schritt 1: CSV-Export einrichten (5 Min)
|
||||
|
||||
### Option A: NFS-Mount (empfohlen)
|
||||
|
||||
```bash
|
||||
# Auf dem überwachten Host als root
|
||||
mkdir -p /mnt/CMK_TSM
|
||||
echo "tsm-server:/exports/backup-stats /mnt/CMK_TSM nfs defaults,ro 0 0" >> /etc/fstab
|
||||
mount -a
|
||||
|
||||
# Test
|
||||
ls -lh /mnt/CMK_TSM/*.CSV
|
||||
```
|
||||
|
||||
### Option B: Rsync via Cron
|
||||
|
||||
```bash
|
||||
# Auf dem überwachten Host als root
|
||||
mkdir -p /mnt/CMK_TSM
|
||||
|
||||
# Crontab eintragen
|
||||
crontab -e
|
||||
# Füge hinzu:
|
||||
*/15 * * * * rsync -az tsm-server:/path/to/*.CSV /mnt/CMK_TSM/
|
||||
|
||||
# Manueller Test
|
||||
rsync -az tsm-server:/path/to/*.CSV /mnt/CMK_TSM/
|
||||
ls -lh /mnt/CMK_TSM/
|
||||
```
|
||||
|
||||
**CSV-Format verifizieren:**
|
||||
```bash
|
||||
head -n 2 /mnt/CMK_TSM/*.CSV
|
||||
# Erwartete Ausgabe:
|
||||
# 2026-01-12 08:00:00,FIELD,SERVER_MSSQL,DAILY_FULL,Completed
|
||||
# 2026-01-12 09:15:00,FIELD,DATABASE_HANA,HOURLY_LOG,Completed
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Schritt 2: Agent-Plugin installieren (2 Min)
|
||||
|
||||
```bash
|
||||
# Auf dem überwachten Host als root
|
||||
cd /usr/lib/check_mk_agent/plugins
|
||||
|
||||
# Plugin kopieren (anpassen an deinen Pfad)
|
||||
scp user@your-server:tsm_backups_agent.py ./tsm_backups
|
||||
|
||||
# Oder wget (wenn auf Webserver verfügbar)
|
||||
wget https://your-repo/tsm_backups_agent.py -O tsm_backups
|
||||
|
||||
# Ausführbar machen
|
||||
chmod +x tsm_backups
|
||||
|
||||
# Test
|
||||
./tsm_backups
|
||||
```
|
||||
|
||||
**Erwartete Ausgabe:**
|
||||
```
|
||||
<<<tsm_backups:sep(0)>>>
|
||||
{"SERVER_MSSQL": {"statuses": ["Completed"], "schedules": ["DAILY_FULL"], "last": 1736693420, "count": 1}}
|
||||
```
|
||||
|
||||
**❌ Fehler "Empty output"?**
|
||||
- Prüfe: Existiert `/mnt/CMK_TSM/`?
|
||||
- Prüfe: Sind CSV-Dateien vorhanden? (`ls /mnt/CMK_TSM/*.CSV`)
|
||||
- Prüfe: Python3 installiert? (`python3 --version`)
|
||||
|
||||
---
|
||||
|
||||
## Schritt 3: Check-Plugin installieren (3 Min)
|
||||
|
||||
```bash
|
||||
# Auf dem CheckMK Server als Site-User
|
||||
OM=/omd/sites/$(omd sites --bare | head -1)
|
||||
cd $OM
|
||||
|
||||
# Plugin-Verzeichnis erstellen
|
||||
mkdir -p local/lib/python3/cmk_addons/plugins/tsm/agent_based
|
||||
|
||||
# Plugin kopieren
|
||||
cp /path/to/tsm_backups.py local/lib/python3/cmk_addons/plugins/tsm/agent_based/
|
||||
|
||||
# Rechte setzen
|
||||
chmod 644 local/lib/python3/cmk_addons/plugins/tsm/agent_based/tsm_backups.py
|
||||
|
||||
# CheckMK neuladen
|
||||
cmk -R
|
||||
|
||||
# Test
|
||||
cmk -vv hostname | grep "TSM Backup"
|
||||
```
|
||||
|
||||
**Erwartete Ausgabe:**
|
||||
```
|
||||
[agent] Received agent data: <<<tsm_backups:sep(0)>>> ...
|
||||
TSM Backup SERVER_MSSQL: OK - Type=MSSQL (database), Level=FULL, Freq=daily, Status=Completed, Last=3h 15m, Jobs=1
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Schritt 4: Service Discovery (2 Min)
|
||||
|
||||
### Option A: WebUI
|
||||
|
||||
1. Gehe zu: `Setup > Hosts`
|
||||
2. Wähle deinen Host aus
|
||||
3. Klicke: `Service Discovery`
|
||||
4. Klicke: `Full Scan`
|
||||
5. Warte auf Ergebnisse
|
||||
6. Klicke: `Accept all` bei neuen Services
|
||||
7. Klicke: `Activate on selected sites`
|
||||
|
||||
### Option B: Command Line
|
||||
|
||||
```bash
|
||||
# Einzelner Host
|
||||
cmk -II hostname
|
||||
|
||||
# Alle Hosts
|
||||
cmk -II --all
|
||||
|
||||
# Änderungen aktivieren
|
||||
cmk -O
|
||||
```
|
||||
|
||||
**Erwartete Services:**
|
||||
```
|
||||
TSM Backup SERVER_MSSQL
|
||||
TSM Backup DATABASE_HANA
|
||||
TSM Backup FILESERVER_FILE
|
||||
...
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Schritt 5: Verifizierung (2 Min)
|
||||
|
||||
### 1. Services prüfen
|
||||
|
||||
```bash
|
||||
# Status aller TSM-Services
|
||||
cmk -n hostname | grep "TSM Backup"
|
||||
```
|
||||
|
||||
**Erwartetes Ergebnis:**
|
||||
```
|
||||
TSM Backup SERVER_MSSQL OK - Type=MSSQL (database), Level=FULL, Freq=daily, Status=Completed, Last=3h 15m, Jobs=1
|
||||
TSM Backup DATABASE_HANA OK - Type=HANA (database), Level=FULL, Freq=daily, Status=Completed, Last=5h 20m, Jobs=1
|
||||
```
|
||||
|
||||
### 2. Labels prüfen
|
||||
|
||||
WebUI: `Monitor > Services > <wähle TSM-Service> > Service labels`
|
||||
|
||||
**Erwartete Labels:**
|
||||
- `backup_type: mssql`
|
||||
- `backup_category: database`
|
||||
- `backup_system: tsm`
|
||||
- `frequency: daily`
|
||||
- `backup_level: full`
|
||||
- `error_handling: strict`
|
||||
|
||||
### 3. Metriken prüfen
|
||||
|
||||
WebUI: `Monitor > Services > <wähle TSM-Service> > Service Metrics`
|
||||
|
||||
**Erwartete Metriken:**
|
||||
- `backup_age`: [Sekunden seit letztem Backup]
|
||||
- `backup_jobs`: [Anzahl Jobs]
|
||||
|
||||
---
|
||||
|
||||
## Troubleshooting Quick-Fixes
|
||||
|
||||
### Problem: Keine Services gefunden
|
||||
|
||||
```bash
|
||||
# 1. Agent-Output prüfen
|
||||
check_mk_agent | grep -A 5 "<<<tsm_backups"
|
||||
|
||||
# 2. CSV-Dateien prüfen
|
||||
ls -lh /mnt/CMK_TSM/*.CSV
|
||||
head /mnt/CMK_TSM/*.CSV
|
||||
|
||||
# 3. Plugin manuell testen
|
||||
/usr/lib/check_mk_agent/plugins/tsm_backups
|
||||
|
||||
# 4. Discovery neu durchführen
|
||||
cmk -II hostname
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Problem: Services bleiben UNKNOWN
|
||||
|
||||
```bash
|
||||
# 1. Check manuell ausführen
|
||||
cmk -nv --debug hostname | grep "TSM Backup"
|
||||
|
||||
# 2. Plugin-Cache löschen
|
||||
cmk -R
|
||||
|
||||
# 3. Discovery wiederholen
|
||||
cmk -II hostname
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Problem: Falsche Backup-Typen
|
||||
|
||||
**Node-Namen-Schema überprüfen:**
|
||||
|
||||
✅ **Korrekt:**
|
||||
```
|
||||
SERVER_MSSQL
|
||||
DATABASE_HANA_01
|
||||
FILESERVER_FILE
|
||||
VM_HYPERV_123
|
||||
```
|
||||
|
||||
❌ **Falsch:**
|
||||
```
|
||||
MSSQL # Zu kurz
|
||||
SERVER # Kein Typ erkennbar
|
||||
SERVER_12345 # Typ unklar (nur Zahl)
|
||||
```
|
||||
|
||||
**Lösung:** Node-Namen in TSM anpassen oder Plugin erweitern
|
||||
|
||||
---
|
||||
|
||||
## Nächste Schritte
|
||||
|
||||
Nach erfolgreicher Installation:
|
||||
|
||||
### 1. Custom Views erstellen
|
||||
|
||||
Erstelle eine View für alle Datenbank-Backups:
|
||||
```
|
||||
Setup > General > Custom views > Create new view
|
||||
|
||||
Filter: Service labels: backup_category = database
|
||||
```
|
||||
|
||||
### 2. Benachrichtigungen konfigurieren
|
||||
|
||||
Erstelle Notification Rule für kritische Backup-Fehler:
|
||||
```
|
||||
Setup > Notifications > Add rule
|
||||
|
||||
Conditions:
|
||||
- Service state: CRIT
|
||||
- Service labels: error_handling = strict
|
||||
|
||||
Contact: dba-team
|
||||
```
|
||||
|
||||
### 3. Schwellwerte anpassen (optional)
|
||||
|
||||
Wenn Standard-Schwellwerte nicht passen:
|
||||
```bash
|
||||
# Plugin bearbeiten
|
||||
vim $OM/local/lib/python3/cmk_addons/plugins/tsm/agent_based/tsm_backups.py
|
||||
|
||||
# Suche nach THRESHOLDS = {
|
||||
# Passe Werte an, z.B.:
|
||||
# "mssql": {"warn": 30 * 3600, "crit": 50 * 3600},
|
||||
|
||||
# CheckMK neuladen
|
||||
cmk -R
|
||||
cmk -II --all
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Hilfe & Support
|
||||
|
||||
- 📖 **README.md:** Ausführliche Feature-Übersicht und Verwendung
|
||||
- 📚 **DOCUMENTATION.md:** Technische Details, API-Referenz, erweiterte Konfiguration
|
||||
- 🐛 **Issues:** GitHub Issues oder CheckMK Forum
|
||||
- 📧 **Kontakt:** [deine-email]@example.com
|
||||
|
||||
---
|
||||
|
||||
## Cheat Sheet
|
||||
|
||||
```bash
|
||||
# === Agent-Seite (überwachter Host) ===
|
||||
# Plugin testen
|
||||
/usr/lib/check_mk_agent/plugins/tsm_backups
|
||||
|
||||
# CSV-Dateien prüfen
|
||||
ls -lh /mnt/CMK_TSM/*.CSV
|
||||
|
||||
# === CheckMK-Seite (Server) ===
|
||||
# Plugin neu laden
|
||||
cmk -R
|
||||
|
||||
# Service Discovery
|
||||
cmk -II hostname
|
||||
|
||||
# Service-Status prüfen
|
||||
cmk -n hostname | grep "TSM Backup"
|
||||
|
||||
# Debug-Modus
|
||||
cmk -vv --debug hostname | grep -A 20 "TSM Backup"
|
||||
|
||||
# Alle TSM-Services aktivieren
|
||||
cmk -O
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
**Installations-Dauer:** ~15 Minuten
|
||||
**Schwierigkeit:** Mittel
|
||||
**Letzte Aktualisierung:** 2026-01-12
|
||||
**Version:** 4.1
|
||||
533
TSM/README.md
Normal file
533
TSM/README.md
Normal file
|
|
@ -0,0 +1,533 @@
|
|||
# TSM Backup Monitoring für CheckMK
|
||||
|
||||
Ein vollständiges CheckMK-Plugin zur Überwachung von IBM Spectrum Protect (TSM) Backups mit intelligenter Backup-Typ-Erkennung, umfassenden Labels und flexiblen Schwellwerten.
|
||||
|
||||
## 📋 Inhaltsverzeichnis
|
||||
|
||||
- [Features](#features)
|
||||
- [Anforderungen](#anforderungen)
|
||||
- [Installation](#installation)
|
||||
- [Konfiguration](#konfiguration)
|
||||
- [Verwendung](#verwendung)
|
||||
- [Architektur](#architektur)
|
||||
- [Troubleshooting](#troubleshooting)
|
||||
- [Changelog](#changelog)
|
||||
|
||||
---
|
||||
|
||||
## ✨ Features
|
||||
|
||||
### Kernfunktionen
|
||||
- **Umfassende Backup-Typ-Erkennung**: Unterstützung für 18+ Backup-Typen
|
||||
- **Flexible Labels**: Umfassende Service-Labels für erweiterte Filterung und Reporting
|
||||
- **Typ-spezifische Schwellwerte**: Individuelle Warn- und Kritisch-Schwellen für verschiedene Backup-Typen
|
||||
- **Redundanz-Support**: Normalisierung von RRZ*/NFRZ* Node-Namen
|
||||
- **Aggregierte Überwachung**: Pro-Node-Aggregierung mehrerer Backup-Jobs
|
||||
- **Intelligentes Error-Handling**: Tolerantes vs. striktes Verhalten je nach Backup-Typ
|
||||
|
||||
### Unterstützte Backup-Typen
|
||||
|
||||
#### Datenbanken (strikte Überwachung)
|
||||
- MSSQL (26h/48h)
|
||||
- SAP HANA (26h/48h)
|
||||
- Oracle (26h/48h)
|
||||
- DB2 (26h/48h)
|
||||
- MySQL/MariaDB (26h/48h)
|
||||
- PostgreSQL (26h/48h)
|
||||
- Sybase (26h/48h)
|
||||
- MongoDB (26h/48h)
|
||||
|
||||
#### Dateisysteme (tolerante Überwachung)
|
||||
- FILE (36h/72h)
|
||||
- SCALE (36h/72h)
|
||||
- DM (36h/72h)
|
||||
- Datacenter (36h/72h)
|
||||
|
||||
#### Virtualisierung (tolerante Überwachung)
|
||||
- Virtual/VMware (36h/72h)
|
||||
- Hyper-V (36h/72h)
|
||||
|
||||
#### Applikationen
|
||||
- Mail/Exchange (26h/48h)
|
||||
- Transaction Logs (4h/8h)
|
||||
|
||||
---
|
||||
|
||||
## 🔧 Anforderungen
|
||||
|
||||
### CheckMK Server
|
||||
- CheckMK Version: **2.3.0p40** oder höher
|
||||
- Python 3.8+
|
||||
- Zugriff auf `/omd/sites/<site>/local/lib/python3/cmk_addons/`
|
||||
|
||||
### Überwachte Hosts
|
||||
- Python 3.6+
|
||||
- Lesezugriff auf TSM CSV-Export-Verzeichnis
|
||||
- CheckMK Agent installiert
|
||||
|
||||
### TSM Server
|
||||
- CSV-Export der Backup-Statistiken
|
||||
- Export-Format: `YYYY-MM-DD HH:MM:SS,<field>,NODE_NAME,SCHEDULE,STATUS`
|
||||
|
||||
---
|
||||
|
||||
## 📦 Installation
|
||||
|
||||
### Schritt 1: Agent-Plugin installieren
|
||||
|
||||
Das Agent-Plugin muss auf **jedem Host** installiert werden, der TSM-Backups überwachen soll.
|
||||
|
||||
```bash
|
||||
# Als root auf dem überwachten Host
|
||||
cd /usr/lib/check_mk_agent/plugins
|
||||
|
||||
# Plugin herunterladen oder kopieren
|
||||
wget https://your-repo/tsm_backups_agent.py -O tsm_backups
|
||||
# ODER
|
||||
scp user@server:tsm_backups_agent.py /usr/lib/check_mk_agent/plugins/tsm_backups
|
||||
|
||||
# Ausführbar machen
|
||||
chmod +x /usr/lib/check_mk_agent/plugins/tsm_backups
|
||||
|
||||
# Test
|
||||
./tsm_backups
|
||||
```
|
||||
|
||||
**Erwartete Ausgabe:**
|
||||
```
|
||||
<<<tsm_backups:sep(0)>>>
|
||||
{"SERVER_MSSQL": {"statuses": ["Completed"], "schedules": ["DAILY_FULL"], "last": 1736693420, "count": 1}, ...}
|
||||
```
|
||||
|
||||
### Schritt 2: CSV-Verzeichnis vorbereiten
|
||||
|
||||
```bash
|
||||
# CSV-Verzeichnis erstellen
|
||||
mkdir -p /mnt/CMK_TSM
|
||||
chmod 755 /mnt/CMK_TSM
|
||||
|
||||
# TSM-CSV-Dateien bereitstellen
|
||||
# Option A: NFS-Mount vom TSM Server
|
||||
mount -t nfs tsm-server:/exports/backup-stats /mnt/CMK_TSM
|
||||
|
||||
# Option B: Regelmäßiger SCP/Rsync
|
||||
# Crontab-Eintrag:
|
||||
*/15 * * * * rsync -az tsm-server:/path/to/*.CSV /mnt/CMK_TSM/
|
||||
```
|
||||
|
||||
**Erwartete CSV-Struktur:**
|
||||
```
|
||||
/mnt/CMK_TSM/
|
||||
├── TSM_BACKUP_SCHED_24H.CSV
|
||||
├── TSM_DB_SCHED_24H.CSV
|
||||
└── TSM_FILE_SCHED_24H.CSV
|
||||
```
|
||||
|
||||
### Schritt 3: Check-Plugin installieren
|
||||
|
||||
Das Check-Plugin wird auf dem **CheckMK Server** installiert.
|
||||
|
||||
```bash
|
||||
# Als Site-User
|
||||
OM=/omd/sites/monitoring
|
||||
cd $OM
|
||||
|
||||
# Plugin-Verzeichnis erstellen
|
||||
mkdir -p local/lib/python3/cmk_addons/plugins/tsm/agent_based
|
||||
|
||||
# Plugin kopieren
|
||||
cp tsm_backups.py local/lib/python3/cmk_addons/plugins/tsm/agent_based/
|
||||
|
||||
# Rechte setzen
|
||||
chmod 644 local/lib/python3/cmk_addons/plugins/tsm/agent_based/tsm_backups.py
|
||||
|
||||
# CheckMK Cache leeren
|
||||
cmk -R
|
||||
```
|
||||
|
||||
### Schritt 4: Service Discovery
|
||||
|
||||
```bash
|
||||
# Service Discovery für einen Host
|
||||
cmk -II hostname
|
||||
|
||||
# Bulk Discovery für alle Hosts
|
||||
cmk -II --all
|
||||
|
||||
# WebUI: Setup > Hosts > <Host> > Service Discovery > Full Scan
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## ⚙️ Konfiguration
|
||||
|
||||
### Schwellwerte anpassen
|
||||
|
||||
Die Schwellwerte können direkt im Check-Plugin angepasst werden:
|
||||
|
||||
**Datei:** `local/lib/python3/cmk_addons/plugins/tsm/agent_based/tsm_backups.py`
|
||||
|
||||
```python
|
||||
THRESHOLDS = {
|
||||
"log": {"warn": 4 * 3600, "crit": 8 * 3600}, # 4h/8h
|
||||
"mssql": {"warn": 26 * 3600, "crit": 48 * 3600}, # 26h/48h
|
||||
# ... weitere Typen ...
|
||||
"default": {"warn": 26 * 3600, "crit": 48 * 3600},
|
||||
}
|
||||
```
|
||||
|
||||
### Neue Backup-Typen hinzufügen
|
||||
|
||||
**Szenario:** Ein neuer Typ "SAPASE" soll unterstützt werden.
|
||||
|
||||
**Schritt 1: Typ zur known_types Liste hinzufügen**
|
||||
|
||||
```python
|
||||
# In extract_backup_type() Funktion
|
||||
known_types = [
|
||||
'MSSQL', 'HANA', 'FILE', 'ORACLE', 'DB2', 'SCALE', 'DM',
|
||||
'DATACENTER', 'VIRTUAL', 'MAIL', 'MYSQL', 'POSTGRES',
|
||||
'MARIADB', 'EXCHANGE', 'VMWARE', 'HYPERV', 'SYBASE', 'MONGODB',
|
||||
'SAPASE', # NEU
|
||||
]
|
||||
```
|
||||
|
||||
**Schritt 2: Schwellwerte definieren (optional)**
|
||||
|
||||
```python
|
||||
THRESHOLDS = {
|
||||
# ... bestehende ...
|
||||
"sapase": {"warn": 26 * 3600, "crit": 48 * 3600},
|
||||
}
|
||||
```
|
||||
|
||||
**Schritt 3: Kategorie zuweisen (optional)**
|
||||
|
||||
```python
|
||||
DATABASE_TYPES = {
|
||||
'mssql', 'hana', 'db2', 'oracle', 'mysql',
|
||||
'postgres', 'mariadb', 'sybase', 'mongodb',
|
||||
'sapase', # NEU
|
||||
}
|
||||
```
|
||||
|
||||
**Nach Änderungen:**
|
||||
```bash
|
||||
cmk -R
|
||||
cmk -II --all
|
||||
```
|
||||
|
||||
### Tolerantes Error-Handling konfigurieren
|
||||
|
||||
Backup-Typen mit tolerantem Verhalten (Failed → WARNING statt CRITICAL):
|
||||
|
||||
```python
|
||||
TOLERANT_TYPES = {
|
||||
'file', 'virtual', 'scale', 'dm', 'datacenter',
|
||||
'vmware', 'hyperv', 'mail', 'exchange',
|
||||
'custom_tolerant_type' # NEU
|
||||
}
|
||||
```
|
||||
|
||||
### CSV-Verzeichnis ändern
|
||||
|
||||
Im Agent-Plugin (`/usr/lib/check_mk_agent/plugins/tsm_backups`):
|
||||
|
||||
```python
|
||||
CSV_DIR = Path("/mnt/CMK_TSM") # Hier anpassen
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 🚀 Verwendung
|
||||
|
||||
### Service-Labels nutzen
|
||||
|
||||
Jeder TSM-Backup-Service erhält automatisch folgende Labels:
|
||||
|
||||
| Label | Werte | Beschreibung |
|
||||
|-------|-------|--------------|
|
||||
| `backup_type` | `mssql`, `hana`, `file`, ... | Erkannter Backup-Typ |
|
||||
| `backup_category` | `database`, `virtualization`, `filesystem`, `application`, `other` | Kategorie |
|
||||
| `backup_system` | `tsm` | Backup-System |
|
||||
| `frequency` | `hourly`, `daily`, `weekly`, `monthly` | Backup-Frequenz |
|
||||
| `backup_level` | `log`, `full`, `incremental`, `differential` | Backup-Level |
|
||||
| `error_handling` | `tolerant`, `strict` | Fehlerbehandlung |
|
||||
| `node_name` | Original Node-Name | TSM-Node |
|
||||
|
||||
### Beispiele: Label-basierte Filterung
|
||||
|
||||
#### Views erstellen
|
||||
|
||||
**CheckMK GUI:**
|
||||
`Setup > General > Custom views > Create new view`
|
||||
|
||||
**Filter-Beispiele:**
|
||||
- **Alle Datenbank-Backups:**
|
||||
`Service labels: backup_category = database`
|
||||
|
||||
- **Alle fehlgeschlagenen strikten Backups:**
|
||||
`State: CRIT` + `Service labels: error_handling = strict`
|
||||
|
||||
- **Alle MSSQL-Backups mit täglicher Frequenz:**
|
||||
`Service labels: backup_type = mssql AND frequency = daily`
|
||||
|
||||
#### Business Intelligence (BI)
|
||||
|
||||
```python
|
||||
# BI-Aggregation: Alle DB-Backups OK?
|
||||
{
|
||||
"type": "bi_aggregation",
|
||||
"title": "Database Backups",
|
||||
"filter": {
|
||||
"service_labels": {
|
||||
"backup_category": "database"
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Status-Bewertungslogik
|
||||
|
||||
| Bedingung | Error Handling | Ergebnis |
|
||||
|-----------|---------------|----------|
|
||||
| ≥ 1x Completed | - | OK ✅ |
|
||||
| Only Pending/Started (<2h) | - | OK ✅ |
|
||||
| Only Pending/Started (>2h) | - | WARN ⚠️ |
|
||||
| Failed/Missed | Tolerant | WARN ⚠️ |
|
||||
| Failed/Missed | Strict | CRIT 🔴 |
|
||||
| Age > Threshold | - | WARN/CRIT ⚠️🔴 |
|
||||
|
||||
---
|
||||
|
||||
## 🏗️ Architektur
|
||||
|
||||
### Komponenten-Übersicht
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────┐
|
||||
│ TSM Server │
|
||||
│ ┌────────────────────────────────────────────────┐ │
|
||||
│ │ Backup Jobs → CSV Export │ │
|
||||
│ │ (via TSM Queries oder Export-Scripts) │ │
|
||||
│ └────────────────┬───────────────────────────────┘ │
|
||||
└───────────────────┼──────────────────────────────────────┘
|
||||
│ CSV-Dateien
|
||||
│ (z.B. via NFS, SCP, Rsync)
|
||||
▼
|
||||
┌─────────────────────────────────────────────────────────┐
|
||||
│ Überwachter Host │
|
||||
│ ┌────────────────────────────────────────────────┐ │
|
||||
│ │ /mnt/CMK_TSM/*.CSV │ │
|
||||
│ └────────────────┬───────────────────────────────┘ │
|
||||
│ │ │
|
||||
│ ┌────────────────▼───────────────────────────────┐ │
|
||||
│ │ Agent Plugin: tsm_backups │ │
|
||||
│ │ - Liest CSV-Dateien │ │
|
||||
│ │ - Normalisiert Node-Namen │ │
|
||||
│ │ - Aggregiert pro Node │ │
|
||||
│ │ - Gibt JSON aus │ │
|
||||
│ └────────────────┬───────────────────────────────┘ │
|
||||
└───────────────────┼──────────────────────────────────────┘
|
||||
│ JSON via CheckMK Agent
|
||||
▼
|
||||
┌─────────────────────────────────────────────────────────┐
|
||||
│ CheckMK Server │
|
||||
│ ┌────────────────────────────────────────────────┐ │
|
||||
│ │ Check Plugin: tsm_backups │ │
|
||||
│ │ - Parsed JSON │ │
|
||||
│ │ - Erstellt Services mit Labels │ │
|
||||
│ │ - Bewertet Status │ │
|
||||
│ │ - Prüft Schwellwerte │ │
|
||||
│ │ - Erzeugt Metriken │ │
|
||||
│ └────────────────┬───────────────────────────────┘ │
|
||||
│ │ │
|
||||
│ ┌────────────────▼───────────────────────────────┐ │
|
||||
│ │ CheckMK Services │ │
|
||||
│ │ - TSM Backup SERVER_MSSQL │ │
|
||||
│ │ - TSM Backup VM_HYPERV_01 │ │
|
||||
│ │ - ... │ │
|
||||
│ └────────────────────────────────────────────────┘ │
|
||||
└─────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
### Datenfluss
|
||||
|
||||
1. **TSM Server**: Exportiert Backup-Statistiken als CSV
|
||||
2. **Host**: Agent-Plugin liest CSV, aggregiert und normalisiert Daten
|
||||
3. **CheckMK**: Check-Plugin empfängt JSON, erstellt Services, bewertet Status
|
||||
4. **Output**: Services mit Labels, Metriken, Status
|
||||
|
||||
### Node-Normalisierung
|
||||
|
||||
**Problem:** Redundante TSM-Server-Nodes
|
||||
**Beispiel:**
|
||||
- `RRZ01_MYSERVER_MSSQL`
|
||||
- `RRZ02_MYSERVER_MSSQL`
|
||||
- `NFRZ01_MYSERVER_MSSQL`
|
||||
|
||||
**Lösung:** Normalisierung entfernt `RRZ*/NFRZ*`-Präfixe
|
||||
**Ergebnis:** Ein Service `MYSERVER_MSSQL` aggregiert alle Nodes
|
||||
|
||||
---
|
||||
|
||||
## 🔍 Troubleshooting
|
||||
|
||||
### Problem: Keine Services gefunden
|
||||
|
||||
**Diagnose:**
|
||||
```bash
|
||||
# Agent-Ausgabe prüfen
|
||||
check_mk_agent | grep -A 20 "<<<tsm_backups"
|
||||
|
||||
# Sollte JSON zurückgeben
|
||||
<<<tsm_backups:sep(0)>>>
|
||||
{"NODE1": {...}, "NODE2": {...}}
|
||||
```
|
||||
|
||||
**Lösungen:**
|
||||
- CSV-Verzeichnis `/mnt/CMK_TSM` existiert?
|
||||
- CSV-Dateien vorhanden? (`ls -lh /mnt/CMK_TSM/*.CSV`)
|
||||
- Agent-Plugin ausführbar? (`ls -l /usr/lib/check_mk_agent/plugins/tsm_backups`)
|
||||
- Manuell testen: `/usr/lib/check_mk_agent/plugins/tsm_backups`
|
||||
|
||||
### Problem: Services bleiben UNKNOWN
|
||||
|
||||
**Diagnose:**
|
||||
```bash
|
||||
# Check-Plugin testen
|
||||
cmk -nv --debug hostname | grep "TSM Backup"
|
||||
```
|
||||
|
||||
**Lösungen:**
|
||||
- Check-Plugin korrekt installiert?
|
||||
- Plugin-Cache löschen: `cmk -R`
|
||||
- Discovery erneut: `cmk -II hostname`
|
||||
|
||||
### Problem: Falsche Backup-Typen erkannt
|
||||
|
||||
**Node-Namen-Konvention überprüfen:**
|
||||
```
|
||||
✅ KORREKT:
|
||||
- SERVER_MSSQL
|
||||
- DATABASE_HANA_01
|
||||
- FILESERVER_FILE
|
||||
- VM_HYPERV_123
|
||||
|
||||
❌ FALSCH:
|
||||
- MSSQL (zu kurz)
|
||||
- SERVER (kein Typ)
|
||||
- SERVER_12345 (Typ unklar)
|
||||
```
|
||||
|
||||
**Lösung:** Node-Namen-Schema anpassen oder `extract_backup_type()` erweitern
|
||||
|
||||
### Problem: CSV-Dateien werden nicht gelesen
|
||||
|
||||
**CSV-Format prüfen:**
|
||||
```bash
|
||||
head -n 5 /mnt/CMK_TSM/TSM_BACKUP_SCHED_24H.CSV
|
||||
```
|
||||
|
||||
**Erwartetes Format:**
|
||||
```
|
||||
2026-01-12 08:00:00,FIELD,SERVER_MSSQL,DAILY_FULL,Completed
|
||||
2026-01-12 09:15:00,FIELD,DATABASE_HANA,HOURLY_LOG,Completed
|
||||
```
|
||||
|
||||
**Spalten:**
|
||||
1. Timestamp (`YYYY-MM-DD HH:MM:SS`)
|
||||
2. Beliebiges Feld
|
||||
3. **Node-Name**
|
||||
4. **Schedule-Name**
|
||||
5. **Status**
|
||||
|
||||
### Problem: Logs analysieren
|
||||
|
||||
```bash
|
||||
# CheckMK-Log
|
||||
tail -f /omd/sites/monitoring/var/log/cmc.log | grep tsm
|
||||
|
||||
# Agent-Plugin debuggen
|
||||
/usr/lib/check_mk_agent/plugins/tsm_backups 2>&1 | tee /tmp/tsm_debug.log
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 📊 Metriken
|
||||
|
||||
### Erzeugte Metriken
|
||||
|
||||
| Metrik | Beschreibung | Einheit | Schwellwerte |
|
||||
|--------|--------------|---------|--------------|
|
||||
| `backup_age` | Zeit seit letztem Backup | Sekunden | Typ-spezifisch |
|
||||
| `backup_jobs` | Anzahl Backup-Jobs | Count | - |
|
||||
|
||||
### Grafana-Integration
|
||||
|
||||
**Beispiel-Query (InfluxDB):**
|
||||
```sql
|
||||
SELECT mean("backup_age")
|
||||
FROM "tsm_backups"
|
||||
WHERE "backup_type" = 'mssql'
|
||||
AND time > now() - 7d
|
||||
GROUP BY time(1h), "node_name"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 🔄 Changelog
|
||||
|
||||
### Version 5.0 (2026-01-12)
|
||||
- ✨ **Dynamische Backup-Typ-Erkennung**: Keine feste Liste mehr nötig
|
||||
- ✨ **Backup-Kategorien**: Zusätzliches Label `backup_category`
|
||||
- ✨ **Erweiterte Kommentierung**: Vollständige Docstrings
|
||||
- ✨ **Neue Typen**: PostgreSQL, MariaDB, MongoDB
|
||||
- 🐛 **Bugfix**: ServiceLabel API für CheckMK 2.3.0p40
|
||||
|
||||
### Version 4.0 (2026-01-10)
|
||||
- ✨ ServiceLabel API-Kompatibilität mit CheckMK 2.3.0p40
|
||||
- 📝 Erweiterte Dokumentation
|
||||
|
||||
### Version 3.0 (2025-12-15)
|
||||
- ✨ Node-Normalisierung für Redundanz
|
||||
- ✨ Aggregation pro logischem Node
|
||||
- 🔧 Typ-spezifische Schwellwerte
|
||||
|
||||
### Version 2.0 (2025-11-20)
|
||||
- ✨ Tolerantes Error-Handling
|
||||
- ✨ Service-Labels
|
||||
|
||||
### Version 1.0 (2025-11-01)
|
||||
- 🎉 Initiales Release
|
||||
|
||||
---
|
||||
|
||||
## 📝 Lizenz
|
||||
|
||||
MIT License - Siehe LICENSE Datei
|
||||
|
||||
## 👤 Autor
|
||||
|
||||
**Marius Gielnik**
|
||||
IT Product Owner - CheckMK Monitoring
|
||||
GC-Gruppe (Cordes und Gräfe KG)
|
||||
|
||||
## 🤝 Support
|
||||
|
||||
- **Issues:** GitHub Issues
|
||||
- **Fragen:** CheckMK Community Forum
|
||||
- **Email:** [deine-email]@example.com
|
||||
|
||||
---
|
||||
|
||||
## 📚 Weiterführende Links
|
||||
|
||||
- [CheckMK Plugin Development](https://docs.checkmk.com/latest/de/devel_check_plugins.html)
|
||||
- [IBM Spectrum Protect Documentation](https://www.ibm.com/docs/en/spectrum-protect)
|
||||
- [CheckMK Labels](https://docs.checkmk.com/latest/de/labels.html)
|
||||
|
||||
---
|
||||
|
||||
**Letzte Aktualisierung:** 2026-01-12
|
||||
**Version:** 4.1
|
||||
45
TSM/isntall.txt
Normal file
45
TSM/isntall.txt
Normal file
|
|
@ -0,0 +1,45 @@
|
|||
TSM BACKUP MONITORING FÜR CHECKMK
|
||||
==================================
|
||||
|
||||
Installation in 3 Schritten
|
||||
----------------------------
|
||||
|
||||
1. AGENT-PLUGIN (auf jedem überwachten Host)
|
||||
Kopiere: tsm_backups_agent.py
|
||||
Nach: /usr/lib/check_mk_agent/plugins/tsm_backups
|
||||
Rechte: chmod +x /usr/lib/check_mk_agent/plugins/tsm_backups
|
||||
|
||||
2. CHECK-PLUGIN (auf CheckMK Server)
|
||||
Kopiere: tsm_backups_check.py
|
||||
Nach: /omd/sites/<site>/local/lib/python3/cmk_addons/plugins/tsm/agent_based/tsm_backups.py
|
||||
Dann: cmk -R
|
||||
|
||||
3. SERVICE DISCOVERY
|
||||
cmk -II <hostname>
|
||||
|
||||
Detaillierte Anleitungen
|
||||
------------------------
|
||||
|
||||
→ QUICKSTART.md - 5-Minuten Schnellstart-Anleitung
|
||||
→ README.md - Feature-Übersicht und Verwendung
|
||||
→ DOCUMENTATION.md - Technische Details und API-Referenz
|
||||
|
||||
Dateien
|
||||
-------a
|
||||
|
||||
tsm_backups_agent.py - Agent-Plugin (auf überwachten Hosts)
|
||||
tsm_backups_check.py - Check-Plugin (auf CheckMK Server)
|
||||
README.md - Hauptdokumentation
|
||||
QUICKSTART.md - Schnellstart-Anleitung
|
||||
DOCUMENTATION.md - Technische Dokumentation
|
||||
CHANGELOG.md - Versionshistorie
|
||||
LICENSE - MIT-Lizenz
|
||||
|
||||
Support
|
||||
-------
|
||||
|
||||
Autor: Marius Gielnik
|
||||
Version: 5.0.0
|
||||
Datum: 2026-01-12
|
||||
|
||||
Bei Fragen: Siehe README.md oder DOCUMENTATION.md
|
||||
0
TSM/tsm_backup_check.py
Normal file
0
TSM/tsm_backup_check.py
Normal file
128
TSM/tsm_backups.py
Normal file
128
TSM/tsm_backups.py
Normal file
|
|
@ -0,0 +1,128 @@
|
|||
#!/usr/bin/env python3
|
||||
"""
|
||||
TSM Backup Status Agent Plugin for CheckMK
|
||||
- Reads CSV files from /mnt/CMK_TSM
|
||||
- Normalizes RRZ*/NFRZ* node names for redundancy
|
||||
- Aggregates backups per logical node
|
||||
- Outputs data as CheckMK agent section
|
||||
|
||||
Installation: /usr/lib/check_mk_agent/plugins/tsm_backups
|
||||
Permissions: chmod +x tsm_backups
|
||||
|
||||
Author: Marius Gielnik
|
||||
Version: 1.0 - Agent Plugin for CheckMK 2.3+
|
||||
"""
|
||||
import csv
|
||||
import json
|
||||
from datetime import datetime
|
||||
from pathlib import Path
|
||||
from collections import defaultdict
|
||||
import re
|
||||
import sys
|
||||
|
||||
CSV_DIR = Path("/mnt/CMK_TSM")
|
||||
|
||||
class TSMParser:
|
||||
def __init__(self):
|
||||
self.backups = []
|
||||
|
||||
def normalize_node_name(self, node):
|
||||
"""Normalisiert Node-Namen für Redundanz-Logik"""
|
||||
pattern = r'(RRZ|NFRZ|RZ)\d+(_)'
|
||||
normalized = re.sub(pattern, r'\2', node)
|
||||
pattern_end = r'(RRZ|NFRZ|RZ)\d+$'
|
||||
normalized = re.sub(pattern_end, '', normalized)
|
||||
return normalized
|
||||
|
||||
def is_valid_node(self, node, status):
|
||||
"""Prüft ob ein Node valide ist"""
|
||||
if not node or len(node) < 3 or not status:
|
||||
return False
|
||||
if "MAINTENANCE" in node:
|
||||
return False
|
||||
return True
|
||||
|
||||
def parse_csv(self, csv_file):
|
||||
"""Parst CSV-Datei"""
|
||||
try:
|
||||
with open(csv_file, 'r', encoding='utf-8') as f:
|
||||
reader = csv.reader(f)
|
||||
for row in reader:
|
||||
if not row or len(row) < 5:
|
||||
continue
|
||||
|
||||
time_str = row[0].strip()
|
||||
node = row[2].strip()
|
||||
schedule = row[3].strip()
|
||||
status = row[4].strip()
|
||||
|
||||
if not self.is_valid_node(node, status):
|
||||
continue
|
||||
|
||||
normalized_node = self.normalize_node_name(node)
|
||||
|
||||
self.backups.append({
|
||||
"time": time_str,
|
||||
"node": normalized_node,
|
||||
"status": status,
|
||||
"schedule": schedule,
|
||||
})
|
||||
except Exception:
|
||||
pass
|
||||
|
||||
def aggregate(self):
|
||||
"""Aggregiert Backups pro logischem Node"""
|
||||
nodes = defaultdict(lambda: {
|
||||
"statuses": [],
|
||||
"schedules": [],
|
||||
"last": None,
|
||||
"count": 0,
|
||||
})
|
||||
|
||||
for b in self.backups:
|
||||
node = b["node"]
|
||||
nodes[node]["count"] += 1
|
||||
nodes[node]["statuses"].append(b["status"])
|
||||
nodes[node]["schedules"].append(b["schedule"])
|
||||
|
||||
try:
|
||||
t = datetime.strptime(b["time"], "%Y-%m-%d %H:%M:%S")
|
||||
if not nodes[node]["last"] or t > nodes[node]["last"]:
|
||||
nodes[node]["last"] = t
|
||||
except Exception:
|
||||
pass
|
||||
|
||||
# Convert datetime to timestamp for JSON serialization
|
||||
result = {}
|
||||
for node, data in nodes.items():
|
||||
result[node] = {
|
||||
"statuses": data["statuses"],
|
||||
"schedules": data["schedules"],
|
||||
"last": int(data["last"].timestamp()) if data["last"] else None,
|
||||
"count": data["count"]
|
||||
}
|
||||
return result
|
||||
|
||||
def main():
|
||||
if not CSV_DIR.exists():
|
||||
# Output empty section if directory doesn't exist
|
||||
print("<<<tsm_backups:sep(0)>>>")
|
||||
print(json.dumps({}))
|
||||
return
|
||||
|
||||
csv_files = list(CSV_DIR.glob("*_SCHED_24H.CSV"))
|
||||
if not csv_files:
|
||||
csv_files = list(CSV_DIR.glob("*.CSV")) + list(CSV_DIR.glob("*.csv"))
|
||||
|
||||
parser = TSMParser()
|
||||
for csv_file in csv_files:
|
||||
parser.parse_csv(csv_file)
|
||||
|
||||
nodes = parser.aggregate()
|
||||
|
||||
# Output CheckMK agent section
|
||||
print("<<<tsm_backups:sep(0)>>>")
|
||||
print(json.dumps(nodes))
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
Loading…
Reference in a new issue