Commits vergleichen

...

70 Commits

Autor SHA1 Nachricht Datum
52f5debe44 Release-Notes: X-Recherche-Konten im Verwaltungsportal verwalten 2026-05-22 14:41:16 +02:00
claude-dev
8c75a70655 feat(x-scraper): X-Recherche-Konten im Verwaltungsportal verwalten
Neuer Sub-Tab "X-Recherche-Konten" unter Quellen: die X-Login-Konten,
mit denen der Monitor bei X scrapt (twscrape-Account-Pool), anzeigen,
hinzufuegen, Cookies erneuern, aktiv/inaktiv schalten, entfernen, plus
Sperren-Reset.

- neuer Router x_scraper.py, verwaltet den twscrape-Pool ueber dessen API
- X_ACCOUNTS_DB_PATH in config.py
- twscrape als Abhaengigkeit (git-main-Pin)
- Sub-Tab, Tabelle und zwei Modals in dashboard.html, Logik in x-scraper.js

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 11:22:35 +00:00
6bfff67c2f Promote develop → main (2026-05-22 11:13 UTC) 2026-05-22 13:13:41 +02:00
746b1bcd81 Release-Notes: Interne Verbesserungen 2026-05-22 13:13:39 +02:00
7ec153ca49 Release-Notes: Interne Verbesserungen 2026-05-22 13:13:27 +02:00
claude-dev
a27fe44b0b Revert "feat(sources): X-Account-Verwaltung im Verwaltungsportal"
This reverts commit bd476edb13.
2026-05-22 11:12:28 +00:00
6c623a8ae5 Promote develop → main (2026-05-22 11:09 UTC) 2026-05-22 13:09:25 +02:00
240222cb2a Release-Notes: X-Konten direkt im Verwaltungsportal verwalten 2026-05-22 13:09:23 +02:00
claude-dev
bd476edb13 feat(sources): X-Account-Verwaltung im Verwaltungsportal
Neuer Sub-Tab "X-Accounts" unter Quellen: die als Recherchequelle
eingebundenen X-Accounts anzeigen, hinzufuegen, bearbeiten und entfernen.
Schreibt source_type=x_account in die geteilte sources-Tabelle, von wo
der Monitor sie pro Lage nutzt.

- x_account im source_type-Pattern von GlobalSourceCreate/Update
- primary_language in Create/Update plus INSERT (Keyword-Matching)
- x_account-Typ und x-Kategorie in source_meta.py
- Sub-Tab, Tabelle und Modal in dashboard.html, Logik in sources.js

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 11:06:07 +00:00
ed38d68db7 Promote develop → main (2026-05-22 09:37 UTC) 2026-05-22 11:37:54 +02:00
c7d6d2eedf Release-Notes: Neue Übersetzungsfunktion im Dashboard 2026-05-22 11:37:47 +02:00
031bd9e114 feat(translation): manueller Übersetzungs-Button im Dashboard
Fremdsprachige Artikel ohne deutsche Fassung lassen sich jetzt manuell
über das Verwaltungs-Dashboard übersetzen. Hintergrund: die automatische
Übersetzung im Monitor wurde deaktiviert (TRANSLATOR_ENABLED=false),
nachdem ein sehr großer Lauf den Refresh-Worker blockiert hatte.

- translation_agent.py: Verwaltungs-Adaption des Monitor-Translators
  (Haiku-Batches), Imports auf shared.agents.claude_client umgestellt
- routers/translation.py: Endpoints /api/translation/status, /run und
  /cancel. Der Lauf läuft als entkoppelter Hintergrund-Task, blockiert
  keinen Request und ist jederzeit abbrechbar
- Dashboard-Karte mit Fortschrittsbalken, Aufwandsschätzung vorab und
  Abbrechen-Button
- test_imports.py: neuen Router in den Smoke-Test aufgenommen

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 10:07:37 +02:00
c316c67294 Kategorie stimmungsbild (#9) 2026-05-22 00:29:02 +02:00
430641b128 feat(sources): neue Kategorie 'stimmungsbild' fuer Forum-Quellen
Single Source of Truth (source_meta.SOURCE_CATEGORIES) um den Eintrag
"Forum / Stimmungsbild" erweitert. Wird vom Frontend ueber /api/sources/meta
geladen und in den Filter-Dropdowns angezeigt.

Hintergrund: jp_demo nutzt 5ch (Phase 2), Hatena Bookmark und Note Trending
als anonyme Foren-Quellen fuer eine eigene Stimmungs-Kachel im Monitor.
Diese Quellen bekommen in der DB category='stimmungsbild' + media_type='forum',
sodass sie aus dem Faktencheck rausfallen.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 00:28:04 +02:00
7c558b7cb4 Promote develop → main (2026-05-21 15:49 UTC) 2026-05-21 17:49:35 +02:00
Claude Code
c62be998d5 fix(config): CLAUDE_PATH-Default auf /usr/local/bin/claude
Bisheriger Default /home/claude-dev/.claude/local/claude existiert auf dem
Live-Server nicht. Folge: jeder call_claude im verwaltungsportal stirbt mit
FileNotFoundError [Errno 2]. Konkret betroffen: bulk_classify (alle Quellen-
Klassifizierungen schlugen still fehl bzw. brachen nach der ersten Quelle ab).

Live wurde uebergangsweise per CLAUDE_PATH-Env in /home/claude-dev/AegisSight
-Monitor-Verwaltung/.env ueberschrieben. Dieser Commit zieht den Fix in den
Code, damit der Default auch ohne .env-Override funktioniert (bzw. die .env-
Zeile spaeter wieder entfernt werden kann).

Monitor-config.py:30 hat einen abweichenden Default (/usr/bin/claude) —
nicht in diesem Commit angefasst, da Monitor heute funktioniert; getrennt
nachhalten falls auch dort Drift auftritt.
2026-05-20 19:53:03 +00:00
5d1d72bf3d Promote develop → main (2026-05-17 19:19 UTC) 2026-05-17 21:19:44 +02:00
d0b71d82e4 Release-Notes: 83 neue Quellen für Militär, Polizei-Technik & Waffen 2026-05-17 21:19:42 +02:00
claude-dev
c64675b266 feat(scripts): Bulk-Seed fuer 83 Militaer-, Polizei-Technik und Waffen-Quellen
Neues idempotentes Skript scripts/seed_military_sources.{json,py} legt 85
internationale Defense-Quellen an (RSS/Web/Telegram), kategorisiert mit
Topic-Tags im notes-Feld: [militaertechnik], [waffen-international],
[polizei-technik]. Sprachen EN/DE/FR/RU/FA/PL, country_code manuell gesetzt.

Erstlauf auf Staging-DB: 83 neu (IDs 384-466), 2 Duplikate (rybar,
osintdefender bereits vorhanden). URL-Check verhindert Duplikate, das
gleiche Skript laeuft ohne Aenderung gegen Live-DB:

  venv/bin/python scripts/seed_military_sources.py \
    --db /home/claude-dev/osint-data/osint.db

Sektionen: 31 internationale Equipment-Fachredaktionen (Janes, TWZ, Defense
News, Naval News, Army Recognition, Aviation Week ...), 8 deutsche
(ESuT, Soldat & Technik, hartpunkt, Augen geradeaus ...), 5 franzoesische
(Opex360, Mer et Marine ...), 5 russische (Topwar, TASS, RIA, bmpd, Zvezda),
4 ukrainisch/polnische (Defense Express, Militarnyi, Defence24), 2
israelische, 3 iranische, 3 chinesisch/asiatische, 8 OSINT-Tracker (ORYX,
WarSpotting, CIT, 5 Telegram), 5 Polizei-Technik (Behoerden-Spiegel, pvt,
Police Magazine ...) und 11 Waffen-Spezialisten (Small Arms Survey, SIPRI,
Conflict Armament Research, ARES, Calibre Obscura, ICRC ...).

Plan: ~/.claude/plans/gleaming-inventing-fern.md

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 11:18:07 +00:00
1d9ce20b68 Promote develop → main (2026-05-17 00:40 UTC) 2026-05-17 02:40:40 +02:00
claude-dev
9843ff0015 fix(pdf-upload): closeModal-Aufrufe ohne Quotes -> Abbrechen ging nicht
Die Anfuehrungszeichen waren beim Einfuegen verloren gegangen
(onclick="closeModal(modalPdfUpload)" statt closeModal())
-> Browser warf ReferenceError und der Klick blieb wirkungslos.
Mit &#39; (HTML-Entity-Apostroph) im Attribut-Wert eindeutig.
2026-05-16 23:43:41 +00:00
claude-dev
27afce7c9e feat(sources): PDF-Upload als neuer Quellentyp pdf_document
- POST /api/sources/global/upload-pdf: multipart File-Upload,
  50 MB Limit, SHA256-Dedup, speichert PDF unter <dirname(DB)>/pdfs/{sha}.pdf,
  legt Source mit processed_at=NULL an (Monitor verarbeitet asynchron)
- pattern in GlobalSourceUpdate um pdf_document erweitert (2x)
- dashboard.html: Button + Modal im Grundquellen-Sub-Tab
- sources.js: openPdfUploadModal + setupPdfUploadForm + FormData-Submit
- app.js: API.upload(path, formData) Helper fuer multipart
- requirements.txt: pypdf (Validierung optional)
2026-05-16 23:21:56 +00:00
claude-dev
d3e5fa7079 feat(orgs): Pipeline-Sprache als Org-Setting im Verwaltungsportal
- OrgCreate / OrgUpdate / OrgResponse um output_language (de | en).
- routers/organizations.py persistiert die Sprache nach create/update
  via shared.services.org_settings.set_org_setting.
- _enrich_org liest output_language aus organization_settings (Default de).
- Frontend: Dropdown im Modal Neue Organisation und im Org-Edit-Formular,
  Auto-Befuellung aus org.output_language. Cache-Buster auf app.js gebumpt.

Phase 7 von 8 (eng_demo / Org-Sprache).
2026-05-13 21:18:07 +00:00
claude-dev
521633bde9 feat(shared): org_settings Helper (Kopie aus Monitor)
Helper aus AegisSight-Monitor/src/services/org_settings.py uebernommen.
Wird in Phase 7 vom Verwaltungs-Org-Router verwendet, um output_language
beim Org-Anlegen/Bearbeiten zu setzen.

Phase 1 von 8 (eng_demo / Org-Sprache).
2026-05-13 20:46:10 +00:00
claude-dev
015255237a feat(klassifikation): Quellen-Klassifikation aus Monitor in Verwaltung verschoben
Service-Module (source_classifier, external_reputation) liegen jetzt in shared/services/, Endpoints unter /api/sources/classification/* sind hier statt im Monitor:
- classification/{stats,queue,bulk-classify,bulk-approve}
- {id}/classification/{approve,reject,reclassify}
- external-reputation/sync

modalSource erweitert um Klassifikations-Section (Politik, Medientyp, Reliability, state-affiliated, Land, 12 Alignment-Chips). Neuer Sub-Tab Klassifikation mit Review-Queue, Pending-Counter, Bulk-Actions. Auth via get_current_admin, Audit-Logging.

Begleit-Refactor: Monitor verliert die Klassifikations-UI/-Endpoints separat.
2026-05-09 21:27:55 +00:00
b56b7eeda2 Merge pull request 'Promote: sync Strategie/Karteileichen-Reihenfolge' (#5) from develop into main 2026-05-09 17:44:30 +02:00
Claude
2f7d967ce2 sync(shared): Reihenfolge Strategie-Eskalation/Karteileichen aus Monitor 5f053a3 2026-05-09 15:43:44 +00:00
1d9751ef1a Promote develop -> main 2026-05-09 17:27:03 +02:00
Claude
5e08d06784 feat(quellen-health): Strategie-Eskalation, Loesung-suchen bei Warnings, Trend-Delta
Drei zusammenhaengende Verbesserungen am Quellen-Health-Bereich:

1. shared/services/source_suggester.py:
   - sync mit Monitor commit 49c5572.
   - Neue Funktion generate_strategy_escalation_suggestions: erzeugt
     deactivate-Vorschlaege fuer Quellen mit fetch_strategy=googlebot|
     paywall, deren Reachability-Check trotzdem error meldet.

2. source-health.js: Loesung-suchen-Button erweitert.
   Bisher nur bei status=error AND check_type=reachability. Jetzt auch
   bei status=warning AND check_type=feed_validity (z.B. "Feed
   erreichbar aber leer"). Backend-Endpoint /api/sources/health/
   search-fix wird in beiden Faellen aufgerufen, Claude sucht eine
   bessere URL fuer die Quelle.

3. source-health.js: Trend-Delta im Counter.
   Liest healthHistoryCache[1] (vorletzter Run) und vergleicht mit
   aktuellen errors/warnings/ok. Zeigt z.B. "3 Fehler (+2)" rot oder
   "143 Warnungen (-15)" gruen. Bei steigenden ok-Counts ist Plus
   gruen, bei steigenden Fehlern ist Plus rot. Wenn der vorletzte
   Run nicht verfuegbar (Initial-Lauf): kein Delta.

Cache-Buster source-health.js auf 20260509l gebumpt.
2026-05-09 15:26:24 +00:00
6a02e06887 Promote develop -> main 2026-05-09 17:21:07 +02:00
Claude
719c67df3e sync(shared): Karteileichen-Heuristik aus Monitor-Repo uebernommen
Spiegelung von AegisSight-Monitor commit d973dc7. Identische Datei
(Pre-Commit-Hook prueft Drift gegen Monitor-Master = 0).

Neue Funktion generate_stale_deactivation_suggestions wird beim
manuellen Health-Check-Run ueber das Verwaltungsportal-UI aufgerufen
(/api/sources/health/run-stream am Ende). Karteileichen-Quellen
landen damit im Vorschlaege-Tab als deactivate_source-Vorschlaege
und koennen per Klick angenommen werden.
2026-05-09 15:09:45 +00:00
42a0647cd7 Promote develop → main (2026-05-09 15:05 UTC) 2026-05-09 17:05:22 +02:00
Claude (cleanup)
38a13c0b64 ux(quellen-health): Run-Verlauf-Tabelle kompakter
- Total-Spalte raus (= errors+warnings+ok, redundant).
- Spalten-Widths explizit per colgroup gesetzt: 200/160/110/130/110px,
  damit die Werte nicht in einer leeren Flaeche rechts kleben.
- Header-Bezeichnungen + Werte fuer Counter-Spalten zentriert
  (statt rechtsbuendig auf gleichmaessig verteilten Spalten).
- Run-ID gekuerzt auf 12 Zeichen, kleinerer font-size, voller
  Wert im title-Tooltip.
- Spaltenbeschriftung von "Zeitpunkt (Run-Ende)" -> "Zeitpunkt"
  (Klammer-Erklaerung war Footnote-Material).

Cache-Buster source-health.js auf 20260509k gebumpt.
2026-05-09 14:44:20 +00:00
Claude (cleanup)
3a838809c6 fix(navigation): #healthSubTabs aus globalem Top-Tab-Handler ausnehmen
Der globale setupNavTabs in app.js fing nav-tab-Clicks aus ALLEN
nav-tabs ab, ausser #orgDetailTabs und #sourceSubTabs. Das neue
#healthSubTabs (aus dem letzten Commit) war nicht in der :not()-
Liste und triggerte daher den Top-Level-Handler, der getElementById("sec-suggestions")
suchte und null bekam -> Crash beim classList.add("active").

Fix: :not(#healthSubTabs) ergaenzt an allen drei Stellen
(setupNavTabs, setupNavTabs Click-Handler, openSection-Helfer in Z. 408).
Cache-Buster fuer app.js gebumpt 20260509d -> 20260509j.
2026-05-09 14:38:36 +00:00
Claude (cleanup)
f1680c9f4f ux(quellen-health): Sub-Tabs Vorschläge / Health-Status / Verlauf, Lucide-Icons statt Emojis
Splittet die Quellen-Health-Section in drei eigene Sub-Tabs auf, damit
der User je nach Aufgabe nur den relevanten Bereich sieht und nicht
durch die ganze Seite scrollen muss.

dashboard.html:
- Innerhalb von <div id=sub-source-health>: neue nav-tabs healthSubTabs
  mit drei Buttons (Vorschläge / Health-Status / Verlauf).
- Drei Pane-Container ht-suggestions / ht-checks / ht-verlauf,
  jeweils per inline-style display kontrolliert.

source-health.js:
- setupHealthSubTabs(): Click-Handler fuer den Tab-Wechsel
  (toggle .active auf den Buttons + display none/block auf den Panes).
- renderHealthDashboard splittet jetzt in drei innerHTML-Calls,
  einen pro Pane:
    paneSuggestions <- Vorschlaege offen
    paneChecks      <- Counter + Filter + Tabelle + Mehr-laden
    paneVerlauf     <- erledigte Vorschlaege + Run-Verlauf
- Tab-Label "Vorschlaege" wird mit Counter angereichert (z.B.
  "Vorschlaege (24 offen)"), wenn welche offen sind.
- LUCIDE_ICONS-Konstante mit Inline-SVG fuer check, x, search,
  refresh. Emojis und HTML-Entities (&check; &times; ) ersetzt.
  Inline-SVG statt CDN-Library, damit keine externe Abhaengigkeit.

Cache-Buster fuer source-health.js auf 20260509i gebumpt.
2026-05-09 14:26:10 +00:00
Claude (cleanup)
5191962ce0 ux(quellen-health): Verschlankung - Beschreibung gekürzt, Verlauf eingeklappt, schmalere Health-Tabelle, Icon-Buttons
Vier UX-Hebel zusammengelegt, alle reines Frontend:

1. Vorschlaege-Tabelle: Beschreibung als Einzeiler mit Ellipsis;
   voller Text im title-Tooltip. Spart bei 24 offenen Vorschlaegen
   ~25 Bildschirmhoehen.

2. Verlauf-Card: standardmaessig eingeklappt via <details>-Element.
   Header zeigt nur "Verlauf (N erledigte Vorschlaege - klick zum
   Aufklappen)". Klick expandiert die Tabelle.

3. Health-Tabelle: Spalten Domain und Sprache aus der Tabelle raus,
   beide als Tooltip auf dem Quellen-Namen. Tabelle hat statt 8
   Spalten nur noch 6, ist schmaler und besser lesbar.

4. Aktionen-Spalten: Text-Buttons ("Annehmen", "Ablehnen", "Lösung
   suchen") durch kompakte Icon-Buttons ersetzt (✓ ✗ 🔍).
   Funktion identisch, Tooltip via title-Attribut.

Cache-Buster fuer source-health.js auf 20260509h gebumpt.
2026-05-09 14:18:04 +00:00
Claude (cleanup)
b6926df84d cleanup(sources): redundanten /health/run Endpoint entfernen
Frontend ruft ausschliesslich /health/run-stream auf. Der Legacy-Endpoint
/health/run war ein simples synchrones Pendant ohne Fortschrittsanzeige
und wurde nirgends mehr aufgerufen (verifiziert via grep -r im Repo).

Schritt 2 der Quellen-Health-Aufraeumung. Reine Code-Saeuberung,
keine UX- oder Backend-Verhaltensaenderung.
2026-05-09 14:03:51 +00:00
2594b0339f Promote develop -> main 2026-05-09 15:47:23 +02:00
claude-dev
e8bb2495ee ux(quellen-health): Default "Nur Probleme", Counter feiner gegliedert, Filter-Hint bei Pagination
Schritt 1 der Quellen-Health-Aufraeumung. Drei UX-Verbesserungen, kein Daten-Eingriff:

1. Default-Filter "Nur Probleme" (errors + warnings, ohne OK).
   - Neuer Status-Filter-Wert "issues" als virtuelles Frontend-Konstrukt.
   - applyHealthFilter behandelt "issues" als status != ok.
   - Default in healthFilters ist jetzt "issues". User sieht beim
     Tab-Klick sofort die kritischen 146 Eintraege statt der 281
     gruenen OK-Zeilen.

2. Counter aufgegliedert nach check_type.
   - Backend (/api/sources/health): zusaetzliches Feld "breakdown"
     mit der GROUP-BY (check_type, status) Aggregation.
   - Frontend rendert pro Status-Zeile die feine Aufschluesselung,
     z.B. "143 Warnungen (112 Aktualität, 27 Feed-Validität, 3 Duplikat,
     1 Erreichbarkeit)".
   - Hilft dem Admin, sofort zu sehen wo das Problem liegt.

3. Filter-Hint bei Pagination + leeren Treffern.
   - Wenn der aktuelle Filter ueber die geladenen 100 Items keinen
     Treffer findet UND has_more=true, zeigt das Frontend einen
     Hinweis-Link "Alle X Health-Checks laden und Filter erneut
     anwenden".
   - Loest das Edge-Problem, dass z.B. Filter "Nur OK" auf den
     Default-100 (errors first) leer schien.

Cache-Buster fuer source-health.js auf 20260509g gebumpt.
2026-05-09 13:24:44 +00:00
claude-dev
50749323f8 fix(quellen-health): fehlende Sub-Section + Render-Container
Tab-Button "Quellen-Health" verlinkte auf eine Sub-Section, die
es im DOM gar nicht gab:

- <button data-subtab="source-health"> existierte bereits
- <div id="sub-source-health"> fehlte komplett
- <div id="healthContent"> (Render-Anker für source-health.js) fehlte
  ebenfalls

Folge:
1. sources.js Click-Handler crashte mit
   "Cannot read properties of null (reading classList)" beim Versuch,
   die Sub-Section auf .active zu setzen
2. loadHealthData() lief zwar (über separaten Listener in
   source-health.js) und der Backend-Call ging durch, aber
   renderHealthDashboard fand kein #healthContent und brach still ab
   (if (!container) return). Nutzer sah niemals Inhalt.

Fix: Sub-Section <div id="sub-source-health"><div id="healthContent">
zwischen sub-tenant-sources und der Audit-Section eingefügt. Außerdem
das ohnehin fehlende </div> für sec-sources sauber geschlossen.

Damit löst sich das gefühlte "Quellen Health lädt minutenlang":
beim Klick wird der Tab korrekt aktiviert, der Render landet in
#healthContent und ist dank der Pagination + Cache aus den letzten
zwei Commits sofort sichtbar.
2026-05-09 13:15:16 +00:00
claude-dev
657683d491 perf(sources): Quellen-Health Pagination (default 100, plus mehr/alle laden)
Echter Bottleneck war der DOM-Render von 519 Tabellen-Zeilen, nicht
das Backend (45ms). Backend-Slim und Cache aus dem letzten Commit
haben Bandbreite und wiederholte Klicks beschleunigt, aber der erste
Klick blieb langsam, weil weiterhin alle 519 Items in einem
innerHTML-Schub gerendert wurden.

Lösung: Server-Side-Pagination.

Backend (/api/sources/health):
- Neue Query-Param: limit (default 100, max 5000), offset (default 0)
- Counters errors/warnings/ok/total_checks aus separater GROUP-BY-
  Aggregat-Query über den GESAMTEN Bestand, nicht über die Page.
- Neues Feld all_orgs in der Antwort: alle Tenants mit Health-Checks,
  damit das Filter-Dropdown auch im Pagination-Modus die volle
  Org-Liste hat.
- Neue Felder limit, offset, has_more.

Frontend (source-health.js):
- healthLoadLimit (default 100), wird durch loadMoreHealth() um 200
  hochgesetzt oder durch loadAllHealth() auf alles gesetzt.
- Cache-Key beinhaltet jetzt auch das aktuelle Limit, damit beim
  Mehr-laden nicht aus altem Cache bedient wird.
- Org-Liste kommt aus healthData.all_orgs statt aus den geladenen
  Page-Items, sonst wäre sie nach Pagination unvollständig.
- Footer mit zwei Buttons ("+200 laden", "Alle N weiteren laden")
  unter der Tabelle, nur sichtbar bei has_more=true.
- Counter-Anzeige: "X / Y angezeigt (von Z insgesamt)".

Cache-Buster für source-health.js auf 20260509f gebumpt.
2026-05-09 12:54:35 +00:00
claude-dev
f6af21e6cb perf(sources): Quellen-Health Tab schneller (Payload-Slim + 60s-Cache)
Tab "Quellen Health" lädt deutlich schneller:

1. /api/sources/health: SELECT reduziert auf nur die im Frontend wirklich
   gerenderten Felder. Weg sind: h.id, s.url, s.source_type, s.category,
   s.bias, h.details, h.checked_at. Response-Größe sinkt damit von ~198 KB
   auf grob die Hälfte (bei 519 Health-Checks) ohne UI-Verlust.

2. source-health.js: 60-Sekunden In-Memory-Cache fürs loadHealthData.
   Tab hin und her klicken ist damit instant statt jedes Mal voller
   Reload + Render der 519 Tabellen-Zeilen.
   Bei Mutationen (Vorschlag annehmen/ablehnen, run-stream beendet,
   search-fix) wird mit loadHealthData(true) der Cache umgangen,
   damit frische Daten gezeigt werden.

3. dashboard.html: Cache-Buster für source-health.js auf 20260509e gebumpt.
2026-05-09 12:33:30 +00:00
claude-dev
1b25d8ba12 sync: paywall-Strategie ohne removepaywall fuer Feed-URL 2026-05-09 05:02:22 +00:00
claude-dev
d8f4e0d303 sync: removepaywall.com Korrektur aus Monitor (singular) 2026-05-09 05:00:14 +00:00
claude-dev
7f729443cb Phase 18 (Verwaltung): fetch_strategy in CRUD + Edit-Modal
- migrations/2026-05-09e_fetch_strategy.py NEU: ALTER TABLE sources ADD COLUMN
  fetch_strategy. Pre-flagging fuer FT/WSJ/NZZ etc. (paywall) und Rheinische
  Post/Verfassungsschutz (googlebot).
- shared/services/source_health.py: gesynct vom Monitor (Phase-18-Code mit
  Retry-Logik + Strategien default/googlebot/paywall/skip).
- routers/sources.py: GlobalSourceCreate/Update um fetch_strategy
  (Pattern-Validation), SOURCE_UPDATE_COLUMNS + INSERT erweitert.
- dashboard.html: Edit-Modal hat jetzt Dropdown sourceFetchStrategy.
- sources.js: laedt + sendet fetch_strategy mit.

Cache-Buster 20260509c -> 20260509d.
2026-05-09 04:57:01 +00:00
claude-dev
bff934d673 Phase 17: Health-Tab Filter + Org-Spalte + History-View + URL-Schema-Fix
Backend:
- shared/services/source_health.py: URL ohne https://-Prefix wird normalisiert
  bevor httpx.get() aufgerufen wird (Bug-Fix: t.me/kanal liess httpx mit
  ValueError crashen, Synchron mit Monitor-Fix 1ee6c4d).
- routers/sources.py /health: Query erweitert um tenant_id, category,
  language, bias, org_name (LEFT JOIN organizations) - Frontend kann jetzt
  pro Issue Tenant-Info anzeigen.
- routers/sources.py /health/history NEU: letzte N Runs aus
  source_health_history aggregiert (run_id, archived_at, errors/warnings/ok).

Frontend (source-health.js):
- healthFilters State: status / check_type / org.
- applyHealthFilter() reduziert die Anzeige.
- Filter-Bar mit 3 Dropdowns + Counter "X / Y Ergebnisse".
- Tabelle erweitert: Org-Spalte ("global" oder Org-Name), Sprache-Spalte.
- History-View neu: letzte 10 Runs als Tabelle (Zeitpunkt, Run-ID, Counts).

Cache-Buster auf 20260509c gebumpt.
2026-05-09 04:47:05 +00:00
claude-dev
07a426561c sync_shared: LOCKED_FILES-Eintrag entfernt nach Phase 16
Nach Phase 16 (Monitor-source_health.py auf Phase-2-Stand) sind alle
4 shared/-Dateien wieder identisch zwischen Monitor und Verwaltung.
Der Lock auf source_health.py war nur fuer den Zeitraum noetig, in dem
die Verwaltung die History-Logik schon hatte und der Monitor noch nicht.
2026-05-09 04:43:20 +00:00
claude-dev
b5bb27785a cache-buster: 20260509 -> 20260509b (Phase 15 sichtbar machen) 2026-05-09 04:35:33 +00:00
claude-dev
c86b2a0056 Phase 15: language + bias als Spalten, Filter, Edit-Form
Bisher waren die DB-Felder sources.language und sources.bias zwar gepflegt
(254/275 Quellen mit bias, 254 mit language), aber in der Verwaltung nicht
sichtbar. Der Admin konnte nicht filtern oder editieren.

Backend (routers/sources.py)
- GlobalSourceCreate + GlobalSourceUpdate Pydantic-Modelle: language +
  bias als Optional[str] erweitert (max 100 / 500 Zeichen).
- SOURCE_UPDATE_COLUMNS: language + bias hinzu.
- INSERT in create_global_source: schreibt language + bias mit.
- Neuer Endpoint GET /api/sources/global/languages: distinct language-Werte
  fuer Frontend-Filter-Dropdown.

Frontend HTML (dashboard.html)
- Grundquellen-Filter-Bar: Sprachen-Dropdown ergaenzt.
- Grundquellen-Tabellenkopf: 2 neue Spalten Sprache (sortable) + Bias.
- modalSource: 2 neue Felder language (mit datalist Vorschlaegen) + bias.
- Kundenquellen-Filter-Bar: Sprachen-Dropdown.
- Kundenquellen-Tabellenkopf: Sprache (sortable) + Bias.

Frontend JS (sources.js)
- loadGlobalSources lädt /languages parallel zu /global + /global/stats,
  populiert beide Sprache-Dropdowns + datalist im Edit-Modal.
- renderGlobalSources: cols 11 -> 13, language+bias-Zellen
  (Bias mit Tooltip fuer Lang-Texte).
- filterGlobalSources: Sprache-Filter, Bias in Suche.
- editGlobalSource: language + bias laden.
- Form-Submit: language + bias mitgesendet.
- renderTenantSources: cols 8 -> 10, language+bias-Zellen.
- tenantFilters um language erweitert, applyTenantFilterAndSort prueft.

Cache-Buster ?v=20260509 (heute) bleibt - Tag wechselt erst morgen.
2026-05-09 04:35:08 +00:00
claude-dev
ff83f64aa6 Phase 14a: Integration-Tests (FastAPI TestClient, ohne DB)
tests/test_api_smoke.py:
  - 43 parametrisierte Auth-Coverage-Tests: jeder geschuetzte Endpoint
    muss ohne Authorization-Header 401 oder 403 liefern (nicht 200, nicht 500).
    Verhindert, dass jemand versehentlich einen Endpoint ohne
    get_current_admin schreibt.
  - 2 Tests fuer oeffentliche Auth-Endpoints (/magic-link, /verify):
    pruefen nur, dass NICHT 401/403 zurueckkommt.
  - 2 Static-Route-Tests (/, /dashboard) muessen 200 liefern.
  - TestClient(raise_server_exceptions=False) damit DB-Probleme nicht zu
    Test-Aborts werden.

tests/test_api_meta.py:
  - Integration-Tests fuer /api/sources/meta mit dependency_overrides
    (Mock get_current_admin). DB-frei, deshalb echte Endpoint-Logik
    vollstaendig durchgetestet.
  - 5 Tests: Schema vorhanden, Pflichtfelder, spezielle Lagen-Themen,
    alle 5 source-types.

Insgesamt: 80 Tests, 0.63s. Aufruf:
  PYTHONPATH=src ./venv/bin/python -m pytest tests/ -v

Phase 14b (echtes DB-Schema-Setup mit aiosqlite-In-Memory) folgt separat,
braucht Schema-Bootstrap - viel groesserer Aufwand fuer CRUD-Tests.
2026-05-09 04:25:39 +00:00
claude-dev
9d16aba5f9 CLAUDE.md: Cache-Buster-Regel dokumentiert (bei JS/CSS-Aenderungen ?v= bumpen) 2026-05-09 04:13:36 +00:00
claude-dev
4bebe9168a Cache-Buster ?v=20260509 an JS+CSS - Browser-Reload nach jeder JS-Aenderung erzwingen
Live-Symptom: User sah leere Audit-Tabelle obwohl Backend 22 Eintraege
lieferte. Ursache: Browser hatte alte audit.js gecached (von vor Phase 5/8b),
in der die Audit-Render-Logik anders war oder fehlte.

Aktuell ohne Cache-Buster cacht der Browser die JS aggressiv. Mit ?v=YYYYMMDD
laedt der Browser bei jedem Bump die neue Version.

Beim naechsten Frontend-Patch in dieser Verwaltung: Cache-Buster auf neues
Datum bumpen, damit alle Browser wieder neu laden.
2026-05-09 04:10:12 +00:00
claude-dev
00cd81f177 Phase 12: Test-Suite (30 pytest-Tests) + CLAUDE.md aktualisiert
tests/:
  conftest.py        - minimale Env-Vars + sys.path-Setup
  test_auth.py       - Magic-Token + JWT Round-Trip (4 Tests)
  test_audit.py      - diff() + _to_json() Helper (8 Tests)
  test_models.py     - Pydantic-Validierung (7 Tests)
  test_source_meta.py - Single Source of Truth Konsistenz (7 Tests)
  test_imports.py    - alle Backend-Module importierbar (4 Tests)

requirements-dev.txt: pytest, ftfy, pyflakes

Tests sind reine Unit-Tests (kein DB-Zugriff, kein HTTP-Server),
laufen in <0.5s, geben sofortiges Catch-Net fuer Syntax/Import-Bugs.

Aufruf: PYTHONPATH=src ./venv/bin/python -m pytest tests/ -v

CLAUDE.md erweitert um:
- Sektion Tests (Framework, Pfad, Ausfuehrung)
- Sektion Phasen-Historie (alle 12 Phasen der Aufraeum-Aktion 2026-05-09
  mit kurzer Erklaerung)
2026-05-09 03:55:30 +00:00
claude-dev
9000750df2 Phase 9: Code-Hygiene - alle pyflakes-Issues fixen
15 pyflakes-Warnings entfernt:
- src/audit.py: HTTPException (in router import statt helper, war hier ungenutzt)
- src/routers/auth.py: status (FastAPI-status ungenutzt)
- src/routers/audit.py: HTTPException (ungenutzt)
- src/routers/users.py: MAGIC_LINK_EXPIRE_MINUTES (ungenutzt)
- src/routers/sources.py: row_to_dict, _extract_domain, _detect_category,
  urlparse, status (alle ungenutzt - status.HTTP_* wird nirgendwo aufgerufen)
- src/routers/sources.py: 2x f-string ohne Placeholder (URL aktualisiert,
  Verbindung fehlgeschlagen) zu normalen Strings
- src/routers/sources.py: except httpx.ConnectError as e -> e ungenutzt, weg
- src/database.py: os ungenutzt
- src/models.py: EmailStr ungenutzt

Audit-Coverage geprueft: alle write-Endpoints in users.py rufen
_toggle_field() auf, das die log_action-Aufrufe macht. Keine Audit-Luecken.
Alle anderen Routers (organizations/licenses/dashboard/token_usage)
hatten bereits saubere Audit-Coverage.

Mojibake-Diagnose ueber alle src/*.py: 0 Treffer.
2026-05-09 03:49:53 +00:00
claude-dev
52a18fd9ec Phase 8a+8b: Pre-Commit-Hook fuer shared/-Drift + Audit-UI resource_id-Filter
Phase 8a (Hook):
- scripts/git-hooks/pre-commit: prueft bei Commits mit src/shared/-Aenderungen
  den Drift-Stand via sync_shared.py --check und gibt eine Warnung aus
  (blockiert NICHT - User entscheidet selbst, ob er zurueck will).
- scripts/install-hooks.sh: kopiert Hooks aus scripts/git-hooks/ nach
  .git/hooks/ (idempotent, ueberspringt user-eigene Hooks).

Phase 8b (Audit-UI):
- dashboard.html: Resource-ID Eingabefeld neben den anderen Audit-Filtern.
- audit.js: Filter-Listen erweitern, params um resource_id ergaenzt
  (Backend hatte den Filter seit Phase 5 schon).
- Damit ist die Audit-Spur einer einzelnen Ressource auch im Audit-Log-Tab
  filterbar (vorher nur per Direkt-URL bzw. per Quellen-Audit-Modal).
2026-05-09 03:40:00 +00:00
claude-dev
6b1cc975c0 Phase 7: sync_shared.py - Mojibake-fail-safe + Doku
- has_mojibake_markers Heuristik: erkennt Doppel/Triple-Encoded UTF-8
  (typische Latin-1-Sicht-Sequenzen wie ä ö ¤ Æ).
- fix_mojibake raises RuntimeError wenn ftfy fehlt UND Mojibake erkannt
  ist - verhindert Mojibake-Reimport durch Sync.
- main() faengt RuntimeError und exit 2 mit klarer Fehlermeldung.
- CLAUDE.md: Voraussetzung ftfy + fail-safe-Erklaerung erganzt.
2026-05-09 03:28:22 +00:00
claude-dev
a5f2c1d59e Phase 7: scripts/sync_shared.py + Lock-Mechanismus
scripts/sync_shared.py: hält src/shared/ in sync mit dem Monitor-Repo
- --check: Drift-Diagnose ohne Schreiben (Exit 1 bei auto-sync-Drift, 0 bei
  nur LOCKED-Drift = informativ)
- --apply: schreibt Drift, ueberspringt LOCKED_FILES
- Mojibake-Schutz via ftfy (Monitor-Originale haben teilweise noch Doppel-
  Encoded UTF-8, das fixed wird beim Sync)
- Imports-Patch: from agents. -> from shared.agents. (etc.) damit Module
  innerhalb von src/shared/ ihre Geschwister korrekt finden

LOCKED_FILES (nicht auto-syncbar):
- src/shared/services/source_health.py (Phase-2-Fork: tenant_id-Filter weg,
  History-Archivierung, Config-Konstanten - waere im Monitor unsinnig)

Hintergrund: Phase 1 hat src/shared/ als 1:1-Kopie aus dem Monitor angelegt.
Phase 2 hat source_health.py spezifisch fuer die Verwaltung erweitert.
Ein blinder Sync wuerde Phase-2-Aenderungen ueberschreiben - Lock-Mechanismus
verhindert das, meldet aber Drift zur Information.

CLAUDE.md: Sektion Shared-Module-Sync mit Workflow-Doku.
2026-05-09 03:26:44 +00:00
claude-dev
e9ff2bac02 Phase 6: Verwendungs-Sicht pro Grundquelle
Backend
- /api/sources/global liefert pro Quelle articles_7d, articles_30d und
  tenant_excluded_count (eine aggregierte Query mit CTEs, kein N+1).
- Match-Logik fuer Articles: LOWER(articles.source) = LOWER(sources.name)
  - articles.source_url ist Artikel-URL, NICHT Feed-URL, daher matcht das
    nicht mit sources.url. source-Name-Match liefert sinnvolle Treffer.
- tenant_excluded_count zaehlt distinct organization_ids aus
  user_excluded_domains (per LOWER(domain)-Match).

Frontend
- dashboard.html: zwei neue sortierbare Spalten Aktivitaet (7d/30d) +
  Sperren in der Grundquellen-Tabelle.
- style.css: .activity-cell + .exclude-badge Styles (mit zero-Variante
  fuer ruhigen Look bei keiner Aktivitaet/Sperre).
- sources.js:
  - cols 9 -> 11
  - Render: 7d-Wert fett, 30d-Wert dezent, Tooltip 7 Tage / 30 Tage
  - Sort-Logik: NUMERIC_FIELDS um articles_7d/articles_30d/tenant_excluded_count
    erweitert (numerischer Compare statt localeCompare)
2026-05-09 03:23:42 +00:00
claude-dev
6b70a7195e Phase 5: Audit-Spur pro Quelle (ausklappbares Modal)
Backend
- routers/audit.py: GET /api/audit-log nimmt jetzt resource_id als Filter
  (zusätzlich zu resource_type, action, admin_id, from_ts/to_ts).

Frontend
- dashboard.html: modalAudit (Modal) für die Audit-Spur einer Ressource.
- style.css: audit-entry Styles (action-Badge mit Farbcode pro Action-Typ,
  Diff als <details>-Block mit JSON-Pre).
- sources.js:
  - showSourceAudit(id, name) öffnet Modal, lädt /audit-log?resource_type=source&resource_id=...
  - renderAuditEntries: pro Eintrag Action-Badge + Meta (ts/admin/ip) +
    optional ausklappbarer Diff (before/after-JSON)
  - formatDateTime Helper
  - Audit-Button in der Aktionen-Spalte der Grundquellen-Tabelle
2026-05-09 03:19:32 +00:00
claude-dev
2001815e19 Phase 4: Admin-Übersicht erweitern (Stats-Bar + Health-Badge inline + Letzter Treffer)
Backend
- routers/sources.py:
  - GET /api/sources/global/stats NEU: aggregierte Counter
    nach Typ, Total-Articles, Health-Bilanz (errors/warnings/ok)
  - GET /api/sources/global liefert pro Quelle health_status
    (worst-case error > warning > ok, NULL wenn nie gecheckt)

Frontend
- dashboard.html sub-global-sources: Stats-Bar Container oben.
  Tabellenkopf bekommt zwei neue Spalten: Letzter Treffer + Health.
- style.css: .sources-stats-bar (analog Monitor-Style),
  .health-badge mit Varianten error/warning/ok/unknown.
- sources.js:
  - loadGlobalSources lädt parallel /global + /global/stats
  - renderGlobalStats: rendert Stats-Bar mit Total-Quellen,
    Counts pro Typ (aus META), Total-Articles, Health-Counters
  - renderGlobalSources: 9 Spalten statt 7, Letzter-Treffer + Health-Badge,
    typeLabel statt TYPE_LABELS-Direktzugriff
2026-05-09 03:12:30 +00:00
claude-dev
9350e4538a Phase 3c: Kundenquellen-Tab mit Filter + Sort + Bulk-Promote
Backend
- routers/sources.py: POST /api/sources/tenant/bulk-promote NEU
  Nimmt Liste von source_ids, promotet jede einzeln zur Grundquelle.
  Returns {promoted, skipped[{id,name,reason}], failed[{id,error}]}.
  Ueberspringt Quellen die schon Grundquellen sind oder deren URL bereits
  als Grundquelle existiert.

Frontend
- dashboard.html sub-tenant-sources: action-bar erweitert um
  3 Filter-Selects (Typ, Kategorie, Org), Bulk-Promote-Button.
  Tabelle bekommt Checkbox-Spalte + sortable Spalten (Sort-Icons).
- sources.js: tenant-Tab komplett refactored
  - State: tenantFilters, tenantSort, tenantSelected (Set)
  - applyTenantFilterAndSort: zentraler Render-Pfad mit allen Filtern + Sort
  - populateTenantFilters: Org-Liste aus Daten, Typ/Kategorie aus META
  - toggleTenantSelect / toggleTenantSelectAll: Selection-Logik
  - bulkPromoteSelected: showConfirm -> POST -> Toast mit Ergebnis
  - renderTenantSources: Checkbox-Spalte, dynamische typeLabel/categoryLabel
  - Counter zeigt jetzt N gefiltert / Gesamt
2026-05-09 03:07:55 +00:00
claude-dev
eda60f9299 Phase 3b: Kategorien/Typen aus Backend (/api/sources/meta)
- src/source_meta.py NEU: SOURCE_CATEGORIES + SOURCE_TYPES als
  Single Source of Truth (Liste mit {key, label}). category_label/type_label
  Lookup-Funktionen, get_meta() liefert das gesamte Set.
- src/routers/sources.py: GET /api/sources/meta ergänzt (admin-auth,
  liefert Kategorien + Typen)
- src/static/js/app.js: window.META + loadMeta() + categoryLabel/typeLabel +
  populateSelect Helper. Beim DOMContentLoaded wird Meta geladen, befüllt
  globale CATEGORY_LABELS und TYPE_LABELS.
- src/static/js/sources.js: hardcoded const CATEGORY_LABELS und TYPE_LABELS
  entfernt - werden jetzt aus app.js loadMeta() global gesetzt.
  loadGlobalSources() ruft populateSelect() für die Filter-Dropdowns auf.
- src/static/js/source-health.js: gleiche hardcoded Listen entfernt.
- src/static/dashboard.html: <option>-Listen für globalFilterCategory und
  globalFilterType entfernt (nur noch default Alle). JS befüllt sie dynamisch.

Ergebnis: Bei einer neuen Kategorie nur source_meta.py anpassen,
keine 3-fach-Pflege mehr in HTML+sources.js+source-health.js.
2026-05-09 03:05:16 +00:00
claude-dev
5a87168416 Phase 3a Frontend-Hygiene: Toast statt alert/confirm
- src/static/css/style.css: Toast-Styles (.toast-container, .toast,
  Varianten info/success/warning/error, Animations)
- src/static/dashboard.html: <div id=toastContainer> vor </body>,
  Cancel-Button im Confirm-Modal bekommt id=confirmCancelBtn
- src/static/js/app.js:
  - showToast(msg, type) neu - links oben, autoclose 3.5s (error: 6s)
  - showConfirm(title, text, callback?) jetzt Promise<boolean>-fähig
    (Backwards-compat: Legacy-Callback wird bei OK weiter aufgerufen)
  - Cancel/Close-Hooks am modalConfirm setzen Promise auf false
- alle 18 alert() in app.js / source-health.js / sources.js durch
  showToast(msg, type) ersetzt (type je nach Kontext error/success/warning/info)
- 2 confirm() in source-health.js durch await showConfirm() ersetzt
2026-05-09 03:02:32 +00:00
claude-dev
ca4422ccd1 Phase 2 Health-Check tenant-fähig + Historie
- migrations/2026-05-09d_source_health_history.py NEU: source_health_history-Tabelle
  (Append-only Verlauf der Health-Check-Runs mit run_id und archived_at)
- shared/services/source_health.py:
  - tenant_id IS NULL Filter raus -> auch Tenant-Quellen werden gecheckt
  - Mojibake (Triple-Encoded UTF-8) via ftfy gefixt
  - DELETE FROM source_health_checks: vorher Stand mit run_id (uuid4) in
    source_health_history archivieren -> kein Datenverlust mehr
  - User-Agent + Timeout aus config.HEALTH_CHECK_* statt hardcoded
- routers/sources.py /health/run-stream: gleiche Änderungen wie oben
- config.py: HEALTH_CHECK_USER_AGENT + HEALTH_CHECK_TIMEOUT_S ergänzt
2026-05-09 02:56:49 +00:00
claude-dev
650f8b0342 Phase 1 Backend-Hygiene Quellen
- src/shared/ neu: source_rules, services/source_health, services/source_suggester,
  agents/claude_client als lokale Kopien aus dem Monitor-Repo (statt sys.path-Hack
  auf /home/claude-dev/AegisSight-Monitor/src - 5 sys.path.insert-Aufrufe entfernt)
- src/routers/sources.py: Imports auf shared. umgestellt, Header neu sortiert
  (Docstring zuerst, sys/os raus), Mojibake (Triple-Encoded UTF-8) via ftfy gefixt
- src/shared/services/source_suggester.py: Mojibake (Double-Encoded UTF-8) via ftfy gefixt
- migrations/2026-05-09c_source_health_schema.py NEU: source_health_checks +
  source_suggestions Tabellen mit Indizes (idempotent), gezogen aus 3 Inline-DDL-Blöcken
  in routers/sources.py (/health/run, /health/run-stream, /health/search-fix)
- src/config.py: CLAUDE_MODEL_MEDIUM und CLAUDE_MODEL_STANDARD ergänzt
  (vorher nur CLAUDE_MODEL_FAST - claude_client.py braucht alle drei)
- requirements.txt: httpx + feedparser explizit (im venv schon vorhanden, jetzt dokumentiert)
2026-05-09 02:47:13 +00:00
claude-dev
7c741062a9 Auth: Verwaltung auf Magic-Link umstellen (Passwort-Login entfernt)
Backend:
- src/routers/auth.py NEU: POST /api/auth/magic-link + POST /api/auth/verify
- src/auth.py: verify_password/hash_password raus, generate_magic_token rein
- src/main.py: alter Login-Endpoint + Brute-Force-Logik raus, neuer auth-Router eingebunden
- src/config.py: ALLOWED_EMAIL + PORTAL_MAGIC_LINK_* hinzu
- src/models.py: LoginRequest raus, MagicLinkRequest etc. rein
- src/email_utils/templates.py: portal_magic_link_email Template

Frontend:
- src/static/index.html: Email-Eingabe statt Passwort, Token-Verify-Logik fuer ?token= aus URL

Datenbank-Migration (migrations/2026-05-09_portal_magic_link.py):
- portal_magic_links + portal_magic_link_attempts neu
- portal_login_attempts gedroppt
- portal_admins.email Spalte hinzu, password_hash geleert

Whitelist info@aegis-sight.de, Rate-Limit 5/15 Min, Anti-Enumeration generische Antwort.
2026-05-09 02:21:40 +00:00
claude-dev
e6fdc5cfa0 test: trigger via eigene Staging-Subdomain 2026-05-09 02:02:04 +00:00
claude-dev
98b8780248 test: trigger staging-Webhook (Phase 0h) 2026-05-09 01:42:38 +00:00
claude-dev
e52202b087 CLAUDE.md: Vollwertige Projekt-Doku mit Übersicht, Struktur, Regeln und Staging-Plan 2026-05-09 01:33:47 +00:00
claude-dev
670a6617a7 Migration: parallele translate-Batches + busy_timeout/WAL
- asyncio.Semaphore(4) + as_completed: 4 Worker parallel statt sequenziell
- Per-Batch commit: kein Datenverlust bei Abbruch
- sqlite3 timeout=60 + PRAGMA busy_timeout=60000 + journal_mode=WAL: kein Crash bei aktivem Live-Write-Lock
- Bessere Progress-Logs (alle 20 Batches)
2026-05-09 01:32:51 +00:00
57 geänderte Dateien mit 8374 neuen und 503 gelöschten Zeilen

3
.gitignore vendored
Datei anzeigen

@@ -3,3 +3,6 @@ __pycache__/
.env .env
logs/ logs/
.venv/ .venv/
venv/
data/
*.bak-*

265
CLAUDE.md Normale Datei
Datei anzeigen

@@ -0,0 +1,265 @@
# AegisSight-Monitor-Verwaltung
> Admin-Portal für Mandanten, Lizenzen, Nutzer, Grundquellen und Token-Verbrauch des AegisSight-Monitors
## Übersicht
```yaml
projekt: AegisSight-Monitor-Verwaltung
url: https://monitor-verwaltung.aegis-sight.de
server: ssh monitor (46.225.141.13, User: claude-dev)
pfad: /home/claude-dev/AegisSight-Monitor-Verwaltung
quellcode: /home/claude-dev/AegisSight-Monitor-Verwaltung/src/
datenbank: /mnt/gitea/osint-data/osint.db (SQLite WAL, geteilt mit AegisSight-Monitor)
gitea: https://gitea-undso.aegis-sight.de/AegisSight/AegisSight-Monitor-Verwaltung
service: verwaltungsportal.service (systemd, Port 8892, Nginx Reverse Proxy)
venv: /home/claude-dev/.venvs/verwaltung/ (Python 3.12)
```
## Technologie-Stack
```yaml
backend:
framework: FastAPI + Uvicorn
datenbank: SQLite WAL (aiosqlite, async) - geteilt mit AegisSight-Monitor
auth: Passwort-Login (bcrypt, JWT HS256, 8 Stunden)
brute_force_schutz: 5 Fehlversuche pro 15 Minuten Block, Aufräumen nach 24 Stunden
audit: jede Mutation via log_action -> portal_audit Tabelle
email: aiosmtplib (smtp.ionos.de:587 TLS) - für Magic-Link-Einladungen Richtung Monitor
frontend:
typ: Vanilla JS (kein Framework, kein Build-Step)
design: AegisSight Dark Theme (gemeinsame Optik wie Monitor)
fonts: Poppins (Titel), Inter (Body)
```
## Projektstruktur
```yaml
src/:
main.py: "FastAPI App, Login + Brute-Force-Logik, Lifespan, statische Routen"
config.py: "Konfiguration (DB-Pfad, JWT, SMTP, Source-Discovery-Konstanten)"
auth.py: "Passwort-Hash (bcrypt), JWT erstellen/verifizieren, get_current_admin Dependency"
database.py: "DB-Connection-Pool, Schema-Helper"
models.py: "Pydantic Request/Response-Schemas"
audit.py: "log_action, get_client_ip, row_to_dict, /api/audit Router"
routers/:
organizations.py: "CRUD Mandanten (organizations + Org-Settings + Token-Budget)"
licenses.py: "CRUD Lizenzen (Org-Lizenzen, Ablauf, Nutzer-Limit, Module)"
users.py: "CRUD User pro Org, Magic-Link-Einladung an info@aegis-sight.de"
sources.py: "Grundquellen, Tenant-Quellen-Übersicht, Discovery, Health-Check, KI-Vorschläge"
dashboard.py: "Aggregat-Endpoints für Übersichts-Tab"
token_usage.py: "Token-Verbrauch pro Org/Monat, Budget-Steuerung"
audit.py: "Audit-Log-Abfrage, Filter"
email_utils/:
sender.py: "Async SMTP Versand"
templates.py: "HTML-Templates (Magic-Link für neue Nutzer)"
static/:
index.html: "Login (Passwort)"
dashboard.html: "Hauptdashboard mit Tabs (Dashboard, Orgs, Lizenzen, Quellen, Audit)"
favicon.svg: "AegisSight Logo"
css/: "Stylesheets (Dark Theme)"
js/:
app.js: "Hauptlogik, Login, Tab-Switching, Dashboard-Render"
sources.js: "Grundquellen + Kundenquellen Management"
source-health.js: "Quellen-Health & KI-Vorschläge"
audit.js: "Audit-Log Tab"
migrations/:
einmal_migrationen: "Backfill-Skripte (DE-Übersetzungen, Umlaute, HTML-Strip etc.)"
```
## Datenbank-Tabellen (relevant fürs Portal)
```yaml
kern: "organizations, licenses, users, magic_links, portal_admins"
quellen: "sources (geteilt mit Monitor), source_health_checks, source_suggestions"
verbrauch: "token_usage_monthly"
audit: "portal_audit"
```
## Verwandte Projekte
```yaml
monitor:
pfad: /home/claude-dev/AegisSight-Monitor
url: https://monitor.aegis-sight.de
service: aegis-monitor.service (Port 8891)
geteilte_db: ja
geteilte_module:
- source_rules: "Domain-Erkennung, RSS-Discovery, Claude-Feed-Bewertung"
- services/source_health: "Health-Check-Logik"
- services/source_suggester: "KI-Quellenvorschläge"
- agents/claude_client: "Shared Claude CLI Client"
hinweis: "Verwaltung importiert diese Module; sys.path-Hacks sollen schrittweise durch eigene Kopien in src/shared/ ersetzt werden"
```
## Regeln
```yaml
regeln:
- "Jede Änderung MUSS sofort committed und nach Gitea gepusht werden"
- "Echte Umlaute (ü, ä, ö, ß), niemals Umschreibungen (ue, ae, oe, ss) - gilt auch in Code-Kommentaren, Logs, UI-Texten"
- "Keine Passwörter oder Secrets in den Code committen (.env nicht im Repo)"
- "Service nach Backend-Änderungen: sudo systemctl restart verwaltungsportal"
- "Frontend-Änderungen (HTML/JS/CSS) brauchen keinen Neustart"
- "Backup-Dateien (.bak) nicht committen, vor Push löschen"
- "Code-Fixes immer über develop -> Staging -> Promote, niemals direkt auf main"
- "Direkte Live-DB-Patches nur nach Vorab-Ankündigung"
```
## Changelog-Workflow
Bei JEDER Änderung an dieser Anwendung müssen zwei Dinge passieren:
1. **TaskMate Wissensdatenbank** (Kategorie: "Changelog Verwaltung", category_id=34)
2. **Git Commit + Push zu Gitea**
Siehe AegisSight-Monitor/CLAUDE.md für vollständiges Beispiel des TaskMate-Aufrufs.
## Staging-Umgebung
Wird im Rahmen des Aufräum-Plans (Phase 0) aufgesetzt. Geplante Eckdaten:
```yaml
staging:
url: https://staging.monitor-verwaltung.aegis-sight.de
server: 46.225.141.13 (gleicher Host wie Live)
pfad: /home/claude-dev/AegisSight-Monitor-Verwaltung-staging
branch: develop
port: 18892 (Live: 8892)
service: aegis-verwaltung-staging.service
venv: /home/claude-dev/AegisSight-Monitor-Verwaltung-staging/venv (eigenes venv)
zugriff: Magic-Link-Login an info@aegis-sight.de (Cookie 30 Tage, vorgelagerter Auth-Service)
datenbank:
plan: eigene SQLite-Kopie der Live-DB in ~/AegisSight-Monitor-Verwaltung-staging/data/osint.db
drift: gewollt - Änderungen in Staging beeinflussen Live nicht
abstimmung: gemeinsame DB mit Monitor-Staging möglich, wird beim Aufbau entschieden
auth_service:
pfad: /opt/aegis-verwaltung-staging-auth
service: aegis-verwaltung-staging-auth.service
port: 127.0.0.1:8098 (Monitor-Staging-Auth liegt schon auf 8095)
cookie_domain: staging.monitor-verwaltung.aegis-sight.de
cookie_name: aegis_verwaltung_staging_auth
```
### Workflow develop -> Staging -> Live (Plan)
1. **Änderung in develop machen**:
```bash
cd ~/AegisSight-Monitor-Verwaltung
git checkout develop
# Änderung
git add . && git commit -m '...' && git push origin develop
```
2. **Auto-Deploy** (geplant, Phase 0f): Gitea-Webhook -> aegis-staging-deploy.service -> pullt develop ins Staging-Verzeichnis -> restartet aegis-verwaltung-staging
3. **Auf https://staging.monitor-verwaltung.aegis-sight.de prüfen**
4. **Promote zu Live** über https://deploy.aegis-sight.de (Phase 0g)
-> Gitea-PR develop->main automerge -> Live-Listener pullt main -> systemctl restart verwaltungsportal
## Shared-Module-Sync (src/shared/)
```yaml
shared:
pfad: src/shared/
inhalt: source_rules + services/source_health + services/source_suggester + agents/claude_client
herkunft: lokale Kopie aus AegisSight-Monitor/src/
drift_lösung: scripts/sync_shared.py
workflow:
pruefen: "./venv/bin/python scripts/sync_shared.py --check"
anwenden: "./venv/bin/python scripts/sync_shared.py --apply"
locked_files:
src/shared/services/source_health.py:
grund: "Verwaltungs-Fork mit tenant_id-Filter weg + Historie + Config-Konstanten"
hinweis: "Auto-Sync schreibt NICHT. Drift wird gemeldet, manuell entscheiden."
voraussetzung:
ftfy installieren: "pip install ftfy" (im venv des Repos)
grund: "Sync-Skript fixed Mojibake aus Monitor-Originalen automatisch."
fail_safe: "Ohne ftfy bricht das Skript ab wenn Mojibake erkannt - schuetzt vor Mojibake-Reimport."
beim_drift:
nicht_locked: "einfach --apply, dann committen"
locked: "diff anschauen, ueberlegen ob die Monitor-Aenderung im Verwaltungs-Fork sinnvoll ist"
```
## Tests
```yaml
tests:
framework: pytest
pfad: tests/
ausfuehren: "PYTHONPATH=src ./venv/bin/python -m pytest tests/ -v"
install: "./venv/bin/pip install -r requirements-dev.txt"
abdeckung:
test_auth.py: Magic-Token + JWT Round-Trip
test_audit.py: diff() + _to_json() Helper
test_models.py: Pydantic-Validierung (MagicLink, Org, License, User)
test_source_meta.py: Single Source of Truth Konsistenz
test_imports.py: alle Backend-Module importierbar (Syntax-Catchnet)
philosophie:
- reine Unit-Tests, kein DB-Zugriff, kein HTTP-Server
- schnell (<1 Sekunde fuer das ganze Set)
- sollten lokal vor jedem Commit laufen
```
## Phasen-Historie (Aufraeum-Aktion 2026-05-09)
```yaml
phasen:
P0: Verwaltungs-Staging mit develop-Branch + Auto-Deploy + Promote-UI
P0i: Login-Auth komplett auf Magic-Link (Passwort entfernt)
P1: Backend-Hygiene Quellen (sys.path-Hack weg, Mojibake gefixt, DDL ausgelagert)
P2: Health-Check tenant-faehig + source_health_history (Verlauf bleibt)
P3a: Toast-System statt alert/confirm
P3b: GET /api/sources/meta - Single Source of Truth fuer Kategorien/Typen
P3c: Kundenquellen-Tab Filter+Sort+Bulk-Promote
P4: Stats-Bar + Health-Badge inline + Letzter-Treffer-Spalte
P5: Audit-Spur pro Quelle (ausklappbares Modal)
P6: Verwendungs-Sicht: Aktivitaet 7d/30d + Tenant-Sperren
P7: scripts/sync_shared.py + Lock-Mechanismus + Mojibake-fail-safe
P8a: Pre-Commit-Hook fuer src/shared/ Drift
P8b: Audit-Log UI um resource_id-Filter
P8c: Monitor-Repo Mojibake gefixt (source_suggester + source_health)
P9: Code-Hygiene - alle pyflakes-Issues bereinigt
P10: Bug 2 Buckelwal-Diagnose: Lagentitel-Eigennamen als Pflicht-Keywords
P11: Backup-Rotation via Cron (KEEP=5 letzte .bak-Files)
P12: Test-Suite (pytest, 30 Tests) + Doku
```
## Cache-Buster bei Frontend-Aenderungen
```yaml
cache_buster:
hintergrund: |
src/static/dashboard.html und index.html laden JS+CSS mit Versions-Suffix
?v=YYYYMMDD. Ohne den Bump cacht der Browser alte JS aggressiv.
Symptom: User sieht alte UI obwohl Live aktuell deployed ist.
regel: |
Bei JEDER Aenderung an .js oder .css unter src/static/ das ?v=YYYYMMDD
auf das aktuelle Datum bumpen. Genuegt einmal pro Tag - mehrfache
Aenderungen am selben Tag teilen sich die Version.
betroffene_files:
- src/static/dashboard.html (4x JS, 1x CSS)
- src/static/index.html (1x CSS)
schnelle_aktualisierung: |
sed -i 's/?v=2026[0-9]\{4\}/?v='$(date +%Y%m%d)'/g' src/static/dashboard.html src/static/index.html
testen: "Strg+Shift+R im Browser (Hard-Reload) zeigt sofort die neue Version."
```

46
RELEASES.json Normale Datei
Datei anzeigen

@@ -0,0 +1,46 @@
[
{
"version": "2026-05-22T12:41Z",
"date": "2026-05-22",
"title": "X-Recherche-Konten im Verwaltungsportal verwalten",
"items": [
"Recherche-Konten für X (ehemals Twitter) können jetzt direkt im Verwaltungsportal hinzugefügt, bearbeitet und entfernt werden."
]
},
{
"version": "2026-05-22T11:13Z",
"date": "2026-05-22",
"title": "Interne Verbesserungen",
"items": []
},
{
"version": "2026-05-22T11:13Z",
"date": "2026-05-22",
"title": "Interne Verbesserungen",
"items": []
},
{
"version": "2026-05-22T11:09Z",
"date": "2026-05-22",
"title": "X-Konten direkt im Verwaltungsportal verwalten",
"items": [
"X-Konten können jetzt zentral über das Verwaltungsportal angelegt und verwaltet werden."
]
},
{
"version": "2026-05-22T09:37Z",
"date": "2026-05-22",
"title": "Neue Übersetzungsfunktion im Dashboard",
"items": [
"Texte können jetzt im Dashboard per Klick manuell übersetzt werden."
]
},
{
"version": "2026-05-17T19:19Z",
"date": "2026-05-17",
"title": "83 neue Quellen für Militär, Polizei-Technik & Waffen",
"items": [
"83 neue Quellen aus den Bereichen Militär, Polizei-Technik und Waffen sind jetzt verfügbar."
]
}
]

Datei anzeigen

@@ -0,0 +1,91 @@
"""Migration 2026-05-09: Magic-Link-Auth für Verwaltungsportal.
Erstellt zwei Tabellen:
- portal_magic_links: Token-Speicher (E-Mail, Token, Ablauf, used_at)
- portal_magic_link_attempts: Brute-Force-/Rate-Limit-Tracking (IP, E-Mail, ts)
Außerdem:
- portal_login_attempts wird gedroppt (alte Passwort-Login-Tabelle, obsolet)
- portal_admins.password_hash wird auf '' gesetzt (Spalten bleiben für Audit-Spur erhalten)
Ausführung:
DB_PATH=/home/claude-dev/osint-data/osint.db python3 migrations/2026-05-09_portal_magic_link.py
DB_PATH=/home/claude-dev/AegisSight-Monitor-staging/data/osint.db python3 migrations/2026-05-09_portal_magic_link.py
"""
import os
import sqlite3
import sys
def main(db_path: str) -> int:
if not os.path.exists(db_path):
print(f"FEHLER: DB nicht gefunden: {db_path}", file=sys.stderr)
return 1
conn = sqlite3.connect(db_path, timeout=60)
conn.execute("PRAGMA busy_timeout = 60000")
conn.execute("PRAGMA journal_mode = WAL")
print(f"Migration auf {db_path}")
# 1. Magic-Link-Tabellen anlegen
conn.executescript("""
CREATE TABLE IF NOT EXISTS portal_magic_links (
id INTEGER PRIMARY KEY AUTOINCREMENT,
email TEXT NOT NULL,
token TEXT UNIQUE NOT NULL,
expires_at TIMESTAMP NOT NULL,
used_at TIMESTAMP,
ip_address TEXT,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
CREATE INDEX IF NOT EXISTS idx_portal_magic_links_token ON portal_magic_links(token);
CREATE INDEX IF NOT EXISTS idx_portal_magic_links_email ON portal_magic_links(email);
CREATE TABLE IF NOT EXISTS portal_magic_link_attempts (
id INTEGER PRIMARY KEY AUTOINCREMENT,
ip TEXT NOT NULL,
email TEXT NOT NULL,
ts TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
CREATE INDEX IF NOT EXISTS idx_portal_magic_link_attempts_lookup
ON portal_magic_link_attempts(email, ip, ts);
""")
print(" + portal_magic_links angelegt (oder vorhanden)")
print(" + portal_magic_link_attempts angelegt (oder vorhanden)")
# 2. Alte Brute-Force-Tabelle für Passwort-Login droppen
cur = conn.execute(
"SELECT name FROM sqlite_master WHERE type='table' AND name='portal_login_attempts'"
)
if cur.fetchone():
conn.execute("DROP TABLE portal_login_attempts")
print(" - portal_login_attempts gedroppt (Passwort-Login obsolet)")
else:
print(" = portal_login_attempts war bereits weg")
# 3. portal_admins.email-Spalte hinzufügen (falls noch nicht da) - für künftige Mehr-Admin-Erweiterung
cols = [c[1] for c in conn.execute("PRAGMA table_info(portal_admins)")]
if "email" not in cols:
conn.execute("ALTER TABLE portal_admins ADD COLUMN email TEXT")
print(" + portal_admins.email Spalte hinzugefügt")
else:
print(" = portal_admins.email war bereits da")
# 4. password_hash auf leeren String setzen (Spalte bleibt für Audit, aber unbenutzt)
cur = conn.execute("SELECT COUNT(*) FROM portal_admins WHERE password_hash != ''")
if cur.fetchone()[0] > 0:
conn.execute("UPDATE portal_admins SET password_hash = ''")
print(" ~ portal_admins.password_hash geleert (Auth ab jetzt nur per Magic-Link)")
else:
print(" = portal_admins.password_hash war bereits leer")
conn.commit()
conn.close()
print("Migration abgeschlossen.")
return 0
if __name__ == "__main__":
db_path = os.environ.get("DB_PATH", "/home/claude-dev/osint-data/osint.db")
sys.exit(main(db_path))

Datei anzeigen

@@ -0,0 +1,65 @@
"""Migration 2026-05-09c: source_health_checks und source_suggestions Schema.
Diese DDL stand bislang inline in routers/sources.py (in /health/run, /health/run-stream,
/health/search-fix). Phase 1 zieht sie hier raus, damit die Endpoints kein DDL mehr ausführen.
Ausführung:
DB_PATH=/home/claude-dev/osint-data/osint.db python3 migrations/2026-05-09c_source_health_schema.py
DB_PATH=/home/claude-dev/AegisSight-Monitor-staging/data/osint.db python3 migrations/2026-05-09c_source_health_schema.py
"""
import os
import sqlite3
import sys
def main(db_path: str) -> int:
if not os.path.exists(db_path):
print(f"FEHLER: DB nicht gefunden: {db_path}", file=sys.stderr)
return 1
conn = sqlite3.connect(db_path, timeout=60)
conn.execute("PRAGMA busy_timeout = 60000")
conn.execute("PRAGMA journal_mode = WAL")
print(f"Migration auf {db_path}")
conn.executescript("""
CREATE TABLE IF NOT EXISTS source_health_checks (
id INTEGER PRIMARY KEY AUTOINCREMENT,
source_id INTEGER NOT NULL REFERENCES sources(id) ON DELETE CASCADE,
check_type TEXT NOT NULL,
status TEXT NOT NULL,
message TEXT,
details TEXT,
checked_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
CREATE INDEX IF NOT EXISTS idx_source_health_checks_source ON source_health_checks(source_id);
CREATE INDEX IF NOT EXISTS idx_source_health_checks_status ON source_health_checks(status);
CREATE TABLE IF NOT EXISTS source_suggestions (
id INTEGER PRIMARY KEY AUTOINCREMENT,
suggestion_type TEXT NOT NULL,
title TEXT NOT NULL,
description TEXT,
source_id INTEGER REFERENCES sources(id) ON DELETE SET NULL,
suggested_data TEXT,
priority TEXT DEFAULT 'medium',
status TEXT DEFAULT 'pending',
reviewed_at TIMESTAMP,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
CREATE INDEX IF NOT EXISTS idx_source_suggestions_status ON source_suggestions(status);
CREATE INDEX IF NOT EXISTS idx_source_suggestions_source ON source_suggestions(source_id);
""")
print(" + source_health_checks + Indizes (idempotent)")
print(" + source_suggestions + Indizes (idempotent)")
conn.commit()
conn.close()
print("Migration abgeschlossen.")
return 0
if __name__ == "__main__":
db_path = os.environ.get("DB_PATH", "/home/claude-dev/osint-data/osint.db")
sys.exit(main(db_path))

Datei anzeigen

@@ -0,0 +1,57 @@
"""Migration 2026-05-09d: source_health_history (Verlauf der Health-Checks).
Bislang wurde vor jedem Health-Check-Run die Tabelle source_health_checks geleert
(DELETE FROM source_health_checks). Damit ging die Historie verloren - kein
Trend, keine Vergleichsmöglichkeit über Runs.
Diese Migration legt eine reine Append-Tabelle source_health_history an.
Vor jedem Health-Check-Run wird der aktuelle Stand von source_health_checks
hier archiviert (mit run_id und archived_at).
Ausführung:
DB_PATH=/home/claude-dev/osint-data/osint.db python3 migrations/2026-05-09d_source_health_history.py
DB_PATH=/home/claude-dev/AegisSight-Monitor-staging/data/osint.db python3 migrations/2026-05-09d_source_health_history.py
"""
import os
import sqlite3
import sys
def main(db_path: str) -> int:
if not os.path.exists(db_path):
print(f"FEHLER: DB nicht gefunden: {db_path}", file=sys.stderr)
return 1
conn = sqlite3.connect(db_path, timeout=60)
conn.execute("PRAGMA busy_timeout = 60000")
conn.execute("PRAGMA journal_mode = WAL")
print(f"Migration auf {db_path}")
conn.executescript("""
CREATE TABLE IF NOT EXISTS source_health_history (
id INTEGER PRIMARY KEY AUTOINCREMENT,
run_id TEXT NOT NULL,
source_id INTEGER NOT NULL,
check_type TEXT NOT NULL,
status TEXT NOT NULL,
message TEXT,
details TEXT,
checked_at TIMESTAMP,
archived_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
CREATE INDEX IF NOT EXISTS idx_source_health_history_run ON source_health_history(run_id);
CREATE INDEX IF NOT EXISTS idx_source_health_history_source ON source_health_history(source_id, archived_at DESC);
CREATE INDEX IF NOT EXISTS idx_source_health_history_status ON source_health_history(status, archived_at DESC);
""")
print(" + source_health_history + Indizes (idempotent)")
conn.commit()
conn.close()
print("Migration abgeschlossen.")
return 0
if __name__ == "__main__":
db_path = os.environ.get("DB_PATH", "/home/claude-dev/osint-data/osint.db")
sys.exit(main(db_path))

Datei anzeigen

@@ -0,0 +1,77 @@
"""Migration 2026-05-09e: sources.fetch_strategy.
Neues Feld zur Steuerung wie der Health-Check / RSS-Parser eine Quelle abrufen soll:
default: normaler User-Agent (AegisSight-HealthCheck), bei 403/429 Retry mit Googlebot.
googlebot: direkt mit Googlebot-UA (fuer Sites die SEO-freundlich sind).
paywall: bei 403 zweite Anfrage via removepaywalls.com (fuer Spiegel+/SZ+/FT etc.).
skip: Health-Check ueberspringen (bekannte unerreichbare Quellen).
Ausfuehrung:
DB_PATH=/home/claude-dev/osint-data/osint.db python3 migrations/2026-05-09e_fetch_strategy.py
DB_PATH=/home/claude-dev/AegisSight-Monitor-staging/data/osint.db python3 migrations/2026-05-09e_fetch_strategy.py
"""
import os
import sqlite3
import sys
def main(db_path: str) -> int:
if not os.path.exists(db_path):
print(f"FEHLER: DB nicht gefunden: {db_path}", file=sys.stderr)
return 1
conn = sqlite3.connect(db_path, timeout=60)
conn.execute("PRAGMA busy_timeout = 60000")
conn.execute("PRAGMA journal_mode = WAL")
print(f"Migration auf {db_path}")
cols = [c[1] for c in conn.execute("PRAGMA table_info(sources)")]
if "fetch_strategy" in cols:
print(" = sources.fetch_strategy war bereits da")
else:
conn.execute(
"ALTER TABLE sources ADD COLUMN fetch_strategy TEXT DEFAULT 'default'"
)
print(" + sources.fetch_strategy hinzugefügt (Default 'default')")
# Bekannte Paywall-Domains pre-flagging
paywall_domains = (
"ft.com",
"wsj.com",
"nzz.ch",
"handelsblatt.com",
"wiwo.de",
)
for dom in paywall_domains:
conn.execute(
"UPDATE sources SET fetch_strategy = 'paywall' "
"WHERE LOWER(domain) = ? AND COALESCE(fetch_strategy, 'default') = 'default'",
(dom,),
)
print(" ~ paywall-Strategie für bekannte Domains gesetzt (FT, WSJ, NZZ, Handelsblatt, WiWo)")
# Bekannte Bot-Block-Domains: Googlebot probieren
bot_block_domains = (
"rheinische-post.de",
"rp-online.de",
"verfassungsschutz.de",
)
for dom in bot_block_domains:
conn.execute(
"UPDATE sources SET fetch_strategy = 'googlebot' "
"WHERE LOWER(domain) LIKE ? AND COALESCE(fetch_strategy, 'default') = 'default'",
(f"%{dom}",),
)
print(" ~ googlebot-Strategie für bekannte Bot-Block-Domains")
conn.commit()
conn.close()
print("Migration abgeschlossen.")
return 0
if __name__ == "__main__":
db_path = os.environ.get("DB_PATH", "/home/claude-dev/osint-data/osint.db")
sys.exit(main(db_path))

Datei anzeigen

@@ -23,19 +23,23 @@ from datetime import datetime
# Pfade fuer Imports (Live-Repo bevorzugt, Staging-Fallback) # Pfade fuer Imports (Live-Repo bevorzugt, Staging-Fallback)
sys.path.insert(0, "/home/claude-dev/AegisSight-Monitor/src") sys.path.insert(0, "/home/claude-dev/AegisSight-Monitor/src")
try: try:
from agents.translator import translate_articles from agents.translator import translate_articles_batch, DEFAULT_BATCH_SIZE
from agents.claude_client import UsageAccumulator from agents.claude_client import UsageAccumulator
from services.post_refresh_qc import normalize_german_umlauts from services.post_refresh_qc import normalize_german_umlauts
except ImportError: except ImportError:
sys.path.insert(0, "/home/claude-dev/AegisSight-Monitor-staging/src") sys.path.insert(0, "/home/claude-dev/AegisSight-Monitor-staging/src")
from agents.translator import translate_articles from agents.translator import translate_articles_batch, DEFAULT_BATCH_SIZE
from agents.claude_client import UsageAccumulator from agents.claude_client import UsageAccumulator
from services.post_refresh_qc import normalize_german_umlauts from services.post_refresh_qc import normalize_german_umlauts
async def main_async(db_path: str, dry_run: bool, limit: int | None) -> int: async def main_async(db_path: str, dry_run: bool, limit: int | None) -> int:
db = sqlite3.connect(db_path) db = sqlite3.connect(db_path, timeout=60)
db.row_factory = sqlite3.Row db.row_factory = sqlite3.Row
# Live-Service haelt regelmaessig den Write-Lock. Statt sofort zu crashen
# warten wir bis zu 60 Sekunden auf den Lock.
db.execute("PRAGMA busy_timeout = 60000")
db.execute("PRAGMA journal_mode = WAL")
sql = """SELECT id, incident_id, headline, content_original, language sql = """SELECT id, incident_id, headline, content_original, language
FROM articles FROM articles
@@ -60,42 +64,76 @@ async def main_async(db_path: str, dry_run: bool, limit: int | None) -> int:
return 0 return 0
usage = UsageAccumulator() usage = UsageAccumulator()
translations = await translate_articles(rows, output_lang="de", total = len(rows)
usage_accumulator=usage) batch_size = DEFAULT_BATCH_SIZE
print(f"Uebersetzt: {len(translations)} von {len(rows)}") PARALLEL_WORKERS = 4
updated = 0 updated = 0
for t in translations: translated = 0
hd = t.get("headline_de") sample_translations = []
cd = t.get("content_de") completed_count = 0
if hd: print(f"Starte parallele Verarbeitung: Batches a {batch_size}, {PARALLEL_WORKERS} Worker parallel...", flush=True)
hd, _ = normalize_german_umlauts(hd)
if cd: # Batches vorbereiten
cd, _ = normalize_german_umlauts(cd) batches = [rows[i:i + batch_size] for i in range(0, total, batch_size)]
if hd or cd: semaphore = asyncio.Semaphore(PARALLEL_WORKERS)
db.execute(
"UPDATE articles SET headline_de = COALESCE(?, headline_de), " async def process_batch(batch):
"content_de = COALESCE(?, content_de) WHERE id = ?", async with semaphore:
(hd, cd, t["id"]), return await translate_articles_batch(batch)
# Tasks erstellen und in beliebiger Reihenfolge bearbeiten
tasks = [asyncio.create_task(process_batch(b)) for b in batches]
n_batches = len(batches)
for completed_task in asyncio.as_completed(tasks):
try:
translations, batch_usage = await completed_task
except Exception as e:
print(f" Batch-Fehler: {e}", flush=True)
continue
usage.add(batch_usage)
translated += len(translations)
for t in translations:
hd = t.get("headline_de")
cd = t.get("content_de")
if hd:
hd, _ = normalize_german_umlauts(hd)
if cd:
cd, _ = normalize_german_umlauts(cd)
if hd or cd:
db.execute(
"UPDATE articles SET headline_de = COALESCE(?, headline_de), "
"content_de = COALESCE(?, content_de) WHERE id = ?",
(hd, cd, t["id"]),
)
updated += 1
if len(sample_translations) < 3:
sample_translations.append(t["id"])
db.commit() # Per-Batch commit -> bei Abbruch kein Datenverlust
completed_count += 1
if completed_count % 20 == 0 or completed_count == n_batches:
print(
f"[{completed_count}/{n_batches} Batches | Updates={updated}/{total} | Cost=${usage.total_cost_usd:.2f}]",
flush=True,
) )
updated += 1
db.commit()
print() print()
print(f"=== Stats ===") print(f"=== Stats ===")
print(f" Updates: {updated}") print(f" Total betrachtet: {total}")
print(f" Calls: {usage.call_count}") print(f" Translator OK: {translated}")
print(f" Input-Tokens: {usage.input_tokens:,}") print(f" DB-Updates: {updated}")
print(f" Output-Tokens: {usage.output_tokens:,}") print(f" Calls: {usage.call_count}")
print(f" Cost gesamt: ${usage.total_cost_usd:.4f}") print(f" Input-Tokens: {usage.input_tokens:,}")
print(f" Output-Tokens: {usage.output_tokens:,}")
print(f" Cost gesamt: ${usage.total_cost_usd:.4f}")
print() print()
print("=== Stichprobe (3 frische Uebersetzungen) ===") print("=== Stichprobe (3 frische Uebersetzungen) ===")
sample_ids = [t["id"] for t in translations[:3]] if sample_translations:
if sample_ids: placeholders = ",".join("?" * len(sample_translations))
placeholders = ",".join("?" * len(sample_ids))
for r in db.execute( for r in db.execute(
f"SELECT id, headline, headline_de FROM articles WHERE id IN ({placeholders})", f"SELECT id, headline, headline_de FROM articles WHERE id IN ({placeholders})",
sample_ids, sample_translations,
): ):
d = dict(r) d = dict(r)
print(f" [{d['id']}]") print(f" [{d['id']}]")

4
requirements-dev.txt Normale Datei
Datei anzeigen

@@ -0,0 +1,4 @@
# Dev-/Test-Dependencies (nicht für Production-venv noetig).
pytest>=8.0
ftfy>=6.0 # fuer scripts/sync_shared.py Mojibake-Reparatur
pyflakes>=3.0 # fuer Code-Check

Datei anzeigen

@@ -5,3 +5,9 @@ passlib[bcrypt]
aiosqlite aiosqlite
python-multipart python-multipart
aiosmtplib aiosmtplib
httpx>=0.28
feedparser>=6.0
# PDF-Upload-Validierung
pypdf>=5.0
# X-Scraper-Konten-Verwaltung (twscrape-Account-Pool)
twscrape @ git+https://github.com/vladkens/twscrape.git@206f0942fe41149da28530399f7c772ec00be17a

44
scripts/git-hooks/pre-commit Ausführbare Datei
Datei anzeigen

@@ -0,0 +1,44 @@
#!/usr/bin/env bash
# AegisSight-Verwaltung Pre-Commit-Hook.
#
# Ueberprueft bei jeder Aenderung in src/shared/ den Drift-Stand gegen das
# Monitor-Repo. Gibt eine Warnung aus, BLOCKIERT den Commit aber NICHT -
# der User entscheidet selbst, ob er zurueck will.
#
# Installation: scripts/install-hooks.sh
REPO_ROOT="$(git rev-parse --show-toplevel)"
cd "$REPO_ROOT" || exit 0
# Nur prüfen wenn shared/ geändert wird
if ! git diff --cached --name-only | grep -q '^src/shared/'; then
exit 0
fi
# venv-Python finden
PY=""
for cand in "venv/bin/python3" "venv/bin/python" "/home/claude-dev/.venvs/verwaltung/bin/python3"; do
if [ -x "$cand" ]; then PY="$cand"; break; fi
done
if [ -z "$PY" ] || [ ! -f "scripts/sync_shared.py" ]; then
# Tool nicht verfuegbar - silent durchlassen
exit 0
fi
if ! out=$("$PY" scripts/sync_shared.py --check 2>&1); then
# --check Exit 1 = auto-syncbarer Drift vorhanden
echo ""
echo "================================================================"
echo " WARNUNG: src/shared/ Drift gegen Monitor-Repo erkannt"
echo "================================================================"
echo "$out" | head -30
echo ""
echo " Mehr Details: ./venv/bin/python scripts/sync_shared.py --check"
echo " Aufloesen: ./venv/bin/python scripts/sync_shared.py --apply"
echo " (oder bewusst forken: LOCKED_FILES in scripts/sync_shared.py)"
echo ""
echo " Commit laeuft trotzdem durch - du entscheidest."
echo "================================================================"
fi
exit 0

40
scripts/install-hooks.sh Ausführbare Datei
Datei anzeigen

@@ -0,0 +1,40 @@
#!/usr/bin/env bash
# Installiert die Git-Hooks aus scripts/git-hooks/ in die lokale Repo-Konfig.
#
# Nutzung: ./scripts/install-hooks.sh
#
# Idempotent: Bereits installierte Hooks werden nur ueberschrieben wenn
# sie aus scripts/git-hooks/ kommen (Marker-Check), nicht user-eigene.
set -euo pipefail
REPO_ROOT="$(git rev-parse --show-toplevel)"
SRC_DIR="$REPO_ROOT/scripts/git-hooks"
DST_DIR="$REPO_ROOT/.git/hooks"
if [ ! -d "$SRC_DIR" ]; then
echo "FEHLER: $SRC_DIR nicht gefunden" >&2
exit 1
fi
mkdir -p "$DST_DIR"
count=0
for hook in "$SRC_DIR"/*; do
[ -f "$hook" ] || continue
name="$(basename "$hook")"
target="$DST_DIR/$name"
# Wenn Ziel existiert und kein AegisSight-Marker drin: ueberspringen
if [ -f "$target" ] && ! grep -q "AegisSight-Verwaltung Pre-Commit-Hook\|AegisSight-Verwaltung Hook" "$target" 2>/dev/null; then
echo " ! $name uebersprungen (existiert, kein AegisSight-Marker)"
continue
fi
cp "$hook" "$target"
chmod +x "$target"
echo " + $name installiert"
count=$((count + 1))
done
echo ""
echo "$count Hook(s) installiert nach $DST_DIR"

Datei anzeigen

@@ -0,0 +1,104 @@
{
"_meta": {
"purpose": "Bulk-Seed fuer Militaer-, Polizei-Technik und internationale Waffen-Quellen",
"created": "2026-05-17",
"plan": "C:\\Users\\hendr\\.claude\\plans\\gleaming-inventing-fern.md"
},
"sources": [
{"name": "Janes OSINT Insights", "url": "https://www.janes.com/osint-insights/defence-news", "domain": "janes.com", "source_type": "web_source", "language": "en", "country_code": "GB", "fetch_strategy": "paywall", "notes": "[militaertechnik] Goldstandard fuer Equipment-Specs und Defense-OSINT, Vollartikel paywalled"},
{"name": "The War Zone (TWZ)", "url": "https://www.twz.com/feed", "domain": "twz.com", "source_type": "rss_feed", "language": "en", "country_code": "US", "fetch_strategy": "default", "notes": "[militaertechnik] Air/Land/Sea/Space/Cyber, sehr tiefe Equipment-Analysen, Tyler Rogoway"},
{"name": "Defense News", "url": "https://www.defensenews.com/arc/outboundfeeds/rss/?outputType=xml", "domain": "defensenews.com", "source_type": "rss_feed", "language": "en", "country_code": "US", "fetch_strategy": "default", "notes": "[militaertechnik] Industriepolitik, Beschaffung, Programme"},
{"name": "Breaking Defense", "url": "https://breakingdefense.com/full-rss-feed/", "domain": "breakingdefense.com", "source_type": "rss_feed", "language": "en", "country_code": "US", "fetch_strategy": "default", "notes": "[militaertechnik] Tech, Programme, Pentagon-Politik"},
{"name": "Naval News", "url": "https://www.navalnews.com/feed/", "domain": "navalnews.com", "source_type": "rss_feed", "language": "en", "country_code": "FR", "fetch_strategy": "default", "notes": "[militaertechnik] Marine global, Schiffstechnik, Werften, U-Boote"},
{"name": "Army Recognition", "url": "https://www.armyrecognition.com/news/army-news/feed/rss", "domain": "armyrecognition.com", "source_type": "rss_feed", "language": "en", "country_code": "BE", "fetch_strategy": "default", "notes": "[militaertechnik] Equipment-Specs Heer, sehr fahrzeugfokussiert, breite Datenbank"},
{"name": "Navy Recognition", "url": "https://www.navyrecognition.com/index.php?option=com_acymailing&ctrl=fronturl&task=rss", "domain": "navyrecognition.com", "source_type": "rss_feed", "language": "en", "country_code": "BE", "fetch_strategy": "default", "notes": "[militaertechnik] Equipment-Specs Marine, Schwesterportal Army Recognition"},
{"name": "Air Recognition", "url": "https://www.airrecognition.com/index.php?option=com_acymailing&ctrl=fronturl&task=rss", "domain": "airrecognition.com", "source_type": "rss_feed", "language": "en", "country_code": "BE", "fetch_strategy": "default", "notes": "[militaertechnik] Equipment-Specs Luftwaffe, Schwesterportal Army Recognition"},
{"name": "Aviation Week Defense", "url": "https://aviationweek.com/awn-rss/feed", "domain": "aviationweek.com", "source_type": "rss_feed", "language": "en", "country_code": "US", "fetch_strategy": "default", "notes": "[militaertechnik] Luftfahrt und Defense, seit 1916, Industrie-Insider"},
{"name": "Air & Space Forces Magazine", "url": "https://www.airandspaceforces.com/feed/", "domain": "airandspaceforces.com", "source_type": "rss_feed", "language": "en", "country_code": "US", "fetch_strategy": "default", "notes": "[militaertechnik] USAF-Schwerpunkt, Programme + Doktrin"},
{"name": "Shephard Media", "url": "https://www.shephardmedia.com/news/feed/", "domain": "shephardmedia.com", "source_type": "rss_feed", "language": "en", "country_code": "GB", "fetch_strategy": "default", "notes": "[militaertechnik] Defense News, Analyse + Daten, Land/Air/Sea/Training"},
{"name": "EDR Magazine (European Defence Review)", "url": "https://www.edrmagazine.eu/feed", "domain": "edrmagazine.eu", "source_type": "rss_feed", "language": "en", "country_code": "FR", "fetch_strategy": "default", "notes": "[militaertechnik] Europaeische Defense-Perspektive, Englisch"},
{"name": "The Defense Post", "url": "https://thedefensepost.com/feed/", "domain": "thedefensepost.com", "source_type": "rss_feed", "language": "en", "country_code": "US", "fetch_strategy": "default", "notes": "[militaertechnik] Globaler Defense-Nachrichten-Mix"},
{"name": "Defense Brief", "url": "https://defbrief.com/feed/", "domain": "defbrief.com", "source_type": "rss_feed", "language": "en", "country_code": "MK", "fetch_strategy": "default", "notes": "[militaertechnik] Defense-News-Aggregator"},
{"name": "Defense Update", "url": "https://defense-update.com/feed", "domain": "defense-update.com", "source_type": "rss_feed", "language": "en", "country_code": "IL", "fetch_strategy": "default", "notes": "[militaertechnik] Israel/US-Equipment-Tiefe, Tamir Eshel"},
{"name": "Naval Technology", "url": "https://www.naval-technology.com/feed/", "domain": "naval-technology.com", "source_type": "rss_feed", "language": "en", "country_code": "GB", "fetch_strategy": "default", "notes": "[militaertechnik] Industrieperspektive Marine"},
{"name": "Army Technology", "url": "https://www.army-technology.com/feed/", "domain": "army-technology.com", "source_type": "rss_feed", "language": "en", "country_code": "GB", "fetch_strategy": "default", "notes": "[militaertechnik] Industrieperspektive Heer"},
{"name": "Airforce Technology", "url": "https://www.airforce-technology.com/feed/", "domain": "airforce-technology.com", "source_type": "rss_feed", "language": "en", "country_code": "GB", "fetch_strategy": "default", "notes": "[militaertechnik] Industrieperspektive Luftwaffe"},
{"name": "The Aviationist", "url": "https://theaviationist.com/feed/", "domain": "theaviationist.com", "source_type": "rss_feed", "language": "en", "country_code": "IT", "fetch_strategy": "default", "notes": "[militaertechnik] Militaerluftfahrt-Specials, David Cenciotti"},
{"name": "C4ISRNET", "url": "https://www.c4isrnet.com/arc/outboundfeeds/rss/?outputType=xml", "domain": "c4isrnet.com", "source_type": "rss_feed", "language": "en", "country_code": "US", "fetch_strategy": "default", "notes": "[militaertechnik] Aufklaerung, Cyber, EW, Netze"},
{"name": "DefenseScoop", "url": "https://defensescoop.com/feed/", "domain": "defensescoop.com", "source_type": "rss_feed", "language": "en", "country_code": "US", "fetch_strategy": "default", "notes": "[militaertechnik] Pentagon-IT, Cyber, KI"},
{"name": "Federation of American Scientists", "url": "https://fas.org/feed/", "domain": "fas.org", "source_type": "rss_feed", "language": "en", "country_code": "US", "fetch_strategy": "default", "notes": "[militaertechnik, waffen-international] Nuklear, Strategic Security, Project on Government Secrecy"},
{"name": "Military Times", "url": "https://www.militarytimes.com/arc/outboundfeeds/rss/?outputType=xml", "domain": "militarytimes.com", "source_type": "rss_feed", "language": "en", "country_code": "US", "fetch_strategy": "default", "notes": "[militaertechnik] US-Streitkraefte-Alltag, Beschaffung, Truppe"},
{"name": "Stars and Stripes", "url": "https://www.stripes.com/rss", "domain": "stripes.com", "source_type": "rss_feed", "language": "en", "country_code": "US", "fetch_strategy": "default", "notes": "[militaertechnik] US-Forces Worldwide"},
{"name": "Defense One", "url": "https://www.defenseone.com/rss/all/", "domain": "defenseone.com", "source_type": "rss_feed", "language": "en", "country_code": "US", "fetch_strategy": "default", "notes": "[militaertechnik] Defense-Politik + Tech"},
{"name": "Inside Defense", "url": "https://insidedefense.com", "domain": "insidedefense.com", "source_type": "web_source", "language": "en", "country_code": "US", "fetch_strategy": "paywall", "notes": "[militaertechnik] US-Pentagon-Insider, komplett paywalled"},
{"name": "RealClearDefense", "url": "https://www.realcleardefense.com/index.xml", "domain": "realcleardefense.com", "source_type": "rss_feed", "language": "en", "country_code": "US", "fetch_strategy": "default", "notes": "[militaertechnik] Aggregator + Kommentare"},
{"name": "War on the Rocks", "url": "https://warontherocks.com/feed/", "domain": "warontherocks.com", "source_type": "rss_feed", "language": "en", "country_code": "US", "fetch_strategy": "default", "notes": "[militaertechnik] Strategie-Essays, hochwertige Analyse"},
{"name": "RUSI Commentary", "url": "https://www.rusi.org/rss/commentary", "domain": "rusi.org", "source_type": "rss_feed", "language": "en", "country_code": "GB", "fetch_strategy": "default", "notes": "[militaertechnik] Royal United Services Institute, Strategie"},
{"name": "CSIS Defense & Security", "url": "https://www.csis.org/rss.xml", "domain": "csis.org", "source_type": "rss_feed", "language": "en", "country_code": "US", "fetch_strategy": "default", "notes": "[militaertechnik] Center for Strategic and International Studies"},
{"name": "Soldier Systems Daily", "url": "https://soldiersystems.net/feed/", "domain": "soldiersystems.net", "source_type": "rss_feed", "language": "en", "country_code": "US", "fetch_strategy": "default", "notes": "[militaertechnik] Tactical Gear und Ausruestung, extrem detailreich"},
{"name": "ESuT - Europaeische Sicherheit & Technik", "url": "https://esut.de/feed/", "domain": "esut.de", "source_type": "rss_feed", "language": "de", "country_code": "DE", "fetch_strategy": "default", "notes": "[militaertechnik] Heer/Luft/Marine, Mittler Report, sehr Equipment-orientiert"},
{"name": "Soldat & Technik", "url": "https://soldat-und-technik.de/feed/", "domain": "soldat-und-technik.de", "source_type": "rss_feed", "language": "de", "country_code": "DE", "fetch_strategy": "default", "notes": "[militaertechnik] Infanterie-Ausruestung, Mittler Report"},
{"name": "hartpunkt", "url": "https://www.hartpunkt.de/feed/", "domain": "hartpunkt.de", "source_type": "rss_feed", "language": "de", "country_code": "DE", "fetch_strategy": "default", "notes": "[militaertechnik] Ruestung und Sicherheitspolitik, unabhaengig"},
{"name": "Augen geradeaus!", "url": "https://augengeradeaus.net/feed/", "domain": "augengeradeaus.net", "source_type": "rss_feed", "language": "de", "country_code": "DE", "fetch_strategy": "default", "notes": "[militaertechnik] Thomas Wiegold, Bundeswehr-Insider"},
{"name": "Bundeswehr-Journal", "url": "https://www.bundeswehr-journal.de/feed/", "domain": "bundeswehr-journal.de", "source_type": "rss_feed", "language": "de", "country_code": "DE", "fetch_strategy": "default", "notes": "[militaertechnik] Bundeswehr-Themen"},
{"name": "Strategie & Technik (Mittler Report)", "url": "https://mittler-report.de/feed/", "domain": "mittler-report.de", "source_type": "rss_feed", "language": "de", "country_code": "DE", "fetch_strategy": "default", "notes": "[militaertechnik] Mittler-Verlag-Hauptfeed, Fachartikel"},
{"name": "cpm Defence Network", "url": "https://www.cpm-defence.de/feed/", "domain": "cpm-defence.de", "source_type": "rss_feed", "language": "de", "country_code": "DE", "fetch_strategy": "default", "notes": "[militaertechnik] Deutsche Ruestungsbranche"},
{"name": "Bundeswehr (offiziell)", "url": "https://www.bundeswehr.de/de/rss", "domain": "bundeswehr.de", "source_type": "rss_feed", "language": "de", "country_code": "DE", "fetch_strategy": "default", "notes": "[militaertechnik] Offizielle BMVg/BW-Meldungen"},
{"name": "Opex360 (Zone Militaire)", "url": "https://www.opex360.com/feed/", "domain": "opex360.com", "source_type": "rss_feed", "language": "fr", "country_code": "FR", "fetch_strategy": "default", "notes": "[militaertechnik] Sehr aktiv, Equipment + Operations FR"},
{"name": "Mer et Marine", "url": "https://www.meretmarine.com/fr/rss.xml", "domain": "meretmarine.com", "source_type": "rss_feed", "language": "fr", "country_code": "FR", "fetch_strategy": "default", "notes": "[militaertechnik] Marine + Schiffbau FR"},
{"name": "FOB - Forces Operations Blog", "url": "https://www.forcesoperations.com/feed/", "domain": "forcesoperations.com", "source_type": "rss_feed", "language": "fr", "country_code": "FR", "fetch_strategy": "default", "notes": "[militaertechnik] Spezialeinheiten und Ausruestung FR"},
{"name": "Lignes de Defense", "url": "https://lignesdedefense.blogs.ouest-france.fr/index.rdf", "domain": "lignesdedefense.blogs.ouest-france.fr", "source_type": "rss_feed", "language": "fr", "country_code": "FR", "fetch_strategy": "default", "notes": "[militaertechnik] Blog Ouest-France, Defense FR"},
{"name": "Air & Cosmos Defense", "url": "https://air-cosmos.com/category/defense/feed", "domain": "air-cosmos.com", "source_type": "rss_feed", "language": "fr", "country_code": "FR", "fetch_strategy": "default", "notes": "[militaertechnik] Luftfahrt + Defense FR"},
{"name": "Topwar / Voyennoye Obozreniye (EN)", "url": "https://en.topwar.ru/rss.xml", "domain": "topwar.ru", "source_type": "rss_feed", "language": "en", "country_code": "RU", "fetch_strategy": "default", "notes": "[militaertechnik] Pro-russisch (MBFC: Right Biased, Propaganda). Wert: Sicht auf eigene Technik"},
{"name": "TASS Defense", "url": "https://tass.com/rss/v2.xml?sections=MjQ%3D", "domain": "tass.com", "source_type": "rss_feed", "language": "en", "country_code": "RU", "fetch_strategy": "default", "notes": "[militaertechnik] Russische Staatsagentur, Defense-Section"},
{"name": "RIA Novosti Army (RU)", "url": "https://ria.ru/export/rss2/army/index.xml", "domain": "ria.ru", "source_type": "rss_feed", "language": "ru", "country_code": "RU", "fetch_strategy": "default", "notes": "[militaertechnik] Russische Staatsagentur, Army-Section"},
{"name": "bmpd (LiveJournal)", "url": "https://bmpd.livejournal.com/data/rss", "domain": "bmpd.livejournal.com", "source_type": "rss_feed", "language": "ru", "country_code": "RU", "fetch_strategy": "default", "notes": "[militaertechnik] Blog des CAST (Centre for Analysis of Strategies and Technologies)"},
{"name": "Zvezda TV", "url": "https://tvzvezda.ru/news.rss", "domain": "tvzvezda.ru", "source_type": "rss_feed", "language": "ru", "country_code": "RU", "fetch_strategy": "default", "notes": "[militaertechnik] TV-Sender des russischen Verteidigungsministeriums"},
{"name": "Defense Express (UA, EN)", "url": "https://en.defence-ua.com/rss/", "domain": "defence-ua.com", "source_type": "rss_feed", "language": "en", "country_code": "UA", "fetch_strategy": "default", "notes": "[militaertechnik] Ukrainische Industrie + Technik EN"},
{"name": "Militarnyi (EN)", "url": "https://militarnyi.com/en/feed/", "domain": "militarnyi.com", "source_type": "rss_feed", "language": "en", "country_code": "UA", "fetch_strategy": "default", "notes": "[militaertechnik] Ukrainisches Defense-Portal EN"},
{"name": "Defence24 (PL)", "url": "https://defence24.pl/rss", "domain": "defence24.pl", "source_type": "rss_feed", "language": "pl", "country_code": "PL", "fetch_strategy": "default", "notes": "[militaertechnik] Polens groesstes Defense-Portal PL"},
{"name": "Defence24.com (EN)", "url": "https://defence24.com/feed", "domain": "defence24.com", "source_type": "rss_feed", "language": "en", "country_code": "PL", "fetch_strategy": "default", "notes": "[militaertechnik] Englische Ausgabe Defence24"},
{"name": "Israel Defense (EN)", "url": "https://www.israeldefense.co.il/en/rss.xml", "domain": "israeldefense.co.il", "source_type": "rss_feed", "language": "en", "country_code": "IL", "fetch_strategy": "default", "notes": "[militaertechnik] Israelische Industrie + IDF"},
{"name": "IDF Spokesperson Website", "url": "https://www.idf.il/en/mini-sites/idf-spokesperson/", "domain": "idf.il", "source_type": "web_source", "language": "en", "country_code": "IL", "fetch_strategy": "default", "notes": "[militaertechnik] Offizielle IDF-Meldungen (Telegram-Kanal haben wir bereits)"},
{"name": "Mehr News Defense (FA)", "url": "https://www.mehrnews.com/rss/tp/12", "domain": "mehrnews.com", "source_type": "rss_feed", "language": "fa", "country_code": "IR", "fetch_strategy": "default", "notes": "[militaertechnik] Halbstaatliche iranische Agentur, Defense-Section"},
{"name": "Tasnim News Defense (FA)", "url": "https://www.tasnimnews.com/de/rss/feed/0/8/6/1/1", "domain": "tasnimnews.com", "source_type": "rss_feed", "language": "fa", "country_code": "IR", "fetch_strategy": "default", "notes": "[militaertechnik] IRGC-nah, Defense-Section"},
{"name": "Fars News Defense (FA)", "url": "https://www.farsnews.ir/rss?cat=8", "domain": "farsnews.ir", "source_type": "rss_feed", "language": "fa", "country_code": "IR", "fetch_strategy": "default", "notes": "[militaertechnik] IRGC-nah, Defense-Section"},
{"name": "China Military Online (EN)", "url": "http://eng.chinamil.com.cn/", "domain": "chinamil.com.cn", "source_type": "web_source", "language": "en", "country_code": "CN", "fetch_strategy": "default", "notes": "[militaertechnik] Offizielles PLA-Organ"},
{"name": "Global Times Military (EN)", "url": "https://www.globaltimes.cn/rss/military.xml", "domain": "globaltimes.cn", "source_type": "rss_feed", "language": "en", "country_code": "CN", "fetch_strategy": "default", "notes": "[militaertechnik] Chinesisches Staatsmedium, Military-Section"},
{"name": "The Diplomat - Security", "url": "https://thediplomat.com/category/security/feed/", "domain": "thediplomat.com", "source_type": "rss_feed", "language": "en", "country_code": "JP", "fetch_strategy": "default", "notes": "[militaertechnik] Asien-Pazifik-Sicherheitsanalyse, in Tokio sitzend"},
{"name": "ORYX (Spioenkop)", "url": "https://www.oryxspioenkop.com/feeds/posts/default", "domain": "oryxspioenkop.com", "source_type": "rss_feed", "language": "en", "country_code": "NL", "fetch_strategy": "default", "notes": "[militaertechnik, waffen-international] Visually confirmed losses, Equipment-DB Ukraine-Krieg"},
{"name": "WarSpotting", "url": "https://warspotting.net/", "domain": "warspotting.net", "source_type": "web_source", "language": "en", "country_code": "NL", "fetch_strategy": "default", "notes": "[militaertechnik, waffen-international] ORYX-Nachfolger fuer Ukraine, OSINT-Verluste"},
{"name": "Conflict Intelligence Team (CIT)", "url": "https://citeam.org/feed/", "domain": "citeam.org", "source_type": "rss_feed", "language": "en", "country_code": "RU", "fetch_strategy": "default", "notes": "[militaertechnik] Russisches Exil-OSINT-Kollektiv"},
{"name": "Telegram @rybar", "url": "t.me/rybar", "domain": "t.me", "source_type": "telegram_channel", "language": "ru", "country_code": "RU", "fetch_strategy": "default", "notes": "[militaertechnik] Grosser russischer Mil-OSINT-Kanal"},
{"name": "Telegram @osintdefender", "url": "t.me/osintdefender", "domain": "t.me", "source_type": "telegram_channel", "language": "en", "country_code": "US", "fetch_strategy": "default", "notes": "[militaertechnik] Pro-westliches Equipment-Tracking"},
{"name": "Telegram @CovertCabal", "url": "t.me/CovertCabal", "domain": "t.me", "source_type": "telegram_channel", "language": "en", "country_code": "US", "fetch_strategy": "default", "notes": "[militaertechnik] Sat-Bild-OSINT"},
{"name": "Telegram @Tendar", "url": "t.me/Tendar", "domain": "t.me", "source_type": "telegram_channel", "language": "en", "country_code": "DE", "fetch_strategy": "default", "notes": "[militaertechnik] UA-Konflikt-Analyse"},
{"name": "Telegram @Osint613", "url": "t.me/Osint613", "domain": "t.me", "source_type": "telegram_channel", "language": "en", "country_code": "IL", "fetch_strategy": "default", "notes": "[militaertechnik] Nahost-OSINT"},
{"name": "Behoerden-Spiegel", "url": "https://www.behoerden-spiegel.de/feed/", "domain": "behoerden-spiegel.de", "source_type": "rss_feed", "language": "de", "country_code": "DE", "fetch_strategy": "default", "notes": "[polizei-technik] DE-BOS-Magazin, Polizei, Fuehrungstechnik"},
{"name": "pvt Polizei Verkehr + Technik", "url": "https://www.pvtweb.de/feed/", "domain": "pvtweb.de", "source_type": "rss_feed", "language": "de", "country_code": "DE", "fetch_strategy": "default", "notes": "[polizei-technik] DE-Polizeitechnik-Fachzeitschrift"},
{"name": "Police Magazine (US)", "url": "https://www.policemag.com/rss", "domain": "policemag.com", "source_type": "rss_feed", "language": "en", "country_code": "US", "fetch_strategy": "default", "notes": "[polizei-technik] US-Polizei + Ausruestung"},
{"name": "Police1.com", "url": "https://www.police1.com/rss/feed", "domain": "police1.com", "source_type": "rss_feed", "language": "en", "country_code": "US", "fetch_strategy": "default", "notes": "[polizei-technik] US-Polizei-Industrie"},
{"name": "Officer.com", "url": "https://www.officer.com/rss", "domain": "officer.com", "source_type": "rss_feed", "language": "en", "country_code": "US", "fetch_strategy": "default", "notes": "[polizei-technik] US-Polizei + Equipment"},
{"name": "Small Arms Survey", "url": "https://www.smallarmssurvey.org/rss.xml", "domain": "smallarmssurvey.org", "source_type": "rss_feed", "language": "en", "country_code": "CH", "fetch_strategy": "default", "notes": "[waffen-international] Genfer Forschungsinstitut, Goldstandard Kleinwaffen, Working Papers + Issue Briefs"},
{"name": "SIPRI Publications", "url": "https://www.sipri.org/rss/publications.xml", "domain": "sipri.org", "source_type": "rss_feed", "language": "en", "country_code": "SE", "fetch_strategy": "default", "notes": "[waffen-international] Stockholm International Peace Research, Waffenexporte, Militaerausgaben, SALW"},
{"name": "Conflict Armament Research", "url": "https://www.conflictarm.com/feed/", "domain": "conflictarm.com", "source_type": "rss_feed", "language": "en", "country_code": "GB", "fetch_strategy": "default", "notes": "[waffen-international] Field-Tracking von Waffen in Konfliktzonen, Lieferketten-Forensik"},
{"name": "Armament Research Services (ARES)", "url": "https://armamentresearch.com/feed/", "domain": "armamentresearch.com", "source_type": "rss_feed", "language": "en", "country_code": "AU", "fetch_strategy": "default", "notes": "[militaertechnik, waffen-international] Munitions- und Waffen-Identifikation, sehr Equipment-tief"},
{"name": "Calibre Obscura (Substack)", "url": "https://calibreobscura.substack.com/feed", "domain": "calibreobscura.substack.com", "source_type": "rss_feed", "language": "en", "country_code": "US", "fetch_strategy": "default", "notes": "[waffen-international] OSINT-Spezialist Kleinwaffen Nahost"},
{"name": "Arms Control Association", "url": "https://www.armscontrol.org/rss.xml", "domain": "armscontrol.org", "source_type": "rss_feed", "language": "en", "country_code": "US", "fetch_strategy": "default", "notes": "[waffen-international] US-Think-Tank, Ruestungskontrolle + Proliferation"},
{"name": "Arms Control Wonk", "url": "https://www.armscontrolwonk.com/feed/", "domain": "armscontrolwonk.com", "source_type": "rss_feed", "language": "en", "country_code": "US", "fetch_strategy": "default", "notes": "[waffen-international] Nuklear- und Raketen-Spezialisten-Blog"},
{"name": "Action on Armed Violence (AOAV)", "url": "https://aoav.org.uk/feed/", "domain": "aoav.org.uk", "source_type": "rss_feed", "language": "en", "country_code": "GB", "fetch_strategy": "default", "notes": "[waffen-international] Explosive Waffen in besiedelten Gebieten, Opferzahlen"},
{"name": "BICC Bonn", "url": "https://www.bicc.de/feed", "domain": "bicc.de", "source_type": "rss_feed", "language": "de", "country_code": "DE", "fetch_strategy": "default", "notes": "[waffen-international] Bonn International Centre for Conflict Studies, Konflikt + Konversion"},
{"name": "Stimson Center", "url": "https://www.stimson.org/feed/", "domain": "stimson.org", "source_type": "rss_feed", "language": "en", "country_code": "US", "fetch_strategy": "default", "notes": "[waffen-international] US-Think-Tank, konventionelle + nukleare Ruestung"},
{"name": "ICRC Law and Policy Blog", "url": "https://blogs.icrc.org/law-and-policy/feed/", "domain": "icrc.org", "source_type": "rss_feed", "language": "en", "country_code": "CH", "fetch_strategy": "default", "notes": "[waffen-international] Voelkerrechtliche Sicht auf Waffenwirkung"}
]
}

116
scripts/seed_military_sources.py Ausführbare Datei
Datei anzeigen

@@ -0,0 +1,116 @@
#!/usr/bin/env python3
"""Bulk-Seed fuer Militaer- und Polizei-Technik-Quellen + internationale Waffen-Spezialisten.
Liest scripts/seed_military_sources.json und legt jede Quelle idempotent in der
Ziel-DB an (Default: Verwaltungs-Staging-DB). Bestehende Quellen werden anhand
der URL erkannt und uebersprungen.
Beispiel:
.venv/bin/python scripts/seed_military_sources.py
.venv/bin/python scripts/seed_military_sources.py --db /home/claude-dev/osint-data/osint.db
"""
from __future__ import annotations
import argparse
import json
import sqlite3
import sys
from pathlib import Path
DEFAULT_DB = "/home/claude-dev/AegisSight-Monitor-staging/data/osint.db"
SEED_FILE = Path(__file__).with_suffix(".json")
INSERT_SQL = """
INSERT INTO sources (
name, url, domain, source_type, category, status, notes,
language, country_code, fetch_strategy, added_by, tenant_id
) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, 'seed_military', NULL)
"""
EXISTS_SQL = "SELECT id FROM sources WHERE url = ? AND tenant_id IS NULL"
def main() -> int:
ap = argparse.ArgumentParser(description=__doc__)
ap.add_argument("--db", default=DEFAULT_DB, help="Pfad zur Ziel-SQLite-DB")
ap.add_argument("--seed", default=str(SEED_FILE), help="Pfad zur Seed-JSON")
ap.add_argument("--dry-run", action="store_true", help="Nur loggen, nichts schreiben")
args = ap.parse_args()
seed_path = Path(args.seed)
if not seed_path.is_file():
print(f"FEHLER: Seed-Datei nicht gefunden: {seed_path}", file=sys.stderr)
return 2
with seed_path.open("r", encoding="utf-8") as fh:
seed = json.load(fh)
sources = seed.get("sources", [])
if not sources:
print("FEHLER: Seed-Datei enthaelt keine sources", file=sys.stderr)
return 2
print(f"DB: {args.db}")
print(f"Seed: {seed_path} ({len(sources)} Eintraege)")
print(f"Dry-Run: {args.dry_run}")
print()
con = sqlite3.connect(args.db)
con.row_factory = sqlite3.Row
cur = con.cursor()
created: list[tuple[int, str]] = []
skipped: list[tuple[int, str]] = []
for entry in sources:
url = entry.get("url")
name = entry.get("name", "?")
if not url:
skipped.append((-1, f"{name}: ohne url"))
continue
row = cur.execute(EXISTS_SQL, (url,)).fetchone()
if row is not None:
skipped.append((row["id"], f"{name}: existiert bereits (id={row['id']})"))
continue
params = (
name,
url,
entry.get("domain"),
entry.get("source_type", "rss_feed"),
entry.get("category", "fachmedien"),
entry.get("status", "active"),
entry.get("notes"),
entry.get("language"),
entry.get("country_code"),
entry.get("fetch_strategy", "default"),
)
if args.dry_run:
created.append((-1, name))
continue
cur.execute(INSERT_SQL, params)
created.append((cur.lastrowid, name))
if not args.dry_run:
con.commit()
con.close()
print(f"Angelegt: {len(created)}")
print(f"Uebersprungen:{len(skipped)}")
print()
if created:
print("--- Neue IDs ---")
for src_id, name in created:
print(f" {src_id:>5} {name}")
if skipped:
print()
print("--- Uebersprungen ---")
for src_id, msg in skipped:
print(f" {src_id:>5} {msg}")
return 0
if __name__ == "__main__":
sys.exit(main())

217
scripts/sync_shared.py Ausführbare Datei
Datei anzeigen

@@ -0,0 +1,217 @@
#!/usr/bin/env python3
"""Sync src/shared/ aus dem Monitor-Repo + Drift-Check.
Hintergrund:
src/shared/ enthält lokale Kopien aus dem Monitor-Repo
(source_rules, services/source_health, services/source_suggester,
agents/claude_client). Wenn der Monitor diese Module ändert,
drifted die Verwaltung auseinander - dieses Skript hält beide
synchron.
Modi:
python scripts/sync_shared.py --check # nur Drift-Diagnose, kein Schreiben
python scripts/sync_shared.py --apply # Drift sichtbar machen + anwenden
python scripts/sync_shared.py --apply --quiet # für CI-/Hook-Aufrufe
Mojibake-Schutz:
Falls beim Kopieren Doppel-Encoded UTF-8 erkannt wird (was im Monitor
noch in einigen Texten steckt), wird ftfy zur Reparatur aufgerufen.
Exit-Codes:
0 in sync (oder Sync erfolgreich)
1 Drift gefunden (--check Modus) oder Fehler beim Apply
2 Quelle nicht gefunden / Konfigurationsproblem
"""
from __future__ import annotations
import argparse
import difflib
import sys
from pathlib import Path
# Fest verdrahtete Quell-Mappings: (Monitor-relative Quelle, Verwaltung-relative Destination)
SHARED_FILES = [
("src/source_rules.py", "src/shared/source_rules.py"),
("src/services/source_health.py", "src/shared/services/source_health.py"),
("src/services/source_suggester.py", "src/shared/services/source_suggester.py"),
("src/agents/claude_client.py", "src/shared/agents/claude_client.py"),
]
# Files mit verwaltungs-spezifischen Anpassungen, die NICHT auto-synced werden
# duerfen. Drift wird gemeldet (zur Information), aber --apply schreibt nicht.
# Wenn der Monitor-Source auch sinnvoll ins Update soll: Aenderungen manuell
# in den Verwaltungs-Fork einarbeiten, dann Eintrag pruefen oder entfernen.
LOCKED_FILES: dict[str, str] = {
# Aktuell keine Forks. Falls in Zukunft ein shared/-File bewusst von der
# Monitor-Version abweichen soll, hier eintragen mit Begruendung.
}
DEFAULT_MONITOR = Path("/home/claude-dev/AegisSight-Monitor")
DEFAULT_VERWALTUNG = Path(__file__).resolve().parent.parent
def has_mojibake_markers(text: str) -> bool:
"""Heuristik: Doppel/Triple-Encoded UTF-8 erkennen.
Echte deutsche Umlaute kommen als "ü" / "ä" / "ö" / "ß" - Single-Byte-Zeichen
aus latin-1-Sicht ("Ã", "Â", "Æ") sind ein starkes Mojibake-Indiz.
"""
return any(seq in text for seq in ("ä", "ö", "ü", "ß", "Ä", "Ö", "Ü", "¤", "Æ"))
def fix_mojibake(text: str) -> tuple[str, bool]:
"""Repariert Doppel-Encoded UTF-8 falls vorhanden. Gibt (text, fixed?) zurück.
Raises RuntimeError wenn Mojibake erkannt wird aber ftfy nicht installiert ist
(dann würde ein Sync den Mojibake unrepariert ins Verwaltungs-Repo schreiben -
dagegen schützt fail-fast).
"""
try:
import ftfy # type: ignore
except ImportError:
if has_mojibake_markers(text):
raise RuntimeError(
"Monitor-Source enthält Mojibake (Doppel-Encoded UTF-8) und ftfy "
"ist nicht installiert. Sync würde Mojibake ins Verwaltungs-Repo "
"schreiben.\n Lösung: pip install ftfy (im venv des Repos)"
)
return text, False
fixed = ftfy.fix_text(text)
return fixed, fixed != text
def patch_imports_for_shared(text: str) -> str:
"""Patcht 'from agents.' -> 'from shared.agents.' damit Module innerhalb
von src/shared/ ihre Geschwister-Module korrekt finden.
'from config import ...' bleibt unverändert (config.py liegt in beiden
Apps in src/ Root).
"""
lines = text.splitlines(keepends=True)
out = []
for line in lines:
stripped = line.lstrip()
indent = line[: len(line) - len(stripped)]
if stripped.startswith("from agents."):
line = indent + "from shared.agents." + stripped[len("from agents."):]
elif stripped.startswith("from services."):
line = indent + "from shared.services." + stripped[len("from services."):]
out.append(line)
return "".join(out)
def diff_summary(old: str, new: str, label_old: str, label_new: str, max_lines: int = 30) -> str:
diff = list(difflib.unified_diff(
old.splitlines(keepends=True),
new.splitlines(keepends=True),
fromfile=label_old,
tofile=label_new,
n=2,
))
if not diff:
return ""
if len(diff) <= max_lines:
return "".join(diff)
return "".join(diff[:max_lines]) + f"\n ... ({len(diff) - max_lines} weitere Zeilen abgeschnitten)\n"
def main() -> int:
parser = argparse.ArgumentParser(description="Sync src/shared/ aus Monitor-Repo")
parser.add_argument("--check", action="store_true", help="nur prüfen, nichts schreiben")
parser.add_argument("--apply", action="store_true", help="Sync ausführen")
parser.add_argument("--quiet", action="store_true", help="weniger Output (für Hooks/CI)")
parser.add_argument("--monitor", type=Path, default=DEFAULT_MONITOR, help="Pfad zum Monitor-Repo")
parser.add_argument("--verwaltung", type=Path, default=DEFAULT_VERWALTUNG, help="Pfad zum Verwaltungs-Repo")
args = parser.parse_args()
if not args.check and not args.apply:
parser.error("entweder --check oder --apply angeben")
if not args.monitor.exists():
print(f"FEHLER: Monitor-Pfad nicht gefunden: {args.monitor}", file=sys.stderr)
return 2
drift_count = 0
applied_count = 0
locked_drift_count = 0
for monitor_rel, verwaltung_rel in SHARED_FILES:
src_path = args.monitor / monitor_rel
dst_path = args.verwaltung / verwaltung_rel
is_locked = verwaltung_rel in LOCKED_FILES
if not src_path.exists():
print(f"FEHLER: Monitor-Quelle fehlt: {src_path}", file=sys.stderr)
return 2
src_text = src_path.read_text(encoding="utf-8")
try:
src_text, fixed_mojibake = fix_mojibake(src_text)
except RuntimeError as e:
print(f"FEHLER beim Verarbeiten von {monitor_rel}:\n {e}", file=sys.stderr)
return 2
src_text = patch_imports_for_shared(src_text)
existing = dst_path.read_text(encoding="utf-8") if dst_path.exists() else ""
if existing == src_text:
if not args.quiet:
print(f" = {verwaltung_rel}: in sync")
continue
if is_locked:
locked_drift_count += 1
if not args.quiet:
print(f" L LOCKED-DRIFT: {verwaltung_rel}")
print(f" Grund: {LOCKED_FILES[verwaltung_rel]}")
print(f" -> NICHT ueberschrieben. Manuell pruefen ob Monitor-Aenderung")
print(f" in den Verwaltungs-Fork eingearbeitet werden muss.")
continue
drift_count += 1
diff = diff_summary(existing, src_text, dst_path.name + " (Verwaltung)", src_path.name + " (Monitor)")
if not args.quiet:
print(f" ! DRIFT: {verwaltung_rel}")
if fixed_mojibake:
print(f" (Mojibake im Monitor-Original gefixt)")
if diff:
for line in diff.splitlines()[:20]:
print(f" {line}")
if args.apply:
dst_path.parent.mkdir(parents=True, exist_ok=True)
dst_path.write_text(src_text, encoding="utf-8")
applied_count += 1
if args.check:
if drift_count == 0 and locked_drift_count == 0:
if not args.quiet:
print(f"OK: src/shared/ ist in sync mit {args.monitor}")
return 0
msg_parts = []
if drift_count:
msg_parts.append(f"{drift_count} Datei(en) drift'en, mit --apply synchronisieren")
if locked_drift_count:
msg_parts.append(f"{locked_drift_count} LOCKED-Datei(en) drift'en (manuell pruefen)")
if not args.quiet:
print("\n" + ", ".join(msg_parts) + ".", file=sys.stderr)
# Exit-Code: 1 wenn auto-sync ausstehend, 0 wenn nur LOCKED-Drift (informativ)
return 1 if drift_count else 0
# apply
if applied_count == 0 and locked_drift_count == 0:
if not args.quiet:
print(f"OK: src/shared/ war schon in sync, nichts zu tun.")
return 0
if not args.quiet:
if applied_count:
print(f"\n{applied_count} Datei(en) synchronisiert.")
print("Vergiss nicht: git diff src/shared/ → git add → commit")
if locked_drift_count:
print(f"\nHINWEIS: {locked_drift_count} LOCKED-Datei(en) NICHT geschrieben.")
print("Manuell pruefen ob Monitor-Aenderung in den Verwaltungs-Fork uebernommen werden muss.")
return 0
if __name__ == "__main__":
sys.exit(main())

Datei anzeigen

@@ -1,7 +1,11 @@
"""Passwort-basierte Authentifizierung fuer das Verwaltungsportal.""" """Magic-Link-Authentifizierung für das Verwaltungsportal.
JWT für Session, Magic-Link an info@aegis-sight.de zur Anmeldung.
Passwort-Login wurde mit Migration 2026-05-09 entfernt.
"""
import secrets
from datetime import datetime, timedelta, timezone from datetime import datetime, timedelta, timezone
from jose import jwt, JWTError from jose import jwt, JWTError
import bcrypt as _bcrypt
from fastapi import Depends, HTTPException, status from fastapi import Depends, HTTPException, status
from fastapi.security import HTTPBearer, HTTPAuthorizationCredentials from fastapi.security import HTTPBearer, HTTPAuthorizationCredentials
from config import JWT_SECRET, JWT_ALGORITHM, JWT_EXPIRE_HOURS from config import JWT_SECRET, JWT_ALGORITHM, JWT_EXPIRE_HOURS
@@ -12,20 +16,19 @@ JWT_ISSUER = "aegissight-portal"
JWT_AUDIENCE = "aegissight-portal" JWT_AUDIENCE = "aegissight-portal"
def hash_password(password: str) -> str: def generate_magic_token() -> str:
return _bcrypt.hashpw(password.encode("utf-8"), _bcrypt.gensalt()).decode("utf-8") """Erzeugt einen URL-sicheren Token (43 Zeichen) für den Magic-Link."""
return secrets.token_urlsafe(32)
def verify_password(password: str, password_hash: str) -> bool: def create_token(admin_id: int, email: str, username: str = "") -> str:
return _bcrypt.checkpw(password.encode("utf-8"), password_hash.encode("utf-8")) """JWT-Session-Token nach erfolgreichem Magic-Link-Verify."""
def create_token(admin_id: int, username: str) -> str:
now = datetime.now(timezone.utc) now = datetime.now(timezone.utc)
expire = now + timedelta(hours=JWT_EXPIRE_HOURS) expire = now + timedelta(hours=JWT_EXPIRE_HOURS)
payload = { payload = {
"sub": str(admin_id), "sub": str(admin_id),
"username": username, "email": email,
"username": username or email.split("@")[0],
"role": "portal_admin", "role": "portal_admin",
"iss": JWT_ISSUER, "iss": JWT_ISSUER,
"aud": JWT_AUDIENCE, "aud": JWT_AUDIENCE,
@@ -47,7 +50,7 @@ def decode_token(token: str) -> dict:
except JWTError: except JWTError:
raise HTTPException( raise HTTPException(
status_code=status.HTTP_401_UNAUTHORIZED, status_code=status.HTTP_401_UNAUTHORIZED,
detail="Token ungueltig oder abgelaufen", detail="Token ungültig oder abgelaufen",
) )
@@ -57,5 +60,6 @@ async def get_current_admin(
payload = decode_token(credentials.credentials) payload = decode_token(credentials.credentials)
return { return {
"id": int(payload["sub"]), "id": int(payload["sub"]),
"username": payload["username"], "email": payload.get("email", ""),
"username": payload.get("username", ""),
} }

Datei anzeigen

@@ -8,6 +8,10 @@ STATIC_DIR = os.path.join(os.path.dirname(os.path.abspath(__file__)), "static")
# Gemeinsame Datenbank (gleiche wie OSINT-Monitor) # Gemeinsame Datenbank (gleiche wie OSINT-Monitor)
DB_PATH = os.environ.get("DB_PATH", "/mnt/gitea/osint-data/osint.db") DB_PATH = os.environ.get("DB_PATH", "/mnt/gitea/osint-data/osint.db")
# twscrape-Account-Store: die X-Login-Konten, mit denen der Monitor bei X
# recherchiert. Geteilt mit dem Monitor (gleicher Pfad-Default).
X_ACCOUNTS_DB_PATH = os.environ.get("X_ACCOUNTS_DB_PATH", "/home/claude-dev/.x-scraper/accounts.db")
# JWT (eigener Secret fuer Verwaltungsportal) # JWT (eigener Secret fuer Verwaltungsportal)
JWT_SECRET = os.environ.get("PORTAL_JWT_SECRET") JWT_SECRET = os.environ.get("PORTAL_JWT_SECRET")
if not JWT_SECRET: if not JWT_SECRET:
@@ -27,12 +31,31 @@ SMTP_FROM_EMAIL = os.environ.get("SMTP_FROM_EMAIL", "noreply@aegis-sight.de")
SMTP_FROM_NAME = os.environ.get("SMTP_FROM_NAME", "AegisSight Verwaltung") SMTP_FROM_NAME = os.environ.get("SMTP_FROM_NAME", "AegisSight Verwaltung")
SMTP_USE_TLS = os.environ.get("SMTP_USE_TLS", "true").lower() == "true" SMTP_USE_TLS = os.environ.get("SMTP_USE_TLS", "true").lower() == "true"
# Magic Link Base URL (fuer OSINT-Monitor Einladungen) # Magic Link Base URL (fuer Einladungen Richtung OSINT-Monitor, NICHT Portal-Login)
MAGIC_LINK_BASE_URL = os.environ.get("MAGIC_LINK_BASE_URL", "https://monitor.aegis-sight.de") MAGIC_LINK_BASE_URL = os.environ.get("MAGIC_LINK_BASE_URL", "https://monitor.aegis-sight.de")
MAGIC_LINK_EXPIRE_MINUTES = 10 MAGIC_LINK_EXPIRE_MINUTES = 10
# Magic-Link-Auth fuer das Verwaltungsportal SELBST
# (frueher Passwort-Login, ab 2026-05-09 nur noch Magic-Link)
ALLOWED_EMAIL = os.environ.get("PORTAL_ALLOWED_EMAIL", "info@aegis-sight.de")
PORTAL_MAGIC_LINK_BASE_URL = os.environ.get(
"PORTAL_MAGIC_LINK_BASE_URL", "https://monitor-verwaltung.aegis-sight.de"
)
PORTAL_MAGIC_LINK_EXPIRE_MINUTES = int(
os.environ.get("PORTAL_MAGIC_LINK_EXPIRE_MINUTES", "10")
)
# Source Discovery (geteilte Config mit OSINT-Monitor) # Source Discovery (geteilte Config mit OSINT-Monitor)
CLAUDE_PATH = os.environ.get("CLAUDE_PATH", "/home/claude-dev/.claude/local/claude") CLAUDE_PATH = os.environ.get("CLAUDE_PATH", "/usr/local/bin/claude")
CLAUDE_TIMEOUT = 300 CLAUDE_TIMEOUT = 300
MAX_FEEDS_PER_DOMAIN = 3 MAX_FEEDS_PER_DOMAIN = 3
CLAUDE_MODEL_FAST = "claude-haiku-4-5-20251001" CLAUDE_MODEL_FAST = "claude-haiku-4-5-20251001"
CLAUDE_MODEL_MEDIUM = "claude-sonnet-4-6"
CLAUDE_MODEL_STANDARD = "claude-opus-4-7"
# Health-Check (genutzt von shared/services/source_health.py + routers/sources.py)
HEALTH_CHECK_USER_AGENT = os.environ.get(
"HEALTH_CHECK_USER_AGENT",
"Mozilla/5.0 (compatible; AegisSight-HealthCheck/1.0)",
)
HEALTH_CHECK_TIMEOUT_S = float(os.environ.get("HEALTH_CHECK_TIMEOUT_S", "15.0"))

Datei anzeigen

@@ -1,6 +1,5 @@
"""Datenbankverbindung (geteilte DB mit OSINT-Monitor).""" """Datenbankverbindung (geteilte DB mit OSINT-Monitor)."""
import aiosqlite import aiosqlite
import os
from config import DB_PATH from config import DB_PATH

Datei anzeigen

@@ -29,7 +29,41 @@ def invite_email(username: str, org_name: str, code: str, link: str) -> tuple[st
<a href="{link}" style="display: inline-block; background: #f0b429; color: #0f172a; padding: 12px 32px; border-radius: 6px; text-decoration: none; font-weight: 600;">Einladung annehmen</a> <a href="{link}" style="display: inline-block; background: #f0b429; color: #0f172a; padding: 12px 32px; border-radius: 6px; text-decoration: none; font-weight: 600;">Einladung annehmen</a>
</div> </div>
<p style="color: #94a3b8; font-size: 13px; margin: 0;">Dieser Link ist 10 Minuten gueltig.</p> <p style="color: #94a3b8; font-size: 13px; margin: 0;">Dieser Link ist 10 Minuten gültig.</p>
</div>
</body>
</html>"""
return subject, html
def portal_magic_link_email(link: str, expire_minutes: int) -> tuple[str, str]:
"""Erzeugt Login-E-Mail mit Magic-Link für das Verwaltungsportal.
Args:
link: Login-URL inkl. Token
expire_minutes: Gültigkeitsdauer in Minuten
Returns:
(subject, html_body)
"""
subject = "AegisSight Verwaltung - Anmeldung"
html = f"""<!DOCTYPE html>
<html>
<head><meta charset="UTF-8"></head>
<body style="font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', sans-serif; background: #0f172a; color: #e2e8f0; padding: 40px 20px;">
<div style="max-width: 480px; margin: 0 auto; background: #1e293b; border-radius: 12px; padding: 32px; border: 1px solid #334155;">
<h1 style="color: #f0b429; font-size: 20px; margin: 0 0 24px 0;">AegisSight Verwaltung</h1>
<p style="margin: 0 0 24px 0;">Klicken Sie auf den Button, um sich am Verwaltungsportal anzumelden:</p>
<div style="text-align: center; margin: 0 0 24px 0;">
<a href="{link}" style="display: inline-block; background: #f0b429; color: #0f172a; padding: 14px 40px; border-radius: 6px; text-decoration: none; font-weight: 600; font-size: 16px;">Jetzt anmelden</a>
</div>
<p style="color: #94a3b8; font-size: 13px; margin: 0 0 12px 0;">Oder kopieren Sie diesen Link in Ihren Browser:</p>
<p style="color: #64748b; font-size: 11px; word-break: break-all; margin: 0 0 24px 0;">{link}</p>
<p style="color: #94a3b8; font-size: 13px; margin: 0;">Dieser Link ist {expire_minutes} Minuten gültig. Falls Sie diese Anmeldung nicht angefordert haben, ignorieren Sie diese E-Mail.</p>
</div> </div>
</body> </body>
</html>""" </html>"""

Datei anzeigen

@@ -1,19 +1,17 @@
"""Verwaltungsportal - FastAPI Anwendung.""" """Verwaltungsportal - FastAPI Anwendung.
Auth: Magic-Link (analog Monitor). Passwort-Login wurde mit Migration
2026-05-09 entfernt. Erlaubte Email-Adresse(n) sind in config.ALLOWED_EMAIL.
"""
import logging import logging
from contextlib import asynccontextmanager from contextlib import asynccontextmanager
from fastapi import FastAPI, Depends, HTTPException, status, Request from fastapi import FastAPI
from fastapi.staticfiles import StaticFiles from fastapi.staticfiles import StaticFiles
from fastapi.responses import FileResponse from fastapi.responses import FileResponse
from config import STATIC_DIR, PORT from config import STATIC_DIR, PORT
from database import db_dependency from routers import auth, organizations, licenses, users, dashboard, sources, token_usage, audit, translation, x_scraper
from auth import verify_password, create_token
from models import LoginRequest, TokenResponse
from routers import organizations, licenses, users, dashboard, sources, token_usage, audit
from audit import log_action, get_client_ip
import aiosqlite
logging.basicConfig( logging.basicConfig(
level=logging.INFO, level=logging.INFO,
@@ -21,11 +19,6 @@ logging.basicConfig(
) )
logger = logging.getLogger("verwaltung") logger = logging.getLogger("verwaltung")
# Brute-Force-Schutz
MAX_FAILED_ATTEMPTS = 5
BLOCK_WINDOW_MINUTES = 15
PURGE_AFTER_HOURS = 24
@asynccontextmanager @asynccontextmanager
async def lifespan(app: FastAPI): async def lifespan(app: FastAPI):
@@ -36,11 +29,12 @@ async def lifespan(app: FastAPI):
app = FastAPI( app = FastAPI(
title="AegisSight Verwaltungsportal", title="AegisSight Verwaltungsportal",
version="1.0.0", version="2.0.0",
lifespan=lifespan, lifespan=lifespan,
) )
# --- Routen --- # --- Routen ---
app.include_router(auth.router)
app.include_router(organizations.router) app.include_router(organizations.router)
app.include_router(licenses.router) app.include_router(licenses.router)
app.include_router(users.router) app.include_router(users.router)
@@ -48,86 +42,8 @@ app.include_router(dashboard.router)
app.include_router(sources.router) app.include_router(sources.router)
app.include_router(token_usage.router) app.include_router(token_usage.router)
app.include_router(audit.router) app.include_router(audit.router)
app.include_router(translation.router)
app.include_router(x_scraper.router)
# --- Login ---
@app.post("/api/auth/login", response_model=TokenResponse)
async def login(
data: LoginRequest,
request: Request,
db: aiosqlite.Connection = Depends(db_dependency),
):
ip = get_client_ip(request)
username = data.username.strip()
# Alte Login-Versuche purgen (LRU-Style, einmal pro Anfrage)
await db.execute(
f"DELETE FROM portal_login_attempts WHERE ts < datetime('now', '-{PURGE_AFTER_HOURS} hours')"
)
# Brute-Force-Check: Anzahl Fehlversuche fuer (ip, username) im Zeitfenster
cursor = await db.execute(
f"""SELECT COUNT(*) AS cnt FROM portal_login_attempts
WHERE ip = ? AND username = ? AND success = 0
AND ts > datetime('now', '-{BLOCK_WINDOW_MINUTES} minutes')""",
(ip, username),
)
failed_count = (await cursor.fetchone())["cnt"]
if failed_count >= MAX_FAILED_ATTEMPTS:
await log_action(
db, admin=None, ip=ip, action="login_blocked",
resource_type="auth",
after={"username": username, "failed_attempts": failed_count},
)
await db.commit()
raise HTTPException(
status_code=status.HTTP_429_TOO_MANY_REQUESTS,
detail=f"Zu viele Fehlversuche. Bitte {BLOCK_WINDOW_MINUTES} Minuten warten.",
headers={"Retry-After": str(BLOCK_WINDOW_MINUTES * 60)},
)
# Auth-Pruefung
cursor = await db.execute(
"SELECT id, username, password_hash FROM portal_admins WHERE username = ?",
(username,),
)
admin = await cursor.fetchone()
auth_ok = bool(admin and verify_password(data.password, admin["password_hash"]))
# Versuch in Tabelle eintragen (fuer Brute-Force-Tracking)
await db.execute(
"INSERT INTO portal_login_attempts (ip, username, success) VALUES (?, ?, ?)",
(ip, username, 1 if auth_ok else 0),
)
await db.commit()
if not auth_ok:
admin_dict = (
{"id": admin["id"], "username": admin["username"]} if admin else None
)
await log_action(
db, admin=admin_dict, ip=ip, action="login_failed",
resource_type="auth",
after={"username": username},
)
raise HTTPException(
status_code=status.HTTP_401_UNAUTHORIZED,
detail="Ungueltige Zugangsdaten",
)
# Erfolg
await log_action(
db,
admin={"id": admin["id"], "username": admin["username"]},
ip=ip,
action="login_success",
resource_type="auth",
)
token = create_token(admin["id"], admin["username"])
return TokenResponse(access_token=token, username=admin["username"])
# --- Statische Dateien --- # --- Statische Dateien ---
app.mount("/static", StaticFiles(directory=STATIC_DIR), name="static") app.mount("/static", StaticFiles(directory=STATIC_DIR), name="static")

Datei anzeigen

@@ -1,27 +1,37 @@
"""Pydantic Models fuer das Verwaltungsportal.""" """Pydantic Models für das Verwaltungsportal."""
from pydantic import BaseModel, Field from pydantic import BaseModel, Field
from typing import Optional from typing import Optional
class LoginRequest(BaseModel): class MagicLinkRequest(BaseModel):
username: str email: str = Field(min_length=3, max_length=200)
password: str
class MagicLinkResponse(BaseModel):
message: str
class VerifyTokenRequest(BaseModel):
token: str = Field(min_length=10, max_length=200)
class TokenResponse(BaseModel): class TokenResponse(BaseModel):
access_token: str access_token: str
token_type: str = "bearer" token_type: str = "bearer"
username: str username: str
email: str = ""
class OrgCreate(BaseModel): class OrgCreate(BaseModel):
name: str = Field(min_length=1, max_length=200) name: str = Field(min_length=1, max_length=200)
slug: str = Field(min_length=1, max_length=100, pattern="^[a-z0-9-]+$") slug: str = Field(min_length=1, max_length=100, pattern="^[a-z0-9-]+$")
output_language: str = Field(default="de", pattern="^(de|en)$")
class OrgUpdate(BaseModel): class OrgUpdate(BaseModel):
name: Optional[str] = Field(default=None, max_length=200) name: Optional[str] = Field(default=None, max_length=200)
is_active: Optional[bool] = None is_active: Optional[bool] = None
output_language: Optional[str] = Field(default=None, pattern="^(de|en)$")
class OrgResponse(BaseModel): class OrgResponse(BaseModel):
@@ -35,6 +45,7 @@ class OrgResponse(BaseModel):
created_at: str created_at: str
globe_access: bool = False globe_access: bool = False
network_access: bool = False network_access: bool = False
output_language: str = "de"
class LicenseCreate(BaseModel): class LicenseCreate(BaseModel):

Datei anzeigen

@@ -2,7 +2,7 @@
import json import json
import logging import logging
from typing import Optional from typing import Optional
from fastapi import APIRouter, Depends, HTTPException from fastapi import APIRouter, Depends
from auth import get_current_admin from auth import get_current_admin
from database import db_dependency from database import db_dependency
import aiosqlite import aiosqlite
@@ -24,6 +24,7 @@ def _parse_json(s: Optional[str]):
async def list_audit( async def list_audit(
action: Optional[str] = None, action: Optional[str] = None,
resource_type: Optional[str] = None, resource_type: Optional[str] = None,
resource_id: Optional[int] = None,
admin_id: Optional[int] = None, admin_id: Optional[int] = None,
from_ts: Optional[str] = None, from_ts: Optional[str] = None,
to_ts: Optional[str] = None, to_ts: Optional[str] = None,
@@ -46,6 +47,9 @@ async def list_audit(
if resource_type: if resource_type:
where.append("resource_type = ?") where.append("resource_type = ?")
params.append(resource_type) params.append(resource_type)
if resource_id is not None:
where.append("resource_id = ?")
params.append(resource_id)
if admin_id is not None: if admin_id is not None:
where.append("admin_id = ?") where.append("admin_id = ?")
params.append(admin_id) params.append(admin_id)

191
src/routers/auth.py Normale Datei
Datei anzeigen

@@ -0,0 +1,191 @@
"""Magic-Link-Authentifizierung."""
import logging
from datetime import datetime, timedelta, timezone
from fastapi import APIRouter, Depends, HTTPException, Request
import aiosqlite
from auth import generate_magic_token, create_token
from config import (
ALLOWED_EMAIL,
PORTAL_MAGIC_LINK_BASE_URL,
PORTAL_MAGIC_LINK_EXPIRE_MINUTES,
)
from database import db_dependency
from email_utils.sender import send_email
from email_utils.templates import portal_magic_link_email
from models import MagicLinkRequest, MagicLinkResponse, TokenResponse, VerifyTokenRequest
from audit import log_action, get_client_ip
logger = logging.getLogger("verwaltung.auth")
router = APIRouter(prefix="/api/auth", tags=["auth"])
# Rate-Limit: max N Magic-Link-Anfragen pro Email/IP-Kombination im Zeitfenster
RATE_LIMIT_PER_WINDOW = 5
RATE_LIMIT_WINDOW_MINUTES = 15
ATTEMPTS_PURGE_AFTER_HOURS = 24
# Generische Antwort - keine Rückschlüsse auf gültige Emails (Anti-Enumeration)
GENERIC_RESPONSE = MagicLinkResponse(
message="Wenn die E-Mail-Adresse berechtigt ist, wurde ein Login-Link gesendet."
)
@router.post("/magic-link", response_model=MagicLinkResponse)
async def request_magic_link(
data: MagicLinkRequest,
request: Request,
db: aiosqlite.Connection = Depends(db_dependency),
):
"""Magic-Link anfordern. Sendet E-Mail mit zeitlich begrenztem Login-Link."""
email = data.email.lower().strip()
ip = get_client_ip(request)
# Alte Versuche purgen
await db.execute(
f"DELETE FROM portal_magic_link_attempts "
f"WHERE ts < datetime('now', '-{ATTEMPTS_PURGE_AFTER_HOURS} hours')"
)
# Rate-Limit prüfen
cur = await db.execute(
f"""SELECT COUNT(*) AS cnt FROM portal_magic_link_attempts
WHERE email = ? AND ip = ?
AND ts > datetime('now', '-{RATE_LIMIT_WINDOW_MINUTES} minutes')""",
(email, ip),
)
attempts = (await cur.fetchone())["cnt"]
# Versuch immer eintragen (auch wenn rate-limited oder Email nicht erlaubt)
await db.execute(
"INSERT INTO portal_magic_link_attempts (ip, email) VALUES (?, ?)",
(ip, email),
)
await db.commit()
if attempts >= RATE_LIMIT_PER_WINDOW:
logger.warning(f"Rate-Limit erreicht für {email} von {ip}: {attempts} Versuche")
return GENERIC_RESPONSE
# Whitelist-Check (still gegen Enumeration)
if email != ALLOWED_EMAIL.lower():
logger.info(f"Magic-Link-Anfrage für nicht erlaubte Email: {email} von {ip}")
return GENERIC_RESPONSE
# Token erzeugen
token = generate_magic_token()
expires_at = (
datetime.now(timezone.utc) + timedelta(minutes=PORTAL_MAGIC_LINK_EXPIRE_MINUTES)
).strftime("%Y-%m-%d %H:%M:%S")
# Vorige unbenutzte Tokens für diese Email entwerten (mehrfaches Anfordern)
await db.execute(
"UPDATE portal_magic_links SET used_at = CURRENT_TIMESTAMP "
"WHERE email = ? AND used_at IS NULL",
(email,),
)
await db.execute(
"""INSERT INTO portal_magic_links (email, token, expires_at, ip_address)
VALUES (?, ?, ?, ?)""",
(email, token, expires_at, ip),
)
await db.commit()
# E-Mail versenden
link = f"{PORTAL_MAGIC_LINK_BASE_URL}/?token={token}"
subject, html = portal_magic_link_email(link, PORTAL_MAGIC_LINK_EXPIRE_MINUTES)
sent = await send_email(email, subject, html)
if not sent:
logger.error(f"E-Mail-Versand fehlgeschlagen für {email}")
# Wir geben trotzdem die generische Antwort zurück, damit Angreifer
# SMTP-Fehler nicht von "Email nicht erlaubt" unterscheiden können
return GENERIC_RESPONSE
@router.post("/verify", response_model=TokenResponse)
async def verify_magic_link(
data: VerifyTokenRequest,
request: Request,
db: aiosqlite.Connection = Depends(db_dependency),
):
"""Magic-Link-Token verifizieren, JWT-Session zurückgeben."""
ip = get_client_ip(request)
cur = await db.execute(
"""SELECT id, email, expires_at, used_at
FROM portal_magic_links
WHERE token = ?""",
(data.token,),
)
ml = await cur.fetchone()
if not ml:
raise HTTPException(status_code=400, detail="Ungültiger Login-Link")
if ml["used_at"] is not None:
raise HTTPException(
status_code=400, detail="Login-Link bereits verwendet. Bitte neuen anfordern."
)
expires = datetime.fromisoformat(ml["expires_at"])
if expires.tzinfo is None:
expires = expires.replace(tzinfo=timezone.utc)
if datetime.now(timezone.utc) > expires:
raise HTTPException(
status_code=400, detail="Login-Link abgelaufen. Bitte neuen anfordern."
)
email = ml["email"]
if email.lower() != ALLOWED_EMAIL.lower():
# Defense-in-depth: sollte nie passieren, da Einreichung schon Whitelist prüft
raise HTTPException(status_code=403, detail="Nicht berechtigt")
# Admin-Datensatz holen oder anlegen
cur = await db.execute(
"SELECT id, username, email FROM portal_admins WHERE LOWER(email) = ?",
(email.lower(),),
)
admin = await cur.fetchone()
if not admin:
# Beim ersten erfolgreichen Login mit dieser Email einen Admin-Eintrag erzeugen,
# falls noch keiner existiert (z.B. nach Migration). Username = local-part der E-Mail.
username = email.split("@")[0]
cur = await db.execute(
"""INSERT INTO portal_admins (username, password_hash, email)
VALUES (?, '', ?)""",
(username, email),
)
admin_id = cur.lastrowid
admin_username = username
await db.commit()
logger.info(f"Neuer portal_admin angelegt für {email} (id={admin_id})")
else:
admin_id = admin["id"]
admin_username = admin["username"]
# Token als verwendet markieren
await db.execute(
"UPDATE portal_magic_links SET used_at = CURRENT_TIMESTAMP WHERE id = ?",
(ml["id"],),
)
await db.commit()
# Audit
await log_action(
db,
admin={"id": admin_id, "username": admin_username},
ip=ip,
action="login_success",
resource_type="auth",
after={"email": email, "method": "magic_link"},
)
await db.commit()
jwt_token = create_token(admin_id, email, admin_username)
return TokenResponse(
access_token=jwt_token,
username=admin_username,
email=email,
)

Datei anzeigen

@@ -25,6 +25,15 @@ async def _enrich_org(db: aiosqlite.Connection, row: aiosqlite.Row) -> dict:
lic = await cursor.fetchone() lic = await cursor.fetchone()
org["license_status"] = lic["status"] if lic else "none" org["license_status"] = lic["status"] if lic else "none"
org["license_type"] = lic["license_type"] if lic else "" org["license_type"] = lic["license_type"] if lic else ""
# output_language aus organization_settings (Default 'de')
cursor = await db.execute(
"SELECT value FROM organization_settings WHERE organization_id = ? AND key = 'output_language'",
(org["id"],),
)
lang_row = await cursor.fetchone()
org["output_language"] = lang_row["value"] if lang_row else "de"
return org return org
@@ -57,6 +66,10 @@ async def create_organization(
org_id = cursor.lastrowid org_id = cursor.lastrowid
await db.commit() await db.commit()
# output_language als organization_settings-Eintrag persistieren
from shared.services.org_settings import set_org_setting
await set_org_setting(db, org_id, "output_language", data.output_language)
cursor = await db.execute("SELECT * FROM organizations WHERE id = ?", (org_id,)) cursor = await db.execute("SELECT * FROM organizations WHERE id = ?", (org_id,))
new_row_obj = await cursor.fetchone() new_row_obj = await cursor.fetchone()
await log_action( await log_action(
@@ -105,6 +118,11 @@ async def update_organization(
await db.execute(f"UPDATE organizations SET {set_clause} WHERE id = ?", values) await db.execute(f"UPDATE organizations SET {set_clause} WHERE id = ?", values)
await db.commit() await db.commit()
# output_language separat ueber organization_settings setzen
if data.output_language is not None:
from shared.services.org_settings import set_org_setting
await set_org_setting(db, org_id, "output_language", data.output_language)
after = await row_to_dict(db, "organizations", org_id) after = await row_to_dict(db, "organizations", org_id)
await log_action( await log_action(
db, admin, get_client_ip(request), db, admin, get_client_ip(request),

Datei-Diff unterdrückt, da er zu groß ist Diff laden

222
src/routers/translation.py Normale Datei
Datei anzeigen

@@ -0,0 +1,222 @@
"""Manuelle Artikel-Übersetzung.
Stößt die Haiku-Übersetzung fremdsprachiger Artikel an, die noch keine
deutsche Fassung haben. Im Monitor läuft der Translator seit 2026-05-22 NICHT
mehr automatisch (TRANSLATOR_ENABLED=false), weil ein sehr großer Lauf den
Refresh-Worker blockierte. Dieser Endpoint ist der bewusste manuelle Ersatz:
er läuft als entkoppelter Hintergrund-Task, blockiert keinen Request und ist
jederzeit abbrechbar.
"""
import asyncio
import logging
from datetime import datetime, timezone
from fastapi import APIRouter, Depends, HTTPException, Request
from auth import get_current_admin
from audit import log_action, get_client_ip
from database import get_db
from translation_agent import translate_articles_batch
logger = logging.getLogger("verwaltung.translation")
router = APIRouter(prefix="/api/translation", tags=["Translation"])
# Batch-Größe wie im Translator-Agent (durch das Haiku-Output-Limit bestimmt).
_BATCH_SIZE = 5
# Grobe Schätzwerte aus Produktiv-Logs (Haiku, 5 Artikel/Batch):
# rund 17 s und rund $0.03 pro Batch.
_SECONDS_PER_ARTICLE = 3.5
_COST_PER_ARTICLE = 0.006
# Artikel ohne deutsche Fassung: fremdsprachig (language gesetzt und != de)
# und headline_de ODER content_de fehlt.
_PENDING_WHERE = (
"language IS NOT NULL AND LOWER(language) != 'de' "
"AND (headline_de IS NULL OR headline_de = '' "
"OR content_de IS NULL OR content_de = '')"
)
# Modul-globaler Job-Status. Es gibt bewusst nur EINEN Übersetzungs-Job
# gleichzeitig, das hält Claude-Last und DB-Schreiblast kalkulierbar.
_job: dict = {
"running": False,
"started_at": None,
"finished_at": None,
"total": 0,
"done": 0,
"translated": 0,
"failed_batches": 0,
"cancelled": False,
"error": None,
"started_by": None,
}
_job_lock = asyncio.Lock()
_cancel_event = asyncio.Event()
# Referenz auf den laufenden Task halten, damit der Garbage Collector ihn
# nicht vorzeitig einsammelt.
_job_task: asyncio.Task | None = None
def _now_iso() -> str:
return datetime.now(timezone.utc).strftime("%Y-%m-%dT%H:%M:%SZ")
async def _count_pending(db) -> int:
cursor = await db.execute(
f"SELECT COUNT(*) FROM articles WHERE {_PENDING_WHERE}"
)
row = await cursor.fetchone()
return row[0] if row else 0
async def _run_translation_job(started_by: str):
"""Hintergrund-Task: übersetzt alle ausstehenden Artikel batchweise.
Schreibt nach jedem Batch in die DB zurück und aktualisiert den
Fortschritt, damit das Frontend live mitlesen kann. Bricht zwischen den
Batches ab, sobald _cancel_event gesetzt ist.
"""
db = await get_db()
try:
# Großzügiger Lock-Timeout, weil der Monitor parallel in dieselbe
# geteilte DB schreiben kann (WAL erlaubt nur einen Writer).
await db.execute("PRAGMA busy_timeout=30000")
cursor = await db.execute(
f"SELECT id, headline, content_original, language "
f"FROM articles WHERE {_PENDING_WHERE} ORDER BY id DESC"
)
articles = [dict(r) for r in await cursor.fetchall()]
_job["total"] = len(articles)
logger.info(
"Übersetzungs-Job gestartet von %s: %d Artikel",
started_by, len(articles),
)
for i in range(0, len(articles), _BATCH_SIZE):
if _cancel_event.is_set():
_job["cancelled"] = True
logger.info(
"Übersetzungs-Job abgebrochen bei %d/%d",
_job["done"], _job["total"],
)
break
batch = articles[i : i + _BATCH_SIZE]
try:
translations, _usage = await translate_articles_batch(batch, "de")
except Exception as e: # pragma: no cover - defensiv
_job["failed_batches"] += 1
logger.error("Übersetzungs-Batch fehlgeschlagen: %s", e)
_job["done"] = min(i + _BATCH_SIZE, len(articles))
continue
for t in translations:
hd = t.get("headline_de")
cd = t.get("content_de")
if hd or cd:
await db.execute(
"UPDATE articles SET "
"headline_de = COALESCE(?, headline_de), "
"content_de = COALESCE(?, content_de) WHERE id = ?",
(hd, cd, t["id"]),
)
_job["translated"] += 1
await db.commit()
_job["done"] = min(i + _BATCH_SIZE, len(articles))
logger.info(
"Übersetzungs-Job beendet: %d/%d übersetzt, %d Batch-Fehler, abgebrochen=%s",
_job["translated"], _job["total"], _job["failed_batches"],
_job["cancelled"],
)
except Exception as e:
_job["error"] = str(e)
logger.error(
"Übersetzungs-Job mit Fehler beendet: %s", e, exc_info=True
)
finally:
_job["running"] = False
_job["finished_at"] = _now_iso()
await db.close()
@router.get("/status")
async def translation_status(admin=Depends(get_current_admin)):
"""Aktueller Job-Status plus Anzahl noch nicht übersetzter Artikel."""
db = await get_db()
try:
pending = await _count_pending(db)
finally:
await db.close()
snap = dict(_job)
snap["pending"] = pending
snap["estimate"] = {
"seconds": round(pending * _SECONDS_PER_ARTICLE),
"cost_usd": round(pending * _COST_PER_ARTICLE, 2),
}
return snap
@router.post("/run")
async def translation_run(request: Request, admin=Depends(get_current_admin)):
"""Startet die Übersetzung aller ausstehenden Artikel als Hintergrund-Task."""
global _job_task
async with _job_lock:
if _job["running"]:
raise HTTPException(
status_code=409, detail="Es läuft bereits eine Übersetzung."
)
db = await get_db()
try:
pending = await _count_pending(db)
if pending == 0:
return {"status": "nothing_to_do", "pending": 0}
await log_action(
db, admin, get_client_ip(request), "translation.run",
resource_type="articles", after={"pending": pending},
)
finally:
await db.close()
started_by = (
admin.get("email") or admin.get("username") or str(admin.get("id"))
)
# Job-Status zurücksetzen und Task entkoppelt starten.
_cancel_event.clear()
_job.update({
"running": True,
"started_at": _now_iso(),
"finished_at": None,
"total": pending,
"done": 0,
"translated": 0,
"failed_batches": 0,
"cancelled": False,
"error": None,
"started_by": started_by,
})
_job_task = asyncio.create_task(_run_translation_job(started_by))
logger.info(
"Übersetzung manuell gestartet von %s (%d Artikel)", started_by, pending
)
return {"status": "started", "pending": pending}
@router.post("/cancel")
async def translation_cancel(request: Request, admin=Depends(get_current_admin)):
"""Bricht einen laufenden Übersetzungs-Job nach dem aktuellen Batch ab."""
if not _job["running"]:
raise HTTPException(
status_code=409, detail="Es läuft keine Übersetzung."
)
_cancel_event.set()
db = await get_db()
try:
await log_action(
db, admin, get_client_ip(request), "translation.cancel",
resource_type="articles",
)
finally:
await db.close()
return {"status": "cancelling"}

Datei anzeigen

@@ -7,7 +7,7 @@ from models import UserCreate, UserResponse
from auth import get_current_admin from auth import get_current_admin
from database import db_dependency from database import db_dependency
from audit import log_action, get_client_ip, row_to_dict from audit import log_action, get_client_ip, row_to_dict
from config import MAGIC_LINK_BASE_URL, MAGIC_LINK_EXPIRE_MINUTES from config import MAGIC_LINK_BASE_URL
import aiosqlite import aiosqlite
router = APIRouter(prefix="/api/users", tags=["users"]) router = APIRouter(prefix="/api/users", tags=["users"])

224
src/routers/x_scraper.py Normale Datei
Datei anzeigen

@@ -0,0 +1,224 @@
"""X-Scraper-Konten: Verwaltung des twscrape-Account-Pools.
Das sind die X-Login-Konten, mit denen der Monitor bei X recherchiert
(scrapen). Sie liegen im twscrape-Account-Store (config.X_ACCOUNTS_DB_PATH),
nicht in der Verwaltungs-Datenbank. twscrape wird lazy importiert, damit das
Portal auch ohne installiertes twscrape startet.
"""
import logging
import os
from datetime import datetime, timezone
from typing import Optional
import aiosqlite
from fastapi import APIRouter, Depends, HTTPException, Request
from pydantic import BaseModel, Field
from auth import get_current_admin
from audit import log_action, get_client_ip
from config import X_ACCOUNTS_DB_PATH
from database import db_dependency
logger = logging.getLogger("verwaltung.x_scraper")
router = APIRouter(prefix="/api/x-scraper", tags=["x-scraper"])
def _get_pool():
"""twscrape-AccountsPool oeffnen. Wirft HTTPException wenn nicht verfuegbar."""
try:
os.makedirs(os.path.dirname(X_ACCOUNTS_DB_PATH), exist_ok=True)
except Exception:
pass
try:
from twscrape import API
except ImportError:
raise HTTPException(status_code=503, detail="twscrape ist nicht installiert")
return API(X_ACCOUNTS_DB_PATH).pool
def _summary(acc) -> dict:
"""Account-Objekt auf ein anzeigbares Dict reduzieren -- ohne Geheimnisse."""
now = datetime.now(timezone.utc)
locked = False
locked_until = None
for ts in (acc.locks or {}).values():
if ts and ts > now:
locked = True
if locked_until is None or ts > locked_until:
locked_until = ts
return {
"username": acc.username,
"email": acc.email if acc.email and acc.email != "_" else None,
"active": bool(acc.active),
"locked": locked,
"locked_until": locked_until.isoformat() if locked_until else None,
"has_cookies": bool(acc.cookies),
"total_requests": sum((acc.stats or {}).values()),
"last_used": acc.last_used.isoformat() if acc.last_used else None,
"error_msg": acc.error_msg or None,
}
class XScraperCreate(BaseModel):
username: str = Field(min_length=1, max_length=100)
password: str = Field(default="", max_length=200)
email: str = Field(default="", max_length=200)
email_password: str = Field(default="", max_length=200)
cookies: str = Field(min_length=1, max_length=4000)
class XScraperCookies(BaseModel):
cookies: str = Field(min_length=1, max_length=4000)
class XScraperActive(BaseModel):
active: bool
@router.get("/accounts")
async def list_accounts(admin: dict = Depends(get_current_admin)):
"""Alle X-Scraper-Konten auflisten (ohne Passwoerter/Cookies)."""
pool = _get_pool()
try:
accounts = await pool.get_all()
except Exception as e:
logger.error("X-Scraper get_all fehlgeschlagen: %s", e)
raise HTTPException(status_code=500, detail="Konten konnten nicht geladen werden")
return [_summary(a) for a in accounts]
@router.post("/accounts", status_code=201)
async def add_account(
data: XScraperCreate,
request: Request,
admin: dict = Depends(get_current_admin),
db: aiosqlite.Connection = Depends(db_dependency),
):
"""Neues X-Scraper-Konto anlegen."""
pool = _get_pool()
username = data.username.strip().lstrip("@")
if not username:
raise HTTPException(status_code=422, detail="Benutzername ist erforderlich")
if await pool.get_account(username) is not None:
raise HTTPException(status_code=409, detail=f"Konto '{username}' existiert bereits")
try:
await pool.add_account(
username=username,
password=data.password or "_",
email=data.email or "_",
email_password=data.email_password or "_",
cookies=data.cookies.strip(),
)
except Exception as e:
logger.error("X-Scraper add_account fehlgeschlagen: %s", e)
raise HTTPException(status_code=500, detail="Konto konnte nicht angelegt werden")
acc = await pool.get_account(username)
if acc is None:
raise HTTPException(status_code=500, detail="Konto wurde nicht gespeichert, bitte Cookies pruefen")
await log_action(
db, admin, get_client_ip(request), action="create",
resource_type="x_scraper_account", after={"username": username, "email": data.email},
)
return _summary(acc)
@router.post("/accounts/{username}/cookies")
async def refresh_cookies(
username: str,
data: XScraperCookies,
request: Request,
admin: dict = Depends(get_current_admin),
db: aiosqlite.Connection = Depends(db_dependency),
):
"""Cookies eines bestehenden Kontos erneuern (Login auffrischen)."""
pool = _get_pool()
acc = await pool.get_account(username)
if acc is None:
raise HTTPException(status_code=404, detail="Konto nicht gefunden")
# twscrape hat keine Update-Methode -- Konto mit frischen Cookies neu anlegen.
pw, em, emp = acc.password, acc.email, acc.email_password
try:
await pool.delete_accounts([username])
await pool.add_account(
username=username, password=pw, email=em,
email_password=emp, cookies=data.cookies.strip(),
)
except Exception as e:
logger.error("X-Scraper Cookie-Refresh fehlgeschlagen: %s", e)
raise HTTPException(status_code=500, detail="Cookies konnten nicht erneuert werden")
acc = await pool.get_account(username)
if acc is None:
raise HTTPException(status_code=500, detail="Konto nach Cookie-Refresh nicht gefunden")
await log_action(
db, admin, get_client_ip(request), action="update",
resource_type="x_scraper_account", after={"username": username, "change": "cookies"},
)
return _summary(acc)
@router.post("/accounts/{username}/active")
async def set_active(
username: str,
data: XScraperActive,
request: Request,
admin: dict = Depends(get_current_admin),
db: aiosqlite.Connection = Depends(db_dependency),
):
"""Konto aktiv oder inaktiv schalten."""
pool = _get_pool()
if await pool.get_account(username) is None:
raise HTTPException(status_code=404, detail="Konto nicht gefunden")
try:
await pool.set_active(username, data.active)
except Exception as e:
logger.error("X-Scraper set_active fehlgeschlagen: %s", e)
raise HTTPException(status_code=500, detail="Status konnte nicht geaendert werden")
await log_action(
db, admin, get_client_ip(request), action="update",
resource_type="x_scraper_account", after={"username": username, "active": data.active},
)
acc = await pool.get_account(username)
return _summary(acc)
@router.delete("/accounts/{username}", status_code=204)
async def delete_account(
username: str,
request: Request,
admin: dict = Depends(get_current_admin),
db: aiosqlite.Connection = Depends(db_dependency),
):
"""X-Scraper-Konto entfernen."""
pool = _get_pool()
if await pool.get_account(username) is None:
raise HTTPException(status_code=404, detail="Konto nicht gefunden")
try:
await pool.delete_accounts([username])
except Exception as e:
logger.error("X-Scraper delete fehlgeschlagen: %s", e)
raise HTTPException(status_code=500, detail="Konto konnte nicht entfernt werden")
await log_action(
db, admin, get_client_ip(request), action="delete",
resource_type="x_scraper_account", before={"username": username},
)
@router.post("/reset-locks")
async def reset_locks(
request: Request,
admin: dict = Depends(get_current_admin),
db: aiosqlite.Connection = Depends(db_dependency),
):
"""Alle temporaeren Sperren der Konten zuruecksetzen."""
pool = _get_pool()
try:
await pool.reset_locks()
except Exception as e:
logger.error("X-Scraper reset_locks fehlgeschlagen: %s", e)
raise HTTPException(status_code=500, detail="Sperren konnten nicht zurueckgesetzt werden")
await log_action(
db, admin, get_client_ip(request), action="update",
resource_type="x_scraper_account", after={"change": "reset_locks"},
)
return {"status": "ok"}

0
src/shared/__init__.py Normale Datei
Datei anzeigen

Datei anzeigen

Datei anzeigen

@@ -0,0 +1,209 @@
"""Shared Claude CLI Client mit Usage-Tracking."""
import asyncio
import contextvars
import json
import logging
from dataclasses import dataclass
from config import CLAUDE_PATH, CLAUDE_TIMEOUT, CLAUDE_MODEL_FAST, CLAUDE_MODEL_STANDARD
# ContextVar fuer Cancel-Event: Wird vom Orchestrator gesetzt,
# call_claude prueft automatisch darauf -- kein Durchreichen noetig.
_cancel_event_var: contextvars.ContextVar[asyncio.Event | None] = contextvars.ContextVar("_cancel_event_var", default=None)
logger = logging.getLogger("osint.claude_client")
class ClaudeCliError(RuntimeError):
"""Strukturierter Fehler aus dem Claude CLI mit Kategorie.
error_type:
- "rate_limit": Anthropic Rate-Limit oder Overload (transient, retry-tauglich)
- "auth_error": Account-Problem (Organisation hat keinen Claude-Zugang,
Token abgelaufen/ungueltig) - kein Retry sinnvoll, Admin-Aktion noetig
- "timeout": Claude CLI Timeout (transient)
- "cli_error": Sonstiger CLI-Fehler (unspezifisch, Default)
"""
def __init__(self, error_type: str, message: str):
self.error_type = error_type
self.message = message
super().__init__(f"Claude CLI [{error_type}]: {message}")
def _classify_cli_error(combined_output: str) -> str:
"""Ordnet einer Fehler-Ausgabe eine error_type-Kategorie zu."""
txt = combined_output.lower()
rate_limit_keywords = ["hit your limit", "rate limit", "resets", "rate_limit", "overloaded"]
auth_error_keywords = ["does not have access", "login again", "contact your administrator"]
if any(kw in txt for kw in rate_limit_keywords):
return "rate_limit"
if any(kw in txt for kw in auth_error_keywords):
return "auth_error"
return "cli_error"
@dataclass
class ClaudeUsage:
"""Token-Verbrauch eines einzelnen Claude CLI Aufrufs."""
input_tokens: int = 0
output_tokens: int = 0
cache_creation_tokens: int = 0
cache_read_tokens: int = 0
cost_usd: float = 0.0
duration_ms: int = 0
@dataclass
class UsageAccumulator:
"""Akkumuliert Usage über mehrere Claude-Aufrufe eines Refreshs."""
input_tokens: int = 0
output_tokens: int = 0
cache_creation_tokens: int = 0
cache_read_tokens: int = 0
total_cost_usd: float = 0.0
call_count: int = 0
def add(self, usage: ClaudeUsage):
self.input_tokens += usage.input_tokens
self.output_tokens += usage.output_tokens
self.cache_creation_tokens += usage.cache_creation_tokens
self.cache_read_tokens += usage.cache_read_tokens
self.total_cost_usd += usage.cost_usd
self.call_count += 1
def _sanitize_mdash(text: str) -> str:
"""Ersetzt Gedankenstriche durch Bindestriche (KI-Indikator reduzieren)."""
return text.replace("\u2014", " - ").replace("\u2013", " - ")
async def call_claude(prompt: str, tools: str | None = "WebSearch,WebFetch", model: str | None = None, raw_text: bool = False, timeout: float | None = None) -> tuple[str, ClaudeUsage]:
"""Ruft Claude CLI auf. Gibt (result_text, usage) zurück.
Prompt wird via stdin uebergeben um OS ARG_MAX Limits zu vermeiden.
Args:
prompt: Der Prompt fuer Claude
tools: Kommagetrennte erlaubte Tools (None = keine Tools, --max-turns 1)
model: Optionales Modell (z.B. CLAUDE_MODEL_FAST fuer Haiku). None = CLAUDE_MODEL_STANDARD (Opus 4.7).
timeout: Override in Sekunden. None = Fallback auf globalen CLAUDE_TIMEOUT (1800s).
"""
effective_model = model or CLAUDE_MODEL_STANDARD
effective_timeout = timeout if timeout is not None else CLAUDE_TIMEOUT
cmd = [CLAUDE_PATH, "-p", "-", "--output-format", "json", "--model", effective_model]
if tools:
cmd.extend(["--allowedTools", tools])
else:
cmd.extend(["--max-turns", "1", "--allowedTools", ""])
if not raw_text:
cmd.extend(["--append-system-prompt",
"CRITICAL: You are a JSON-only output agent. "
"Output EXCLUSIVELY a single valid JSON object. "
"No explanatory text, no markdown fences, no continuation of previous responses. "
"Start your response with { and end with }."])
process = await asyncio.create_subprocess_exec(
*cmd, stdout=asyncio.subprocess.PIPE, stderr=asyncio.subprocess.PIPE,
stdin=asyncio.subprocess.PIPE,
env={
"PATH": "/usr/local/bin:/usr/bin:/bin",
"HOME": "/home/claude-dev",
"LANG": "C.UTF-8",
"LC_ALL": "C.UTF-8",
},
)
try:
cancel_event = _cancel_event_var.get(None)
if cancel_event:
# Cancel-aware: Monitor cancel_event while process runs
communicate_task = asyncio.create_task(
process.communicate(input=prompt.encode("utf-8"))
)
cancel_wait_task = asyncio.create_task(cancel_event.wait())
timeout_task = asyncio.create_task(asyncio.sleep(effective_timeout))
done, pending = await asyncio.wait(
[communicate_task, cancel_wait_task, timeout_task],
return_when=asyncio.FIRST_COMPLETED,
)
for p in pending:
p.cancel()
if communicate_task in done:
stdout, stderr = communicate_task.result()
elif cancel_wait_task in done:
process.kill()
await process.wait()
raise asyncio.CancelledError("Cancel angefordert")
else:
process.kill()
await process.wait()
raise TimeoutError(f"Claude CLI Timeout nach {effective_timeout}s")
else:
stdout, stderr = await asyncio.wait_for(
process.communicate(input=prompt.encode("utf-8")), timeout=effective_timeout
)
except asyncio.TimeoutError:
process.kill()
raise TimeoutError(f"Claude CLI Timeout nach {effective_timeout}s")
if process.returncode != 0:
error_msg = stderr.decode("utf-8", errors="replace").strip()
stdout_msg = stdout.decode("utf-8", errors="replace").strip()
# Rate-Limit/Auth-Fehler kommen teils als JSON auf stdout, nicht auf stderr
combined_output = f"{error_msg} {stdout_msg}"
error_type = _classify_cli_error(combined_output)
if error_type == "rate_limit":
logger.warning(f"Claude CLI Rate-Limit (Exit {process.returncode}): {stdout_msg or error_msg}")
elif error_type == "auth_error":
logger.error(f"Claude CLI Auth-Fehler (Exit {process.returncode}): {stdout_msg or error_msg}")
else:
logger.error(f"Claude CLI Fehler (Exit {process.returncode}): {error_msg}")
if stdout_msg:
logger.error(f"Claude CLI stdout bei Fehler: {stdout_msg[:500]}")
raise ClaudeCliError(error_type, stdout_msg or error_msg)
raw = stdout.decode("utf-8", errors="replace").strip()
usage = ClaudeUsage()
result_text = raw
try:
data = json.loads(raw)
# CLI kann returncode=0 liefern und trotzdem is_error=true setzen
# (z.B. "Your organization does not have access to Claude")
if data.get("is_error"):
error_text = str(data.get("result", ""))
error_type = _classify_cli_error(error_text)
if error_type == "rate_limit":
logger.warning(f"Claude CLI Rate-Limit (is_error): {error_text}")
elif error_type == "auth_error":
logger.error(f"Claude CLI Auth-Fehler (is_error): {error_text}")
else:
logger.error(f"Claude CLI Fehler (is_error): {error_text}")
raise ClaudeCliError(error_type, error_text)
result_text = data.get("result", raw)
u = data.get("usage", {})
usage = ClaudeUsage(
input_tokens=u.get("input_tokens", 0),
output_tokens=u.get("output_tokens", 0),
cache_creation_tokens=u.get("cache_creation_input_tokens", 0),
cache_read_tokens=u.get("cache_read_input_tokens", 0),
cost_usd=data.get("total_cost_usd", 0.0),
duration_ms=data.get("duration_ms", 0),
)
model_info = f" [{model}]" if model else ""
logger.info(
f"Claude{model_info}: {usage.input_tokens} in / {usage.output_tokens} out / "
f"cache {usage.cache_creation_tokens}+{usage.cache_read_tokens} / "
f"${usage.cost_usd:.4f} / {usage.duration_ms}ms"
)
except json.JSONDecodeError:
logger.warning("Claude CLI Antwort kein gültiges JSON, nutze raw output")
result_text = _sanitize_mdash(result_text)
return result_text, usage

Datei anzeigen

Datei anzeigen

@@ -0,0 +1,282 @@
"""Externe Reputations-Daten fuer Quellen.
Synchronisiert Domain-Listen von oeffentlichen Reputations-/Faktencheck-Datenbanken
und schreibt die Treffer in die sources-Spalten:
- IFCN-Signatories (anerkannte Faktenchecker) -> ifcn_signatory
- EUvsDisinfo (pro-Kreml-Desinformation, Zenodo-CSV) -> eu_disinfo_listed,
eu_disinfo_case_count, eu_disinfo_last_seen
Anschliessend wendet apply_reputation_overrides() Override-Regeln auf die
reliability-Spalte an:
- ifcn_signatory=1 -> reliability='sehr_hoch'
- eu_disinfo_case_count >= 5 -> reliability='sehr_niedrig'
- eu_disinfo_case_count >= 1 -> reliability eine Stufe runter (max bis 'niedrig')
"""
import csv
import io
import logging
from collections import defaultdict
from urllib.parse import urlparse
import aiosqlite
import httpx
logger = logging.getLogger("osint.external_reputation")
IFCN_LIST_URL = "https://raw.githubusercontent.com/IFCN/verified-signatories/main/list"
EU_DISINFO_CSV_URL = "https://zenodo.org/records/10514307/files/euvsdisinfo_base.csv?download=1"
HTTP_TIMEOUT = httpx.Timeout(60.0, connect=10.0)
# Generische Plattform-Domains, die NICHT als Quelle markiert werden duerfen
# (EUvsDisinfo aggregiert anonyme Telegram-/Twitter-Posts unter Plattform-Domains).
PLATFORM_DOMAINS = {
"t.me", "telegram.me", "telegram.org",
"twitter.com", "x.com", "mobile.twitter.com",
"youtube.com", "youtu.be", "m.youtube.com",
"facebook.com", "fb.com", "m.facebook.com",
"instagram.com", "tiktok.com", "vk.com", "ok.ru",
"rumble.com", "bitchute.com", "odysee.com",
"reddit.com", "old.reddit.com",
"wordpress.com", "blogspot.com", "medium.com",
"substack.com", "wixsite.com",
}
# Reliability-Skala in Stufenfolge (schlecht -> gut)
RELIABILITY_ORDER = ["sehr_niedrig", "niedrig", "gemischt", "hoch", "sehr_hoch"]
def _normalize_domain(raw: str | None) -> str | None:
"""Normalisiert eine Domain: lowercase, ohne www., ohne Schema/Pfad."""
if not raw:
return None
raw = raw.strip().lower()
if not raw:
return None
# Falls eine vollstaendige URL uebergeben wurde
if "://" in raw:
try:
raw = urlparse(raw).netloc or raw
except ValueError:
pass
# Pfad/Query strippen
raw = raw.split("/")[0].split("?")[0].split("#")[0]
if raw.startswith("www."):
raw = raw[4:]
return raw or None
async def _fetch_text(url: str) -> str:
"""Laedt Text von einer URL. Wirft HTTPException bei Fehler."""
async with httpx.AsyncClient(timeout=HTTP_TIMEOUT, follow_redirects=True) as client:
resp = await client.get(url)
resp.raise_for_status()
return resp.text
async def sync_ifcn_signatories(db: aiosqlite.Connection) -> dict:
"""Laedt IFCN-Domain-Liste und matcht gegen sources.domain.
Setzt ifcn_signatory=1 wo die Domain in der Liste vorkommt, sonst 0.
"""
text = await _fetch_text(IFCN_LIST_URL)
domains: set[str] = set()
for line in text.splitlines():
d = _normalize_domain(line)
if d:
domains.add(d)
logger.info("IFCN-Liste geladen: %d Domains", len(domains))
# Aktuelle Quellen mit Domain laden
cursor = await db.execute(
"SELECT id, domain FROM sources WHERE domain IS NOT NULL AND domain != ''"
)
sources = [dict(r) for r in await cursor.fetchall()]
matched_ids: list[int] = []
unmatched_ids: list[int] = []
for s in sources:
nd = _normalize_domain(s["domain"])
if nd and nd not in PLATFORM_DOMAINS and nd in domains:
matched_ids.append(s["id"])
else:
unmatched_ids.append(s["id"])
# Bulk-Update in zwei Statements
if matched_ids:
placeholders = ",".join("?" for _ in matched_ids)
await db.execute(
f"UPDATE sources SET ifcn_signatory = 1 WHERE id IN ({placeholders})",
matched_ids,
)
if unmatched_ids:
placeholders = ",".join("?" for _ in unmatched_ids)
await db.execute(
f"UPDATE sources SET ifcn_signatory = 0 WHERE id IN ({placeholders})",
unmatched_ids,
)
await db.commit()
logger.info("IFCN-Sync: %d Quellen als Faktenchecker markiert (von %d)",
len(matched_ids), len(sources))
return {
"list_size": len(domains),
"sources_checked": len(sources),
"matched": len(matched_ids),
}
async def sync_eu_disinfo(db: aiosqlite.Connection) -> dict:
"""Laedt EUvsDisinfo-CSV von Zenodo, aggregiert pro Domain, schreibt sources.
- eu_disinfo_listed: 1 wenn Domain mindestens 1x als 'disinformation' debunkt
- eu_disinfo_case_count: Anzahl Disinformation-Faelle
- eu_disinfo_last_seen: spaetestes debunk_date
"""
text = await _fetch_text(EU_DISINFO_CSV_URL)
reader = csv.DictReader(io.StringIO(text))
# Per-Domain aggregieren (nur class='disinformation')
counts: dict[str, int] = defaultdict(int)
last_seen: dict[str, str] = {}
total_rows = 0
for row in reader:
total_rows += 1
if (row.get("class") or "").strip().lower() != "disinformation":
continue
d = _normalize_domain(row.get("article_domain"))
if not d:
continue
counts[d] += 1
debunk_date = (row.get("debunk_date") or "").strip()
if debunk_date:
prev = last_seen.get(d)
if not prev or debunk_date > prev:
last_seen[d] = debunk_date
logger.info("EUvsDisinfo-CSV: %d Zeilen, %d Domains mit Desinformation",
total_rows, len(counts))
# Quellen laden + matchen
cursor = await db.execute(
"SELECT id, domain FROM sources WHERE domain IS NOT NULL AND domain != ''"
)
sources = [dict(r) for r in await cursor.fetchall()]
matched = 0
for s in sources:
nd = _normalize_domain(s["domain"])
if nd and nd not in PLATFORM_DOMAINS and nd in counts:
await db.execute(
"""UPDATE sources SET
eu_disinfo_listed = 1,
eu_disinfo_case_count = ?,
eu_disinfo_last_seen = ?
WHERE id = ?""",
(counts[nd], last_seen.get(nd), s["id"]),
)
matched += 1
else:
await db.execute(
"""UPDATE sources SET
eu_disinfo_listed = 0,
eu_disinfo_case_count = 0,
eu_disinfo_last_seen = NULL
WHERE id = ?""",
(s["id"],),
)
await db.commit()
logger.info("EUvsDisinfo-Sync: %d Quellen als Desinformations-Quelle markiert (von %d)",
matched, len(sources))
return {
"rows_in_csv": total_rows,
"domains_with_disinfo_in_csv": len(counts),
"sources_checked": len(sources),
"matched": matched,
}
def _override_reliability(current: str | None, ifcn: bool, eu_count: int) -> str | None:
"""Wendet Override-Regeln auf eine reliability-Stufe an.
Rueckgabe: neue Stufe (oder None, wenn unveraendert).
"""
cur = current or "na"
# IFCN gewinnt: zertifizierter Faktenchecker -> sehr_hoch (immer)
if ifcn:
return "sehr_hoch" if cur != "sehr_hoch" else None
# EUvsDisinfo: Downgrade
if eu_count >= 5:
return "sehr_niedrig" if cur != "sehr_niedrig" else None
if eu_count >= 1:
# Eine Stufe runter, mindestens bis 'niedrig'
if cur == "na":
return "niedrig"
if cur in RELIABILITY_ORDER:
idx = RELIABILITY_ORDER.index(cur)
new_idx = max(0, idx - 1)
new = RELIABILITY_ORDER[new_idx]
# Mindeststufe 'niedrig' bei eu_count >= 1
if RELIABILITY_ORDER.index(new) > RELIABILITY_ORDER.index("niedrig"):
new = "niedrig"
return new if new != cur else None
return None
async def apply_reputation_overrides(db: aiosqlite.Connection, source_id: int | None = None) -> dict:
"""Wendet Reliability-Override-Regeln an.
Wenn source_id angegeben ist, nur fuer diese Quelle. Sonst fuer alle Quellen.
"""
if source_id is not None:
cursor = await db.execute(
"SELECT id, reliability, ifcn_signatory, eu_disinfo_case_count "
"FROM sources WHERE id = ?",
(source_id,),
)
else:
cursor = await db.execute(
"SELECT id, reliability, ifcn_signatory, eu_disinfo_case_count FROM sources"
)
sources = [dict(r) for r in await cursor.fetchall()]
changed = 0
for s in sources:
new = _override_reliability(
s.get("reliability"),
bool(s.get("ifcn_signatory")),
int(s.get("eu_disinfo_case_count") or 0),
)
if new is not None:
await db.execute(
"UPDATE sources SET reliability = ? WHERE id = ?",
(new, s["id"]),
)
changed += 1
await db.commit()
logger.info("Reliability-Override: %d Quellen angepasst (von %d gepruefte)",
changed, len(sources))
return {"checked": len(sources), "changed": changed}
async def sync_all(db: aiosqlite.Connection) -> dict:
"""Vollstaendiger Sync: IFCN + EUvsDisinfo + Reliability-Override.
Setzt external_data_synced_at fuer alle Quellen.
"""
ifcn_result = await sync_ifcn_signatories(db)
eu_result = await sync_eu_disinfo(db)
override_result = await apply_reputation_overrides(db)
await db.execute(
"UPDATE sources SET external_data_synced_at = CURRENT_TIMESTAMP "
"WHERE domain IS NOT NULL AND domain != ''"
)
await db.commit()
return {
"ifcn": ifcn_result,
"eu_disinfo": eu_result,
"override": override_result,
}

Datei anzeigen

@@ -0,0 +1,104 @@
"""Organization-Settings-Helper.
KV-Store pro Organisation. Aktuell genutzt fuer output_language ('de'|'en').
Spaeter erweiterbar (Default-Modell, Telegram-Toggle, Theme, ...).
Cache: TTL 60s in-memory pro (tenant_id, key). Wird bei set_org_setting()
invalidiert.
"""
import logging
import time
from typing import Optional
import aiosqlite
logger = logging.getLogger("osint.org_settings")
_CACHE: dict[tuple[int, str], tuple[float, Optional[str]]] = {}
_TTL_SECONDS = 60.0
def _cache_get(tenant_id: int, key: str) -> tuple[bool, Optional[str]]:
"""(hit, value). hit=True heisst Cache traf; value kann auch None sein."""
entry = _CACHE.get((tenant_id, key))
if entry is None:
return (False, None)
expires_at, value = entry
if time.monotonic() > expires_at:
_CACHE.pop((tenant_id, key), None)
return (False, None)
return (True, value)
def _cache_put(tenant_id: int, key: str, value: Optional[str]) -> None:
_CACHE[(tenant_id, key)] = (time.monotonic() + _TTL_SECONDS, value)
def _cache_invalidate(tenant_id: int, key: str) -> None:
_CACHE.pop((tenant_id, key), None)
async def get_org_setting(
db: aiosqlite.Connection,
tenant_id: int,
key: str,
default: Optional[str] = None,
) -> Optional[str]:
"""Liest ein Org-Setting. Fallback auf default."""
if tenant_id is None:
return default
hit, cached = _cache_get(tenant_id, key)
if hit:
return cached if cached is not None else default
cursor = await db.execute(
"SELECT value FROM organization_settings WHERE organization_id = ? AND key = ?",
(tenant_id, key),
)
row = await cursor.fetchone()
value = row["value"] if row else None
_cache_put(tenant_id, key, value)
return value if value is not None else default
async def set_org_setting(
db: aiosqlite.Connection,
tenant_id: int,
key: str,
value: str,
) -> None:
"""Setzt ein Org-Setting (upsert)."""
await db.execute(
"""INSERT INTO organization_settings (organization_id, key, value, updated_at)
VALUES (?, ?, ?, CURRENT_TIMESTAMP)
ON CONFLICT(organization_id, key) DO UPDATE SET
value = excluded.value,
updated_at = CURRENT_TIMESTAMP""",
(tenant_id, key, value),
)
await db.commit()
_cache_invalidate(tenant_id, key)
logger.info("Org %s Setting %s='%s' gespeichert", tenant_id, key, value)
# Bekannte Sprachen + Anzeigenamen fuer Prompts
LANGUAGE_DISPLAY_NAMES = {
"de": "Deutsch",
"en": "English",
}
async def get_org_language(
db: aiosqlite.Connection,
tenant_id: int,
) -> str:
"""Liefert ISO-2-Sprachcode der Org (default 'de')."""
value = await get_org_setting(db, tenant_id, "output_language", default="de")
if value not in LANGUAGE_DISPLAY_NAMES:
logger.warning("Unbekannte output_language '%s' fuer Org %s -- fallback 'de'", value, tenant_id)
return "de"
return value
def language_display(lang_iso: str) -> str:
"""ISO-Code -> Anzeigename fuer Prompts ('de' -> 'Deutsch')."""
return LANGUAGE_DISPLAY_NAMES.get(lang_iso, lang_iso)

Datei anzeigen

@@ -0,0 +1,295 @@
"""Klassifiziert Quellen via Claude (Haiku) nach 4 Achsen + state_affiliated + country.
Schreibt Vorschlaege in die proposed_*-Spalten von sources und setzt
classification_source='llm_pending'. Approval erfolgt ueber separate Endpoints,
die proposed_* in die echten Spalten kopieren.
"""
import asyncio
import json
import logging
import re
import aiosqlite
from shared.agents.claude_client import call_claude
from config import CLAUDE_MODEL_FAST
logger = logging.getLogger("osint.source_classifier")
POLITICAL_VALUES = {
"links_extrem", "links", "mitte_links", "liberal", "mitte",
"konservativ", "mitte_rechts", "rechts", "rechts_extrem", "na",
}
MEDIA_TYPE_VALUES = {
"tageszeitung", "wochenzeitung", "magazin", "tv_sender", "radio",
"oeffentlich_rechtlich", "nachrichtenagentur", "online_only", "blog",
"telegram_kanal", "telegram_bot", "podcast", "social_media", "imageboard",
"think_tank", "ngo", "behoerde", "staatsmedium", "fachmedium", "sonstige",
}
RELIABILITY_VALUES = {"sehr_hoch", "hoch", "gemischt", "niedrig", "sehr_niedrig", "na"}
ALIGNMENT_VALUES = {
"prorussisch", "proiranisch", "prowestlich", "proukrainisch",
"prochinesisch", "projapanisch", "proisraelisch", "propalaestinensisch",
"protuerkisch", "panarabisch", "neutral", "sonstige",
}
def _build_prompt(src: dict, sample_articles: list[dict]) -> str:
sample_text = ""
if sample_articles:
lines = []
for i, art in enumerate(sample_articles[:5], 1):
headline = (art.get("headline") or art.get("headline_de") or "").strip()
if headline:
lines.append(f"{i}. {headline[:200]}")
if lines:
sample_text = "\nLetzte Artikel/Headlines:\n" + "\n".join(lines)
return f"""Du bist ein OSINT-Analyst und klassifizierst Nachrichten- und Medienquellen fuer ein Lagebild-Monitoring-System (DACH-Raum).
QUELLE:
Name: {src.get('name')}
URL: {src.get('url') or '-'}
Domain: {src.get('domain') or '-'}
Quellentyp: {src.get('source_type')}
Bisherige Kategorie: {src.get('category')}
Sprache: {src.get('language') or 'unbekannt'}
Bisherige Notiz (Freitext): {src.get('bias') or '-'}{sample_text}
AUFGABE: Klassifiziere die Quelle nach folgenden Achsen.
1. political_orientation:
- links_extrem (z.B. linksunten.indymedia)
- links (klar links, z.B. junge Welt, taz)
- mitte_links (linksliberal/sozialdemokratisch, z.B. SZ, Spiegel)
- liberal (wirtschafts-/grünliberal, z.B. NZZ, Zeit)
- mitte (politisch neutral, Agentur, z.B. dpa, Reuters, tagesschau)
- konservativ (buergerlich-konservativ, z.B. FAZ, Welt)
- mitte_rechts (rechts-buergerlich, z.B. Tichys Einblick, Achgut)
- rechts (klar rechts, z.B. Junge Freiheit, EpochTimes)
- rechts_extrem (z.B. Compact, PI-News)
- na (nicht klassifizierbar: Behoerde, Fachmedium, Think Tank ohne klare politische Linie)
2. media_type (genau einer):
tageszeitung, wochenzeitung, magazin, tv_sender, radio, oeffentlich_rechtlich,
nachrichtenagentur, online_only, blog, telegram_kanal, telegram_bot, podcast,
social_media, imageboard, think_tank, ngo, behoerde, staatsmedium, fachmedium, sonstige
3. reliability:
- sehr_hoch (etablierte Qualitaet, Faktencheck: tagesschau, dpa, FAZ, Reuters)
- hoch (serioes mit gelegentlichen Schwaechen: taz, Welt, BILD bei harten News)
- gemischt (Mix Meinung/Einseitigkeit: Tichys Einblick, Achgut, Boulevard)
- niedrig (haeufig irrefuehrend, schwache Quellenarbeit: Junge Freiheit, EpochTimes)
- sehr_niedrig (bekannt fuer Desinformation/Verschwoerung: Compact, RT, Sputnik, PI-News)
- na (nicht bewertbar)
4. alignments (Mehrfach, leeres Array wenn keine ausgepraegte Naehe):
prorussisch, proiranisch, prowestlich, proukrainisch, prochinesisch, projapanisch,
proisraelisch, propalaestinensisch, protuerkisch, panarabisch, neutral, sonstige
5. state_affiliated (true/false): true wenn vom Staat finanziert/kontrolliert
(RT, Sputnik, CGTN, PressTV, Xinhua, TRT). Public Service Broadcaster
wie ARD/ZDF/BBC sind NICHT state_affiliated.
6. country_code (ISO 3166-1 alpha-2): Heimatland (DE, AT, CH, RU, US, ...). null wenn unklar.
7. confidence (0.0-1.0): 0.85+ fuer bekannte Outlets, 0.5-0.85 fuer mittelbekannt, <0.5 fuer unsicher.
8. reasoning (1-2 Saetze): Kurze Begruendung der Hauptklassifikationen.
WICHTIG:
- Antworte AUSSCHLIESSLICH mit einem JSON-Objekt, kein Text drumherum.
- Nutze ausschliesslich die genannten enum-Werte (snake_case).
- Bei Unklarheit lieber `na` und niedrige confidence.
JSON-Schema:
{{
"political_orientation": "...",
"media_type": "...",
"reliability": "...",
"alignments": ["..."],
"state_affiliated": false,
"country_code": "DE",
"confidence": 0.9,
"reasoning": "..."
}}"""
async def _load_sample_articles(db: aiosqlite.Connection, name: str, domain: str | None, limit: int = 5) -> list[dict]:
"""Laedt die letzten Headlines einer Quelle (per name oder Domain-Match)."""
rows: list = []
if name:
cursor = await db.execute(
"SELECT headline, headline_de FROM articles WHERE source = ? ORDER BY collected_at DESC LIMIT ?",
(name, limit),
)
rows = await cursor.fetchall()
if not rows and domain:
cursor = await db.execute(
"SELECT headline, headline_de FROM articles WHERE source_url LIKE ? ORDER BY collected_at DESC LIMIT ?",
(f"%{domain}%", limit),
)
rows = await cursor.fetchall()
return [dict(r) for r in rows]
def _validate(parsed: dict) -> dict:
"""Validiert + normalisiert eine LLM-Antwort gegen die Enums."""
pol = parsed.get("political_orientation", "na")
if pol not in POLITICAL_VALUES:
pol = "na"
mt = parsed.get("media_type", "sonstige")
if mt not in MEDIA_TYPE_VALUES:
mt = "sonstige"
rel = parsed.get("reliability", "na")
if rel not in RELIABILITY_VALUES:
rel = "na"
aligns_raw = parsed.get("alignments") or []
if not isinstance(aligns_raw, list):
aligns_raw = []
aligns = sorted({a for a in aligns_raw if isinstance(a, str) and a in ALIGNMENT_VALUES})
sa = bool(parsed.get("state_affiliated", False))
cc = parsed.get("country_code")
if isinstance(cc, str) and len(cc) == 2 and cc.isalpha():
cc = cc.upper()
else:
cc = None
try:
confidence = float(parsed.get("confidence", 0.5))
confidence = max(0.0, min(1.0, confidence))
except (TypeError, ValueError):
confidence = 0.5
reasoning = str(parsed.get("reasoning", ""))[:1000]
return {
"political_orientation": pol,
"media_type": mt,
"reliability": rel,
"alignments": aligns,
"state_affiliated": sa,
"country_code": cc,
"confidence": confidence,
"reasoning": reasoning,
}
async def classify_source(
db: aiosqlite.Connection,
source_id: int,
sample_limit: int = 5,
model: str = CLAUDE_MODEL_FAST,
) -> dict:
"""Klassifiziert eine einzelne Quelle und schreibt die Vorschlaege in proposed_*-Spalten."""
cursor = await db.execute(
"SELECT id, name, url, domain, source_type, category, language, bias, "
"classification_source FROM sources WHERE id = ?",
(source_id,),
)
row = await cursor.fetchone()
if not row:
raise ValueError(f"Quelle {source_id} nicht gefunden")
src = dict(row)
sample = await _load_sample_articles(db, src["name"], src.get("domain"), sample_limit)
prompt = _build_prompt(src, sample)
response, usage = await call_claude(prompt, tools=None, model=model)
json_match = re.search(r"\{.*\}", response, re.DOTALL)
if not json_match:
raise ValueError(f"Keine JSON-Antwort von Claude fuer source_id={source_id}: {response[:200]}")
parsed = json.loads(json_match.group(0))
result = _validate(parsed)
# Nur classification_source auf 'llm_pending' setzen, wenn nicht bereits manuell/approved
new_src = "CASE WHEN classification_source IN ('manual','llm_approved') THEN classification_source ELSE 'llm_pending' END"
await db.execute(
f"""UPDATE sources SET
proposed_political_orientation = ?,
proposed_media_type = ?,
proposed_reliability = ?,
proposed_state_affiliated = ?,
proposed_country_code = ?,
proposed_alignments_json = ?,
proposed_confidence = ?,
proposed_reasoning = ?,
proposed_at = CURRENT_TIMESTAMP,
classification_source = {new_src}
WHERE id = ?""",
(
result["political_orientation"],
result["media_type"],
result["reliability"],
1 if result["state_affiliated"] else 0,
result["country_code"],
json.dumps(result["alignments"], ensure_ascii=False),
result["confidence"],
result["reasoning"],
source_id,
),
)
await db.commit()
logger.info(
"Klassifiziert source_id=%s '%s' -> %s/%s/%s conf=%.2f ($%.4f)",
source_id, src["name"], result["political_orientation"],
result["media_type"], result["reliability"], result["confidence"],
usage.cost_usd,
)
result["source_id"] = source_id
result["usage"] = {
"cost_usd": usage.cost_usd,
"input_tokens": usage.input_tokens,
"output_tokens": usage.output_tokens,
}
return result
async def bulk_classify(
db: aiosqlite.Connection,
limit: int = 50,
only_unclassified: bool = True,
model: str = CLAUDE_MODEL_FAST,
) -> dict:
"""Klassifiziert noch unklassifizierte Quellen (sequenziell).
Args:
limit: Maximale Anzahl Quellen pro Aufruf
only_unclassified: Wenn True, nur classification_source='legacy'.
Wenn False, auch 'llm_pending' neu klassifizieren.
"""
if only_unclassified:
where = "classification_source = 'legacy'"
else:
where = "classification_source IN ('legacy', 'llm_pending')"
cursor = await db.execute(
f"SELECT id FROM sources WHERE {where} AND status = 'active' "
f"AND source_type != 'excluded' ORDER BY id LIMIT ?",
(limit,),
)
ids = [row["id"] for row in await cursor.fetchall()]
total_cost = 0.0
success = 0
errors: list[dict] = []
for sid in ids:
try:
r = await classify_source(db, sid, model=model)
total_cost += r["usage"]["cost_usd"]
success += 1
except asyncio.CancelledError:
raise
except Exception as e:
logger.error("Klassifikation source_id=%s fehlgeschlagen: %s", sid, e, exc_info=True)
errors.append({"source_id": sid, "error": str(e)})
logger.info(
"Bulk-Klassifikation fertig: %d/%d erfolgreich, $%.4f Kosten, %d Fehler",
success, len(ids), total_cost, len(errors),
)
return {
"processed": len(ids),
"success": success,
"errors": errors,
"total_cost_usd": total_cost,
}

Datei anzeigen

@@ -0,0 +1,361 @@
"""Quellen-Health-Check Engine - prüft Erreichbarkeit, Feed-Validität, Duplikate."""
import asyncio
import logging
import json
import uuid
from urllib.parse import urlparse
import httpx
import feedparser
import aiosqlite
try:
from config import HEALTH_CHECK_USER_AGENT, HEALTH_CHECK_TIMEOUT_S
except ImportError:
HEALTH_CHECK_USER_AGENT = "Mozilla/5.0 (compatible; AegisSight-HealthCheck/1.0)"
HEALTH_CHECK_TIMEOUT_S = 15.0
# Phase 18: alternative User-Agents fuer Bot-Block-Bypass
USER_AGENT_GOOGLEBOT = "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
USER_AGENT_BROWSER = (
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 "
"(KHTML, like Gecko) Chrome/120.0 Safari/537.36"
)
REMOVEPAYWALLS_PREFIX = "https://www.removepaywall.com/search?url="
# HTTP-Codes, die einen Retry mit anderem UA rechtfertigen
RETRY_ON_STATUS = {403, 406, 429}
logger = logging.getLogger("osint.source_health")
async def run_health_checks(db: aiosqlite.Connection) -> dict:
"""Führt Health-Checks für alle aktiven Quellen durch (global + Tenant)."""
logger.info("Starte Quellen-Health-Check...")
# Alle aktiven Quellen laden (global UND Tenant-spezifisch)
cursor = await db.execute(
"SELECT id, name, url, domain, source_type, article_count, last_seen_at, "
"COALESCE(fetch_strategy, 'default') AS fetch_strategy "
"FROM sources WHERE status = 'active' "
)
sources = [dict(row) for row in await cursor.fetchall()]
# Bisherigen Stand in History archivieren, dann frisch starten
run_id = uuid.uuid4().hex[:12]
await db.execute(
"INSERT INTO source_health_history "
"(run_id, source_id, check_type, status, message, details, checked_at) "
"SELECT ?, source_id, check_type, status, message, details, checked_at "
"FROM source_health_checks",
(run_id,),
)
await db.execute("DELETE FROM source_health_checks")
await db.commit()
logger.info(f"Health-Check Run {run_id}: vorigen Stand archiviert")
checks_done = 0
issues_found = 0
# 1. Erreichbarkeit + Feed-Validität (nur Quellen mit URL)
sources_with_url = [s for s in sources if s["url"]]
async with httpx.AsyncClient(
timeout=HEALTH_CHECK_TIMEOUT_S,
follow_redirects=True,
headers={"User-Agent": HEALTH_CHECK_USER_AGENT},
) as client:
for i in range(0, len(sources_with_url), 5):
batch = sources_with_url[i:i + 5]
tasks = [_check_source_reachability(client, s) for s in batch]
results = await asyncio.gather(*tasks, return_exceptions=True)
for source, result in zip(batch, results):
if isinstance(result, Exception):
await _save_check(
db, source["id"], "reachability", "error",
f"Prüfung fehlgeschlagen: {result}",
)
issues_found += 1
else:
for check in result:
await _save_check(
db, source["id"], check["type"], check["status"],
check["message"], check.get("details"),
)
if check["status"] != "ok":
issues_found += 1
checks_done += 1
# 2. Veraltete Quellen (kein Artikel seit >30 Tagen)
for source in sources:
if source["source_type"] in ("excluded", "web_source"):
continue
stale_check = _check_stale(source)
if stale_check:
await _save_check(
db, source["id"], stale_check["type"],
stale_check["status"], stale_check["message"],
)
if stale_check["status"] != "ok":
issues_found += 1
# 3. Duplikate erkennen
duplicates = _find_duplicates(sources)
for dup in duplicates:
await _save_check(
db, dup["source_id"], "duplicate", "warning",
dup["message"], json.dumps(dup.get("details", {})),
)
issues_found += 1
await db.commit()
logger.info(
f"Health-Check abgeschlossen: {checks_done} Quellen geprüft, "
f"{issues_found} Probleme gefunden"
)
return {"checked": checks_done, "issues": issues_found}
async def _check_source_reachability(
client: httpx.AsyncClient, source: dict,
) -> list[dict]:
"""Prüft Erreichbarkeit und Feed-Validität einer Quelle.
Phase 18: pro Quelle eine fetch_strategy ('default' | 'googlebot' | 'paywall' | 'skip').
Bei 'default' wird im Fehlerfall (403/406/429) ein Retry mit Googlebot-UA gemacht.
Bei 'paywall' wird auf removepaywall.com umgeleitet.
Bei 'skip' wird kein Check ausgeführt.
"""
checks = []
url = source["url"]
strategy = source.get("fetch_strategy") or "default"
# 'skip' -> kein Check (bekannte unerreichbare Quellen, z.B. Login-only)
if strategy == "skip":
checks.append({
"type": "reachability", "status": "ok",
"message": "Health-Check uebersprungen (fetch_strategy=skip)",
})
return checks
# URL-Schema sicherstellen
if url and not url.startswith(("http://", "https://")):
url = "https://" + url.lstrip("/")
# Initialen UA waehlen
initial_ua = HEALTH_CHECK_USER_AGENT
initial_url = url
if strategy == "googlebot":
initial_ua = USER_AGENT_GOOGLEBOT
elif strategy == "paywall":
# Paywall-Quellen: Feed-URL direkt laden, aber mit Browser-UA (versucht Bot-Detection zu umgehen).
# removepaywall.com ist fuer Article-URLs, NICHT fuer RSS-Feed-Validity-Checks
# (gibt HTML statt XML zurueck). Researcher-Pipeline nutzt removepaywall fuer Inhalte.
initial_ua = USER_AGENT_BROWSER
try:
resp = await client.get(initial_url, headers={"User-Agent": initial_ua})
# Paywall-Quellen: 4xx ist erwartbar (Bot-Detection), als warning markieren statt error
if strategy == "paywall" and resp.status_code in RETRY_ON_STATUS:
checks.append({
"type": "reachability", "status": "warning",
"message": f"Paywall-Quelle, Direkt-Zugang HTTP {resp.status_code} (Researcher-Pipeline nutzt removepaywall.com fuer Inhalte)",
})
return checks # Feed-Validity-Check skippen (Paywall liefert kein RSS)
# Bot-Block-Retry nur bei strategy='default'
if (
strategy == "default"
and resp.status_code in RETRY_ON_STATUS
):
retry = await client.get(url, headers={"User-Agent": USER_AGENT_GOOGLEBOT})
if retry.status_code < 400:
resp = retry # Retry hat geholfen
checks.append({
"type": "reachability", "status": "warning",
"message": f"Erreichbar nur mit Googlebot-UA (Standard-UA bekam HTTP {initial_url and 'unknown' or 'XXX'})",
})
if resp.status_code >= 400:
checks.append({
"type": "reachability",
"status": "error",
"message": f"HTTP {resp.status_code} - nicht erreichbar",
"details": json.dumps({"status_code": resp.status_code, "url": url}),
})
return checks
if resp.status_code >= 300:
checks.append({
"type": "reachability",
"status": "warning",
"message": f"HTTP {resp.status_code} - Weiterleitung",
"details": json.dumps({
"status_code": resp.status_code,
"final_url": str(resp.url),
}),
})
else:
checks.append({
"type": "reachability",
"status": "ok",
"message": "Erreichbar",
})
# Feed-Validität nur für RSS-Feeds
if source["source_type"] == "rss_feed":
text = resp.text[:20000]
if "<rss" not in text and "<feed" not in text and "<channel" not in text:
checks.append({
"type": "feed_validity",
"status": "error",
"message": "Kein gültiger RSS/Atom-Feed",
})
else:
feed = await asyncio.to_thread(feedparser.parse, text)
if feed.get("bozo") and not feed.entries:
checks.append({
"type": "feed_validity",
"status": "error",
"message": "Feed fehlerhaft (bozo)",
"details": json.dumps({
"bozo_exception": str(feed.get("bozo_exception", "")),
}),
})
elif not feed.entries:
checks.append({
"type": "feed_validity",
"status": "warning",
"message": "Feed erreichbar aber leer",
})
else:
checks.append({
"type": "feed_validity",
"status": "ok",
"message": f"Feed gültig ({len(feed.entries)} Einträge)",
})
except httpx.TimeoutException:
checks.append({
"type": "reachability",
"status": "error",
"message": "Timeout (15s)",
})
except httpx.ConnectError as e:
checks.append({
"type": "reachability",
"status": "error",
"message": f"Verbindung fehlgeschlagen: {e}",
})
except Exception as e:
checks.append({
"type": "reachability",
"status": "error",
"message": f"{type(e).__name__}: {e}",
})
return checks
def _check_stale(source: dict) -> dict | None:
"""Prüft ob eine Quelle veraltet ist (keine Artikel seit >30 Tagen)."""
if source["source_type"] == "excluded":
return None
article_count = source.get("article_count") or 0
last_seen = source.get("last_seen_at")
if article_count == 0:
return {
"type": "stale",
"status": "warning",
"message": "Noch nie Artikel geliefert",
}
if last_seen:
try:
from datetime import datetime
last_dt = datetime.fromisoformat(last_seen)
now = datetime.now()
age_days = (now - last_dt).days
if age_days > 30:
return {
"type": "stale",
"status": "warning",
"message": f"Letzter Artikel vor {age_days} Tagen",
}
except (ValueError, TypeError):
pass
return None
def _find_duplicates(sources: list[dict]) -> list[dict]:
"""Findet doppelte Quellen (gleiche URL)."""
duplicates = []
url_map = {}
for s in sources:
if not s["url"]:
continue
url_norm = s["url"].lower().rstrip("/")
if url_norm in url_map:
existing = url_map[url_norm]
duplicates.append({
"source_id": s["id"],
"message": f"Doppelte URL wie '{existing['name']}' (ID {existing['id']})",
"details": {"duplicate_of": existing["id"], "type": "url"},
})
else:
url_map[url_norm] = s
return duplicates
async def _save_check(
db: aiosqlite.Connection, source_id: int, check_type: str,
status: str, message: str, details: str = None,
):
"""Speichert ein Health-Check-Ergebnis."""
await db.execute(
"INSERT INTO source_health_checks "
"(source_id, check_type, status, message, details) "
"VALUES (?, ?, ?, ?, ?)",
(source_id, check_type, status, message, details),
)
async def get_health_summary(db: aiosqlite.Connection) -> dict:
"""Gibt eine Zusammenfassung der letzten Health-Check-Ergebnisse zurück."""
cursor = await db.execute("""
SELECT
h.id, h.source_id, s.name, s.domain, s.url, s.source_type,
h.check_type, h.status, h.message, h.details, h.checked_at
FROM source_health_checks h
JOIN sources s ON s.id = h.source_id
ORDER BY
CASE h.status WHEN 'error' THEN 0 WHEN 'warning' THEN 1 ELSE 2 END,
s.name
""")
checks = [dict(row) for row in await cursor.fetchall()]
error_count = sum(1 for c in checks if c["status"] == "error")
warning_count = sum(1 for c in checks if c["status"] == "warning")
ok_count = sum(1 for c in checks if c["status"] == "ok")
cursor = await db.execute(
"SELECT MAX(checked_at) as last_check FROM source_health_checks"
)
row = await cursor.fetchone()
last_check = row["last_check"] if row else None
return {
"last_check": last_check,
"total_checks": len(checks),
"errors": error_count,
"warnings": warning_count,
"ok": ok_count,
"checks": checks,
}

Datei anzeigen

@@ -0,0 +1,461 @@
"""KI-gestützte Quellen-Vorschläge via Haiku + deterministische Karteileichen-Heuristik."""
import json
import logging
import re
import aiosqlite
from agents.claude_client import call_claude
from config import CLAUDE_MODEL_FAST
logger = logging.getLogger("osint.source_suggester")
# Schwelle für "stumm seit": eine Quelle, die seit mehr als so vielen Tagen
# keinen Artikel mehr geliefert hat, gilt als Karteileichen-Kandidat.
STALE_DEACTIVATE_THRESHOLD_DAYS = 60
async def generate_stale_deactivation_suggestions(
db: aiosqlite.Connection,
days_threshold: int = STALE_DEACTIVATE_THRESHOLD_DAYS,
) -> int:
"""Erzeugt deactivate_source-Vorschläge für Karteileichen-Quellen.
Karteileiche = aktive Quelle, die entweder noch nie einen Artikel geliefert hat
(article_count = 0) oder seit mehr als days_threshold Tagen stumm ist
(last_seen_at älter als die Schwelle). Reine SQL-Heuristik, kein KI-Aufruf.
Doppel-Vermeidung: existiert bereits ein pending deactivate-Vorschlag für
dieselbe source_id, wird kein neuer erzeugt.
Returns: Anzahl neu erstellter Vorschläge.
"""
cursor = await db.execute(
f"""
SELECT id, name, url, domain, article_count, last_seen_at
FROM sources
WHERE status = 'active'
AND (
COALESCE(article_count, 0) = 0
OR (last_seen_at IS NOT NULL
AND last_seen_at < datetime('now', '-{int(days_threshold)} days'))
)
"""
)
candidates = [dict(row) for row in await cursor.fetchall()]
if not candidates:
return 0
cursor = await db.execute(
"SELECT DISTINCT source_id FROM source_suggestions "
"WHERE status = 'pending' AND suggestion_type = 'deactivate_source' "
"AND source_id IS NOT NULL"
)
already_pending = {row["source_id"] for row in await cursor.fetchall()}
created = 0
for c in candidates:
sid = c["id"]
if sid in already_pending:
continue
if (c["article_count"] or 0) == 0:
reason = "Hat seit Anlage noch nie einen Artikel geliefert."
else:
reason = (
f"Letzter Artikel vor mehr als {days_threshold} Tagen "
f"(last_seen_at={c['last_seen_at']})."
)
title = f"{c['name']} (ID {sid}) - Karteileiche, deaktivieren?"
description = (
f"Quelle: {c['name']} | URL: {c['url']} | Domain: {c['domain'] or '-'}\n"
f"Begründung: {reason}\n"
f"article_count={c['article_count'] or 0}, "
f"last_seen_at={c['last_seen_at'] or 'NULL'}\n"
"Hinweis: Quelle wurde automatisch als inaktiv erkannt. "
"Bitte vor Annahme prüfen, ob sie wirklich nicht mehr gebraucht wird."
)
suggested_data = json.dumps(
{"action": "deactivate", "source_id": sid}, ensure_ascii=False
)
await db.execute(
"INSERT INTO source_suggestions "
"(suggestion_type, title, description, source_id, suggested_data, "
" priority, status) VALUES "
"('deactivate_source', ?, ?, ?, ?, 'medium', 'pending')",
(title, description, sid, suggested_data),
)
created += 1
if created > 0:
await db.commit()
logger.info(
"Karteileichen-Heuristik: %d neue deactivate-Vorschläge erstellt "
"(%d Kandidaten, %d bereits pending)",
created, len(candidates), len(already_pending),
)
else:
logger.info(
"Karteileichen-Heuristik: keine neuen Vorschläge "
"(%d Kandidaten, alle bereits pending)",
len(candidates),
)
return created
async def generate_strategy_escalation_suggestions(db: aiosqlite.Connection) -> int:
"""Erzeugt deactivate_source-Vorschläge für Quellen, bei denen die fetch_strategy
bereits eskaliert wurde (googlebot oder paywall) und der Reachability-Check
trotzdem error meldet.
Beispiel: Rheinische Post hat fetch_strategy=googlebot, kriegt aber HTTP 403.
-> Strategie greift nicht, Quelle ist faktisch nicht abrufbar. Vorschlag: deaktivieren.
Doppel-Vermeidung wie in der Karteileichen-Heuristik: nur wenn noch kein pending
deactivate-Vorschlag für die source_id existiert.
Returns: Anzahl neu erstellter Vorschläge.
"""
cursor = await db.execute(
"""
SELECT s.id, s.name, s.url, s.domain, s.fetch_strategy, h.message
FROM sources s
JOIN source_health_checks h ON h.source_id = s.id
WHERE s.status = 'active'
AND s.fetch_strategy IN ('googlebot', 'paywall')
AND h.check_type = 'reachability'
AND h.status = 'error'
"""
)
candidates = [dict(row) for row in await cursor.fetchall()]
if not candidates:
return 0
cursor = await db.execute(
"SELECT DISTINCT source_id FROM source_suggestions "
"WHERE status = 'pending' AND suggestion_type = 'deactivate_source' "
"AND source_id IS NOT NULL"
)
already_pending = {row["source_id"] for row in await cursor.fetchall()}
created = 0
for c in candidates:
sid = c["id"]
if sid in already_pending:
continue
title = f"{c['name']} (ID {sid}) - Strategie greift nicht"
description = (
f"Quelle: {c['name']} | URL: {c['url']} | Domain: {c['domain'] or '-'}\n"
f"fetch_strategy='{c['fetch_strategy']}' wurde bereits zur Eskalation gesetzt, "
f"liefert beim Health-Check aber weiter einen Fehler:\n"
f" {c['message']}\n"
"Vorschlag: deaktivieren oder fetch_strategy='skip' setzen, damit die Quelle "
"den Health-Check nicht weiter verfälscht.\n"
"Hinweis: Quelle wurde automatisch erkannt. Bitte vor Annahme prüfen."
)
suggested_data = json.dumps(
{"action": "deactivate", "source_id": sid,
"reason": "fetch_strategy_failed", "current_strategy": c["fetch_strategy"]},
ensure_ascii=False,
)
await db.execute(
"INSERT INTO source_suggestions "
"(suggestion_type, title, description, source_id, suggested_data, "
" priority, status) VALUES "
"('deactivate_source', ?, ?, ?, ?, 'high', 'pending')",
(title, description, sid, suggested_data),
)
created += 1
if created > 0:
await db.commit()
logger.info(
"Strategie-Eskalations-Heuristik: %d neue deactivate-Vorschläge "
"(%d Kandidaten, %d bereits pending)",
created, len(candidates), len(already_pending),
)
return created
async def generate_suggestions(db: aiosqlite.Connection) -> int:
"""Generiert Quellen-Vorschläge basierend auf Health-Checks und Lückenanalyse.
Drei Stufen, in dieser Reihenfolge ausgeführt (spezifisch -> generisch -> KI):
1. Deterministisch: Strategie-Eskalations-Heuristik (fetch_strategy=googlebot
oder paywall, aber Reachability weiter error) erzeugt deactivate_source-
Vorschläge mit Priorität 'high'. Spezifischste Diagnose: "Workaround
greift nicht". Läuft ZUERST, damit diese Sources nicht von der
generischeren Karteileichen-Stufe weggefangen werden.
2. Deterministisch: Karteileichen-Heuristik (article_count=0 oder >60d stumm)
erzeugt sofort deactivate_source-Vorschläge für alle übrigen toten
Quellen ohne KI-Aufruf.
3. KI-basiert: Haiku schaut sich Quellensammlung + Health-Probleme an
und schlägt weitere Verbesserungen vor (add_source, deactivate_source,
fix_url, ...).
Rückgabe ist die Gesamtzahl neu erzeugter Vorschläge aller Stufen.
"""
strategy_count = await generate_strategy_escalation_suggestions(db)
stale_count = await generate_stale_deactivation_suggestions(db)
logger.info("Starte Quellen-Vorschläge via Haiku...")
# 1. Aktuelle Quellen laden
cursor = await db.execute(
"SELECT id, name, url, domain, source_type, category, status, "
"article_count, last_seen_at "
"FROM sources WHERE tenant_id IS NULL ORDER BY category, name"
)
sources = [dict(row) for row in await cursor.fetchall()]
# 2. Health-Check-Probleme laden
cursor = await db.execute("""
SELECT h.source_id, s.name, s.domain, s.url,
h.check_type, h.status, h.message
FROM source_health_checks h
JOIN sources s ON s.id = h.source_id
WHERE h.status IN ('error', 'warning')
""")
issues = [dict(row) for row in await cursor.fetchall()]
# 3. Alte pending-Vorschläge entfernen (älter als 30 Tage)
await db.execute(
"DELETE FROM source_suggestions "
"WHERE status = 'pending' AND created_at < datetime('now', '-30 days')"
)
# 4. Quellen-Zusammenfassung für Haiku
categories = {}
for s in sources:
cat = s["category"]
if cat not in categories:
categories[cat] = []
categories[cat].append(s)
source_summary = ""
for cat, cat_sources in sorted(categories.items()):
active = [
s for s in cat_sources
if s["status"] == "active" and s["source_type"] != "excluded"
]
source_summary += f"\n{cat} ({len(active)} aktiv): "
source_summary += ", ".join(s["name"] for s in active[:10])
if len(active) > 10:
source_summary += f" ... (+{len(active) - 10} weitere)"
issues_summary = ""
if issues:
issues_summary = "\n\nProbleme gefunden:\n"
for issue in issues[:20]:
issues_summary += (
f"- [source_id={issue['source_id']}] {issue['name']} ({issue['domain']}): "
f"{issue['check_type']} = {issue['status']} - {issue['message']}\n"
)
prompt = f"""Du bist ein OSINT-Analyst und verwaltest die Quellensammlung eines Lagebildmonitors für Sicherheitsbehörden.
Aktuelle Quellensammlung:{source_summary}{issues_summary}
Aufgabe: Analysiere die Quellensammlung und schlage Verbesserungen vor.
Beachte:
1. Bei Problemen (nicht erreichbar, leere Feeds): Schlage "deactivate_source" vor und setze "source_id" auf die ID aus [source_id=X] in der Problemliste
2. Fehlende wichtige OSINT-Quellen: Schlage "add_source" mit konkreter RSS-Feed-URL vor
3. Fokus auf deutschsprachige + wichtige internationale Nachrichtenquellen
4. Nur Quellen vorschlagen, die NICHT bereits vorhanden sind
5. Maximal 5 Vorschläge
Antworte NUR mit einem JSON-Array. Jedes Element:
{{
"type": "add_source|deactivate_source|fix_url|remove_source",
"title": "Kurzer Titel",
"description": "Begründung",
"priority": "low|medium|high",
"source_id": null,
"data": {{
"name": "Anzeigename",
"url": "https://...",
"domain": "example.de",
"category": "international|nachrichtenagentur|qualitaetszeitung|behoerde|fachmedien|think-tank|regional|sonstige"
}}
}}
Nur das JSON-Array, kein anderer Text."""
try:
response, usage = await call_claude(
prompt, tools=None, model=CLAUDE_MODEL_FAST,
)
json_match = re.search(r'\[.*\]', response, re.DOTALL)
if not json_match:
logger.warning("Keine Vorschläge von Haiku erhalten (kein JSON)")
return 0
suggestions = json.loads(json_match.group(0))
count = 0
for suggestion in suggestions[:5]:
stype = suggestion.get("type", "add_source")
title = suggestion.get("title", "")
desc = suggestion.get("description", "")
priority = suggestion.get("priority", "medium")
source_id = suggestion.get("source_id")
data = json.dumps(
suggestion.get("data", {}), ensure_ascii=False,
)
# source_id validieren (muss existieren oder None sein)
if source_id is not None:
cursor = await db.execute(
"SELECT id FROM sources WHERE id = ?", (source_id,),
)
if not await cursor.fetchone():
source_id = None
# Duplikat-Check: gleicher Typ + gleiche source_id oder gleiche Domain pending?
if source_id is not None:
cursor = await db.execute(
"SELECT id FROM source_suggestions "
"WHERE suggestion_type = ? AND source_id = ? AND status = 'pending'",
(stype, source_id),
)
else:
# Bei add_source ohne source_id: Domain aus suggested_data prüfen
check_domain = suggestion.get('data', {}).get('domain', '')
if check_domain:
cursor = await db.execute(
"SELECT id FROM source_suggestions "
"WHERE suggestion_type = ? AND suggested_data LIKE ? AND status = 'pending'",
(stype, f'%{check_domain}%'),
)
else:
cursor = await db.execute(
"SELECT id FROM source_suggestions "
"WHERE title = ? AND status = 'pending'",
(title,),
)
if await cursor.fetchone():
continue
await db.execute(
"INSERT INTO source_suggestions "
"(suggestion_type, title, description, source_id, "
"suggested_data, priority, status) "
"VALUES (?, ?, ?, ?, ?, ?, 'pending')",
(stype, title, desc, source_id, data, priority),
)
count += 1
await db.commit()
logger.info(
f"Quellen-Vorschläge: {count} neue Vorschläge generiert via Haiku "
f"(+{stale_count} Karteileichen, +{strategy_count} Strategie-Eskalation) "
f"(Haiku: {usage.input_tokens} in / {usage.output_tokens} out / "
f"${usage.cost_usd:.4f})"
)
return count + stale_count + strategy_count
except Exception as e:
logger.error(f"Fehler bei Quellen-Vorschlägen: {e}", exc_info=True)
return stale_count + strategy_count
async def apply_suggestion(
db: aiosqlite.Connection, suggestion_id: int, accept: bool,
) -> dict:
"""Wendet einen Vorschlag an oder lehnt ihn ab."""
cursor = await db.execute(
"SELECT * FROM source_suggestions WHERE id = ?", (suggestion_id,),
)
suggestion = await cursor.fetchone()
if not suggestion:
raise ValueError("Vorschlag nicht gefunden")
suggestion = dict(suggestion)
if suggestion["status"] != "pending":
raise ValueError(f"Vorschlag bereits {suggestion['status']}")
new_status = "accepted" if accept else "rejected"
result = {"status": new_status, "action": None}
if accept:
stype = suggestion["suggestion_type"]
data = (
json.loads(suggestion["suggested_data"])
if suggestion["suggested_data"]
else {}
)
if stype == "add_source":
name = data.get("name", "Unbenannt")
url = data.get("url")
domain = data.get("domain", "")
category = data.get("category", "sonstige")
source_type = "rss_feed" if url and any(
x in (url or "").lower()
for x in ("rss", "feed", "xml", "atom")
) else "web_source"
if url:
cursor = await db.execute(
"SELECT id FROM sources WHERE url = ? AND tenant_id IS NULL",
(url,),
)
if await cursor.fetchone():
result["action"] = "übersprungen (URL bereits vorhanden)"
new_status = "rejected"
else:
await db.execute(
"INSERT INTO sources "
"(name, url, domain, source_type, category, status, "
"added_by, tenant_id) "
"VALUES (?, ?, ?, ?, ?, 'active', 'haiku-vorschlag', NULL)",
(name, url, domain, source_type, category),
)
result["action"] = f"Quelle '{name}' angelegt"
else:
result["action"] = "übersprungen (keine URL)"
new_status = "rejected"
elif stype == "deactivate_source":
source_id = suggestion["source_id"]
if source_id:
await db.execute(
"UPDATE sources SET status = 'inactive' WHERE id = ?",
(source_id,),
)
result["action"] = "Quelle deaktiviert"
else:
result["action"] = "übersprungen (keine source_id)"
elif stype == "remove_source":
source_id = suggestion["source_id"]
if source_id:
await db.execute(
"DELETE FROM sources WHERE id = ?", (source_id,),
)
result["action"] = "Quelle gelöscht"
else:
result["action"] = "übersprungen (keine source_id)"
elif stype == "fix_url":
source_id = suggestion["source_id"]
new_url = data.get("url")
if source_id and new_url:
await db.execute(
"UPDATE sources SET url = ? WHERE id = ?",
(new_url, source_id),
)
result["action"] = f"URL aktualisiert auf {new_url}"
else:
result["action"] = "übersprungen (keine source_id oder URL)"
await db.execute(
"UPDATE source_suggestions SET status = ?, reviewed_at = CURRENT_TIMESTAMP "
"WHERE id = ?",
(new_status, suggestion_id),
)
await db.commit()
result["status"] = new_status
return result

742
src/shared/source_rules.py Normale Datei
Datei anzeigen

@@ -0,0 +1,742 @@
"""Dynamische Quellen-Regeln aus der Datenbank."""
import logging
import re
import json
import asyncio
from urllib.parse import urlparse
import httpx
import feedparser
import hashlib
from config import CLAUDE_PATH, CLAUDE_TIMEOUT, MAX_FEEDS_PER_DOMAIN
logger = logging.getLogger("osint.source_rules")
# Domain -> Kategorie Mapping für Auto-Erkennung
DOMAIN_CATEGORY_MAP = {
# Nachrichtenagenturen
"reuters.com": "nachrichtenagentur",
"apnews.com": "nachrichtenagentur",
"dpa.com": "nachrichtenagentur",
"afp.com": "nachrichtenagentur",
# Öffentlich-Rechtlich
"tagesschau.de": "oeffentlich-rechtlich",
"zdf.de": "oeffentlich-rechtlich",
"dw.com": "oeffentlich-rechtlich",
"br.de": "oeffentlich-rechtlich",
"ndr.de": "oeffentlich-rechtlich",
"wdr.de": "oeffentlich-rechtlich",
"mdr.de": "oeffentlich-rechtlich",
"swr.de": "oeffentlich-rechtlich",
"hr.de": "oeffentlich-rechtlich",
"rbb24.de": "oeffentlich-rechtlich",
"ard.de": "oeffentlich-rechtlich",
"orf.at": "oeffentlich-rechtlich",
"srf.ch": "oeffentlich-rechtlich",
# Qualitätszeitungen
"spiegel.de": "qualitaetszeitung",
"zeit.de": "qualitaetszeitung",
"faz.net": "qualitaetszeitung",
"sueddeutsche.de": "qualitaetszeitung",
"nzz.ch": "qualitaetszeitung",
"welt.de": "qualitaetszeitung",
"tagesspiegel.de": "qualitaetszeitung",
"fr.de": "qualitaetszeitung",
"stern.de": "qualitaetszeitung",
"focus.de": "qualitaetszeitung",
# Behörden
"bmi.bund.de": "behoerde",
"europol.europa.eu": "behoerde",
"bka.de": "behoerde",
"bsi.bund.de": "behoerde",
"verfassungsschutz.de": "behoerde",
"bpb.de": "behoerde",
# Fachmedien
"netzpolitik.org": "fachmedien",
"handelsblatt.com": "fachmedien",
"heise.de": "fachmedien",
"golem.de": "fachmedien",
"t3n.de": "fachmedien",
"wiwo.de": "fachmedien",
# Think Tanks
"swp-berlin.org": "think-tank",
"iiss.org": "think-tank",
"brookings.edu": "think-tank",
"rand.org": "think-tank",
"dgap.org": "think-tank",
"chathamhouse.org": "think-tank",
# International
"bbc.co.uk": "international",
"bbc.com": "international",
"aljazeera.com": "international",
"france24.com": "international",
"cnn.com": "international",
"theguardian.com": "international",
"nytimes.com": "international",
"washingtonpost.com": "international",
"lemonde.fr": "international",
"elpais.com": "international",
# Regional
"berliner-zeitung.de": "regional",
"hamburger-abendblatt.de": "regional",
"stuttgarter-zeitung.de": "regional",
"ksta.de": "regional",
"rp-online.de": "regional",
"merkur.de": "regional",
# Telegram
"t.me": "telegram",
}
# Bekannte Feed-Pfade zum Durchprobieren
_FEED_PATHS = ["/feed", "/rss", "/rss.xml", "/atom.xml", "/feed.xml", "/index.xml", "/feed/rss", "/feeds/posts/default"]
# Erweiterte nachrichtenspezifische Feed-Pfade für Multi-Discovery
_NEWS_FEED_PATHS = [
"/world/rss", "/world/rss.xml", "/world/feed",
"/politics/rss", "/politics/rss.xml", "/politics/feed",
"/business/rss", "/business/rss.xml", "/business/feed",
"/technology/rss", "/technology/rss.xml", "/technology/feed",
"/environment/rss", "/environment/rss.xml", "/environment/feed",
"/science/rss", "/science/rss.xml", "/science/feed",
"/europe/rss", "/europe/rss.xml", "/europe/feed",
"/security/rss", "/security/rss.xml", "/security/feed",
"/international/rss", "/international/rss.xml", "/international/feed",
"/economy/rss", "/economy/rss.xml", "/economy/feed",
"/defence/rss", "/defence/rss.xml", "/defence/feed",
"/middle-east/rss", "/middle-east/rss.xml",
"/asia/rss", "/asia/rss.xml",
"/africa/rss", "/africa/rss.xml",
"/americas/rss", "/americas/rss.xml",
"/uk-news/rss", "/us-news/rss",
"/commentisfree/rss", "/opinion/rss",
"/law/rss", "/media/rss",
"/global-development/rss",
"/news/feed", "/news/rss", "/news/rss.xml",
"/politik/rss", "/politik/rss.xml",
"/wirtschaft/rss", "/wirtschaft/rss.xml",
"/panorama/rss", "/panorama/rss.xml",
"/wissen/rss", "/wissen/rss.xml",
"/ausland/rss", "/ausland/rss.xml",
"/inland/rss", "/inland/rss.xml",
"/netzwelt/rss", "/netzwelt/rss.xml",
"/kultur/rss", "/kultur/rss.xml",
]
# Bekannte Feed-Subdomains für Portale die Feeds auf separater Domain hosten
_DOMAIN_FEED_URLS = {
"bbc.com": [
"https://feeds.bbci.co.uk/news/rss.xml",
"https://feeds.bbci.co.uk/news/world/rss.xml",
"https://feeds.bbci.co.uk/news/business/rss.xml",
"https://feeds.bbci.co.uk/news/politics/rss.xml",
"https://feeds.bbci.co.uk/news/technology/rss.xml",
"https://feeds.bbci.co.uk/news/science_and_environment/rss.xml",
"https://feeds.bbci.co.uk/news/health/rss.xml",
"https://feeds.bbci.co.uk/news/education/rss.xml",
"https://feeds.bbci.co.uk/news/world/middle_east/rss.xml",
"https://feeds.bbci.co.uk/news/world/europe/rss.xml",
"https://feeds.bbci.co.uk/news/world/africa/rss.xml",
"https://feeds.bbci.co.uk/news/world/asia/rss.xml",
"https://feeds.bbci.co.uk/news/world/us_and_canada/rss.xml",
"https://feeds.bbci.co.uk/news/world/latin_america/rss.xml",
"https://feeds.bbci.co.uk/news/entertainment_and_arts/rss.xml",
],
"bbc.co.uk": "bbc.com", # Alias
"reuters.com": [
"https://www.reutersagency.com/feed/",
],
"aljazeera.com": [
"https://www.aljazeera.com/xml/rss/all.xml",
],
}
def _get_extra_feed_urls(domain: str) -> list[str]:
"""Gibt bekannte Feed-URLs für Domains mit separater Feed-Subdomain zurück."""
entry = _DOMAIN_FEED_URLS.get(domain)
if isinstance(entry, str):
# Alias — auf andere Domain verweisen
entry = _DOMAIN_FEED_URLS.get(entry)
if isinstance(entry, list):
return entry
return []
def _normalize_url(url: str) -> str:
"""URL normalisieren (https:// ergänzen falls fehlend)."""
url = url.strip()
if not url.startswith(("http://", "https://")):
url = "https://" + url
return url
# Subdomain → kanonische Domain Zuordnung
_DOMAIN_ALIASES = {
"feeds.bbci.co.uk": "bbc.com",
"rss.sueddeutsche.de": "sueddeutsche.de",
"on.orf.at": "orf.at",
"rss.orf.at": "orf.at",
"rss.dw.com": "dw.com",
"newsfeed.zeit.de": "zeit.de",
"reutersagency.com": "reuters.com",
"edition.cnn.com": "cnn.com",
"rsshub.app": "apnews.com",
}
def _extract_domain(url: str) -> str:
"""Domain aus URL extrahieren (ohne www., mit Alias-Normalisierung)."""
parsed = urlparse(url)
domain = parsed.hostname or ""
if domain.startswith("www."):
domain = domain[4:]
return _DOMAIN_ALIASES.get(domain, domain)
def _detect_category(domain: str) -> str:
"""Kategorie anhand der Domain erkennen."""
if domain in DOMAIN_CATEGORY_MAP:
return DOMAIN_CATEGORY_MAP[domain]
# Subdomain-Match: z.B. feeds.reuters.com -> reuters.com
parts = domain.split(".")
if len(parts) > 2:
parent = ".".join(parts[-2:])
if parent in DOMAIN_CATEGORY_MAP:
return DOMAIN_CATEGORY_MAP[parent]
return "sonstige"
# Bekannte Domain → Anzeigename Zuordnungen
DOMAIN_DISPLAY_NAMES = {
"tagesschau.de": "tagesschau",
"zdf.de": "ZDF heute",
"spiegel.de": "Spiegel",
"zeit.de": "Zeit",
"newsfeed.zeit.de": "Zeit",
"faz.net": "FAZ",
"sueddeutsche.de": "Süddeutsche Zeitung",
"rss.sueddeutsche.de": "Süddeutsche Zeitung",
"nzz.ch": "NZZ",
"dw.com": "Deutsche Welle",
"rss.dw.com": "Deutsche Welle",
"reuters.com": "Reuters",
"reutersagency.com": "Reuters",
"rsshub.app": "RSSHub",
"apnews.com": "AP News",
"bbc.com": "BBC",
"bbc.co.uk": "BBC",
"feeds.bbci.co.uk": "BBC",
"aljazeera.com": "Al Jazeera",
"france24.com": "France24",
"theguardian.com": "The Guardian",
"nytimes.com": "New York Times",
"washingtonpost.com": "Washington Post",
"cnn.com": "CNN",
"bmi.bund.de": "BMI",
"europol.europa.eu": "Europol",
"handelsblatt.com": "Handelsblatt",
"wiwo.de": "WirtschaftsWoche",
"heise.de": "Heise Online",
"golem.de": "Golem",
"netzpolitik.org": "netzpolitik.org",
"t3n.de": "t3n",
"welt.de": "Welt",
"tagesspiegel.de": "Tagesspiegel",
"stern.de": "Stern",
"focus.de": "Focus",
"n-tv.de": "n-tv",
"bild.de": "BILD",
"tarnkappe.info": "Tarnkappe",
"bleepingcomputer.com": "BleepingComputer",
"techcrunch.com": "TechCrunch",
"theverge.com": "The Verge",
"wired.com": "WIRED",
"tomshardware.com": "Tom's Hardware",
"finanzen.net": "Finanzen.net",
"404media.co": "404 Media",
"medium.com": "Medium",
"swp-berlin.org": "SWP Berlin",
"dgap.org": "DGAP",
"brookings.edu": "Brookings",
"rand.org": "RAND",
"lemonde.fr": "Le Monde",
"elpais.com": "El País",
"orf.at": "ORF",
"srf.ch": "SRF",
"br.de": "BR",
"ndr.de": "NDR",
"wdr.de": "WDR",
"mdr.de": "MDR",
"swr.de": "SWR",
"hr.de": "hr",
"rbb24.de": "rbb24",
"fr.de": "Frankfurter Rundschau",
"rp-online.de": "Rheinische Post",
"ksta.de": "Kölner Stadt-Anzeiger",
"berliner-zeitung.de": "Berliner Zeitung",
"stuttgarter-zeitung.de": "Stuttgarter Zeitung",
"hamburger-abendblatt.de": "Hamburger Abendblatt",
"merkur.de": "Münchner Merkur",
"bsi.bund.de": "BSI",
"bpb.de": "bpb",
"bka.de": "BKA",
"verfassungsschutz.de": "Verfassungsschutz",
"bashinho.de": "Bashinho",
}
def domain_to_display_name(domain: str) -> str:
"""Wandelt eine Domain in einen lesbaren Anzeigenamen um.
Prüft erst die bekannte Zuordnung, dann leitet einen sinnvollen
Namen aus der Domain ab (erster Teil, kapitalisiert).
"""
if domain in DOMAIN_DISPLAY_NAMES:
return DOMAIN_DISPLAY_NAMES[domain]
# Subdomain-Match: feeds.reuters.com -> reuters.com
parts = domain.split(".")
if len(parts) > 2:
parent = ".".join(parts[-2:])
if parent in DOMAIN_DISPLAY_NAMES:
return DOMAIN_DISPLAY_NAMES[parent]
# Fallback: Domain-Kern extrahieren und kapitalisieren
# z.B. "example-news.de" → "Example News"
core = parts[-2] if len(parts) >= 2 else parts[0]
return core.replace("-", " ").title()
def _compute_content_hash(entries: list) -> str:
"""Berechnet einen Fingerprint aus den ersten 5 Entry-Titeln eines Feeds."""
titles = [e.get("title", "") for e in entries[:5]]
combined = "|".join(titles).strip()
if not combined:
return ""
return hashlib.sha256(combined.encode("utf-8")).hexdigest()[:16]
async def _validate_feed(client: httpx.AsyncClient, url: str) -> dict | None:
"""Prüft ob eine URL ein gültiger RSS/Atom-Feed ist. Gibt Feed-Info zurück oder None."""
try:
resp = await client.get(url)
if resp.status_code != 200:
return None
content_type = resp.headers.get("content-type", "")
text = resp.text[:10000] # Nur Anfang prüfen
# Muss XML-artig sein
if "<rss" not in text and "<feed" not in text and "<channel" not in text:
return None
feed = await asyncio.to_thread(feedparser.parse, text)
if feed.get("bozo") and not feed.entries:
return None
if feed.feed.get("title") or feed.entries:
content_hash = _compute_content_hash(feed.entries)
return {
"url": str(resp.url), # Finale URL nach Redirects
"title": feed.feed.get("title", ""),
"content_hash": content_hash,
}
except Exception:
pass
return None
async def discover_source(url: str) -> dict:
"""Erkennt RSS-Feed, Name, Domain und Kategorie einer URL automatisch.
Returns:
dict mit: name, domain, rss_url, category, source_type
"""
url = _normalize_url(url)
domain = _extract_domain(url)
category = _detect_category(domain)
result = {
"name": domain_to_display_name(domain),
"domain": domain,
"rss_url": None,
"category": category,
"source_type": "web_source",
}
async with httpx.AsyncClient(
timeout=12.0,
follow_redirects=True,
headers={"User-Agent": "Mozilla/5.0 (compatible; OSINT-Monitor/1.0)"},
) as client:
# 1. Seite abrufen und nach RSS-Links suchen
page_title = None
try:
resp = await client.get(url)
if resp.status_code == 200:
html = resp.text[:50000]
# <title> extrahieren
title_match = re.search(r"<title[^>]*>([^<]+)</title>", html, re.IGNORECASE)
if title_match:
page_title = title_match.group(1).strip()
# RSS/Atom Link-Tags suchen
feed_links = re.findall(
r'<link[^>]+type=["\']application/(rss|atom)\+xml["\'][^>]*>',
html,
re.IGNORECASE,
)
# Auch umgekehrte Attribut-Reihenfolge
feed_links += re.findall(
r'<link[^>]+href=["\']([^"\']+)["\'][^>]+type=["\']application/(rss|atom)\+xml["\'][^>]*>',
html,
re.IGNORECASE,
)
# href aus den gefundenen Tags extrahieren
feed_urls = []
for tag in re.finditer(
r'<link[^>]+type=["\']application/(?:rss|atom)\+xml["\'][^>]*>',
html,
re.IGNORECASE,
):
href_match = re.search(r'href=["\']([^"\']+)["\']', tag.group(0))
if href_match:
href = href_match.group(1)
# Relative URLs auflösen
if href.startswith("/"):
parsed = urlparse(url)
href = f"{parsed.scheme}://{parsed.netloc}{href}"
elif not href.startswith("http"):
href = url.rstrip("/") + "/" + href
feed_urls.append(href)
# Gefundene Feed-URLs validieren
for feed_url in feed_urls:
feed_info = await _validate_feed(client, feed_url)
if feed_info:
result["rss_url"] = feed_info["url"]
result["source_type"] = "rss_feed"
if feed_info["title"]:
result["name"] = feed_info["title"]
elif page_title:
result["name"] = page_title
return result
except Exception as e:
logger.debug(f"Fehler beim Abrufen von {url}: {e}")
# 2. Bekannte Feed-Pfade durchprobieren
parsed = urlparse(url)
base_url = f"{parsed.scheme}://{parsed.netloc}"
for path in _FEED_PATHS:
feed_url = base_url + path
feed_info = await _validate_feed(client, feed_url)
if feed_info:
result["rss_url"] = feed_info["url"]
result["source_type"] = "rss_feed"
if feed_info["title"]:
result["name"] = feed_info["title"]
elif page_title:
result["name"] = page_title
return result
# Kein Feed gefunden — Name aus Seitentitel
if page_title:
result["name"] = page_title
return result
async def discover_all_feeds(url: str) -> dict:
"""Findet ALLE RSS/Atom-Feeds einer Domain.
Returns:
dict mit: domain, category, page_title, feeds: [{"url", "title"}, ...]
"""
url = _normalize_url(url)
domain = _extract_domain(url)
category = _detect_category(domain)
result = {
"domain": domain,
"category": category,
"page_title": None,
"feeds": [],
}
seen_urls = set()
seen_content_hashes = set()
async with httpx.AsyncClient(
timeout=15.0,
follow_redirects=True,
headers={"User-Agent": "Mozilla/5.0 (compatible; OSINT-Monitor/1.0)"},
) as client:
# 1. HTML-Seite abrufen und ALLE RSS-Link-Tags sammeln
candidate_urls = []
try:
resp = await client.get(url)
if resp.status_code == 200:
html = resp.text[:100000]
title_match = re.search(r"<title[^>]*>([^<]+)</title>", html, re.IGNORECASE)
if title_match:
result["page_title"] = title_match.group(1).strip()
parsed = urlparse(url)
base = f"{parsed.scheme}://{parsed.netloc}"
for tag in re.finditer(
r'<link[^>]+type=["\']application/(?:rss|atom)\+xml["\'][^>]*>',
html,
re.IGNORECASE,
):
href_match = re.search(r'href=["\']([^"\']+)["\']', tag.group(0))
if href_match:
href = href_match.group(1)
if href.startswith("/"):
href = base + href
elif not href.startswith("http"):
href = url.rstrip("/") + "/" + href
candidate_urls.append(href)
except Exception as e:
logger.debug(f"Fehler beim Abrufen von {url}: {e}")
# 2. Bekannte Feed-Pfade hinzufügen (Standard + Nachrichten-spezifisch)
parsed = urlparse(url)
base_url = f"{parsed.scheme}://{parsed.netloc}"
for path in _FEED_PATHS + _NEWS_FEED_PATHS:
candidate_urls.append(base_url + path)
# 2b. Bekannte Feed-URLs für Domains mit separater Feed-Subdomain (z.B. BBC)
extra_urls = _get_extra_feed_urls(domain)
candidate_urls.extend(extra_urls)
# 3. Alle Kandidaten parallel validieren (in Batches von 10)
async def _validate_and_collect(feed_url: str):
try:
return await _validate_feed(client, feed_url)
except Exception:
return None
for i in range(0, len(candidate_urls), 10):
batch = candidate_urls[i:i + 10]
results = await asyncio.gather(*[_validate_and_collect(u) for u in batch])
for feed_info in results:
if not feed_info:
continue
if feed_info["url"] in seen_urls:
continue
# Content-Hash Duplikat-Erkennung (gleicher Inhalt = WordPress-Redirect etc.)
content_hash = feed_info.get("content_hash", "")
if content_hash and content_hash in seen_content_hashes:
logger.debug(f"Content-Hash Duplikat übersprungen: {feed_info['url']}")
continue
seen_urls.add(feed_info["url"])
if content_hash:
seen_content_hashes.add(content_hash)
result["feeds"].append(feed_info)
logger.info(f"discover_all_feeds({domain}): {len(result['feeds'])} Feeds gefunden")
return result
async def evaluate_feeds_with_claude(domain: str, feeds: list[dict]) -> list[dict]:
"""Lässt Claude die OSINT-Relevanz der Feeds bewerten.
Args:
domain: Domain-Name
feeds: Liste von {"url", "title"} Dicts
Returns:
Liste von {"url", "title", "name"} Dicts (nur relevante Feeds)
"""
if not feeds:
return []
feed_list = "\n".join(
f" {i+1}. {f['title'] or f['url']}{f['url']}"
for i, f in enumerate(feeds)
)
prompt = f"""Du bist ein OSINT-Analyst. Bewerte diese RSS-Feeds der Domain "{domain}" nach OSINT-Relevanz.
OSINT-relevante Themen: Politik, Sicherheit, Wirtschaft, Internationale Beziehungen, Verteidigung, Konflikte, Terrorismus, Cybersecurity, Umweltkatastrophen, Technologie, Wissenschaft, Nachrichten allgemein.
NICHT relevant: Sport, Lifestyle, Rezepte, Unterhaltung, Reisen, Mode, Kultur/Kunst, Wetter, Kreuzworträtsel, Podcasts (allgemein), Leserbriefe, Kommentare/Meinung.
Feeds:
{feed_list}
Antworte AUSSCHLIESSLICH mit einem JSON-Array. Jedes Element:
{{"index": <1-basiert>, "relevant": true/false, "name": "<Anzeigename für OSINT-Monitor, z.B. 'Guardian World' oder 'Spiegel Politik'>"}}
Nur das JSON-Array, kein anderer Text."""
try:
cmd = [
CLAUDE_PATH,
"-p", prompt,
"--output-format", "text",
]
process = await asyncio.create_subprocess_exec(
*cmd,
stdout=asyncio.subprocess.PIPE,
stderr=asyncio.subprocess.PIPE,
env={"PATH": "/usr/local/bin:/usr/bin:/bin", "HOME": "/home/claude-dev"},
)
try:
stdout, stderr = await asyncio.wait_for(
process.communicate(), timeout=min(CLAUDE_TIMEOUT, 120)
)
except asyncio.TimeoutError:
process.kill()
logger.warning(f"Claude-Bewertung Timeout für {domain}, nutze Fallback")
return _fallback_all_feeds(domain, feeds)
if process.returncode != 0:
logger.warning(f"Claude-Bewertung fehlgeschlagen für {domain}, nutze Fallback")
return _fallback_all_feeds(domain, feeds)
response = stdout.decode("utf-8", errors="replace").strip()
# JSON aus Antwort extrahieren (Claude gibt manchmal Markdown-Blöcke zurück)
json_match = re.search(r'\[.*\]', response, re.DOTALL)
if not json_match:
logger.warning(f"Kein JSON in Claude-Antwort für {domain}, nutze Fallback")
return _fallback_all_feeds(domain, feeds)
evaluations = json.loads(json_match.group(0))
relevant = []
for ev in evaluations:
idx = ev.get("index", 0) - 1
if ev.get("relevant") and 0 <= idx < len(feeds):
feed = feeds[idx]
relevant.append({
"url": feed["url"],
"title": feed["title"],
"name": ev.get("name", feed["title"] or domain),
})
logger.info(f"Claude-Bewertung für {domain}: {len(relevant)}/{len(feeds)} relevant")
return relevant
except json.JSONDecodeError:
logger.warning(f"JSON-Parse-Fehler bei Claude-Antwort für {domain}, nutze Fallback")
return _fallback_all_feeds(domain, feeds)
except Exception as e:
logger.warning(f"Claude-Bewertung Fehler für {domain}: {e}, nutze Fallback")
return _fallback_all_feeds(domain, feeds)
def _fallback_all_feeds(domain: str, feeds: list[dict]) -> list[dict]:
"""Fallback: Alle Feeds übernehmen mit Feed-Titel als Name."""
return [
{
"url": f["url"],
"title": f["title"],
"name": f["title"] or domain,
}
for f in feeds
]
async def get_feeds_with_metadata(tenant_id: int = None, source_type: str = "rss_feed") -> list[dict]:
"""Aktive Feeds eines bestimmten Typs mit Metadaten fuer Claude-Selektion (global + org-spezifisch).
source_type: "rss_feed" (Default) oder "podcast_feed" — trennt RSS- und Podcast-Quellen
in getrennten Pipelines, damit der RSS-Heisspfad unveraendert bleibt.
"""
from database import get_db
db = await get_db()
try:
if tenant_id:
cursor = await db.execute(
"SELECT name, url, domain, category, notes, COALESCE(article_count, 0) AS article_count FROM sources "
"WHERE source_type = ? AND status = 'active' "
"AND (tenant_id IS NULL OR tenant_id = ?)",
(source_type, tenant_id),
)
else:
cursor = await db.execute(
"SELECT name, url, domain, category, notes, COALESCE(article_count, 0) AS article_count FROM sources "
"WHERE source_type = ? AND status = 'active'",
(source_type,),
)
return [dict(row) for row in await cursor.fetchall()]
except Exception as e:
logger.error(f"Fehler beim Laden der Feed-Metadaten ({source_type}): {e}")
return []
finally:
await db.close()
async def get_user_excluded_domains(user_id: int) -> list[str]:
"""Laedt die vom User ausgeschlossenen Domains."""
from database import get_db
db = await get_db()
try:
cursor = await db.execute(
"SELECT domain FROM user_excluded_domains WHERE user_id = ?",
(user_id,),
)
return [row[0] for row in await cursor.fetchall()]
except Exception as e:
logger.warning(f"Fehler beim Laden der User-Ausschluesse: {e}")
return []
finally:
await db.close()
async def get_source_rules(tenant_id: int = None) -> dict:
"""Liest Quellen-Konfiguration aus DB (global + org-spezifisch).
Returns:
dict mit:
- excluded_domains: Liste ausgeschlossener Domains
- rss_feeds: Dict mit Kategorien deutsch/international/behoerden
"""
from database import get_db
db = await get_db()
try:
if tenant_id:
cursor = await db.execute(
"SELECT * FROM sources WHERE status = 'active' AND (tenant_id IS NULL OR tenant_id = ?)",
(tenant_id,),
)
else:
cursor = await db.execute(
"SELECT * FROM sources WHERE status = 'active'"
)
sources = [dict(row) for row in await cursor.fetchall()]
excluded_domains = []
rss_feeds = {"deutsch": [], "international": [], "behoerden": []}
for source in sources:
if source["source_type"] == "excluded":
excluded_domains.append(source["domain"] or source["name"])
elif source["source_type"] == "rss_feed" and source["url"]:
feed_entry = {"name": source["name"], "url": source["url"]}
cat = source["category"]
if cat == "behoerde":
rss_feeds["behoerden"].append(feed_entry)
elif cat == "international":
rss_feeds["international"].append(feed_entry)
else:
# Alle anderen Kategorien → deutsch
rss_feeds["deutsch"].append(feed_entry)
return {
"excluded_domains": excluded_domains,
"rss_feeds": rss_feeds,
}
except Exception as e:
logger.error(f"Fehler beim Laden der Quellen-Regeln: {e}")
# Fallback auf config.py
from config import RSS_FEEDS, EXCLUDED_SOURCES
return {
"excluded_domains": list(EXCLUDED_SOURCES),
"rss_feeds": dict(RSS_FEEDS),
}
finally:
await db.close()

74
src/source_meta.py Normale Datei
Datei anzeigen

@@ -0,0 +1,74 @@
"""Single Source of Truth für Quellen-Kategorien und -Typen.
Wird vom Backend über GET /api/sources/meta exportiert.
Frontend (sources.js, source-health.js, dashboard.html) lädt diese
beim Init und befüllt damit Filter-Dropdowns und Label-Lookups.
"""
from typing import TypedDict
class CategoryEntry(TypedDict):
key: str
label: str
class TypeEntry(TypedDict):
key: str
label: str
SOURCE_CATEGORIES: list[CategoryEntry] = [
{"key": "nachrichtenagentur", "label": "Nachrichtenagentur"},
{"key": "oeffentlich-rechtlich", "label": "Öffentlich-Rechtlich"},
{"key": "qualitaetszeitung", "label": "Qualitätszeitung"},
{"key": "behoerde", "label": "Behörde"},
{"key": "fachmedien", "label": "Fachmedien"},
{"key": "think-tank", "label": "Think-Tank"},
{"key": "international", "label": "International"},
{"key": "regional", "label": "Regional"},
{"key": "boulevard", "label": "Boulevard"},
{"key": "stimmungsbild", "label": "Forum / Stimmungsbild"},
{"key": "sonstige", "label": "Sonstige"},
{"key": "cybercrime", "label": "Cybercrime / Hacktivismus"},
{"key": "cybercrime-leaks", "label": "Cybercrime / Leaks"},
{"key": "ukraine-russland-krieg", "label": "Ukraine-Russland-Krieg"},
{"key": "irankonflikt", "label": "Irankonflikt"},
{"key": "osint-international", "label": "OSINT International"},
{"key": "extremismus-deutschland", "label": "Extremismus Deutschland"},
{"key": "russische-staatspropaganda", "label": "Russische Staatspropaganda"},
{"key": "russische-opposition", "label": "Russische Opposition / Exilmedien"},
{"key": "syrien-nahost", "label": "Syrien / Nahost"},
]
SOURCE_TYPES: list[TypeEntry] = [
{"key": "rss_feed", "label": "RSS-Feed"},
{"key": "web_source", "label": "Webquelle"},
{"key": "telegram_channel", "label": "Telegram-Kanal"},
{"key": "podcast_feed", "label": "Podcast-Feed"},
{"key": "excluded", "label": "Ausgeschlossen"},
]
def get_meta() -> dict:
"""Vollständige Meta-Information für Frontend-Konsumenten."""
return {
"categories": SOURCE_CATEGORIES,
"types": SOURCE_TYPES,
}
def category_label(key: str) -> str:
"""Lookup: Kategorie-Key -> Label. Fallback: Key selbst."""
for c in SOURCE_CATEGORIES:
if c["key"] == key:
return c["label"]
return key
def type_label(key: str) -> str:
"""Lookup: Typ-Key -> Label. Fallback: Key selbst."""
for t in SOURCE_TYPES:
if t["key"] == key:
return t["label"]
return key

Datei anzeigen

@@ -790,3 +790,334 @@ tr:hover td {
.audit-diff .diff-new { color: #2ecc71; word-break: break-word; } .audit-diff .diff-new { color: #2ecc71; word-break: break-word; }
.token-budget-bar.over-limit { background: repeating-linear-gradient(45deg, #c0392b, #c0392b 6px, #962d22 6px, #962d22 12px); } .token-budget-bar.over-limit { background: repeating-linear-gradient(45deg, #c0392b, #c0392b 6px, #962d22 6px, #962d22 12px); }
input[type="date"].filter-select { padding: 6px 10px; } input[type="date"].filter-select { padding: 6px 10px; }
/* === Toast-Notifications (Phase 3) === */
.toast-container {
position: fixed;
top: 24px;
right: 24px;
z-index: 9999;
display: flex;
flex-direction: column;
gap: 10px;
max-width: 380px;
pointer-events: none;
}
.toast {
background: #1e293b;
border: 1px solid #334155;
border-left-width: 4px;
border-radius: 8px;
padding: 12px 16px;
color: #e2e8f0;
font-size: 14px;
line-height: 1.4;
box-shadow: 0 8px 24px rgba(0, 0, 0, 0.3);
pointer-events: auto;
animation: toast-in 0.18s ease-out;
}
.toast.toast-out {
animation: toast-out 0.18s ease-in forwards;
}
.toast-info { border-left-color: #3b82f6; }
.toast-success { border-left-color: #10b981; }
.toast-warning { border-left-color: #f59e0b; }
.toast-error { border-left-color: #ef4444; }
@keyframes toast-in {
from { opacity: 0; transform: translateX(20px); }
to { opacity: 1; transform: translateX(0); }
}
@keyframes toast-out {
from { opacity: 1; transform: translateX(0); }
to { opacity: 0; transform: translateX(20px); }
}
/* === Sources Stats-Bar (Phase 4) === */
.sources-stats-bar {
display: flex;
flex-wrap: wrap;
gap: 14px;
padding: 12px 16px;
background: rgba(255, 255, 255, 0.03);
border: 1px solid rgba(255, 255, 255, 0.08);
border-radius: 8px;
margin-bottom: 14px;
font-size: 13px;
}
.sources-stat-item {
display: inline-flex;
align-items: baseline;
gap: 6px;
color: #94a3b8;
}
.sources-stat-value {
color: #f0b429;
font-weight: 600;
font-size: 15px;
}
.sources-stat-item.health-error .sources-stat-value { color: #ef4444; }
.sources-stat-item.health-warning .sources-stat-value { color: #f59e0b; }
.sources-stat-item.health-ok .sources-stat-value { color: #10b981; }
/* Health-Badge inline in Tabellenzeile */
.health-badge {
display: inline-block;
padding: 2px 8px;
border-radius: 10px;
font-size: 11px;
font-weight: 600;
}
.health-badge-error { background: rgba(239, 68, 68, 0.15); color: #ef4444; }
.health-badge-warning { background: rgba(245, 158, 11, 0.15); color: #f59e0b; }
.health-badge-ok { background: rgba(16, 185, 129, 0.15); color: #10b981; }
.health-badge-unknown { background: rgba(148, 163, 184, 0.15); color: #94a3b8; }
/* === Audit-Spur (Phase 5) === */
.modal.modal-large {
max-width: 720px;
}
.audit-content {
max-height: 60vh;
overflow-y: auto;
}
.audit-entry {
border: 1px solid rgba(255, 255, 255, 0.08);
border-radius: 8px;
padding: 10px 12px;
margin-bottom: 8px;
background: rgba(255, 255, 255, 0.02);
}
.audit-entry-head {
display: flex;
align-items: center;
gap: 10px;
flex-wrap: wrap;
font-size: 13px;
}
.audit-entry-action {
display: inline-block;
padding: 2px 8px;
border-radius: 10px;
font-size: 11px;
font-weight: 600;
text-transform: uppercase;
letter-spacing: 0.5px;
}
.audit-action-create { background: rgba(16, 185, 129, 0.2); color: #10b981; }
.audit-action-update { background: rgba(59, 130, 246, 0.2); color: #3b82f6; }
.audit-action-delete { background: rgba(239, 68, 68, 0.2); color: #ef4444; }
.audit-action-login_success { background: rgba(16, 185, 129, 0.15); color: #10b981; }
.audit-action-login_failed { background: rgba(245, 158, 11, 0.15); color: #f59e0b; }
.audit-action-login_blocked { background: rgba(239, 68, 68, 0.2); color: #ef4444; }
.audit-entry-meta {
color: #94a3b8;
font-size: 12px;
}
.audit-entry-detail {
margin-top: 8px;
}
.audit-entry-detail summary {
cursor: pointer;
font-size: 12px;
color: #94a3b8;
user-select: none;
}
.audit-entry-detail pre {
margin: 6px 0 0 0;
padding: 8px 10px;
background: rgba(0, 0, 0, 0.3);
border-radius: 4px;
font-size: 11px;
line-height: 1.4;
overflow-x: auto;
color: #e2e8f0;
max-height: 240px;
}
/* === Verwendungs-Sicht (Phase 6) === */
.activity-cell {
font-variant-numeric: tabular-nums;
color: #94a3b8;
font-size: 12px;
}
.activity-cell strong {
color: #e2e8f0;
font-weight: 600;
}
.activity-cell.activity-zero {
color: #475569;
}
.exclude-badge {
display: inline-block;
padding: 2px 8px;
border-radius: 10px;
font-size: 11px;
font-weight: 600;
background: rgba(239, 68, 68, 0.15);
color: #ef4444;
}
.exclude-badge.exclude-zero {
background: transparent;
color: #475569;
font-weight: 400;
}
/* === Klassifikations-Review === */
.sources-tab-badge {
display: inline-flex;
align-items: center;
justify-content: center;
min-width: 20px;
padding: 0 6px;
height: 18px;
border-radius: 9px;
background: var(--accent);
color: var(--bg-primary);
font-size: 10px;
font-weight: 700;
}
.review-toolbar {
display: flex;
align-items: center;
justify-content: space-between;
padding: 10px 14px;
background: var(--bg-secondary);
border: 1px solid var(--border);
border-radius: var(--radius);
margin-bottom: 12px;
flex-wrap: wrap;
gap: 12px;
}
.review-toolbar-info {
display: flex;
align-items: center;
gap: 16px;
font-size: 13px;
color: var(--text-primary);
}
.review-conf-filter {
display: inline-flex;
align-items: center;
gap: 6px;
font-size: 12px;
color: var(--text-secondary);
}
.review-toolbar-actions { display: flex; gap: 6px; }
.review-list { display: flex; flex-direction: column; gap: 8px; }
.review-card {
background: var(--bg-secondary);
border: 1px solid var(--border);
border-radius: var(--radius);
padding: 12px 14px;
}
.review-card-header {
display: flex;
justify-content: space-between;
align-items: flex-start;
gap: 12px;
margin-bottom: 10px;
}
.review-card-title {
display: flex;
flex-wrap: wrap;
align-items: center;
gap: 8px;
}
.review-card-name { font-weight: 600; font-size: 14px; color: var(--text-primary); }
.review-card-domain { font-size: 11px; color: var(--text-muted); }
.review-global-badge {
display: inline-flex;
align-items: center;
padding: 1px 6px;
border-radius: var(--radius);
background: #5e35b1;
color: #fff;
font-size: 9px;
font-weight: 600;
letter-spacing: 0.3px;
text-transform: uppercase;
}
.review-card-confidence {
display: inline-flex;
flex-direction: column;
align-items: center;
padding: 4px 10px;
border-radius: var(--radius);
min-width: 60px;
}
.review-card-confidence .conf-value { font-size: 14px; font-weight: 700; }
.review-card-confidence .conf-label { font-size: 9px; text-transform: uppercase; letter-spacing: 0.3px; opacity: 0.8; }
.review-card-confidence.conf-high { background: rgba(34,197,94,0.15); color: var(--success); }
.review-card-confidence.conf-medium { background: rgba(245,158,11,0.15); color: var(--warning); }
.review-card-confidence.conf-low { background: rgba(239,68,68,0.15); color: var(--danger); }
.review-card-diff {
display: grid;
grid-template-columns: 1fr;
gap: 4px;
font-size: 12px;
margin-bottom: 10px;
}
.review-diff-row {
display: grid;
grid-template-columns: 130px 1fr 24px 1fr;
align-items: center;
gap: 8px;
padding: 3px 6px;
border-radius: 3px;
}
.review-diff-row.changed { background: rgba(245,158,11,0.10); }
.review-diff-label { color: var(--text-secondary); font-weight: 500; }
.review-diff-current { color: var(--text-muted); }
.review-diff-arrow { text-align: center; color: var(--text-muted); font-weight: 600; }
.review-diff-proposed { color: var(--text-primary); font-weight: 500; }
.review-diff-row.changed .review-diff-proposed { color: var(--warning); font-weight: 600; }
.review-card-reasoning {
font-size: 12px;
color: var(--text-secondary);
background: var(--bg-tertiary);
padding: 8px 10px;
border-radius: var(--radius);
margin-bottom: 10px;
line-height: 1.5;
}
.review-card-actions { display: flex; gap: 6px; flex-wrap: wrap; }
/* Edit-Form: Klassifikations-Sektion */
.sources-classification-section {
margin-top: 14px;
padding-top: 14px;
border-top: 1px solid var(--border);
}
.sources-classification-header {
font-size: 12px;
font-weight: 600;
color: var(--text-secondary);
margin-bottom: 10px;
letter-spacing: 0.3px;
text-transform: uppercase;
}
.alignment-chips { display: flex; flex-wrap: wrap; gap: 6px; }
.alignment-chip {
display: inline-flex;
align-items: center;
padding: 4px 10px;
border-radius: 999px;
font-size: 11px;
font-weight: 500;
background: transparent;
color: var(--text-secondary);
border: 1px solid var(--border);
cursor: pointer;
transition: all 0.12s ease;
}
.alignment-chip:hover { background: var(--bg-tertiary); color: var(--text-primary); }
.alignment-chip.active {
background: var(--accent);
color: var(--bg-primary);
border-color: var(--accent);
}

Datei anzeigen

@@ -6,7 +6,7 @@
<title>AegisSight Monitor-Verwaltung</title> <title>AegisSight Monitor-Verwaltung</title>
<link rel="icon" type="image/svg+xml" href="/static/favicon.svg"> <link rel="icon" type="image/svg+xml" href="/static/favicon.svg">
<link rel="apple-touch-icon" href="/static/favicon.svg"> <link rel="apple-touch-icon" href="/static/favicon.svg">
<link rel="stylesheet" href="/static/css/style.css"> <link rel="stylesheet" href="/static/css/style.css?v=20260509d">
<style> <style>
.source-badge { display:inline-block; padding:2px 8px; border-radius:4px; font-size:12px; font-weight:600; } .source-badge { display:inline-block; padding:2px 8px; border-radius:4px; font-size:12px; font-weight:600; }
@@ -59,6 +59,32 @@
</div> </div>
</div> </div>
</div> </div>
<!-- Artikel-Übersetzung -->
<div class="card" id="translationCard" style="margin-top:16px;">
<div class="card-header">
<h2>Artikel-Übersetzung</h2>
</div>
<div class="card-body">
<p class="text-muted" style="margin-top:0;">
Die automatische Übersetzung im Monitor ist deaktiviert. Hier lassen sich
fremdsprachige Artikel ohne deutsche Fassung manuell übersetzen.
</p>
<p id="translationInfo" style="margin:12px 0;">Status wird geladen…</p>
<div id="translationProgressWrap" style="display:none; margin:12px 0;">
<div style="background:rgba(128,128,128,0.25); border-radius:6px; height:14px; overflow:hidden;">
<div id="translationProgressBar" style="background:#1565c0; height:100%; width:0%; transition:width .3s;"></div>
</div>
<p class="text-muted" id="translationProgressText" style="margin:6px 0 0;"></p>
</div>
<div style="margin-top:12px; display:flex; gap:8px;">
<button class="btn btn-primary" id="translationRunBtn">Übersetzung starten</button>
<button class="btn btn-danger" id="translationCancelBtn" style="display:none;">Abbrechen</button>
</div>
</div>
</div>
</div> </div>
<!-- Organizations Section --> <!-- Organizations Section -->
@@ -166,6 +192,14 @@
<option value="false">Deaktiviert</option> <option value="false">Deaktiviert</option>
</select> </select>
</div> </div>
<div class="form-group">
<label for="editOrgLanguage">Pipeline-Sprache</label>
<select id="editOrgLanguage">
<option value="de">Deutsch</option>
<option value="en">English</option>
</select>
<small class="text-secondary">Bestimmt die Ausgabesprache der KI (Lagebild, Faktencheck, Recherche) und der sichtbarsten UI-Elemente fuer alle Nutzer dieser Organisation.</small>
</div>
<div style="display: flex; gap: 8px; margin-top: 16px;"> <div style="display: flex; gap: 8px; margin-top: 16px;">
<button type="submit" class="btn btn-primary">Speichern</button> <button type="submit" class="btn btn-primary">Speichern</button>
<button type="button" class="btn btn-danger" id="deleteOrgBtn">Organisation löschen</button> <button type="button" class="btn btn-danger" id="deleteOrgBtn">Organisation löschen</button>
@@ -294,50 +328,34 @@
<button class="nav-tab active" data-subtab="global-sources">Grundquellen</button> <button class="nav-tab active" data-subtab="global-sources">Grundquellen</button>
<button class="nav-tab" data-subtab="tenant-sources">Kundenquellen</button> <button class="nav-tab" data-subtab="tenant-sources">Kundenquellen</button>
<button class="nav-tab" data-subtab="source-health">Quellen-Health</button> <button class="nav-tab" data-subtab="source-health">Quellen-Health</button>
<button class="nav-tab" data-subtab="classification-review">Klassifikation <span class="sources-tab-badge" id="classificationPendingBadge">0</span></button>
<button class="nav-tab" data-subtab="x-scraper">X-Recherche-Konten</button>
</div> </div>
<!-- Grundquellen --> <!-- Grundquellen -->
<div class="section active" id="sub-global-sources"> <div class="section active" id="sub-global-sources">
<div class="sources-stats-bar" id="globalStatsBar"></div>
<div class="action-bar"> <div class="action-bar">
<div style="display:flex;align-items:center;gap:12px;flex-wrap:wrap;"> <div style="display:flex;align-items:center;gap:12px;flex-wrap:wrap;">
<input type="text" class="search-input" id="globalSourceSearch" placeholder="Grundquelle suchen..."> <input type="text" class="search-input" id="globalSourceSearch" placeholder="Grundquelle suchen...">
<select class="filter-select" id="globalFilterType" onchange="filterGlobalSources()"> <select class="filter-select" id="globalFilterType" onchange="filterGlobalSources()">
<option value="">Alle Typen</option> <option value="">Alle Typen</option>
<option value="rss_feed">RSS-Feed</option>
<option value="web_source">Webquelle</option>
<option value="telegram_channel">Telegram-Kanal</option>
<option value="podcast_feed">Podcast-Feed</option>
</select> </select>
<select class="filter-select" id="globalFilterCategory" onchange="filterGlobalSources()"> <select class="filter-select" id="globalFilterCategory" onchange="filterGlobalSources()">
<option value="">Alle Kategorien</option> <option value="">Alle Kategorien</option>
<option value="nachrichtenagentur">Nachrichtenagentur</option>
<option value="oeffentlich-rechtlich">Öffentlich-Rechtlich</option>
<option value="qualitaetszeitung">Qualitätszeitung</option>
<option value="behoerde">Behörde</option>
<option value="fachmedien">Fachmedien</option>
<option value="think-tank">Think-Tank</option>
<option value="international">International</option>
<option value="regional">Regional</option>
<option value="boulevard">Boulevard</option>
<option value="sonstige">Sonstige</option>
<option value="cybercrime">Cybercrime / Hacktivismus</option>
<option value="cybercrime-leaks">Cybercrime / Leaks</option>
<option value="ukraine-russland-krieg">Ukraine-Russland-Krieg</option>
<option value="irankonflikt">Irankonflikt</option>
<option value="osint-international">OSINT International</option>
<option value="extremismus-deutschland">Extremismus Deutschland</option>
<option value="russische-staatspropaganda">Russische Staatspropaganda</option>
<option value="russische-opposition">Russische Opposition / Exilmedien</option>
<option value="syrien-nahost">Syrien / Nahost</option>
</select> </select>
<select class="filter-select" id="globalFilterStatus" onchange="filterGlobalSources()"> <select class="filter-select" id="globalFilterStatus" onchange="filterGlobalSources()">
<option value="">Alle Status</option> <option value="">Alle Status</option>
<option value="active">Aktiv</option> <option value="active">Aktiv</option>
<option value="inactive">Inaktiv</option> <option value="inactive">Inaktiv</option>
</select> </select>
<select class="filter-select" id="globalFilterLanguage" onchange="filterGlobalSources()">
<option value="">Alle Sprachen</option>
</select>
<span class="text-secondary" id="globalSourceCount"></span> <span class="text-secondary" id="globalSourceCount"></span>
</div> </div>
<button class="btn btn-secondary" id="discoverSourceBtn">Erkennen</button> <button class="btn btn-secondary" id="discoverSourceBtn">Erkennen</button>
<button class="btn btn-secondary" id="newPdfSourceBtn" style="margin-right:8px;">+ PDF hochladen</button>
<button class="btn btn-primary" id="newGlobalSourceBtn">+ Neue Grundquelle</button> <button class="btn btn-primary" id="newGlobalSourceBtn">+ Neue Grundquelle</button>
</div> </div>
<div class="card"> <div class="card">
@@ -350,6 +368,12 @@
<th class="sortable" data-sort="domain" onclick="sortGlobalSources('domain')">Domain <span class="sort-icon"></span></th> <th class="sortable" data-sort="domain" onclick="sortGlobalSources('domain')">Domain <span class="sort-icon"></span></th>
<th class="sortable" data-sort="source_type" onclick="sortGlobalSources('source_type')">Typ <span class="sort-icon"></span></th> <th class="sortable" data-sort="source_type" onclick="sortGlobalSources('source_type')">Typ <span class="sort-icon"></span></th>
<th class="sortable" data-sort="article_count" onclick="sortGlobalSources('article_count')">Artikel <span class="sort-icon"></span></th> <th class="sortable" data-sort="article_count" onclick="sortGlobalSources('article_count')">Artikel <span class="sort-icon"></span></th>
<th class="sortable" data-sort="articles_30d" onclick="sortGlobalSources('articles_30d')">Aktivität <span class="sort-icon"></span></th>
<th class="sortable" data-sort="tenant_excluded_count" onclick="sortGlobalSources('tenant_excluded_count')">Sperren <span class="sort-icon"></span></th>
<th class="sortable" data-sort="language" onclick="sortGlobalSources('language')">Sprache <span class="sort-icon"></span></th>
<th>Bias</th>
<th class="sortable" data-sort="last_seen_at" onclick="sortGlobalSources('last_seen_at')">Letzter Treffer <span class="sort-icon"></span></th>
<th class="sortable" data-sort="health_status" onclick="sortGlobalSources('health_status')">Health <span class="sort-icon"></span></th>
<th class="sortable" data-sort="status" onclick="sortGlobalSources('status')">Status <span class="sort-icon"></span></th> <th class="sortable" data-sort="status" onclick="sortGlobalSources('status')">Status <span class="sort-icon"></span></th>
<th>Aktionen</th> <th>Aktionen</th>
</tr> </tr>
@@ -363,21 +387,39 @@
<!-- Kundenquellen --> <!-- Kundenquellen -->
<div class="section" id="sub-tenant-sources"> <div class="section" id="sub-tenant-sources">
<div class="action-bar"> <div class="action-bar">
<div style="display:flex;align-items:center;gap:12px;"> <div style="display:flex;align-items:center;gap:12px;flex-wrap:wrap;">
<input type="text" class="search-input" id="tenantSourceSearch" placeholder="Kundenquelle suchen..."> <input type="text" class="search-input" id="tenantSourceSearch" placeholder="Kundenquelle suchen...">
<select class="filter-select" id="tenantFilterType" onchange="filterTenantSources()">
<option value="">Alle Typen</option>
</select>
<select class="filter-select" id="tenantFilterCategory" onchange="filterTenantSources()">
<option value="">Alle Kategorien</option>
</select>
<select class="filter-select" id="tenantFilterOrg" onchange="filterTenantSources()">
<option value="">Alle Organisationen</option>
</select>
<select class="filter-select" id="tenantFilterLanguage" onchange="filterTenantSources()">
<option value="">Alle Sprachen</option>
</select>
<span class="text-secondary" id="tenantSourceCount"></span> <span class="text-secondary" id="tenantSourceCount"></span>
</div> </div>
<button class="btn btn-primary" id="tenantBulkPromoteBtn" disabled onclick="bulkPromoteSelected()">
Ausgewählte übernehmen (0)
</button>
</div> </div>
<div class="card"> <div class="card">
<div class="table-wrap"> <div class="table-wrap">
<table> <table>
<thead> <thead>
<tr> <tr>
<th>Name</th> <th style="width:32px;"><input type="checkbox" id="tenantSelectAll" onchange="toggleTenantSelectAll(this.checked)"></th>
<th>Domain</th> <th class="sortable" data-sort="name" onclick="sortTenantSources('name')">Name <span class="sort-icon"></span></th>
<th>Typ</th> <th class="sortable" data-sort="domain" onclick="sortTenantSources('domain')">Domain <span class="sort-icon"></span></th>
<th>Kategorie</th> <th class="sortable" data-sort="source_type" onclick="sortTenantSources('source_type')">Typ <span class="sort-icon"></span></th>
<th>Organisation</th> <th class="sortable" data-sort="category" onclick="sortTenantSources('category')">Kategorie <span class="sort-icon"></span></th>
<th class="sortable" data-sort="org_name" onclick="sortTenantSources('org_name')">Organisation <span class="sort-icon"></span></th>
<th class="sortable" data-sort="language" onclick="sortTenantSources('language')">Sprache <span class="sort-icon"></span></th>
<th>Bias</th>
<th>Hinzugefügt von</th> <th>Hinzugefügt von</th>
<th>Aktionen</th> <th>Aktionen</th>
</tr> </tr>
@@ -388,17 +430,80 @@
</div> </div>
</div> </div>
<!-- Quellen-Health --> <!-- Quellen-Health (Sub-Tab) - drei Bereiche als Sub-Sub-Tabs;
source-health.js rendert pro Bereich in den jeweiligen Container. -->
<div class="section" id="sub-source-health"> <div class="section" id="sub-source-health">
<div class="action-bar"> <div class="nav-tabs" id="healthSubTabs" style="margin-top:0;">
<h2 style="font-size:16px;font-weight:600;">Quellen-Health & Vorschläge</h2> <button class="nav-tab active" data-healthtab="suggestions">Vorschläge</button>
<button class="btn btn-primary" id="runHealthCheckBtn" onclick="runHealthCheck()">Jetzt prüfen</button> <button class="nav-tab" data-healthtab="checks">Health-Status</button>
<button class="nav-tab" data-healthtab="verlauf">Verlauf</button>
</div> </div>
<div id="healthContent"> <div id="ht-suggestions" class="health-pane active"></div>
<div class="text-muted" style="padding:20px;">Tab auswählen um Health-Daten zu laden...</div> <div id="ht-checks" class="health-pane" style="display:none;"></div>
<div id="ht-verlauf" class="health-pane" style="display:none;"></div>
</div>
<!-- Klassifikations-Review -->
<div class="section" id="sub-classification-review">
<div class="action-bar review-toolbar">
<div class="review-toolbar-info">
<span><strong id="reviewPendingCount">0</strong> Vorschläge ausstehend</span>
<label class="review-conf-filter">
Mindest-Konfidenz:
<select class="filter-select" id="reviewMinConfidence" onchange="loadClassificationQueue()">
<option value="0">alle</option>
<option value="0.5">0.5+</option>
<option value="0.7">0.7+</option>
<option value="0.85">0.85+</option>
<option value="0.9">0.9+</option>
</select>
</label>
</div>
<div class="review-toolbar-actions">
<button class="btn btn-secondary btn-small" onclick="triggerExternalReputationSync()" title="IFCN-Faktenchecker und EUvsDisinfo neu syncen">Externe Daten syncen</button>
<button class="btn btn-secondary btn-small" onclick="triggerBulkClassify()" title="LLM-Klassifikation für noch unklassifizierte Quellen starten">+ Klassifikation starten</button>
<button class="btn btn-primary btn-small" onclick="bulkApproveHighConfidence()" title="Alle Vorschläge ab 0.85 Konfidenz übernehmen">Alle ≥ 0.85 genehmigen</button>
</div>
</div>
<div class="card">
<div class="review-list" id="classificationReviewList">
<div class="text-muted" style="padding:24px;text-align:center;">Lade Review-Queue…</div>
</div>
</div> </div>
</div> </div>
</div>
<!-- X-Recherche-Konten (Sub-Tab) -->
<div class="section" id="sub-x-scraper">
<div class="action-bar">
<div style="display:flex;align-items:center;gap:12px;flex-wrap:wrap;">
<span class="text-secondary" id="xScraperCount"></span>
</div>
<div style="display:flex;gap:8px;">
<button class="btn btn-secondary" onclick="resetXScraperLocks()">Sperren zurücksetzen</button>
<button class="btn btn-primary" onclick="openXScraperAddModal()">+ Konto hinzufügen</button>
</div>
</div>
<div class="card">
<p class="text-secondary" style="padding:0 4px 12px;">X-Login-Konten, mit denen der Monitor bei X recherchiert. Mehr Konten bedeuten paralleleres, schnelleres Scrapen. Cookies laufen periodisch ab und müssen dann erneuert werden.</p>
<div class="table-wrap">
<table>
<thead>
<tr>
<th>Benutzername</th>
<th>E-Mail</th>
<th>Status</th>
<th>Anfragen</th>
<th>Letzte Nutzung</th>
<th>Aktionen</th>
</tr>
</thead>
<tbody id="xScraperTable"></tbody>
</table>
</div>
</div>
</div>
</div> <!-- /sec-sources -->
<!-- Audit-Log Section --> <!-- Audit-Log Section -->
<div class="section" id="sec-audit"> <div class="section" id="sec-audit">
@@ -410,6 +515,7 @@
<select class="filter-select" id="auditFilterResource"> <select class="filter-select" id="auditFilterResource">
<option value="">Alle Ressourcen</option> <option value="">Alle Ressourcen</option>
</select> </select>
<input type="number" class="filter-select" id="auditFilterResourceId" placeholder="Ressourcen-ID" min="1" style="width:130px;">
<select class="filter-select" id="auditFilterAdmin"> <select class="filter-select" id="auditFilterAdmin">
<option value="">Alle Admins</option> <option value="">Alle Admins</option>
</select> </select>
@@ -460,6 +566,14 @@
<label for="newOrgSlug">Slug (URL-freundlich)</label> <label for="newOrgSlug">Slug (URL-freundlich)</label>
<input type="text" id="newOrgSlug" required pattern="[a-z0-9-]+" placeholder="z.B. bundespolizei"> <input type="text" id="newOrgSlug" required pattern="[a-z0-9-]+" placeholder="z.B. bundespolizei">
</div> </div>
<div class="form-group">
<label for="newOrgLanguage">Pipeline-Sprache</label>
<select id="newOrgLanguage">
<option value="de" selected>Deutsch</option>
<option value="en">English</option>
</select>
<small class="text-secondary">Steuert die Ausgabesprache der KI-Pipeline (Lagebild, Faktencheck, Recherche) und die sichtbarsten UI-Strings im Monitor.</small>
</div>
<div id="newOrgError" class="error-msg" style="display:none"></div> <div id="newOrgError" class="error-msg" style="display:none"></div>
</div> </div>
<div class="modal-footer"> <div class="modal-footer">
@@ -586,6 +700,7 @@
<option value="telegram_channel">Telegram-Kanal</option> <option value="telegram_channel">Telegram-Kanal</option>
<option value="podcast_feed">Podcast-Feed</option> <option value="podcast_feed">Podcast-Feed</option>
<option value="excluded">Ausgeschlossen</option> <option value="excluded">Ausgeschlossen</option>
<option value="pdf_document" disabled>PDF-Dokument (nur Upload)</option>
</select> </select>
</div> </div>
<div class="form-group"> <div class="form-group">
@@ -613,17 +728,127 @@
</select> </select>
</div> </div>
</div> </div>
<div style="display:grid;grid-template-columns:1fr 1fr;gap:12px;">
<div class="form-group">
<label for="sourceStatus">Status</label>
<select id="sourceStatus">
<option value="active">Aktiv</option>
<option value="inactive">Inaktiv</option>
</select>
</div>
<div class="form-group">
<label for="sourceLanguage">Sprache</label>
<input type="text" id="sourceLanguage" list="languageSuggestions" placeholder="z.B. Deutsch, Englisch, Russisch">
<datalist id="languageSuggestions"></datalist>
</div>
</div>
<div class="form-group"> <div class="form-group">
<label for="sourceStatus">Status</label> <label for="sourceBias">Bias / Einordnung</label>
<select id="sourceStatus"> <input type="text" id="sourceBias" placeholder="z.B. Nachrichtenagentur, faktenbasiert-neutral" maxlength="500">
<option value="active">Aktiv</option> </div>
<option value="inactive">Inaktiv</option> <div class="form-group">
<label for="sourceFetchStrategy">Fetch-Strategie (Health-Check)</label>
<select id="sourceFetchStrategy">
<option value="default">Standard (UA + Retry mit Googlebot bei 403)</option>
<option value="googlebot">Googlebot (direkt - fuer SEO-freundliche Sites)</option>
<option value="paywall">Paywall (via removepaywalls.com - z.B. FT, Spiegel+)</option>
<option value="skip">Skip (Health-Check ueberspringen)</option>
</select> </select>
</div> </div>
<div class="form-group"> <div class="form-group">
<label for="sourceNotes">Notizen</label> <label for="sourceNotes">Notizen</label>
<input type="text" id="sourceNotes" placeholder="Optional"> <input type="text" id="sourceNotes" placeholder="Optional">
</div> </div>
<div class="sources-classification-section">
<div class="sources-classification-header">Einordnung (Klassifikation)</div>
<div style="display:grid;grid-template-columns:1fr 1fr 1fr;gap:12px;">
<div class="form-group">
<label for="sourcePolitical">Politische Ausrichtung</label>
<select id="sourcePolitical">
<option value="">— unverändert —</option>
<option value="na">Nicht eingeordnet</option>
<option value="links_extrem">Links (extrem)</option>
<option value="links">Links</option>
<option value="mitte_links">Mitte-Links</option>
<option value="liberal">Liberal</option>
<option value="mitte">Mitte</option>
<option value="konservativ">Konservativ</option>
<option value="mitte_rechts">Mitte-Rechts</option>
<option value="rechts">Rechts</option>
<option value="rechts_extrem">Rechts (extrem)</option>
</select>
</div>
<div class="form-group">
<label for="sourceMediaType">Medientyp</label>
<select id="sourceMediaType">
<option value="">— unverändert —</option>
<option value="sonstige">Sonstige</option>
<option value="tageszeitung">Tageszeitung</option>
<option value="wochenzeitung">Wochenzeitung</option>
<option value="magazin">Magazin</option>
<option value="tv_sender">TV-Sender</option>
<option value="radio">Radio</option>
<option value="oeffentlich_rechtlich">Öffentlich-Rechtlich</option>
<option value="nachrichtenagentur">Nachrichtenagentur</option>
<option value="online_only">Online-only</option>
<option value="blog">Blog</option>
<option value="telegram_kanal">Telegram-Kanal</option>
<option value="telegram_bot">Telegram-Bot</option>
<option value="podcast">Podcast</option>
<option value="social_media">Social Media</option>
<option value="imageboard">Imageboard</option>
<option value="think_tank">Think Tank</option>
<option value="ngo">NGO</option>
<option value="behoerde">Behörde</option>
<option value="staatsmedium">Staatsmedium</option>
<option value="fachmedium">Fachmedium</option>
</select>
</div>
<div class="form-group">
<label for="sourceReliability">Glaubwürdigkeit</label>
<select id="sourceReliability">
<option value="">— unverändert —</option>
<option value="na">Nicht eingeordnet</option>
<option value="sehr_hoch">Sehr hoch</option>
<option value="hoch">Hoch</option>
<option value="gemischt">Gemischt</option>
<option value="niedrig">Niedrig</option>
<option value="sehr_niedrig">Sehr niedrig</option>
</select>
</div>
</div>
<div style="display:grid;grid-template-columns:1fr 1fr;gap:12px;margin-top:8px;">
<div class="form-group">
<label for="sourceCountryCode">Land (ISO 3166)</label>
<input type="text" id="sourceCountryCode" maxlength="2" placeholder="z.B. DE, RU, US" style="text-transform:uppercase;">
</div>
<div class="form-group">
<label class="checkbox-label" style="display:flex;align-items:center;gap:8px;margin-top:24px;">
<input type="checkbox" id="sourceStateAffiliated">
<span>Staatsnah / staatlich kontrolliert</span>
</label>
</div>
</div>
<div class="form-group" style="margin-top:8px;">
<label>Geopolitische Nähe (Mehrfachauswahl)</label>
<div id="sourceAlignmentChips" class="alignment-chips" onclick="handleAlignmentChipClick(event)">
<button type="button" class="alignment-chip" data-alignment="prorussisch">prorussisch</button>
<button type="button" class="alignment-chip" data-alignment="proiranisch">proiranisch</button>
<button type="button" class="alignment-chip" data-alignment="prowestlich">prowestlich</button>
<button type="button" class="alignment-chip" data-alignment="proukrainisch">proukrainisch</button>
<button type="button" class="alignment-chip" data-alignment="prochinesisch">prochinesisch</button>
<button type="button" class="alignment-chip" data-alignment="projapanisch">projapanisch</button>
<button type="button" class="alignment-chip" data-alignment="proisraelisch">proisraelisch</button>
<button type="button" class="alignment-chip" data-alignment="propalaestinensisch">propalästinensisch</button>
<button type="button" class="alignment-chip" data-alignment="protuerkisch">protürkisch</button>
<button type="button" class="alignment-chip" data-alignment="panarabisch">panarabisch</button>
<button type="button" class="alignment-chip" data-alignment="neutral">neutral</button>
<button type="button" class="alignment-chip" data-alignment="sonstige">sonstige</button>
</div>
</div>
</div>
<div id="sourceError" class="error-msg" style="display:none"></div> <div id="sourceError" class="error-msg" style="display:none"></div>
</div> </div>
<div class="modal-footer"> <div class="modal-footer">
@@ -634,6 +859,59 @@
</div> </div>
</div> </div>
<!-- Modal: PDF hochladen -->
<div class="modal-overlay" id="modalPdfUpload">
<div class="modal">
<div class="modal-header">
<h3>PDF als Quelle hochladen</h3>
<button class="modal-close" onclick="closeModal(&#39;modalPdfUpload&#39;)">&times;</button>
</div>
<form id="pdfUploadForm" enctype="multipart/form-data">
<div class="modal-body">
<p class="text-secondary" style="margin-top:0;">
Die PDF wird gespeichert und vom Monitor automatisch verarbeitet:
Text extrahieren (OCR-Fallback fuer gescannte Dokumente),
Übersetzung nach Deutsch und Englisch.
</p>
<div class="form-group">
<label for="pdfFile">PDF-Datei (max. 50 MB)</label>
<input type="file" id="pdfFile" accept="application/pdf,.pdf" required>
</div>
<div class="form-group">
<label for="pdfName">Anzeige-Name (optional)</label>
<input type="text" id="pdfName" maxlength="200" placeholder="leer = Dateiname">
</div>
<div style="display:grid;grid-template-columns:1fr 1fr;gap:12px;">
<div class="form-group">
<label for="pdfCategory">Kategorie</label>
<select id="pdfCategory">
<option value="sonstige" selected>Sonstige</option>
<option value="behoerde">Behörde</option>
<option value="think-tank">Think-Tank</option>
<option value="fachmedien">Fachmedien</option>
<option value="international">International</option>
</select>
</div>
<div class="form-group">
<label for="pdfLanguage">Sprache (optional)</label>
<input type="text" id="pdfLanguage" list="languageSuggestions" placeholder="z.B. Deutsch, Englisch">
</div>
</div>
<div class="form-group">
<label for="pdfNotes">Notizen</label>
<input type="text" id="pdfNotes" placeholder="Optional">
</div>
<div id="pdfUploadError" class="error-msg" style="display:none"></div>
<div id="pdfUploadProgress" class="text-secondary" style="display:none;margin-top:8px;">Lädt hoch …</div>
</div>
<div class="modal-footer">
<button type="button" class="btn btn-secondary" onclick="closeModal(&#39;modalPdfUpload&#39;)">Abbrechen</button>
<button type="submit" class="btn btn-primary" id="pdfUploadSubmitBtn">Hochladen</button>
</div>
</form>
</div>
</div>
<!-- Modal: Discover Sources --> <!-- Modal: Discover Sources -->
<div class="modal-overlay" id="modalDiscover"> <div class="modal-overlay" id="modalDiscover">
<div class="modal" style="max-width:600px;"> <div class="modal" style="max-width:600px;">
@@ -661,6 +939,20 @@
</div> </div>
</div> </div>
<!-- Modal: Audit-Spur einer Ressource (Phase 5) -->
<div class="modal-overlay" id="modalAudit">
<div class="modal modal-large">
<div class="modal-header">
<h3 id="auditTitle">Audit-Spur</h3>
<button class="modal-close" onclick="closeModal('modalAudit')">&times;</button>
</div>
<div class="modal-body">
<div id="auditContent" class="audit-content">Lade...</div>
</div>
</div>
</div>
<!-- Modal: Confirm --> <!-- Modal: Confirm -->
<div class="modal-overlay" id="modalConfirm"> <div class="modal-overlay" id="modalConfirm">
<div class="modal" style="max-width: 400px;"> <div class="modal" style="max-width: 400px;">
@@ -672,15 +964,85 @@
<p class="confirm-text" id="confirmText"></p> <p class="confirm-text" id="confirmText"></p>
</div> </div>
<div class="modal-footer"> <div class="modal-footer">
<button class="btn btn-secondary" onclick="closeModal('modalConfirm')">Abbrechen</button> <button class="btn btn-secondary" id="confirmCancelBtn" onclick="closeModal('modalConfirm')">Abbrechen</button>
<button class="btn btn-danger" id="confirmOkBtn">Bestätigen</button> <button class="btn btn-danger" id="confirmOkBtn">Bestätigen</button>
</div> </div>
</div> </div>
</div> </div>
<script src="/static/js/app.js"></script> <!-- Modal: X-Recherche-Konto hinzufügen -->
<script src="/static/js/sources.js"></script> <div class="modal-overlay" id="modalXScraperAdd">
<script src="/static/js/source-health.js"></script> <div class="modal">
<script src="/static/js/audit.js"></script> <div class="modal-header">
<h3>X-Recherche-Konto hinzufügen</h3>
<button class="modal-close" onclick="closeModal('modalXScraperAdd')">&times;</button>
</div>
<form id="xScraperAddForm">
<div class="modal-body">
<div class="form-group">
<label for="xsUsername">X-Benutzername</label>
<input type="text" id="xsUsername" required placeholder="Login-Handle des Kontos, ohne @">
</div>
<div class="form-group">
<label for="xsPassword">X-Passwort</label>
<input type="password" id="xsPassword" placeholder="optional">
</div>
<div class="form-group">
<label for="xsEmail">E-Mail</label>
<input type="text" id="xsEmail" placeholder="optional, z.B. konto@protonmail.com">
</div>
<div class="form-group">
<label for="xsEmailPassword">E-Mail-Passwort</label>
<input type="password" id="xsEmailPassword" placeholder="optional">
</div>
<div class="form-group">
<label for="xsCookies">Cookies</label>
<textarea id="xsCookies" rows="3" required placeholder="auth_token=...; ct0=..."></textarea>
<small class="text-secondary">Aus dem eingeloggten X-Browser exportiert, mindestens auth_token und ct0.</small>
</div>
<div id="xScraperAddError" class="error-msg" style="display:none"></div>
</div>
<div class="modal-footer">
<button type="button" class="btn btn-secondary" onclick="closeModal('modalXScraperAdd')">Abbrechen</button>
<button type="submit" class="btn btn-primary">Konto anlegen</button>
</div>
</form>
</div>
</div>
<!-- Modal: X-Recherche-Konto Cookies erneuern -->
<div class="modal-overlay" id="modalXScraperCookies">
<div class="modal">
<div class="modal-header">
<h3>Cookies erneuern</h3>
<button class="modal-close" onclick="closeModal('modalXScraperCookies')">&times;</button>
</div>
<form id="xScraperCookiesForm">
<div class="modal-body">
<div class="form-group">
<label for="xsCookiesUsername">Konto</label>
<input type="text" id="xsCookiesUsername" readonly>
</div>
<div class="form-group">
<label for="xsCookiesValue">Neue Cookies</label>
<textarea id="xsCookiesValue" rows="3" required placeholder="auth_token=...; ct0=..."></textarea>
<small class="text-secondary">Frisch aus dem eingeloggten X-Browser exportieren.</small>
</div>
<div id="xScraperCookiesError" class="error-msg" style="display:none"></div>
</div>
<div class="modal-footer">
<button type="button" class="btn btn-secondary" onclick="closeModal('modalXScraperCookies')">Abbrechen</button>
<button type="submit" class="btn btn-primary">Cookies setzen</button>
</div>
</form>
</div>
</div>
<script src="/static/js/app.js?v=20260522a"></script>
<script src="/static/js/sources.js?v=20260522x2"></script>
<script src="/static/js/x-scraper.js?v=20260522a"></script>
<script src="/static/js/source-health.js?v=20260509l"></script>
<script src="/static/js/audit.js?v=20260509d"></script>
<div id="toastContainer" class="toast-container" aria-live="polite" aria-atomic="true"></div>
</body> </body>
</html> </html>

Datei anzeigen

@@ -3,10 +3,11 @@
<head> <head>
<meta charset="UTF-8"> <meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0"> <meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>AegisSight Monitor-Verwaltung - Login</title> <meta name="robots" content="noindex, nofollow">
<title>AegisSight Monitor-Verwaltung - Anmeldung</title>
<link rel="icon" type="image/svg+xml" href="/static/favicon.svg"> <link rel="icon" type="image/svg+xml" href="/static/favicon.svg">
<link rel="apple-touch-icon" href="/static/favicon.svg"> <link rel="apple-touch-icon" href="/static/favicon.svg">
<link rel="stylesheet" href="/static/css/style.css"> <link rel="stylesheet" href="/static/css/style.css?v=20260509d">
</head> </head>
<body class="login-page"> <body class="login-page">
<div class="login-container"> <div class="login-container">
@@ -16,68 +17,143 @@
<p class="subtitle">Monitor-Verwaltung</p> <p class="subtitle">Monitor-Verwaltung</p>
</div> </div>
<form id="loginForm" class="login-form"> <!-- Schritt 1: Email-Eingabe -->
<form id="magicForm" class="login-form">
<div class="form-group"> <div class="form-group">
<label for="username">Benutzername</label> <label for="email">E-Mail-Adresse</label>
<input type="text" id="username" name="username" required autocomplete="username" autofocus> <input type="email" id="email" name="email" required autocomplete="email" autofocus
</div> placeholder="info@aegis-sight.de">
<div class="form-group">
<label for="password">Passwort</label>
<input type="password" id="password" name="password" required autocomplete="current-password">
</div> </div>
<div id="loginError" class="error-msg" style="display:none"></div> <div id="loginError" class="error-msg" style="display:none"></div>
<button type="submit" class="btn btn-primary btn-full" id="loginBtn">Anmelden</button> <button type="submit" class="btn btn-primary btn-full" id="magicBtn">Login-Link anfordern</button>
<p class="form-hint" style="margin-top:14px;text-align:center;font-size:12px;color:#94a3b8;">
Wir senden dir einen einmaligen Login-Link per E-Mail.
</p>
</form> </form>
<!-- Schritt 2: Bestätigung nach Versand -->
<div id="sentInfo" style="display:none;text-align:center;">
<div style="font-size:42px;margin:8px 0 16px 0;">&#9993;</div>
<h2 style="font-size:17px;margin:0 0 8px 0;">E-Mail unterwegs</h2>
<p style="margin:0 0 18px 0;font-size:14px;color:#94a3b8;line-height:1.5;">
Wenn die Adresse berechtigt ist, hast du gleich einen Login-Link in deinem Posteingang.
Der Link ist 10 Minuten gültig.
</p>
<button type="button" class="btn btn-secondary btn-full" onclick="resetForm()">Andere E-Mail-Adresse</button>
</div>
<!-- Schritt 3: Verify (während Token-Prüfung) -->
<div id="verifying" style="display:none;text-align:center;">
<div class="spinner" style="margin:8px auto 16px;"></div>
<h2 style="font-size:17px;margin:0 0 8px 0;">Anmeldung wird geprüft...</h2>
<p style="margin:0;font-size:14px;color:#94a3b8;">Einen Moment bitte.</p>
</div>
</div> </div>
</div> </div>
<script> <style>
const form = document.getElementById('loginForm'); .spinner {
const errorEl = document.getElementById('loginError'); width: 36px; height: 36px;
const btn = document.getElementById('loginBtn'); border: 3px solid rgba(240,180,41,0.2);
border-top-color: #f0b429;
border-radius: 50%;
animation: spin 0.8s linear infinite;
}
@keyframes spin { to { transform: rotate(360deg); } }
.form-hint { font-size: 12px; color: #94a3b8; margin-top: 8px; }
</style>
<script>
const form = document.getElementById('magicForm');
const sentInfo = document.getElementById('sentInfo');
const verifying = document.getElementById('verifying');
const errorEl = document.getElementById('loginError');
const btn = document.getElementById('magicBtn');
function resetForm() {
sentInfo.style.display = 'none';
verifying.style.display = 'none';
form.style.display = '';
errorEl.style.display = 'none';
document.getElementById('email').value = '';
document.getElementById('email').focus();
}
function showError(msg) {
form.style.display = '';
sentInfo.style.display = 'none';
verifying.style.display = 'none';
errorEl.textContent = msg;
errorEl.style.display = 'block';
}
// --- Magic-Link anfordern ---
form.addEventListener('submit', async (e) => { form.addEventListener('submit', async (e) => {
e.preventDefault(); e.preventDefault();
errorEl.style.display = 'none'; errorEl.style.display = 'none';
btn.disabled = true; btn.disabled = true;
btn.textContent = 'Anmeldung...'; btn.textContent = 'Sende...';
try { try {
const res = await fetch('/api/auth/login', { const res = await fetch('/api/auth/magic-link', {
method: 'POST', method: 'POST',
headers: { 'Content-Type': 'application/json' }, headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ body: JSON.stringify({ email: document.getElementById('email').value }),
username: document.getElementById('username').value,
password: document.getElementById('password').value,
}),
}); });
if (res.status === 429) {
const data = await res.json().catch(() => ({}));
throw new Error(data.detail || 'Zu viele Fehlversuche. Bitte 15 Minuten warten.');
}
if (!res.ok) { if (!res.ok) {
const data = await res.json(); const data = await res.json().catch(() => ({}));
throw new Error(data.detail || 'Anmeldung fehlgeschlagen'); throw new Error(data.detail || `Fehler ${res.status}`);
} }
// Erfolg (oder generisch): Bestätigungsanzeige
const data = await res.json(); form.style.display = 'none';
localStorage.setItem('token', data.access_token); sentInfo.style.display = '';
localStorage.setItem('username', data.username);
window.location.href = '/dashboard';
} catch (err) { } catch (err) {
errorEl.textContent = err.message; showError(err.message);
errorEl.style.display = 'block';
} finally { } finally {
btn.disabled = false; btn.disabled = false;
btn.textContent = 'Anmelden'; btn.textContent = 'Login-Link anfordern';
} }
}); });
// Redirect if already logged in // --- Token aus URL verifizieren (Schritt 3) ---
if (localStorage.getItem('token')) { async function verifyTokenFromUrl() {
const params = new URLSearchParams(window.location.search);
const token = params.get('token');
if (!token) return;
form.style.display = 'none';
verifying.style.display = '';
try {
const res = await fetch('/api/auth/verify', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ token }),
});
if (!res.ok) {
const data = await res.json().catch(() => ({}));
throw new Error(data.detail || 'Login-Link ungültig');
}
const data = await res.json();
localStorage.setItem('token', data.access_token);
localStorage.setItem('username', data.username);
if (data.email) localStorage.setItem('email', data.email);
// Token aus URL entfernen, damit er nicht im Verlauf liegt
window.history.replaceState({}, '', '/');
window.location.href = '/dashboard';
} catch (err) {
showError(err.message);
// Token aus URL entfernen bei Fehler
window.history.replaceState({}, '', '/');
}
}
// Schon eingeloggt? -> direkt aufs Dashboard
if (localStorage.getItem('token') && !window.location.search.includes('token=')) {
window.location.href = '/dashboard'; window.location.href = '/dashboard';
} else {
verifyTokenFromUrl();
} }
</script> </script>
</body> </body>

Datei anzeigen

@@ -26,6 +26,23 @@ const API = {
post(path, body) { return this.request(path, { method: "POST", body: JSON.stringify(body) }); }, post(path, body) { return this.request(path, { method: "POST", body: JSON.stringify(body) }); },
put(path, body) { return this.request(path, { method: "PUT", body: body ? JSON.stringify(body) : undefined }); }, put(path, body) { return this.request(path, { method: "PUT", body: body ? JSON.stringify(body) : undefined }); },
del(path) { return this.request(path, { method: "DELETE" }); }, del(path) { return this.request(path, { method: "DELETE" }); },
async upload(path, formData) {
const headers = {};
if (this.token) headers["Authorization"] = `Bearer ${this.token}`;
const res = await fetch(path, { method: "POST", headers, body: formData });
if (res.status === 401) {
localStorage.removeItem("token");
localStorage.removeItem("username");
window.location.href = "/";
return;
}
if (!res.ok) {
const data = await res.json().catch(() => ({}));
throw new Error(data.detail || `Fehler ${res.status}`);
}
return res.json();
},
}; };
// --- State --- // --- State ---
@@ -42,8 +59,10 @@ document.addEventListener("DOMContentLoaded", () => {
setupNavTabs(); setupNavTabs();
setupOrgDetailTabs(); setupOrgDetailTabs();
setupForms(); setupForms();
setupTranslation();
loadDashboard(); loadDashboard();
loadDashboardTokenStats(); loadDashboardTokenStats();
loadTranslationStatus();
loadOrgs(); loadOrgs();
}); });
@@ -55,14 +74,15 @@ function logout() {
// --- Navigation --- // --- Navigation ---
function setupNavTabs() { function setupNavTabs() {
document.querySelectorAll(".nav-tabs:not(#orgDetailTabs):not(#sourceSubTabs) .nav-tab").forEach(tab => { document.querySelectorAll(".nav-tabs:not(#orgDetailTabs):not(#sourceSubTabs):not(#healthSubTabs) .nav-tab").forEach(tab => {
tab.addEventListener("click", () => { tab.addEventListener("click", () => {
const section = tab.dataset.section; const section = tab.dataset.section;
document.querySelectorAll(".nav-tabs:not(#orgDetailTabs):not(#sourceSubTabs) .nav-tab").forEach(t => t.classList.remove("active")); document.querySelectorAll(".nav-tabs:not(#orgDetailTabs):not(#sourceSubTabs):not(#healthSubTabs) .nav-tab").forEach(t => t.classList.remove("active"));
tab.classList.add("active"); tab.classList.add("active");
document.querySelectorAll(".app-content > .section").forEach(s => s.classList.remove("active")); document.querySelectorAll(".app-content > .section").forEach(s => s.classList.remove("active"));
document.getElementById(`sec-${section}`).classList.add("active"); document.getElementById(`sec-${section}`).classList.add("active");
if (section === "dashboard") loadTranslationStatus();
if (section === "licenses") loadExpiringLicenses(); if (section === "licenses") loadExpiringLicenses();
if (section === "audit" && typeof loadAudit === "function") loadAudit(); if (section === "audit" && typeof loadAudit === "function") loadAudit();
}); });
@@ -213,6 +233,8 @@ async function openOrg(orgId) {
document.getElementById("editOrgName").value = org.name; document.getElementById("editOrgName").value = org.name;
document.getElementById("editOrgActive").value = org.is_active ? "true" : "false"; document.getElementById("editOrgActive").value = org.is_active ? "true" : "false";
const langEl = document.getElementById("editOrgLanguage");
if (langEl) langEl.value = org.output_language || "de";
loadOrgUsers(orgId); loadOrgUsers(orgId);
loadOrgLicenses(orgId); loadOrgLicenses(orgId);
@@ -268,7 +290,7 @@ async function changeRole(userId, role) {
try { try {
await API.put(`/api/users/${userId}/role?role=${role}`); await API.put(`/api/users/${userId}/role?role=${role}`);
} catch (err) { } catch (err) {
alert(err.message); showToast(err.message, "error");
if (currentOrgId) loadOrgUsers(currentOrgId); if (currentOrgId) loadOrgUsers(currentOrgId);
} }
} }
@@ -278,7 +300,7 @@ async function toggleUser(userId, activate) {
await API.put(`/api/users/${userId}/${activate ? "activate" : "deactivate"}`); await API.put(`/api/users/${userId}/${activate ? "activate" : "deactivate"}`);
if (currentOrgId) loadOrgUsers(currentOrgId); if (currentOrgId) loadOrgUsers(currentOrgId);
} catch (err) { } catch (err) {
alert(err.message); showToast(err.message, "error");
} }
} }
@@ -287,7 +309,7 @@ async function toggleGlobeAccess(userId) {
await API.put("/api/users/" + userId + "/globe-access"); await API.put("/api/users/" + userId + "/globe-access");
if (currentOrgId) loadOrgUsers(currentOrgId); if (currentOrgId) loadOrgUsers(currentOrgId);
} catch (err) { } catch (err) {
alert(err.message); showToast(err.message, "error");
if (currentOrgId) loadOrgUsers(currentOrgId); if (currentOrgId) loadOrgUsers(currentOrgId);
} }
} }
@@ -297,7 +319,7 @@ async function toggleNetworkAccess(userId) {
await API.put("/api/users/" + userId + "/network-access"); await API.put("/api/users/" + userId + "/network-access");
if (currentOrgId) loadOrgUsers(currentOrgId); if (currentOrgId) loadOrgUsers(currentOrgId);
} catch (err) { } catch (err) {
alert(err.message); showToast(err.message, "error");
if (currentOrgId) loadOrgUsers(currentOrgId); if (currentOrgId) loadOrgUsers(currentOrgId);
} }
} }
@@ -311,7 +333,7 @@ function confirmDeleteUser(userId, email) {
await API.del(`/api/users/${userId}`); await API.del(`/api/users/${userId}`);
if (currentOrgId) loadOrgUsers(currentOrgId); if (currentOrgId) loadOrgUsers(currentOrgId);
} catch (err) { } catch (err) {
alert(err.message); showToast(err.message, "error");
} }
} }
); );
@@ -353,7 +375,7 @@ async function extendLicense(licId) {
await API.put(`/api/licenses/${licId}/extend?days=${parseInt(days)}`); await API.put(`/api/licenses/${licId}/extend?days=${parseInt(days)}`);
if (currentOrgId) loadOrgLicenses(currentOrgId); if (currentOrgId) loadOrgLicenses(currentOrgId);
} catch (err) { } catch (err) {
alert(err.message); showToast(err.message, "error");
} }
} }
@@ -366,7 +388,7 @@ function confirmRevokeLicense(licId) {
await API.put(`/api/licenses/${licId}/revoke`); await API.put(`/api/licenses/${licId}/revoke`);
if (currentOrgId) loadOrgLicenses(currentOrgId); if (currentOrgId) loadOrgLicenses(currentOrgId);
} catch (err) { } catch (err) {
alert(err.message); showToast(err.message, "error");
} }
} }
); );
@@ -405,7 +427,7 @@ document.addEventListener("DOMContentLoaded", () => {
function switchToOrg(orgId) { function switchToOrg(orgId) {
// Switch to orgs tab and open detail // Switch to orgs tab and open detail
document.querySelectorAll(".nav-tabs:not(#orgDetailTabs):not(#sourceSubTabs) .nav-tab").forEach(t => t.classList.remove("active")); document.querySelectorAll(".nav-tabs:not(#orgDetailTabs):not(#sourceSubTabs):not(#healthSubTabs) .nav-tab").forEach(t => t.classList.remove("active"));
document.querySelector('.nav-tab[data-section="orgs"]').classList.add("active"); document.querySelector('.nav-tab[data-section="orgs"]').classList.add("active");
document.querySelectorAll(".app-content > .section").forEach(s => s.classList.remove("active")); document.querySelectorAll(".app-content > .section").forEach(s => s.classList.remove("active"));
document.getElementById("sec-orgs").classList.add("active"); document.getElementById("sec-orgs").classList.add("active");
@@ -424,6 +446,7 @@ function setupForms() {
await API.post("/api/orgs", { await API.post("/api/orgs", {
name: document.getElementById("newOrgName").value, name: document.getElementById("newOrgName").value,
slug: document.getElementById("newOrgSlug").value, slug: document.getElementById("newOrgSlug").value,
output_language: document.getElementById("newOrgLanguage").value || "de",
}); });
closeModal("modalNewOrg"); closeModal("modalNewOrg");
document.getElementById("newOrgForm").reset(); document.getElementById("newOrgForm").reset();
@@ -518,12 +541,13 @@ function setupForms() {
await API.put(`/api/orgs/${currentOrgId}`, { await API.put(`/api/orgs/${currentOrgId}`, {
name: document.getElementById("editOrgName").value, name: document.getElementById("editOrgName").value,
is_active: document.getElementById("editOrgActive").value === "true", is_active: document.getElementById("editOrgActive").value === "true",
output_language: document.getElementById("editOrgLanguage").value || "de",
}); });
openOrg(currentOrgId); openOrg(currentOrgId);
loadOrgs(); loadOrgs();
loadDashboard(); loadDashboard();
} catch (err) { } catch (err) {
alert(err.message); showToast(err.message, "error");
} }
}); });
@@ -541,7 +565,7 @@ function setupForms() {
loadOrgs(); loadOrgs();
loadDashboard(); loadDashboard();
} catch (err) { } catch (err) {
alert(err.message); showToast(err.message, "error");
} }
} }
); );
@@ -560,19 +584,56 @@ function closeModal(id) {
// Confirm dialog // Confirm dialog
let confirmCallback = null; let confirmCallback = null;
let confirmResolver = null;
function showConfirm(title, text, callback) { function showConfirm(title, text, callback) {
document.getElementById("confirmTitle").textContent = title; document.getElementById("confirmTitle").textContent = title;
document.getElementById("confirmText").textContent = text; document.getElementById("confirmText").textContent = text;
confirmCallback = callback; // Backward-compat: legacy Callback wird bei OK aufgerufen
confirmCallback = callback || null;
openModal("modalConfirm"); openModal("modalConfirm");
return new Promise((resolve) => {
if (confirmResolver) confirmResolver(false); // alten Resolver schliessen
confirmResolver = resolve;
});
}
function showToast(msg, type) {
type = type || "info";
const c = document.getElementById("toastContainer");
if (!c) { console.log("[toast]", type, msg); return; }
const el = document.createElement("div");
el.className = "toast toast-" + type;
el.textContent = msg;
c.appendChild(el);
setTimeout(() => {
el.classList.add("toast-out");
setTimeout(() => el.remove(), 220);
}, type === "error" ? 6000 : 3500);
} }
document.addEventListener("DOMContentLoaded", () => { document.addEventListener("DOMContentLoaded", () => {
document.getElementById("confirmOkBtn").addEventListener("click", async () => { document.getElementById("confirmOkBtn").addEventListener("click", async () => {
closeModal("modalConfirm"); closeModal("modalConfirm");
if (confirmCallback) await confirmCallback(); const cb = confirmCallback;
const rs = confirmResolver;
confirmCallback = null; confirmCallback = null;
confirmResolver = null;
if (cb) {
try { await cb(); } catch (e) { showToast(e.message || String(e), "error"); }
}
if (rs) rs(true);
}); });
// Cancel/Close -> resolver(false)
function _confirmCancel() {
const rs = confirmResolver;
confirmCallback = null;
confirmResolver = null;
if (rs) rs(false);
}
document.getElementById("confirmCancelBtn")?.addEventListener("click", _confirmCancel);
document.querySelector("#modalConfirm .modal-close")?.addEventListener("click", _confirmCancel);
}); });
// --- Utilities --- // --- Utilities ---
@@ -594,6 +655,151 @@ function formatDate(iso) {
} }
// ===== Artikel-Übersetzung =====
let translationPollTimer = null;
function setupTranslation() {
const runBtn = document.getElementById("translationRunBtn");
const cancelBtn = document.getElementById("translationCancelBtn");
if (runBtn) runBtn.addEventListener("click", startTranslation);
if (cancelBtn) cancelBtn.addEventListener("click", cancelTranslation);
}
function formatDuration(seconds) {
seconds = Math.max(0, Math.round(seconds || 0));
if (seconds < 60) return seconds + " Sek.";
const min = Math.round(seconds / 60);
if (min < 60) return min + " Min.";
const h = Math.floor(min / 60), m = min % 60;
return h + " Std. " + (m ? m + " Min." : "").trim();
}
function renderTranslation(st) {
const info = document.getElementById("translationInfo");
const wrap = document.getElementById("translationProgressWrap");
const bar = document.getElementById("translationProgressBar");
const ptext = document.getElementById("translationProgressText");
const runBtn = document.getElementById("translationRunBtn");
const cancelBtn = document.getElementById("translationCancelBtn");
if (!info || !runBtn) return;
if (st.running) {
runBtn.style.display = "none";
cancelBtn.style.display = "";
wrap.style.display = "";
const pct = st.total > 0 ? Math.round((st.done / st.total) * 100) : 0;
bar.style.width = pct + "%";
ptext.textContent = `${st.done} / ${st.total} verarbeitet, ${st.translated} übersetzt (${pct}%)`;
info.textContent = "Übersetzung läuft…";
return;
}
runBtn.style.display = "";
cancelBtn.style.display = "none";
wrap.style.display = "none";
let resultLine = "";
if (st.finished_at && (st.total > 0 || st.error)) {
if (st.error) {
resultLine = `Letzter Lauf mit Fehler beendet: ${st.error}. `;
} else if (st.cancelled) {
resultLine = `Letzter Lauf abgebrochen, ${st.translated} von ${st.total} Artikeln übersetzt. `;
} else {
resultLine = `Letzter Lauf abgeschlossen, ${st.translated} Artikel übersetzt. `;
}
}
if (st.pending > 0) {
const est = st.estimate || {};
info.textContent = resultLine +
`${st.pending} Artikel ohne deutsche Übersetzung. ` +
`Geschätzt: ${formatDuration(est.seconds)}, ca. $${est.cost_usd}.`;
runBtn.disabled = false;
} else {
info.textContent = resultLine + "Alle Artikel sind übersetzt.";
runBtn.disabled = true;
}
}
async function loadTranslationStatus() {
try {
const st = await API.get("/api/translation/status");
renderTranslation(st);
if (st.running && !translationPollTimer) {
translationPollTimer = setInterval(pollTranslation, 3000);
}
} catch (e) {
const info = document.getElementById("translationInfo");
if (info) info.textContent = "Status nicht abrufbar: " + (e.message || e);
}
}
async function pollTranslation() {
try {
const st = await API.get("/api/translation/status");
renderTranslation(st);
if (!st.running) {
clearInterval(translationPollTimer);
translationPollTimer = null;
if (st.error) {
showToast("Übersetzung mit Fehler beendet", "error");
} else if (st.cancelled) {
showToast(`Übersetzung abgebrochen, ${st.translated} übersetzt`, "info");
} else {
showToast(`Übersetzung fertig: ${st.translated} Artikel`, "success");
}
}
} catch (e) {
console.warn("Translation-Poll fehlgeschlagen:", e);
}
}
async function startTranslation() {
let st;
try {
st = await API.get("/api/translation/status");
} catch (e) {
showToast(e.message || "Status nicht abrufbar", "error");
return;
}
if (st.running) { showToast("Es läuft bereits eine Übersetzung", "info"); return; }
if (!st.pending) { showToast("Es gibt nichts zu übersetzen", "info"); return; }
const est = st.estimate || {};
const ok = await showConfirm(
"Übersetzung starten",
`${st.pending} Artikel werden ins Deutsche übersetzt. ` +
`Geschätzte Dauer: ${formatDuration(est.seconds)}, geschätzte Kosten: ca. $${est.cost_usd}. ` +
`Der Lauf kann jederzeit abgebrochen werden.`
);
if (!ok) return;
try {
const res = await API.post("/api/translation/run", {});
if (res && res.status === "started") {
showToast(`Übersetzung gestartet (${res.pending} Artikel)`, "success");
await loadTranslationStatus();
if (!translationPollTimer) {
translationPollTimer = setInterval(pollTranslation, 3000);
}
} else {
showToast("Es gibt nichts zu übersetzen", "info");
loadTranslationStatus();
}
} catch (e) {
showToast(e.message || "Start fehlgeschlagen", "error");
}
}
async function cancelTranslation() {
try {
await API.post("/api/translation/cancel", {});
showToast("Übersetzung wird abgebrochen…", "info");
} catch (e) {
showToast(e.message || "Abbruch fehlgeschlagen", "error");
}
}
// ===== Token-Nutzung ===== // ===== Token-Nutzung =====
async function loadOrgTokenUsage(orgId) { async function loadOrgTokenUsage(orgId) {
try { try {
@@ -796,3 +1002,48 @@ document.addEventListener('DOMContentLoaded', function() {
}); });
} }
}); });
// === Source-Meta (Kategorien + Typen aus dem Backend) ===
window.META = { categories: [], types: [] };
window.CATEGORY_LABELS = {};
window.TYPE_LABELS = {};
async function loadMeta() {
try {
const data = await API.get("/api/sources/meta");
window.META = data;
window.CATEGORY_LABELS = Object.fromEntries((data.categories || []).map(c => [c.key, c.label]));
window.TYPE_LABELS = Object.fromEntries((data.types || []).map(t => [t.key, t.label]));
return data;
} catch (err) {
console.warn("loadMeta:", err);
return null;
}
}
function categoryLabel(key) {
return window.CATEGORY_LABELS[key] || key || "";
}
function typeLabel(key) {
return window.TYPE_LABELS[key] || key || "";
}
function populateSelect(el, items, allLabel) {
if (!el) return;
const current = el.value;
el.innerHTML = '<option value="">' + (allLabel || "Alle") + '</option>';
items.forEach(it => {
const opt = document.createElement("option");
opt.value = it.key;
opt.textContent = it.label;
el.appendChild(opt);
});
if (current && items.some(it => it.key === current)) el.value = current;
}
document.addEventListener("DOMContentLoaded", () => {
// Beim Page-Load Meta einmalig laden (asynchron, blockiert nicht)
if (window.API && (localStorage.getItem("token") || window.location.pathname === "/dashboard")) {
loadMeta();
}
});

Datei anzeigen

@@ -38,7 +38,7 @@ document.addEventListener("DOMContentLoaded", () => {
}); });
// Filter-Inputs verdrahten // Filter-Inputs verdrahten
["auditFilterAction", "auditFilterResource", "auditFilterAdmin", ["auditFilterAction", "auditFilterResource", "auditFilterResourceId", "auditFilterAdmin",
"auditFilterFrom", "auditFilterTo"].forEach((id) => { "auditFilterFrom", "auditFilterTo"].forEach((id) => {
const el = document.getElementById(id); const el = document.getElementById(id);
if (el) el.addEventListener("change", () => { auditCache.offset = 0; loadAudit(); }); if (el) el.addEventListener("change", () => { auditCache.offset = 0; loadAudit(); });
@@ -112,12 +112,14 @@ async function loadAudit() {
const params = new URLSearchParams(); const params = new URLSearchParams();
const action = document.getElementById("auditFilterAction")?.value; const action = document.getElementById("auditFilterAction")?.value;
const resource = document.getElementById("auditFilterResource")?.value; const resource = document.getElementById("auditFilterResource")?.value;
const resourceId = document.getElementById("auditFilterResourceId")?.value;
const adminId = document.getElementById("auditFilterAdmin")?.value; const adminId = document.getElementById("auditFilterAdmin")?.value;
const from = document.getElementById("auditFilterFrom")?.value; const from = document.getElementById("auditFilterFrom")?.value;
const to = document.getElementById("auditFilterTo")?.value; const to = document.getElementById("auditFilterTo")?.value;
if (action) params.append("action", action); if (action) params.append("action", action);
if (resource) params.append("resource_type", resource); if (resource) params.append("resource_type", resource);
if (resourceId) params.append("resource_id", resourceId);
if (adminId) params.append("admin_id", adminId); if (adminId) params.append("admin_id", adminId);
if (from) params.append("from_ts", from); if (from) params.append("from_ts", from);
if (to) params.append("to_ts", to); if (to) params.append("to_ts", to);

Datei anzeigen

@@ -3,6 +3,21 @@
let healthData = null; let healthData = null;
let suggestionsCache = []; let suggestionsCache = [];
// Default-Filter zeigt nur Probleme (errors + warnings); OK ist meistens Rauschen.
// "issues" ist ein virtueller Status-Wert, den nur das Frontend versteht (siehe applyHealthFilter).
let healthFilters = { status: "issues", check_type: "", org: "all" };
let healthHistoryCache = [];
// 60-Sekunden-Cache, damit Tab-Wechsel nicht jedes Mal die volle Antwort neu lädt.
// Bei Mutationen (Vorschlag annehmen/ablehnen, run-stream, search-fix) wird mit force=true neu geladen.
// Cache-Key beinhaltet das aktuelle Limit, damit "Mehr laden" nicht aus altem Cache bedient wird.
let healthDataCache = { health: null, suggestions: null, history: null, ts: 0, limit: 0 };
const HEALTH_CACHE_TTL_MS = 60000;
// Default-Pagination: 100 Items reichen meistens (errors+warnings stehen vorne, ok-Status hinten).
// Wird durch loadMoreHealth() / loadAllHealth() hochgesetzt.
let healthLoadLimit = 100;
const CHECK_TYPE_LABELS = { const CHECK_TYPE_LABELS = {
reachability: "Erreichbarkeit", reachability: "Erreichbarkeit",
@@ -24,6 +39,15 @@ const PRIORITY_LABELS = {
low: "Niedrig", low: "Niedrig",
}; };
// Lucide-Icons als Inline-SVG-Konstanten (statt CDN-Abhängigkeit oder Emojis).
// 14x14, currentColor erbt vom Button-Style.
const LUCIDE_ICONS = {
check: '<svg xmlns="http://www.w3.org/2000/svg" width="14" height="14" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2.5" stroke-linecap="round" stroke-linejoin="round" style="vertical-align:-2px;"><polyline points="20 6 9 17 4 12"/></svg>',
x: '<svg xmlns="http://www.w3.org/2000/svg" width="14" height="14" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2.5" stroke-linecap="round" stroke-linejoin="round" style="vertical-align:-2px;"><line x1="18" y1="6" x2="6" y2="18"/><line x1="6" y1="6" x2="18" y2="18"/></svg>',
search:'<svg xmlns="http://www.w3.org/2000/svg" width="14" height="14" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2.5" stroke-linecap="round" stroke-linejoin="round" style="vertical-align:-2px;"><circle cx="11" cy="11" r="8"/><line x1="21" y1="21" x2="16.65" y2="16.65"/></svg>',
refresh:'<svg xmlns="http://www.w3.org/2000/svg" width="14" height="14" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" style="vertical-align:-2px;"><polyline points="23 4 23 10 17 10"/><polyline points="1 20 1 14 7 14"/><path d="M3.51 9a9 9 0 0 1 14.85-3.36L23 10M1 14l4.64 4.36A9 9 0 0 0 20.49 15"/></svg>',
};
// --- Init --- // --- Init ---
function setupHealthTab() { function setupHealthTab() {
const tab = document.querySelector('#sourceSubTabs .nav-tab[data-subtab="source-health"]'); const tab = document.querySelector('#sourceSubTabs .nav-tab[data-subtab="source-health"]');
@@ -32,17 +56,49 @@ function setupHealthTab() {
} }
} }
document.addEventListener("DOMContentLoaded", setupHealthTab); // Sub-Sub-Tabs innerhalb von Quellen-Health: Vorschläge / Health-Status / Verlauf.
function setupHealthSubTabs() {
document.querySelectorAll("#healthSubTabs .nav-tab").forEach((tab) => {
tab.addEventListener("click", () => {
const which = tab.dataset.healthtab;
document.querySelectorAll("#healthSubTabs .nav-tab").forEach(t => t.classList.remove("active"));
tab.classList.add("active");
["suggestions", "checks", "verlauf"].forEach(name => {
const pane = document.getElementById("ht-" + name);
if (pane) pane.style.display = name === which ? "block" : "none";
});
});
});
}
document.addEventListener("DOMContentLoaded", () => {
setupHealthTab();
setupHealthSubTabs();
});
// --- Health-Daten laden --- // --- Health-Daten laden ---
async function loadHealthData() { async function loadHealthData(force = false) {
const now = Date.now();
if (!force
&& healthDataCache.health
&& healthDataCache.limit === healthLoadLimit
&& (now - healthDataCache.ts) < HEALTH_CACHE_TTL_MS) {
healthData = healthDataCache.health;
suggestionsCache = healthDataCache.suggestions;
healthHistoryCache = healthDataCache.history;
renderHealthDashboard();
return;
}
try { try {
const [health, suggestions] = await Promise.all([ const [health, suggestions, history] = await Promise.all([
API.get("/api/sources/health"), API.get("/api/sources/health?limit=" + healthLoadLimit),
API.get("/api/sources/suggestions"), API.get("/api/sources/suggestions"),
API.get("/api/sources/health/history?limit=10").catch(() => []),
]); ]);
healthData = health; healthData = health;
suggestionsCache = suggestions; suggestionsCache = suggestions;
healthHistoryCache = history || [];
healthDataCache = { health, suggestions, history: history || [], ts: Date.now(), limit: healthLoadLimit };
renderHealthDashboard(); renderHealthDashboard();
} catch (err) { } catch (err) {
console.error("Health-Daten laden fehlgeschlagen:", err); console.error("Health-Daten laden fehlgeschlagen:", err);
@@ -51,9 +107,41 @@ async function loadHealthData() {
} }
} }
// Pagination-Steuerung: hochsetzen + neu laden
function loadMoreHealth() {
healthLoadLimit += 200;
loadHealthData(true);
}
function loadAllHealth() {
healthLoadLimit = 100000;
loadHealthData(true);
}
function applyHealthFilter(checks) {
return checks.filter(c => {
// "issues" = Sammelfilter für errors + warnings (Default)
if (healthFilters.status === "issues" && c.status === "ok") return false;
if (healthFilters.status && healthFilters.status !== "issues" && c.status !== healthFilters.status) return false;
if (healthFilters.check_type && c.check_type !== healthFilters.check_type) return false;
if (healthFilters.org === "global" && c.tenant_id !== null) return false;
if (healthFilters.org !== "all" && healthFilters.org !== "global"
&& healthFilters.org && String(c.tenant_id) !== healthFilters.org) return false;
return true;
});
}
function setHealthFilter(field, value) {
healthFilters[field] = value;
renderHealthDashboard();
}
function renderHealthDashboard() { function renderHealthDashboard() {
const container = document.getElementById("healthContent"); // Drei Sub-Panes (statt einer monolithischen Health-Section).
if (!container) return; const paneSuggestions = document.getElementById("ht-suggestions");
const paneChecks = document.getElementById("ht-checks");
const paneVerlauf = document.getElementById("ht-verlauf");
if (!paneSuggestions || !paneChecks || !paneVerlauf) return;
// Vorschläge rendern // Vorschläge rendern
const pendingSuggestions = suggestionsCache.filter((s) => s.status === "pending"); const pendingSuggestions = suggestionsCache.filter((s) => s.status === "pending");
@@ -85,13 +173,13 @@ function renderHealthDashboard() {
<tr> <tr>
<td><span class="badge badge-suggestion-${s.suggestion_type}">${SUGGESTION_TYPE_LABELS[s.suggestion_type] || s.suggestion_type}</span></td> <td><span class="badge badge-suggestion-${s.suggestion_type}">${SUGGESTION_TYPE_LABELS[s.suggestion_type] || s.suggestion_type}</span></td>
<td>${esc(s.title)}</td> <td>${esc(s.title)}</td>
<td class="text-secondary" style="max-width:300px;">${esc(s.description || "")}</td> <td class="text-secondary" style="max-width:300px; white-space:nowrap; overflow:hidden; text-overflow:ellipsis;" title="${esc(s.description || "")}">${esc(s.description || "")}</td>
<td><span class="badge badge-priority-${s.priority}">${PRIORITY_LABELS[s.priority] || s.priority}</span></td> <td><span class="badge badge-priority-${s.priority}">${PRIORITY_LABELS[s.priority] || s.priority}</span></td>
<td class="text-secondary">${formatDateTime(s.created_at)}</td> <td class="text-secondary">${formatDateTime(s.created_at)}</td>
<td style="white-space:nowrap;"> <td style="white-space:nowrap;">
${s.suggestion_type === "deactivate_source" && s.source_id ? `<button class="btn btn-secondary btn-small" data-source-id="${s.source_id}" data-source-name="${esc(s.title.split(':')[0] || s.title)}" onclick="searchFix(this)">Lösung suchen</button> ` : ""} ${s.suggestion_type === "deactivate_source" && s.source_id ? `<button class="btn btn-secondary btn-small" data-source-id="${s.source_id}" data-source-name="${esc(s.title.split(':')[0] || s.title)}" onclick="searchFix(this)" title="Lösung suchen">${LUCIDE_ICONS.search}</button> ` : ""}
<button class="btn btn-success btn-small" onclick="handleSuggestion(${s.id}, true)">Annehmen</button> <button class="btn btn-success btn-small" onclick="handleSuggestion(${s.id}, true)" title="Annehmen">${LUCIDE_ICONS.check}</button>
<button class="btn btn-danger btn-small" onclick="handleSuggestion(${s.id}, false)">Ablehnen</button> <button class="btn btn-danger btn-small" onclick="handleSuggestion(${s.id}, false)" title="Ablehnen">${LUCIDE_ICONS.x}</button>
</td> </td>
</tr>`, </tr>`,
) )
@@ -108,20 +196,23 @@ function renderHealthDashboard() {
</div>`; </div>`;
} }
// Vergangene Vorschläge // Vergangene Vorschläge - eingeklappt by default, weil rein historisch.
let historyHtml = ""; let historyHtml = "";
if (recentSuggestions.length > 0) { if (recentSuggestions.length > 0) {
const shown = recentSuggestions.slice(0, 20);
historyHtml = ` historyHtml = `
<div class="card" style="margin-bottom:16px;"> <details class="card" style="margin-bottom:16px;">
<div class="card-header"><h2>Verlauf</h2></div> <summary style="cursor:pointer; padding:14px 18px; list-style:none;">
<div class="table-wrap"> <span style="color:var(--accent, #C8A851); font-weight:600; font-size:1.02rem;">Verlauf</span>
<span class="text-secondary" style="font-size:13px; margin-left:8px;">(${recentSuggestions.length} erledigte Vorschläge - klick zum Aufklappen)</span>
</summary>
<div class="table-wrap" style="border-top:1px solid var(--border, rgba(255,255,255,0.08));">
<table> <table>
<thead> <thead>
<tr><th>Typ</th><th>Titel</th><th>Status</th><th>Bearbeitet</th></tr> <tr><th>Typ</th><th>Titel</th><th>Status</th><th>Bearbeitet</th></tr>
</thead> </thead>
<tbody> <tbody>
${recentSuggestions ${shown
.slice(0, 20)
.map( .map(
(s) => ` (s) => `
<tr> <tr>
@@ -135,14 +226,56 @@ function renderHealthDashboard() {
</tbody> </tbody>
</table> </table>
</div> </div>
</div>`; </details>`;
} }
// Health-Check Ergebnisse // Health-Check Ergebnisse
let healthHtml = ""; let healthHtml = "";
if (healthData && healthData.checks && healthData.checks.length > 0) { if (healthData && healthData.checks && healthData.checks.length > 0) {
const issues = healthData.checks.filter((c) => c.status !== "ok"); // Filter anwenden
const okCount = healthData.checks.filter((c) => c.status === "ok").length; const allChecks = healthData.checks;
const filtered = applyHealthFilter(allChecks);
// Counters aus Backend-Aggregat (über Gesamt-Bestand, nicht nur Page)
const okCount = healthData.ok != null ? healthData.ok : healthData.checks.filter((c) => c.status === "ok").length;
const totalAll = healthData.total_checks != null ? healthData.total_checks : allChecks.length;
const hasMore = !!healthData.has_more;
// Org-Liste aus Backend-Liste (volle Liste, auch wenn Page kleiner ist)
const orgs = (healthData.all_orgs || []).map(o => ({ id: String(o.id), name: o.name || ("Org " + o.id) }));
const checkTypes = Array.from(new Set(allChecks.map(c => c.check_type)));
// Counter-Aufgliederung aus Backend-Breakdown (pro check_type x status).
// Beispiel: { reachability: {ok: 281, error: 3, warning: 1}, feed_validity: {...}, stale: {...}, duplicate: {...} }
const breakdown = healthData.breakdown || {};
function breakdownLine(statusKey, cssClass) {
const entries = Object.entries(breakdown)
.map(([ct, byStatus]) => [ct, byStatus[statusKey] || 0])
.filter(([_, n]) => n > 0)
.sort((a, b) => b[1] - a[1]);
if (entries.length === 0) return "";
const total = entries.reduce((s, [, n]) => s + n, 0);
const detail = entries.map(([ct, n]) => `${n} ${CHECK_TYPE_LABELS[ct] || ct}`).join(", ");
const label = statusKey === "error" ? "Fehler" : (statusKey === "warning" ? "Warnungen" : "OK");
return `<span class="${cssClass}" title="${esc(detail)}">${total} ${label}</span> <span class="text-secondary" style="font-size:11px;">(${esc(detail)})</span>`;
}
// Trend-Delta zum vorletzten Run (healthHistoryCache[1]). Index 0 ist
// typischerweise der aktuelle Stand, Index 1 der davor archivierte Run.
// Wenn weniger als 2 Runs in der History: kein Delta anzeigen.
const prevRun = (healthHistoryCache && healthHistoryCache.length > 1) ? healthHistoryCache[1] : null;
function deltaBadge(currentValue, prevValue, badIsUp) {
if (prevValue == null) return "";
const d = currentValue - prevValue;
if (d === 0) return ` <span class="text-secondary" style="font-size:11px;" title="unverändert seit letztem Run">(±0)</span>`;
const sign = d > 0 ? "+" : "";
// badIsUp=true: Anstieg = schlecht (rot), Abnahme = gut (grün). Umgekehrt für OK.
const cls = (badIsUp ? (d > 0) : (d < 0)) ? "text-danger" : "text-success";
return ` <span class="${cls}" style="font-size:11px;" title="seit letztem Run">(${sign}${d})</span>`;
}
const dErr = prevRun ? deltaBadge(healthData.errors, prevRun.errors, true) : "";
const dWarn = prevRun ? deltaBadge(healthData.warnings, prevRun.warnings, true) : "";
const dOk = prevRun ? deltaBadge(okCount, prevRun.ok, false) : "";
healthHtml = ` healthHtml = `
<div class="card"> <div class="card">
@@ -151,39 +284,94 @@ function renderHealthDashboard() {
<span class="text-secondary" style="font-size:13px;"> <span class="text-secondary" style="font-size:13px;">
Letzter Check: ${healthData.last_check ? formatDateTime(healthData.last_check) : "Noch nie"} Letzter Check: ${healthData.last_check ? formatDateTime(healthData.last_check) : "Noch nie"}
&nbsp;|&nbsp; &nbsp;|&nbsp;
<span class="text-danger">${healthData.errors} Fehler</span> &nbsp; ${breakdownLine("error", "text-danger") || `<span class="text-danger">0 Fehler</span>`}${dErr} &nbsp;
<span class="text-warning">${healthData.warnings} Warnungen</span> &nbsp; ${breakdownLine("warning", "text-warning") || `<span class="text-warning">0 Warnungen</span>`}${dWarn} &nbsp;
<span class="text-success">${okCount} OK</span> <span class="text-success">${okCount} OK</span>${dOk}
</span> </span>
</div>
<div class="action-bar" style="border-bottom:1px solid var(--border, rgba(255,255,255,0.08));">
<div style="display:flex;gap:10px;flex-wrap:wrap;align-items:center;">
<select class="filter-select" onchange="setHealthFilter('status', this.value)">
<option value="issues" ${healthFilters.status === "issues" ? "selected" : ""}>Nur Probleme (Default)</option>
<option value="" ${healthFilters.status === "" ? "selected" : ""}>Alle Status</option>
<option value="error" ${healthFilters.status === "error" ? "selected" : ""}>Nur Fehler</option>
<option value="warning" ${healthFilters.status === "warning" ? "selected" : ""}>Nur Warnungen</option>
<option value="ok" ${healthFilters.status === "ok" ? "selected" : ""}>Nur OK</option>
</select>
<select class="filter-select" onchange="setHealthFilter('check_type', this.value)">
<option value="" ${healthFilters.check_type === "" ? "selected" : ""}>Alle Typen</option>
${checkTypes.map(ct => `<option value="${esc(ct)}" ${healthFilters.check_type === ct ? "selected" : ""}>${esc(CHECK_TYPE_LABELS[ct] || ct)}</option>`).join("")}
</select>
<select class="filter-select" onchange="setHealthFilter('org', this.value)">
<option value="all" ${healthFilters.org === "all" ? "selected" : ""}>Alle Quellen</option>
<option value="global" ${healthFilters.org === "global" ? "selected" : ""}>Nur Grundquellen</option>
${orgs.map(o => `<option value="${esc(o.id)}" ${healthFilters.org === o.id ? "selected" : ""}>Org: ${esc(o.name)}</option>`).join("")}
</select>
<span class="text-secondary" style="font-size:13px;">
${filtered.length} / ${allChecks.length} angezeigt${totalAll > allChecks.length ? ` (von ${totalAll} insgesamt)` : ''}
</span>
</div>
</div>`; </div>`;
if (issues.length > 0) { if (filtered.length > 0) {
healthHtml += ` healthHtml += `
<div class="table-wrap"> <div class="table-wrap">
<table> <table>
<thead> <thead>
<tr><th>Quelle</th><th>Domain</th><th>Typ</th><th>Status</th><th>Details</th><th>Aktionen</th></tr> <tr><th>Quelle</th><th>Typ</th><th>Org</th><th>Status</th><th>Details</th><th>Aktion</th></tr>
</thead> </thead>
<tbody> <tbody>
${issues ${filtered
.map( .map(
(c) => ` (c) => {
// Domain + Sprache in Tooltip vom Quellnamen, statt eigene Spalten.
const tipParts = [];
if (c.domain) tipParts.push(c.domain);
if (c.language) tipParts.push(c.language);
const nameTip = tipParts.length ? ` title="${esc(tipParts.join(" · "))}"` : "";
return `
<tr> <tr>
<td>${esc(c.name)}</td> <td${nameTip}>${esc(c.name)}</td>
<td class="text-secondary">${esc(c.domain || "")}</td>
<td>${CHECK_TYPE_LABELS[c.check_type] || c.check_type}</td> <td>${CHECK_TYPE_LABELS[c.check_type] || c.check_type}</td>
<td><span class="badge badge-health-${c.status}">${c.status === "error" ? "Fehler" : "Warnung"}</span></td> <td class="text-secondary">${c.tenant_id == null ? '<span style="color:#94a3b8;">global</span>' : esc(c.org_name || ("Org " + c.tenant_id))}</td>
<td class="text-secondary" style="max-width:250px;">${esc(c.message)}</td> <td><span class="badge badge-health-${c.status}">${c.status === "error" ? "Fehler" : (c.status === "warning" ? "Warnung" : "OK")}</span></td>
<td>${c.status === "error" && c.check_type === "reachability" ? `<button class="btn btn-secondary btn-small" data-source-id="${c.source_id}" data-source-name="${esc(c.name)}" onclick="searchFix(this)">Lösung suchen</button>` : ""}</td> <td class="text-secondary" style="max-width:300px;" title="${esc(c.message || "")}">${esc(c.message || "")}</td>
</tr>`, <td>${(
(c.status === "error" && c.check_type === "reachability") ||
(c.status === "warning" && c.check_type === "feed_validity")
) ? `<button class="btn btn-secondary btn-small" data-source-id="${c.source_id}" data-source-name="${esc(c.name)}" onclick="searchFix(this)" title="Lösung suchen">${LUCIDE_ICONS.search}</button>` : ""}</td>
</tr>`;
}
) )
.join("")} .join("")}
</tbody> </tbody>
</table> </table>
</div>`; </div>`;
} else if (hasMore) {
// 0 Treffer in der Page, aber es gibt noch ungeladene Items.
// Hinweis, dass der Filter erst über die volle Liste sicher ist.
healthHtml += `
<div class="card-body text-muted">
Keine Treffer in den geladenen ${allChecks.length} von ${totalAll} Items mit dem aktuellen Filter.
<a href="#" onclick="event.preventDefault(); loadAllHealth()" style="text-decoration:underline;">
Alle ${totalAll} Health-Checks laden
</a> und Filter erneut anwenden.
</div>`;
} else { } else {
healthHtml += '<div class="card-body text-success">Alle Quellen sind gesund.</div>'; healthHtml += '<div class="card-body text-muted">Keine Ergebnisse mit diesen Filtern.</div>';
} }
// Footer mit Mehr-laden-Buttons, falls Backend has_more meldet
if (hasMore) {
const remaining = Math.max(0, totalAll - allChecks.length);
healthHtml += `
<div class="card-body" style="display:flex;justify-content:center;gap:10px;align-items:center;border-top:1px solid var(--border, rgba(255,255,255,0.08));">
<span class="text-secondary" style="font-size:13px;">${allChecks.length} von ${totalAll} geladen</span>
<button class="btn btn-secondary btn-small" onclick="loadMoreHealth()">+200 laden</button>
<button class="btn btn-secondary btn-small" onclick="loadAllHealth()">Alle ${remaining} weiteren laden</button>
</div>`;
}
healthHtml += "</div>"; healthHtml += "</div>";
} else { } else {
healthHtml = ` healthHtml = `
@@ -193,7 +381,57 @@ function renderHealthDashboard() {
</div>`; </div>`;
} }
container.innerHTML = suggestionsHtml + historyHtml + healthHtml; // History-View: letzte Runs. Kompakt: Total raus (= errors+warnings+ok),
// Spalten-Widths explizit, Zahlen zentriert, Run-ID gekürzt + leiser.
let runsHtml = "";
if (healthHistoryCache.length > 0) {
runsHtml = `
<div class="card" style="margin-bottom:16px;">
<div class="card-header"><h2>Verlauf der Health-Check-Runs</h2></div>
<div class="table-wrap">
<table style="table-layout:fixed; width:100%;">
<colgroup>
<col style="width:200px;">
<col style="width:160px;">
<col style="width:110px;">
<col style="width:130px;">
<col style="width:110px;">
</colgroup>
<thead>
<tr>
<th>Zeitpunkt</th>
<th>Run-ID</th>
<th style="text-align:center;">Fehler</th>
<th style="text-align:center;">Warnungen</th>
<th style="text-align:center;">OK</th>
</tr>
</thead>
<tbody>
${healthHistoryCache.map(r => `
<tr>
<td>${formatDateTime(r.archived_at)}</td>
<td class="text-secondary" style="font-size:12px;" title="${esc(r.run_id)}"><code>${esc(String(r.run_id || "").slice(0, 12))}</code></td>
<td class="text-danger" style="text-align:center;">${r.errors || 0}</td>
<td class="text-warning" style="text-align:center;">${r.warnings || 0}</td>
<td class="text-success" style="text-align:center;">${r.ok || 0}</td>
</tr>`).join("")}
</tbody>
</table>
</div>
</div>`;
}
// Statt einer monolithischen Render: drei Sub-Panes, einer pro Sub-Tab.
paneSuggestions.innerHTML = suggestionsHtml;
paneChecks.innerHTML = healthHtml;
paneVerlauf.innerHTML = historyHtml + runsHtml;
// Tab-Label "Vorschläge" mit Counter der offenen Vorschläge anreichern.
const tabBtnSugg = document.querySelector('#healthSubTabs .nav-tab[data-healthtab="suggestions"]');
if (tabBtnSugg) {
const open = pendingSuggestions.length;
tabBtnSugg.textContent = open > 0 ? `Vorschläge (${open} offen)` : "Vorschläge";
}
} }
// --- Vorschlag annehmen/ablehnen --- // --- Vorschlag annehmen/ablehnen ---
@@ -202,18 +440,19 @@ async function handleSuggestion(id, accept) {
const suggestion = suggestionsCache.find((s) => s.id === id); const suggestion = suggestionsCache.find((s) => s.id === id);
if (!suggestion) return; if (!suggestion) return;
if (!confirm(`Vorschlag "${suggestion.title}" ${action}?`)) return; const ok = await showConfirm("Vorschlag " + (action === "annehmen" ? "annehmen" : "ablehnen"), `Soll "${suggestion.title}" ${action}?`);
if (!ok) return;
try { try {
const result = await API.put("/api/sources/suggestions/" + id, { accept }); const result = await API.put("/api/sources/suggestions/" + id, { accept });
if (result.action) { if (result.action) {
alert(`Ergebnis: ${result.action}`); showToast("Ergebnis: " + result.action, "success");
} }
loadHealthData(); loadHealthData(true);
// Grundquellen-Liste auch aktualisieren // Grundquellen-Liste auch aktualisieren
if (typeof loadGlobalSources === "function") loadGlobalSources(); if (typeof loadGlobalSources === "function") loadGlobalSources();
} catch (err) { } catch (err) {
alert("Fehler: " + err.message); showToast("Fehler: " + err.message, "error");
} }
} }
@@ -291,7 +530,7 @@ async function runHealthCheck() {
} }
} }
loadHealthData(); loadHealthData(true);
} catch (err) { } catch (err) {
progressEl.innerHTML = '<span class="text-danger">Fehler: ' + esc(err.message) + '</span>'; progressEl.innerHTML = '<span class="text-danger">Fehler: ' + esc(err.message) + '</span>';
} finally { } finally {
@@ -322,7 +561,8 @@ function formatDateTime(dateStr) {
async function searchFix(btn) { async function searchFix(btn) {
const sourceId = btn.dataset.sourceId; const sourceId = btn.dataset.sourceId;
const sourceName = btn.dataset.sourceName; const sourceName = btn.dataset.sourceName;
if (!confirm(`Sonnet mit WebSearch nach einer Lösung für "${sourceName}" suchen lassen?\n\nDas kann einige Minuten dauern.`)) return; const ok = await showConfirm("Lösung suchen", `Sonnet mit WebSearch nach einer Lösung für "${sourceName}" suchen lassen? Das kann einige Minuten dauern.`);
if (!ok) return;
btn.disabled = true; btn.disabled = true;
btn.textContent = "Sucht..."; btn.textContent = "Sucht...";
@@ -337,10 +577,10 @@ async function searchFix(btn) {
if (result.cost_usd) { if (result.cost_usd) {
msg += `\n\nKosten: $${result.cost_usd.toFixed(2)}`; msg += `\n\nKosten: $${result.cost_usd.toFixed(2)}`;
} }
alert(msg); showToast(msg, "info");
loadHealthData(); loadHealthData(true);
} catch (err) { } catch (err) {
alert("Fehler: " + err.message); showToast("Fehler: " + err.message, "error");
} finally { } finally {
btn.disabled = false; btn.disabled = false;
btn.textContent = "Lösung suchen"; btn.textContent = "Lösung suchen";

Datei anzeigen

@@ -3,40 +3,17 @@
let globalSourcesCache = []; let globalSourcesCache = [];
let tenantSourcesCache = []; let tenantSourcesCache = [];
// Phase 3c: Tenant-Tab State
let tenantFilters = { search: "", type: "", category: "", org: "", language: "" };
let tenantSort = { field: "org_name", asc: true };
let tenantSelected = new Set();
let editingSourceId = null; let editingSourceId = null;
let globalSortField = "category"; let globalSortField = "category";
let globalSortAsc = true; let globalSortAsc = true;
const CATEGORY_LABELS = { // CATEGORY_LABELS jetzt global (aus app.js loadMeta)
nachrichtenagentur: "Nachrichtenagentur", // TYPE_LABELS jetzt global (aus app.js loadMeta)
"oeffentlich-rechtlich": "Öffentlich-Rechtlich",
qualitaetszeitung: "Qualitätszeitung",
behoerde: "Behörde",
fachmedien: "Fachmedien",
"think-tank": "Think-Tank",
international: "International",
regional: "Regional",
boulevard: "Boulevard",
sonstige: "Sonstige",
"cybercrime": "Cybercrime / Hacktivismus",
"cybercrime-leaks": "Cybercrime / Leaks",
"ukraine-russland-krieg": "Ukraine-Russland-Krieg",
"irankonflikt": "Irankonflikt",
"osint-international": "OSINT International",
"extremismus-deutschland": "Extremismus Deutschland",
"russische-staatspropaganda": "Russische Staatspropaganda",
"russische-opposition": "Russische Opposition / Exilmedien",
"syrien-nahost": "Syrien / Nahost",
};
const TYPE_LABELS = {
rss_feed: "RSS-Feed",
web_source: "Webquelle",
telegram_channel: "Telegram-Kanal",
podcast_feed: "Podcast-Feed",
excluded: "Ausgeschlossen",
};
// --- Init --- // --- Init ---
document.addEventListener("DOMContentLoaded", () => { document.addEventListener("DOMContentLoaded", () => {
setupSourceSubTabs(); setupSourceSubTabs();
@@ -60,6 +37,8 @@ function setupSourceSubTabs() {
if (subtab === "global-sources") loadGlobalSources(); if (subtab === "global-sources") loadGlobalSources();
else if (subtab === "tenant-sources") loadTenantSources(); else if (subtab === "tenant-sources") loadTenantSources();
else if (subtab === "source-health") loadHealthData(); else if (subtab === "source-health") loadHealthData();
else if (subtab === "classification-review") loadClassificationQueue();
else if (subtab === "x-scraper") loadXScraperAccounts();
}); });
}); });
} }
@@ -67,16 +46,122 @@ function setupSourceSubTabs() {
// --- Grundquellen --- // --- Grundquellen ---
async function loadGlobalSources() { async function loadGlobalSources() {
try { try {
globalSourcesCache = await API.get("/api/sources/global"); // Kategorien/Typen-Dropdowns aus META befüllen (idempotent)
if (window.META && window.META.categories && window.META.categories.length) {
populateSelect(document.getElementById("globalFilterCategory"), window.META.categories, "Alle Kategorien");
populateSelect(document.getElementById("globalFilterType"),
(window.META.types || []).filter(t => t.key !== "excluded"), "Alle Typen");
}
const [list, stats, languages] = await Promise.all([
API.get("/api/sources/global"),
API.get("/api/sources/global/stats"),
API.get("/api/sources/global/languages").catch(() => []),
]);
globalSourcesCache = list;
populateSelect(
document.getElementById("globalFilterLanguage"),
(languages || []).map(l => ({ key: l, label: l })),
"Alle Sprachen",
);
populateSelect(
document.getElementById("tenantFilterLanguage"),
(languages || []).map(l => ({ key: l, label: l })),
"Alle Sprachen",
);
// datalist fuer Edit-Modal
const dl = document.getElementById("languageSuggestions");
if (dl) {
dl.innerHTML = "";
(languages || []).forEach(l => {
const o = document.createElement("option");
o.value = l;
dl.appendChild(o);
});
}
renderGlobalStats(stats);
renderGlobalSources(globalSourcesCache); renderGlobalSources(globalSourcesCache);
} catch (err) { } catch (err) {
console.error("Grundquellen laden fehlgeschlagen:", err); console.error("Grundquellen laden fehlgeschlagen:", err);
} }
} }
async function showSourceAudit(sourceId, sourceName) {
document.getElementById("auditTitle").textContent = `Audit-Spur: ${sourceName}`;
document.getElementById("auditContent").innerHTML = '<div class="text-muted">Lade...</div>';
openModal("modalAudit");
try {
const res = await API.get(`/api/audit-log?resource_type=source&resource_id=${sourceId}&limit=50`);
renderAuditEntries(res.items || []);
} catch (err) {
document.getElementById("auditContent").innerHTML =
`<div class="text-danger">Audit konnte nicht geladen werden: ${esc(err.message || String(err))}</div>`;
}
}
function renderAuditEntries(items) {
const c = document.getElementById("auditContent");
if (!items.length) {
c.innerHTML = '<div class="text-muted">Keine Audit-Eintr&auml;ge f&uuml;r diese Quelle.</div>';
return;
}
c.innerHTML = items.map(e => {
const meta = `${formatDateTime(e.ts)} &middot; ${esc(e.admin_username || "-")} &middot; ${esc(e.ip || "-")}`;
const hasDiff = (e.before && Object.keys(e.before).length) || (e.after && Object.keys(e.after).length);
const diffPayload = JSON.stringify({ before: e.before, after: e.after }, null, 2);
return `
<div class="audit-entry">
<div class="audit-entry-head">
<span class="audit-entry-action audit-action-${esc(e.action)}">${esc(e.action)}</span>
<span class="audit-entry-meta">${meta}</span>
</div>
${hasDiff ? `<details class="audit-entry-detail">
<summary>Diff anzeigen</summary>
<pre>${esc(diffPayload)}</pre>
</details>` : ""}
</div>
`;
}).join("");
}
function formatDateTime(iso) {
if (!iso) return "-";
try {
const d = new Date(iso);
return d.toLocaleString("de-DE", {
day: "2-digit", month: "2-digit", year: "numeric",
hour: "2-digit", minute: "2-digit",
});
} catch { return iso; }
}
function renderGlobalStats(stats) {
const bar = document.getElementById("globalStatsBar");
if (!bar) return;
if (!stats || !stats.by_type) { bar.innerHTML = ""; return; }
const types = window.META && window.META.types ? window.META.types : [];
const parts = [];
parts.push(`<span class="sources-stat-item"><span class="sources-stat-value">${stats.total || 0}</span> Quellen gesamt</span>`);
for (const t of types) {
if (t.key === "excluded") continue;
const v = stats.by_type[t.key] || { count: 0, articles: 0 };
parts.push(`<span class="sources-stat-item"><span class="sources-stat-value">${v.count}</span> ${esc(t.label)}</span>`);
}
parts.push(`<span class="sources-stat-item"><span class="sources-stat-value">${stats.total_articles || 0}</span> Artikel</span>`);
const h = stats.health || { errors: 0, warnings: 0, ok: 0 };
if (h.errors) parts.push(`<span class="sources-stat-item health-error"><span class="sources-stat-value">${h.errors}</span> Fehler</span>`);
if (h.warnings) parts.push(`<span class="sources-stat-item health-warning"><span class="sources-stat-value">${h.warnings}</span> Warnungen</span>`);
if (h.ok) parts.push(`<span class="sources-stat-item health-ok"><span class="sources-stat-value">${h.ok}</span> OK</span>`);
bar.innerHTML = parts.join("");
}
function renderGlobalSources(sources) { function renderGlobalSources(sources) {
const tbody = document.getElementById("globalSourceTable"); const tbody = document.getElementById("globalSourceTable");
const cols = 7; const cols = 13;
if (sources.length === 0) { if (sources.length === 0) {
tbody.innerHTML = `<tr><td colspan="${cols}" class="text-muted">Keine Grundquellen</td></tr>`; tbody.innerHTML = `<tr><td colspan="${cols}" class="text-muted">Keine Grundquellen</td></tr>`;
return; return;
@@ -104,15 +189,26 @@ function renderGlobalSources(sources) {
const notesRow = hasNotes const notesRow = hasNotes
? `<tr class="src-notes-row" id="notes-${s.id}" style="display:none;"><td colspan="${cols}" class="src-notes-cell">${esc(s.notes)}</td></tr>` ? `<tr class="src-notes-row" id="notes-${s.id}" style="display:none;"><td colspan="${cols}" class="src-notes-cell">${esc(s.notes)}</td></tr>`
: ''; : '';
const lastSeen = s.last_seen_at ? formatDate(s.last_seen_at) : "-";
const hs = s.health_status || "unknown";
const hsLabel = { error: "Fehler", warning: "Warnung", ok: "OK", unknown: "—" }[hs];
const hsClass = "health-badge-" + (hs === "unknown" ? "unknown" : hs);
html += `<tr> html += `<tr>
<td>${infoBtn} ${esc(s.name)}</td> <td>${infoBtn} ${esc(s.name)}</td>
<td class="text-secondary" style="max-width:200px;overflow:hidden;text-overflow:ellipsis;white-space:nowrap;" title="${esc(s.url || '')}">${esc(s.url || "-")}</td> <td class="text-secondary" style="max-width:200px;overflow:hidden;text-overflow:ellipsis;white-space:nowrap;" title="${esc(s.url || '')}">${esc(s.url || "-")}</td>
<td>${esc(s.domain || "-")}</td> <td>${esc(s.domain || "-")}</td>
<td>${TYPE_LABELS[s.source_type] || s.source_type}</td> <td>${typeLabel(s.source_type)}</td>
<td class="text-right">${s.article_count || 0}</td> <td class="text-right">${s.article_count || 0}</td>
<td class="${(s.articles_30d || 0) === 0 ? "activity-cell activity-zero" : "activity-cell"}" title="7 Tage / 30 Tage"><strong>${s.articles_7d || 0}</strong> / ${s.articles_30d || 0}</td>
<td class="text-right"><span class="${(s.tenant_excluded_count || 0) === 0 ? "exclude-badge exclude-zero" : "exclude-badge"}">${s.tenant_excluded_count || 0}</span></td>
<td class="text-secondary">${esc(s.language || "-")}</td>
<td class="text-secondary" style="max-width:200px;overflow:hidden;text-overflow:ellipsis;white-space:nowrap;" title="${esc(s.bias || "")}">${esc(s.bias || "-")}</td>
<td class="text-secondary">${lastSeen}</td>
<td><span class="health-badge ${hsClass}">${hsLabel}</span></td>
<td><span class="badge badge-${s.status === "active" ? "active" : "inactive"}">${s.status === "active" ? "Aktiv" : "Inaktiv"}</span></td> <td><span class="badge badge-${s.status === "active" ? "active" : "inactive"}">${s.status === "active" ? "Aktiv" : "Inaktiv"}</span></td>
<td> <td>
<button class="btn btn-secondary btn-small" onclick="editGlobalSource(${s.id})">Bearbeiten</button> <button class="btn btn-secondary btn-small" onclick="editGlobalSource(${s.id})">Bearbeiten</button>
<button class="btn btn-secondary btn-small" onclick="showSourceAudit(${s.id}, '${esc(s.name)}')">Audit</button>
<button class="btn btn-danger btn-small" onclick="confirmDeleteGlobalSource(${s.id}, '${esc(s.name)}')">Löschen</button> <button class="btn btn-danger btn-small" onclick="confirmDeleteGlobalSource(${s.id}, '${esc(s.name)}')">Löschen</button>
</td> </td>
</tr>${notesRow}`; </tr>${notesRow}`;
@@ -142,11 +238,13 @@ function filterGlobalSources() {
const catFilter = document.getElementById("globalFilterCategory")?.value || ""; const catFilter = document.getElementById("globalFilterCategory")?.value || "";
const statusFilter = document.getElementById("globalFilterStatus")?.value || ""; const statusFilter = document.getElementById("globalFilterStatus")?.value || "";
const langFilter = document.getElementById("globalFilterLanguage")?.value || "";
let filtered = globalSourcesCache.filter((s) => { let filtered = globalSourcesCache.filter((s) => {
if (q && !(s.name.toLowerCase().includes(q) || (s.domain || "").toLowerCase().includes(q) || (s.url || "").toLowerCase().includes(q))) return false; if (q && !(s.name.toLowerCase().includes(q) || (s.domain || "").toLowerCase().includes(q) || (s.url || "").toLowerCase().includes(q) || (s.bias || "").toLowerCase().includes(q))) return false;
if (typeFilter && s.source_type !== typeFilter) return false; if (typeFilter && s.source_type !== typeFilter) return false;
if (catFilter && s.category !== catFilter) return false; if (catFilter && s.category !== catFilter) return false;
if (statusFilter && s.status !== statusFilter) return false; if (statusFilter && s.status !== statusFilter) return false;
if (langFilter && s.language !== langFilter) return false;
return true; return true;
}); });
@@ -154,8 +252,10 @@ function filterGlobalSources() {
filtered.sort((a, b) => { filtered.sort((a, b) => {
let va = a[globalSortField] ?? ""; let va = a[globalSortField] ?? "";
let vb = b[globalSortField] ?? ""; let vb = b[globalSortField] ?? "";
if (globalSortField === "article_count") { const NUMERIC_FIELDS = ["article_count", "articles_7d", "articles_30d", "tenant_excluded_count"];
va = va || 0; vb = vb || 0; if (NUMERIC_FIELDS.includes(globalSortField)) {
va = parseInt(va) || 0;
vb = parseInt(vb) || 0;
return globalSortAsc ? va - vb : vb - va; return globalSortAsc ? va - vb : vb - va;
} }
va = String(va).toLowerCase(); va = String(va).toLowerCase();
@@ -182,6 +282,7 @@ function openNewGlobalSource() {
editingSourceId = null; editingSourceId = null;
document.getElementById("sourceModalTitle").textContent = "Neue Grundquelle"; document.getElementById("sourceModalTitle").textContent = "Neue Grundquelle";
document.getElementById("sourceForm").reset(); document.getElementById("sourceForm").reset();
setAlignmentChips([]);
openModal("modalSource"); openModal("modalSource");
} }
@@ -197,11 +298,22 @@ function editGlobalSource(id) {
document.getElementById("sourceCategory").value = s.category; document.getElementById("sourceCategory").value = s.category;
document.getElementById("sourceStatus").value = s.status; document.getElementById("sourceStatus").value = s.status;
document.getElementById("sourceNotes").value = s.notes || ""; document.getElementById("sourceNotes").value = s.notes || "";
document.getElementById("sourceLanguage").value = s.language || "";
document.getElementById("sourceBias").value = s.bias || "";
document.getElementById("sourceFetchStrategy").value = s.fetch_strategy || "default";
document.getElementById("sourcePolitical").value = s.political_orientation || "";
document.getElementById("sourceMediaType").value = s.media_type || "";
document.getElementById("sourceReliability").value = s.reliability || "";
document.getElementById("sourceCountryCode").value = s.country_code || "";
document.getElementById("sourceStateAffiliated").checked = !!s.state_affiliated;
setAlignmentChips(s.alignments || []);
openModal("modalSource"); openModal("modalSource");
} }
function setupSourceForms() { function setupSourceForms() {
document.getElementById("newGlobalSourceBtn").addEventListener("click", openNewGlobalSource); document.getElementById("newGlobalSourceBtn").addEventListener("click", openNewGlobalSource);
document.getElementById("newPdfSourceBtn")?.addEventListener("click", openPdfUploadModal);
setupPdfUploadForm();
document.getElementById("discoverSourceBtn").addEventListener("click", () => { document.getElementById("discoverSourceBtn").addEventListener("click", () => {
document.getElementById("discoverUrl").value = ""; document.getElementById("discoverUrl").value = "";
document.getElementById("discoverStatus").style.display = "none"; document.getElementById("discoverStatus").style.display = "none";
@@ -222,8 +334,24 @@ function setupSourceForms() {
category: document.getElementById("sourceCategory").value, category: document.getElementById("sourceCategory").value,
status: document.getElementById("sourceStatus").value, status: document.getElementById("sourceStatus").value,
notes: document.getElementById("sourceNotes").value || null, notes: document.getElementById("sourceNotes").value || null,
language: document.getElementById("sourceLanguage").value || null,
bias: document.getElementById("sourceBias").value || null,
fetch_strategy: document.getElementById("sourceFetchStrategy").value || "default",
}; };
const pol = document.getElementById("sourcePolitical")?.value;
if (pol) body.political_orientation = pol;
const mt = document.getElementById("sourceMediaType")?.value;
if (mt) body.media_type = mt;
const rel = document.getElementById("sourceReliability")?.value;
if (rel) body.reliability = rel;
const cc = (document.getElementById("sourceCountryCode")?.value || "").trim().toUpperCase();
if (cc) body.country_code = cc;
if (editingSourceId) {
body.state_affiliated = !!document.getElementById("sourceStateAffiliated")?.checked;
body.alignments = getAlignmentChips();
}
try { try {
if (editingSourceId) { if (editingSourceId) {
await API.put("/api/sources/global/" + editingSourceId, body); await API.put("/api/sources/global/" + editingSourceId, body);
@@ -258,7 +386,7 @@ function confirmDeleteGlobalSource(id, name) {
await API.del("/api/sources/global/" + id); await API.del("/api/sources/global/" + id);
loadGlobalSources(); loadGlobalSources();
} catch (err) { } catch (err) {
alert(err.message); showToast(err.message, "error");
} }
} }
); );
@@ -268,47 +396,164 @@ function confirmDeleteGlobalSource(id, name) {
async function loadTenantSources() { async function loadTenantSources() {
try { try {
tenantSourcesCache = await API.get("/api/sources/tenant"); tenantSourcesCache = await API.get("/api/sources/tenant");
renderTenantSources(tenantSourcesCache); tenantSelected.clear();
populateTenantFilters();
applyTenantFilterAndSort();
} catch (err) { } catch (err) {
console.error("Kundenquellen laden fehlgeschlagen:", err); console.error("Kundenquellen laden fehlgeschlagen:", err);
showToast("Kundenquellen konnten nicht geladen werden", "error");
}
}
function populateTenantFilters() {
// Typ + Kategorie aus META, Org aus Cache
if (window.META && window.META.types) {
populateSelect(document.getElementById("tenantFilterType"),
window.META.types.filter(t => t.key !== "excluded"), "Alle Typen");
}
if (window.META && window.META.categories) {
populateSelect(document.getElementById("tenantFilterCategory"),
window.META.categories, "Alle Kategorien");
}
// Org-Liste aus den Daten extrahieren (eindeutig)
const orgs = Array.from(new Set(tenantSourcesCache.map(s => s.org_name).filter(Boolean))).sort();
populateSelect(
document.getElementById("tenantFilterOrg"),
orgs.map(o => ({ key: o, label: o })),
"Alle Organisationen",
);
}
function applyTenantFilterAndSort() {
const q = (tenantFilters.search || "").toLowerCase();
let filtered = tenantSourcesCache.filter(s => {
if (q && !(
(s.name || "").toLowerCase().includes(q)
|| (s.domain || "").toLowerCase().includes(q)
|| (s.org_name || "").toLowerCase().includes(q)
|| (s.url || "").toLowerCase().includes(q)
)) return false;
if (tenantFilters.type && s.source_type !== tenantFilters.type) return false;
if (tenantFilters.category && s.category !== tenantFilters.category) return false;
if (tenantFilters.org && s.org_name !== tenantFilters.org) return false;
if (tenantFilters.language && s.language !== tenantFilters.language) return false;
return true;
});
filtered.sort((a, b) => {
const va = String(a[tenantSort.field] ?? "").toLowerCase();
const vb = String(b[tenantSort.field] ?? "").toLowerCase();
const cmp = va.localeCompare(vb, "de");
return tenantSort.asc ? cmp : -cmp;
});
renderTenantSources(filtered);
// Sort-Icons aktualisieren
document.querySelectorAll("#sub-tenant-sources th.sortable .sort-icon").forEach(el => el.textContent = "");
const active = document.querySelector(`#sub-tenant-sources th.sortable[data-sort="${tenantSort.field}"] .sort-icon`);
if (active) active.textContent = tenantSort.asc ? " \u25B2" : " \u25BC";
}
function filterTenantSources() {
tenantFilters.search = (document.getElementById("tenantSourceSearch")?.value || "").trim();
tenantFilters.type = document.getElementById("tenantFilterType")?.value || "";
tenantFilters.category = document.getElementById("tenantFilterCategory")?.value || "";
tenantFilters.org = document.getElementById("tenantFilterOrg")?.value || "";
tenantFilters.language = document.getElementById("tenantFilterLanguage")?.value || "";
applyTenantFilterAndSort();
}
function sortTenantSources(field) {
if (tenantSort.field === field) tenantSort.asc = !tenantSort.asc;
else { tenantSort.field = field; tenantSort.asc = true; }
applyTenantFilterAndSort();
}
function toggleTenantSelectAll(checked) {
document.querySelectorAll("#tenantSourceTable input.tenant-select").forEach(cb => {
cb.checked = checked;
const id = parseInt(cb.dataset.id);
if (checked) tenantSelected.add(id); else tenantSelected.delete(id);
});
updateBulkButton();
}
function toggleTenantSelect(id, checked) {
id = parseInt(id);
if (checked) tenantSelected.add(id); else tenantSelected.delete(id);
updateBulkButton();
// Header-Checkbox anpassen
const visible = document.querySelectorAll("#tenantSourceTable input.tenant-select").length;
const checkedVisible = document.querySelectorAll("#tenantSourceTable input.tenant-select:checked").length;
const all = document.getElementById("tenantSelectAll");
if (all) all.checked = visible > 0 && visible === checkedVisible;
}
function updateBulkButton() {
const btn = document.getElementById("tenantBulkPromoteBtn");
if (!btn) return;
const n = tenantSelected.size;
btn.disabled = n === 0;
btn.textContent = `Ausgewählte übernehmen (${n})`;
}
async function bulkPromoteSelected() {
if (tenantSelected.size === 0) return;
const ids = Array.from(tenantSelected);
const ok = await showConfirm(
"Ausgewählte als Grundquelle übernehmen",
`Sollen ${ids.length} Kundenquelle(n) als Grundquelle übernommen werden? Sie werden dann für alle Monitore verfügbar.`,
);
if (!ok) return;
try {
const result = await API.post("/api/sources/tenant/bulk-promote", { source_ids: ids });
let msg = `${result.promoted} übernommen`;
if (result.skipped && result.skipped.length) msg += `, ${result.skipped.length} übersprungen`;
if (result.failed && result.failed.length) msg += `, ${result.failed.length} Fehler`;
showToast(msg, result.failed && result.failed.length ? "warning" : "success");
tenantSelected.clear();
await loadTenantSources();
} catch (err) {
showToast("Bulk-Promote fehlgeschlagen: " + err.message, "error");
} }
} }
function renderTenantSources(sources) { function renderTenantSources(sources) {
const tbody = document.getElementById("tenantSourceTable"); const tbody = document.getElementById("tenantSourceTable");
const cols = 10;
if (sources.length === 0) { if (sources.length === 0) {
tbody.innerHTML = '<tr><td colspan="7" class="text-muted">Keine Kundenquellen</td></tr>'; tbody.innerHTML = `<tr><td colspan="${cols}" class="text-muted">Keine Kundenquellen</td></tr>`;
document.getElementById("tenantSourceCount").textContent = `0 / ${tenantSourcesCache.length} Kundenquellen`;
updateBulkButton();
return; return;
} }
tbody.innerHTML = sources.map((s) => ` tbody.innerHTML = sources.map((s) => {
const checked = tenantSelected.has(s.id) ? "checked" : "";
return `
<tr> <tr>
<td><input type="checkbox" class="tenant-select" data-id="${s.id}" ${checked} onchange="toggleTenantSelect(${s.id}, this.checked)"></td>
<td>${esc(s.name)}</td> <td>${esc(s.name)}</td>
<td class="text-secondary" style="max-width:180px;overflow:hidden;text-overflow:ellipsis;white-space:nowrap;" title="${esc(s.url || '')}">${esc(s.domain || "-")}</td> <td class="text-secondary" style="max-width:180px;overflow:hidden;text-overflow:ellipsis;white-space:nowrap;" title="${esc(s.url || '')}">${esc(s.domain || "-")}</td>
<td>${TYPE_LABELS[s.source_type] || s.source_type}</td> <td>${typeLabel(s.source_type)}</td>
<td>${CATEGORY_LABELS[s.category] || s.category}</td> <td>${categoryLabel(s.category)}</td>
<td>${esc(s.org_name || "-")}</td> <td>${esc(s.org_name || "-")}</td>
<td class="text-secondary">${esc(s.language || "-")}</td>
<td class="text-secondary" style="max-width:180px;overflow:hidden;text-overflow:ellipsis;white-space:nowrap;" title="${esc(s.bias || "")}">${esc(s.bias || "-")}</td>
<td>${esc(s.added_by || "-")}</td> <td>${esc(s.added_by || "-")}</td>
<td> <td>
<button class="btn btn-primary btn-small" onclick="promoteSource(${s.id}, '${esc(s.name)}')">Übernehmen</button> <button class="btn btn-primary btn-small" onclick="promoteSource(${s.id}, '${esc(s.name)}')">Übernehmen</button>
</td> </td>
</tr> </tr>`;
`).join(""); }).join("");
document.getElementById("tenantSourceCount").textContent = `${sources.length} Kundenquellen`; document.getElementById("tenantSourceCount").textContent = `${sources.length} / ${tenantSourcesCache.length} Kundenquellen`;
updateBulkButton();
} }
// Suche Kundenquellen // Suche Kundenquellen
document.addEventListener("DOMContentLoaded", () => { document.addEventListener("DOMContentLoaded", () => {
const el = document.getElementById("tenantSourceSearch"); const el = document.getElementById("tenantSourceSearch");
if (el) { if (el) el.addEventListener("input", () => filterTenantSources());
el.addEventListener("input", () => {
const q = el.value.toLowerCase();
const filtered = tenantSourcesCache.filter((s) =>
s.name.toLowerCase().includes(q) || (s.domain || "").toLowerCase().includes(q) || (s.org_name || "").toLowerCase().includes(q)
);
renderTenantSources(filtered);
});
}
}); });
function promoteSource(id, name) { function promoteSource(id, name) {
@@ -320,7 +565,7 @@ function promoteSource(id, name) {
await API.post("/api/sources/tenant/" + id + "/promote"); await API.post("/api/sources/tenant/" + id + "/promote");
loadTenantSources(); loadTenantSources();
} catch (err) { } catch (err) {
alert(err.message); showToast(err.message, "error");
} }
} }
); );
@@ -399,7 +644,7 @@ async function addDiscoveredFeeds() {
}); });
if (selected.length === 0) { if (selected.length === 0) {
alert("Keine Feeds ausgewählt"); showToast("Keine Feeds ausgewählt", "warning");
return; return;
} }
@@ -411,15 +656,222 @@ async function addDiscoveredFeeds() {
const result = await API.post("/api/sources/discover/add", selected); const result = await API.post("/api/sources/discover/add", selected);
closeModal("modalDiscover"); closeModal("modalDiscover");
loadGlobalSources(); loadGlobalSources();
alert(result.added + " Grundquelle(n) hinzugefügt" + (result.skipped ? ", " + result.skipped + " übersprungen" : "")); showToast(result.added + " Grundquelle(n) hinzugefügt" + (result.skipped ? ", " + result.skipped + " übersprungen" : ""), "success");
} catch (err) { } catch (err) {
alert("Fehler: " + err.message); showToast("Fehler: " + err.message, "error");
} finally { } finally {
btn.disabled = false; btn.disabled = false;
btn.textContent = "Ausgewählte hinzufügen"; btn.textContent = "Ausgewählte hinzufügen";
} }
} }
// === Klassifikations-Review ===
const POLITICAL_LABELS = {
links_extrem: { short: "L+", full: "Links (extrem)" },
links: { short: "L", full: "Links" },
mitte_links: { short: "ML", full: "Mitte-Links" },
liberal: { short: "LIB", full: "Liberal" },
mitte: { short: "M", full: "Mitte" },
konservativ: { short: "KON", full: "Konservativ" },
mitte_rechts: { short: "MR", full: "Mitte-Rechts" },
rechts: { short: "R", full: "Rechts" },
rechts_extrem: { short: "R+", full: "Rechts (extrem)" },
na: { short: "?", full: "Nicht eingeordnet" },
};
const RELIABILITY_LABELS = {
sehr_hoch: "Sehr hoch", hoch: "Hoch", gemischt: "Gemischt",
niedrig: "Niedrig", sehr_niedrig: "Sehr niedrig", na: "Nicht eingeordnet",
};
const MEDIA_TYPE_LABELS = {
tageszeitung: "Tageszeitung", wochenzeitung: "Wochenzeitung", magazin: "Magazin",
tv_sender: "TV-Sender", radio: "Radio", oeffentlich_rechtlich: "Öffentlich-Rechtlich",
nachrichtenagentur: "Nachrichtenagentur", online_only: "Online-only", blog: "Blog",
telegram_kanal: "Telegram-Kanal", telegram_bot: "Telegram-Bot", podcast: "Podcast",
social_media: "Social Media", imageboard: "Imageboard", think_tank: "Think Tank",
ngo: "NGO", behoerde: "Behörde", staatsmedium: "Staatsmedium",
fachmedium: "Fachmedium", sonstige: "Sonstige",
};
const ALIGNMENT_LABELS = {
prorussisch: "prorussisch", proiranisch: "proiranisch", prowestlich: "prowestlich",
proukrainisch: "proukrainisch", prochinesisch: "prochinesisch", projapanisch: "projapanisch",
proisraelisch: "proisraelisch", propalaestinensisch: "propalästinensisch",
protuerkisch: "protürkisch", panarabisch: "panarabisch", neutral: "neutral", sonstige: "sonstige",
};
function setAlignmentChips(active) {
const chips = document.querySelectorAll("#sourceAlignmentChips .alignment-chip");
const set = new Set((active || []).map((a) => (a || "").toLowerCase()));
chips.forEach((chip) => {
if (set.has(chip.dataset.alignment)) chip.classList.add("active");
else chip.classList.remove("active");
});
}
function getAlignmentChips() {
return Array.from(document.querySelectorAll("#sourceAlignmentChips .alignment-chip.active"))
.map((chip) => chip.dataset.alignment);
}
function handleAlignmentChipClick(e) {
const chip = e.target.closest(".alignment-chip");
if (!chip) return;
e.preventDefault();
chip.classList.toggle("active");
}
async function refreshClassificationStats() {
try {
const stats = await API.get("/api/sources/classification/stats");
const badge = document.getElementById("classificationPendingBadge");
if (badge) badge.textContent = String(stats.pending_review || 0);
} catch (_) { /* still ok */ }
}
async function loadClassificationQueue() {
const list = document.getElementById("classificationReviewList");
if (!list) return;
const minConf = parseFloat(document.getElementById("reviewMinConfidence")?.value || "0");
list.innerHTML = '<div class="text-muted" style="padding:24px;text-align:center;">Lade…</div>';
try {
const items = await API.get(`/api/sources/classification/queue?limit=200&min_confidence=${minConf}`);
const countEl = document.getElementById("reviewPendingCount");
if (countEl) countEl.textContent = String(items.length);
refreshClassificationStats();
if (items.length === 0) {
list.innerHTML = '<div class="text-muted" style="padding:24px;text-align:center;">Keine ausstehenden Vorschläge.</div>';
return;
}
list.innerHTML = items.map((it) => renderClassificationQueueItem(it)).join("");
} catch (err) {
list.innerHTML = `<div class="text-danger" style="padding:24px;text-align:center;">Fehler: ${esc(err.message)}</div>`;
}
}
function renderClassificationQueueItem(item) {
const cur = item.current || {};
const prop = item.proposed || {};
const conf = prop.confidence || 0;
const confPct = Math.round(conf * 100);
const confClass = conf >= 0.85 ? "high" : conf >= 0.7 ? "medium" : "low";
const polFmt = (v) => (v && v !== "na" ? POLITICAL_LABELS[v]?.full || v : "–");
const mtFmt = (v) => (v ? MEDIA_TYPE_LABELS[v] || v : "–");
const relFmt = (v) => (v && v !== "na" ? RELIABILITY_LABELS[v] || v : "–");
const stateFmt = (v) => (v ? "ja" : "nein");
const ccFmt = (v) => v || "–";
const alignFmt = (v) =>
Array.isArray(v) && v.length > 0 ? v.map((a) => ALIGNMENT_LABELS[a] || a).join(", ") : "–";
const row = (label, c, p, fmt) => {
const cs = fmt(c);
const ps = fmt(p);
const changed = cs !== ps;
return `<div class="review-diff-row${changed ? " changed" : ""}">
<span class="review-diff-label">${esc(label)}</span>
<span class="review-diff-current">${esc(cs)}</span>
<span class="review-diff-arrow">→</span>
<span class="review-diff-proposed">${esc(ps)}</span>
</div>`;
};
const reasoning = prop.reasoning ? esc(prop.reasoning) : "";
return `<div class="review-card" data-source-id="${item.id}">
<div class="review-card-header">
<div class="review-card-title">
<span class="review-card-name">${esc(item.name)}</span>
${item.is_global ? '<span class="review-global-badge">Grundquelle</span>' : ""}
<span class="review-card-domain">${esc(item.domain || "")}</span>
</div>
<div class="review-card-confidence conf-${confClass}" title="LLM-Konfidenz">
<span class="conf-value">${confPct}%</span>
<span class="conf-label">Konfidenz</span>
</div>
</div>
<div class="review-card-diff">
${row("Politik", cur.political_orientation, prop.political_orientation, polFmt)}
${row("Medientyp", cur.media_type, prop.media_type, mtFmt)}
${row("Glaubwürdigkeit", cur.reliability, prop.reliability, relFmt)}
${row("Staatsnah", cur.state_affiliated, prop.state_affiliated, stateFmt)}
${row("Land", cur.country_code, prop.country_code, ccFmt)}
${row("Geopol. Nähe", cur.alignments, prop.alignments, alignFmt)}
</div>
${reasoning ? `<div class="review-card-reasoning"><strong>Begründung:</strong> ${reasoning}</div>` : ""}
<div class="review-card-actions">
<button class="btn btn-small btn-primary" onclick="approveClassification(${item.id})">Übernehmen</button>
<button class="btn btn-small btn-secondary" onclick="rejectClassification(${item.id})">Verwerfen</button>
<button class="btn btn-small btn-secondary" data-reclassify-id="${item.id}" onclick="reclassifySource(${item.id})">Neu klassifizieren</button>
</div>
</div>`;
}
async function approveClassification(id) {
try {
await API.post(`/api/sources/${id}/classification/approve`, {});
showToast("Klassifikation übernommen.", "success");
loadClassificationQueue();
} catch (err) {
showToast("Approve fehlgeschlagen: " + err.message, "error");
}
}
async function rejectClassification(id) {
try {
await API.post(`/api/sources/${id}/classification/reject`, {});
showToast("Vorschlag verworfen.", "success");
loadClassificationQueue();
} catch (err) {
showToast("Reject fehlgeschlagen: " + err.message, "error");
}
}
async function reclassifySource(id) {
const btn = document.querySelector(`[data-reclassify-id="${id}"]`);
if (btn) { btn.disabled = true; btn.textContent = "..."; }
try {
await API.post(`/api/sources/${id}/classification/reclassify`, {});
showToast("Neu klassifiziert.", "success");
loadClassificationQueue();
} catch (err) {
showToast("Reclassify fehlgeschlagen: " + err.message, "error");
} finally {
if (btn) { btn.disabled = false; btn.textContent = "Neu klassifizieren"; }
}
}
async function triggerBulkClassify() {
if (!confirm("Bulk-Klassifikation aller noch nicht klassifizierten Quellen starten? Läuft im Hintergrund (~3-5 Sek pro Quelle, ~0.02 USD pro Quelle).")) return;
try {
const r = await API.post("/api/sources/classification/bulk-classify?limit=500&only_unclassified=true", {});
showToast(`Bulk-Klassifikation gestartet (limit=${r.limit}). In ~10 min neu laden.`, "info");
} catch (err) {
showToast("Start fehlgeschlagen: " + err.message, "error");
}
}
async function bulkApproveHighConfidence() {
if (!confirm("Alle Vorschläge mit Konfidenz ≥ 0.85 genehmigen?")) return;
try {
const r = await API.post("/api/sources/classification/bulk-approve?min_confidence=0.85", {});
showToast(`${r.approved} Vorschläge übernommen.`, "success");
loadClassificationQueue();
} catch (err) {
showToast("Bulk-Approve fehlgeschlagen: " + err.message, "error");
}
}
async function triggerExternalReputationSync() {
if (!confirm("IFCN- und EUvsDisinfo-Datenbanken jetzt syncen? Läuft im Hintergrund (~30 Sek).")) return;
try {
await API.post("/api/sources/external-reputation/sync", {});
showToast("Externer Sync gestartet. Quellenliste in ~30 Sek neu laden.", "info");
} catch (err) {
showToast("Sync fehlgeschlagen: " + err.message, "error");
}
}
function toggleSourceInfo(id) { function toggleSourceInfo(id) {
const row = document.getElementById("notes-" + id); const row = document.getElementById("notes-" + id);
if (!row) return; if (!row) return;
@@ -431,3 +883,68 @@ function toggleSourceInfo(id) {
if (btn) btn.classList.toggle("active", !isVisible); if (btn) btn.classList.toggle("active", !isVisible);
} }
} }
// --- PDF-Quellen-Upload ---
function openPdfUploadModal() {
const form = document.getElementById("pdfUploadForm");
if (form) form.reset();
const err = document.getElementById("pdfUploadError");
if (err) { err.style.display = "none"; err.textContent = ""; }
const prog = document.getElementById("pdfUploadProgress");
if (prog) prog.style.display = "none";
openModal("modalPdfUpload");
}
function setupPdfUploadForm() {
const form = document.getElementById("pdfUploadForm");
if (!form || form.dataset.bound === "1") return;
form.dataset.bound = "1";
form.addEventListener("submit", async (e) => {
e.preventDefault();
const errEl = document.getElementById("pdfUploadError");
const progEl = document.getElementById("pdfUploadProgress");
const submitBtn = document.getElementById("pdfUploadSubmitBtn");
errEl.style.display = "none";
const fileInput = document.getElementById("pdfFile");
const f = fileInput?.files?.[0];
if (!f) {
errEl.textContent = "Bitte eine PDF-Datei auswaehlen.";
errEl.style.display = "block";
return;
}
if (f.size > 50 * 1024 * 1024) {
errEl.textContent = "Datei ueberschreitet 50 MB.";
errEl.style.display = "block";
return;
}
const fd = new FormData();
fd.append("file", f);
const nm = document.getElementById("pdfName").value.trim();
if (nm) fd.append("name", nm);
fd.append("category", document.getElementById("pdfCategory").value || "sonstige");
const lng = document.getElementById("pdfLanguage").value.trim();
if (lng) fd.append("language", lng);
const nt = document.getElementById("pdfNotes").value.trim();
if (nt) fd.append("notes", nt);
submitBtn.disabled = true;
progEl.style.display = "block";
try {
await API.upload("/api/sources/global/upload-pdf", fd);
closeModal("modalPdfUpload");
if (typeof showToast === "function") {
showToast("PDF hochgeladen -- Verarbeitung laeuft im Hintergrund", "success");
}
loadGlobalSources();
} catch (err) {
errEl.textContent = err.message || "Upload fehlgeschlagen";
errEl.style.display = "block";
} finally {
submitBtn.disabled = false;
progEl.style.display = "none";
}
});
}

169
src/static/js/x-scraper.js Normale Datei
Datei anzeigen

@@ -0,0 +1,169 @@
/* X-Recherche-Konten: Verwaltung des twscrape-Account-Pools */
"use strict";
let xScraperCache = [];
async function loadXScraperAccounts() {
setupXScraperForms();
const tbody = document.getElementById("xScraperTable");
tbody.innerHTML = '<tr><td colspan="6" class="text-muted">Lade...</td></tr>';
try {
xScraperCache = await API.get("/api/x-scraper/accounts");
renderXScraperAccounts(xScraperCache || []);
} catch (err) {
tbody.innerHTML = '<tr><td colspan="6" class="text-muted">Fehler: ' + esc(err.message || "") + '</td></tr>';
}
}
function renderXScraperAccounts(list) {
const tbody = document.getElementById("xScraperTable");
const cnt = document.getElementById("xScraperCount");
if (cnt) cnt.textContent = list.length + (list.length === 1 ? " Konto" : " Konten");
if (!list.length) {
tbody.innerHTML = '<tr><td colspan="6" class="text-muted">Keine X-Recherche-Konten. Mit „+ Konto hinzufügen" anlegen.</td></tr>';
return;
}
tbody.innerHTML = list.map((a) => {
let status;
if (!a.active) status = '<span class="text-muted">Inaktiv</span>';
else if (a.locked) status = '<span style="color:var(--warning,#b8860b);">Gesperrt</span>';
else status = '<span style="color:var(--success,#2e7d32);">Aktiv</span>';
const lastUsed = a.last_used && typeof formatDateTime === "function"
? formatDateTime(a.last_used)
: (a.last_used || "—");
const errInfo = a.error_msg
? ' <span class="info-icon" title="' + esc(a.error_msg) + '">!</span>'
: "";
const u = esc(a.username);
const toggleLabel = a.active ? "Deaktivieren" : "Aktivieren";
return '<tr>'
+ '<td><strong>' + u + '</strong>' + errInfo + '</td>'
+ '<td>' + esc(a.email || "—") + '</td>'
+ '<td>' + status + '</td>'
+ '<td>' + (a.total_requests || 0) + '</td>'
+ '<td>' + esc(lastUsed) + '</td>'
+ '<td>'
+ '<button class="btn btn-secondary btn-small" onclick="openXScraperCookiesModal(\'' + u + '\')">Cookies erneuern</button> '
+ '<button class="btn btn-secondary btn-small" onclick="toggleXScraperActive(\'' + u + '\',' + (!a.active) + ')">' + toggleLabel + '</button> '
+ '<button class="btn btn-danger btn-small" onclick="confirmDeleteXScraper(\'' + u + '\')">Entfernen</button>'
+ '</td>'
+ '</tr>';
}).join("");
}
function openXScraperAddModal() {
document.getElementById("xScraperAddError").style.display = "none";
["xsUsername", "xsPassword", "xsEmail", "xsEmailPassword", "xsCookies"].forEach((id) => {
const el = document.getElementById(id);
if (el) el.value = "";
});
openModal("modalXScraperAdd");
}
function openXScraperCookiesModal(username) {
document.getElementById("xScraperCookiesError").style.display = "none";
document.getElementById("xsCookiesUsername").value = username;
document.getElementById("xsCookiesValue").value = "";
openModal("modalXScraperCookies");
}
async function toggleXScraperActive(username, active) {
try {
await API.post("/api/x-scraper/accounts/" + encodeURIComponent(username) + "/active", { active: active });
showToast("Status geändert.", "success");
loadXScraperAccounts();
} catch (err) {
showToast(err.message || "Status konnte nicht geändert werden", "error");
}
}
function confirmDeleteXScraper(username) {
showConfirm(
"Konto entfernen",
'Soll das X-Recherche-Konto "' + username + '" entfernt werden? Der Monitor nutzt es dann nicht mehr zum Scrapen.',
async () => {
try {
await API.del("/api/x-scraper/accounts/" + encodeURIComponent(username));
showToast("Konto entfernt.", "success");
loadXScraperAccounts();
} catch (err) {
showToast(err.message || "Konto konnte nicht entfernt werden", "error");
}
}
);
}
function resetXScraperLocks() {
showConfirm(
"Sperren zurücksetzen",
"Alle temporären Sperren der X-Recherche-Konten zurücksetzen?",
async () => {
try {
await API.post("/api/x-scraper/reset-locks", {});
showToast("Sperren zurückgesetzt.", "success");
loadXScraperAccounts();
} catch (err) {
showToast(err.message || "Sperren konnten nicht zurückgesetzt werden", "error");
}
}
);
}
function setupXScraperForms() {
const addForm = document.getElementById("xScraperAddForm");
if (addForm && !addForm.dataset.wired) {
addForm.dataset.wired = "1";
addForm.addEventListener("submit", async (e) => {
e.preventDefault();
const errEl = document.getElementById("xScraperAddError");
errEl.style.display = "none";
const body = {
username: document.getElementById("xsUsername").value.trim().replace(/^@/, ""),
password: document.getElementById("xsPassword").value,
email: document.getElementById("xsEmail").value.trim(),
email_password: document.getElementById("xsEmailPassword").value,
cookies: document.getElementById("xsCookies").value.trim(),
};
if (!body.username || !body.cookies) {
errEl.textContent = "Benutzername und Cookies sind erforderlich.";
errEl.style.display = "block";
return;
}
try {
await API.post("/api/x-scraper/accounts", body);
closeModal("modalXScraperAdd");
showToast("Konto angelegt.", "success");
loadXScraperAccounts();
} catch (err) {
errEl.textContent = err.message || "Anlegen fehlgeschlagen";
errEl.style.display = "block";
}
});
}
const ckForm = document.getElementById("xScraperCookiesForm");
if (ckForm && !ckForm.dataset.wired) {
ckForm.dataset.wired = "1";
ckForm.addEventListener("submit", async (e) => {
e.preventDefault();
const errEl = document.getElementById("xScraperCookiesError");
errEl.style.display = "none";
const username = document.getElementById("xsCookiesUsername").value;
const cookies = document.getElementById("xsCookiesValue").value.trim();
if (!cookies) {
errEl.textContent = "Cookies sind erforderlich.";
errEl.style.display = "block";
return;
}
try {
await API.post("/api/x-scraper/accounts/" + encodeURIComponent(username) + "/cookies", { cookies: cookies });
closeModal("modalXScraperCookies");
showToast("Cookies erneuert.", "success");
loadXScraperAccounts();
} catch (err) {
errEl.textContent = err.message || "Cookies konnten nicht erneuert werden";
errEl.style.display = "block";
}
});
}
}

422
src/translation_agent.py Normale Datei
Datei anzeigen

@@ -0,0 +1,422 @@
"""Translator-Agent: übersetzt fremdsprachige Artikel ins Deutsche.
Verwaltungs-Adaption des gleichnamigen Monitor-Agents. Nutzt CLAUDE_MODEL_FAST
(Haiku) in Batches. Im Verwaltungsportal wird der Translator ausschließlich
manuell über den Übersetzungs-Button (routers/translation.py) angestoßen,
niemals automatisch.
Quelle: AegisSight-Monitor/src/agents/translator.py - bei größeren Änderungen
am Monitor-Original hier nachziehen. Die Imports weichen bewusst ab
(shared.agents.claude_client statt agents.claude_client). Der restliche Code
unterhalb ist eine 1:1-Kopie und behält daher den Stil des Originals.
"""
import json
import logging
import re
from shared.agents.claude_client import call_claude, ClaudeUsage, UsageAccumulator
from config import CLAUDE_MODEL_FAST
logger = logging.getLogger("verwaltung.translation")
# Im Verwaltungsportal gibt es kein automatisches Übersetzen: der Translator
# läuft nur, wenn translate_articles() explizit mit enabled=True gerufen wird.
# Diese Konstante ist daher der konservative Default für enabled=None.
TRANSLATOR_ENABLED = False
# Pro Batch nicht mehr als so viele Artikel an Claude geben.
# Bei Haiku ist das Output-Limit ca. 8k Tokens. Pro Artikel kommen leicht
# 400-600 Tokens raus (headline_de + content_de bis 1000 Zeichen). Bei 15
# wurde regelmaessig getrunkt (mid-JSON broken). 5 ist sicher mit Reserve.
DEFAULT_BATCH_SIZE = 5
# content_original wird ohnehin auf 1000 Zeichen gecappt (rss_parser).
# Fuer den Translator nochmal verkuerzen, falls vorhanden mehr.
CONTENT_INPUT_MAX = 1200
# content_de soll wie content_original auf 1000 Zeichen begrenzt sein.
CONTENT_OUTPUT_MAX = 1000
def _extract_complete_objects(text: str) -> list[dict]:
"""Extrahiert vollstaendige JSON-Objekte aus moeglicherweise abgeschnittenem Text.
Klammer-Counter-Ansatz: jedes balancierte {...} wird probiert.
"""
results = []
depth = 0
start = -1
in_string = False
escape = False
for i, ch in enumerate(text):
if escape:
escape = False
continue
if ch == "\\":
escape = True
continue
if ch == '"' and not escape:
in_string = not in_string
continue
if in_string:
continue
if ch == "{":
if depth == 0:
start = i
depth += 1
elif ch == "}":
depth -= 1
if depth == 0 and start >= 0:
obj_text = text[start:i + 1]
try:
obj = json.loads(obj_text)
if isinstance(obj, dict):
results.append(obj)
except json.JSONDecodeError:
pass
start = -1
return results
def _build_prompt(articles: list[dict], output_lang: str = "de") -> str:
"""Bauen den Translation-Prompt fuer eine Batch."""
lang_label = {"de": "Deutsch", "en": "Englisch"}.get(output_lang, output_lang)
items = []
for a in articles:
items.append({
"id": a["id"],
"headline": a.get("headline", "") or "",
"content": (a.get("content_original") or "")[:CONTENT_INPUT_MAX],
"source_lang": a.get("language", "en"),
})
return f"""Du bist ein praeziser Uebersetzer fuer Nachrichten-Artikel.
Uebersetze die folgenden Artikel nach {lang_label}.
WICHTIG:
- Verwende IMMER echte UTF-8-Umlaute (ä, ö, ü, ß) - NIEMALS Umschreibungen wie ae, oe, ue, ss.
Beispiele: "Gespraeche" -> "Gespräche", "Fuehrer" -> "Führer", "grosse" -> "große".
- Behalte Eigennamen (Personen, Orte, Organisationen) im Original.
- Headline kurz und buendig wie im Original.
- Content auf MAX {CONTENT_OUTPUT_MAX} Zeichen kuerzen, kein HTML, kein Markdown.
- Wenn der Artikel schon auf {lang_label} ist (z.B. source_lang="{output_lang}"),
kopiere headline und content unveraendert.
Antworte AUSSCHLIESSLICH mit einem flachen JSON-Array (kein Wrapper-Objekt!).
Format genau so:
[
{{"id": 1, "headline_de": "Titel auf Deutsch", "content_de": "Inhalt auf Deutsch"}},
{{"id": 2, "headline_de": "...", "content_de": "..."}}
]
NICHT erlaubt: {{"translations": [...]}} oder {{"items": [...]}} oder Markdown-Codefences.
Nur das Array, ohne Einleitung, ohne Erklaerung.
ARTIKEL:
{json.dumps(items, ensure_ascii=False, indent=2)}
"""
def _parse_response(text: str) -> list[dict]:
"""Robustes JSON-Array-Parsing.
Handhabt:
- reines JSON
- JSON in Markdown-Codefence ```json ... ```
- abgeschnittene Antworten (extrahiert vollstaendige Top-Level-Objekte)
"""
text = text.strip()
# Markdown-Codefence entfernen
if text.startswith("```"):
text = re.sub(r"^```(?:json)?\s*", "", text)
text = re.sub(r"\s*```\s*$", "", text)
text = text.strip()
try:
data = json.loads(text)
except json.JSONDecodeError:
# Erst Array versuchen
match = re.search(r"\[.*\]", text, re.DOTALL)
if match:
try:
data = json.loads(match.group(0))
except json.JSONDecodeError:
# Truncate-Fallback: einzelne Top-Level-Objekte extrahieren
data = _extract_complete_objects(text)
else:
data = _extract_complete_objects(text)
# Claude wraps das Array gelegentlich in {"translations": [...]} oder {"items": [...]}
if isinstance(data, dict):
for key in ("translations", "items", "results", "data"):
if isinstance(data.get(key), list):
data = data[key]
break
else:
# Einzelnes Objekt? Dann als Liste mit einem Element behandeln
if "id" in data:
data = [data]
else:
raise ValueError(f"Translator-Antwort: Dict ohne erwarteten Array-Key (keys={list(data.keys())[:5]})")
if not isinstance(data, list):
raise ValueError(f"Translator-Antwort ist kein Array: {type(data).__name__}")
cleaned = []
for item in data:
if not isinstance(item, dict):
continue
aid = item.get("id")
if not isinstance(aid, int):
try:
aid = int(aid)
except (TypeError, ValueError):
continue
cleaned.append({
"id": aid,
"headline_de": (item.get("headline_de") or "").strip() or None,
"content_de": (item.get("content_de") or "").strip() or None,
})
return cleaned
async def translate_articles_batch(
articles: list[dict],
output_lang: str = "de",
) -> tuple[list[dict], ClaudeUsage]:
"""Uebersetzt eine Batch von Artikeln.
Erwartet articles als Liste von Dicts mit den Feldern id, headline,
content_original, language.
Rueckgabe: (uebersetzte_artikel, usage)
Wenn der Call fehlschlaegt, wird ([], leere_usage) zurueckgegeben - der
Caller kann entscheiden, ob retry oder skip.
"""
if not articles:
return [], ClaudeUsage()
prompt = _build_prompt(articles, output_lang)
try:
result_text, usage = await call_claude(prompt, tools=None, model=CLAUDE_MODEL_FAST)
except Exception as e:
logger.error(f"Translator Claude-Call fehlgeschlagen: {e}")
return [], ClaudeUsage()
try:
translations = _parse_response(result_text)
except Exception as e:
logger.error(f"Translator JSON-Parsing fehlgeschlagen: {e}; raw: {result_text[:300]!r}")
return [], usage
# Validierung: nur Translations zurueckgeben, deren id wirklich
# in der angefragten Batch war
requested_ids = {a["id"] for a in articles}
valid = [t for t in translations if t["id"] in requested_ids]
if len(valid) != len(translations):
logger.warning(
"Translator: %d von %d Translations referenzieren unbekannte IDs",
len(translations) - len(valid), len(translations),
)
return valid, usage
# --- Pre-Topic-Filter: schmale Headline-Übersetzung -----------------------------
#
# Der Topic-Filter (analyzer.filter_relevant_articles) ist ein Haiku-Call, der pro
# Artikel beurteilt, ob er thematisch zur Lage passt. Bei fremdsprachigen Headlines
# (CJK/Arabisch/Hebräisch/Kyrillisch) bewertet Haiku konservativ und verwirft sie
# häufig, weil er sie nur halb versteht. Damit landeten z.B. die japanischen
# Ministeriums-Feeds (MOD, NHK, Asahi) in Lagen mit Japan-Bezug nie in der finalen
# Auswahl, obwohl der RSS-Match korrekt griff.
#
# Diese Funktion übersetzt einen einzelnen Batch-Call alle nicht-lateinischen
# Headlines + erste Content-Sätze ins Englische und hängt das Ergebnis als
# article["headline_en_for_topic"] / article["content_en_for_topic"] an. Der
# Topic-Filter zeigt das dem LLM zusätzlich zum Original.
#
# WICHTIG: Diese Mini-Übersetzung ist UNABHÄNGIG vom TRANSLATOR_ENABLED-Flag —
# sie wird auch dann gemacht, wenn der nachgelagerte Volltext-Translator
# deaktiviert ist (Pflicht für korrektes Topic-Filtering, sehr kleine Kosten).
_TOPIC_TRANSLATE_CONTENT_MAX = 500
def _needs_pretopic_translate(article: dict) -> bool:
"""Erkennt fremdsprachige Headlines, die für den Topic-Filter übersetzt
werden sollten.
Heuristik: Headline enthält Non-ASCII-Zeichen, die NICHT in den typischen
deutsch/franz./span./port./skand. Latin-1-Erweiterungen liegen.
Das sind v.a. CJK (Kanji/Kana/Hangul), Arabisch, Hebräisch, Kyrillisch,
Thai, Devanagari etc.
"""
headline = (article.get("headline_de") or article.get("headline") or "").strip()
if not headline:
return False
for ch in headline:
cp = ord(ch)
# Bereiche ausschließen, die in Latin-Schrift normal sind:
# ASCII (0-127), Latin-1 Supplement (128-255), Latin Extended-A/B (256-591)
if cp <= 591:
continue
# Alles darüber sind fremde Schriftsysteme → übersetzen
return True
return False
async def translate_headlines_for_topic_filter(
articles: list[dict],
target_lang: str = "en",
) -> tuple[int, ClaudeUsage]:
"""Übersetzt die Headlines fremdsprachiger Artikel ins Englische, damit der
nachgelagerte Topic-Filter (Haiku) sie zuverlässig beurteilen kann.
Setzt direkt auf den Artikel-Dicts:
article["headline_en_for_topic"]: str | None
article["content_en_for_topic"]: str | None
Returns:
(anzahl_übersetzt, ClaudeUsage)
"""
if not articles:
return 0, ClaudeUsage()
candidates = [a for a in articles if _needs_pretopic_translate(a)]
if not candidates:
return 0, ClaudeUsage()
# Eindeutige Indizes (auch wenn article kein "id"-Feld hat, weil noch nicht
# in der DB): wir nutzen die Position in der gesamten articles-Liste.
idx_by_obj = {id(a): i for i, a in enumerate(articles)}
items = []
for a in candidates:
idx = idx_by_obj.get(id(a))
if idx is None:
continue
headline = (a.get("headline_de") or a.get("headline") or "").strip()
content_src = (a.get("content_de") or a.get("content_original") or "")
items.append({
"i": idx,
"h": headline[:200],
"c": content_src[:_TOPIC_TRANSLATE_CONTENT_MAX],
})
if not items:
return 0, ClaudeUsage()
lang_label = {"en": "English", "de": "German"}.get(target_lang, target_lang)
prompt = f"""Translate these news headlines and short content snippets to {lang_label}.
Keep proper names (people, organizations, places) untouched. Keep it concise; the goal
is to let another model judge topical relevance, not to publish.
Return ONLY a JSON array. Each item: {{"i": <index>, "h": <headline in {lang_label}>, "c": <content snippet in {lang_label}>}}.
Keep the same "i" values. No prose, no markdown fences.
INPUT:
{json.dumps(items, ensure_ascii=False)}
"""
try:
result_text, usage = await call_claude(prompt, tools=None, model=CLAUDE_MODEL_FAST)
except Exception as e:
logger.warning(f"Pre-Topic-Translate Claude-Call fehlgeschlagen: {e}")
return 0, ClaudeUsage()
# Robustes Parsing (Markdown-Codefence + nacktes Array)
text = result_text.strip()
if text.startswith("```"):
text = re.sub(r"^```(?:json)?\s*", "", text)
text = re.sub(r"\s*```\s*$", "", text)
text = text.strip()
try:
data = json.loads(text)
except json.JSONDecodeError:
m = re.search(r"\[.*\]", text, re.DOTALL)
if not m:
logger.warning(
f"Pre-Topic-Translate: kein JSON-Array in Antwort. Sample: {text[:200]!r}"
)
return 0, usage
try:
data = json.loads(m.group(0))
except json.JSONDecodeError:
data = _extract_complete_objects(text)
if not isinstance(data, list):
logger.warning(
f"Pre-Topic-Translate: Antwort ist kein Array ({type(data).__name__})"
)
return 0, usage
applied = 0
for entry in data:
if not isinstance(entry, dict):
continue
idx = entry.get("i")
if not isinstance(idx, int) or not (0 <= idx < len(articles)):
try:
idx = int(idx)
if not (0 <= idx < len(articles)):
continue
except (TypeError, ValueError):
continue
h = (entry.get("h") or "").strip() or None
c = (entry.get("c") or "").strip() or None
if h:
articles[idx]["headline_en_for_topic"] = h
if c:
articles[idx]["content_en_for_topic"] = c
if h or c:
applied += 1
return applied, usage
async def translate_articles(
articles: list[dict],
output_lang: str = "de",
batch_size: int = DEFAULT_BATCH_SIZE,
usage_accumulator: UsageAccumulator | None = None,
enabled: bool | None = None,
) -> list[dict]:
"""Uebersetzt eine beliebige Anzahl Artikel in Batches.
Bringt die Batches durch Logik in `translate_articles_batch` und gibt
EINE flache Liste der Translations zurueck. Wenn ein Batch fehlschlaegt,
wird er uebersprungen (anderer Batches laufen weiter).
enabled: Pro-Aufruf-Override des globalen TRANSLATOR_ENABLED-Flags. Wenn None,
greift das Modul-Default (config.TRANSLATOR_ENABLED, abgeleitet aus .env).
Der Orchestrator setzt das aus dem Org-Setting 'translator_enabled', damit
jp_demo (Translator zwingend an) trotz global deaktiviertem Flag funktioniert.
"""
if not articles:
return []
is_enabled = TRANSLATOR_ENABLED if enabled is None else bool(enabled)
if not is_enabled:
logger.info(
"Translator deaktiviert (enabled=%s, global TRANSLATOR_ENABLED=%s), %d Artikel uebersprungen",
enabled, TRANSLATOR_ENABLED, len(articles),
)
return []
all_translations = []
for i in range(0, len(articles), batch_size):
batch = articles[i : i + batch_size]
translations, usage = await translate_articles_batch(batch, output_lang)
if usage_accumulator is not None:
usage_accumulator.add(usage)
all_translations.extend(translations)
logger.info(
"Translator-Batch %d/%d: %d/%d uebersetzt (cost=$%.4f)",
(i // batch_size) + 1,
(len(articles) + batch_size - 1) // batch_size,
len(translations), len(batch),
usage.cost_usd,
)
return all_translations

0
tests/__init__.py Normale Datei
Datei anzeigen

23
tests/conftest.py Normale Datei
Datei anzeigen

@@ -0,0 +1,23 @@
"""Pytest-Fixtures für die Verwaltung-Tests.
Setzt minimale Env-Vars, damit src/config.py beim Import nicht scheitert.
Tests bleiben Unit-Tests (kein DB-Zugriff, kein HTTP-Server).
"""
import os
import sys
from pathlib import Path
# config.py erwartet PORTAL_JWT_SECRET zwingend.
# Beim Test-Import setzen wir einen Wert.
os.environ.setdefault("PORTAL_JWT_SECRET", "test-secret-not-for-production")
os.environ.setdefault("DB_PATH", "/tmp/aegis-test-not-used.db")
os.environ.setdefault("SMTP_HOST", "")
os.environ.setdefault("SMTP_USER", "")
os.environ.setdefault("SMTP_PASSWORD", "")
# src/ ist der Python-App-Dir
ROOT = Path(__file__).resolve().parent.parent
SRC = ROOT / "src"
if str(SRC) not in sys.path:
sys.path.insert(0, str(SRC))

65
tests/test_api_meta.py Normale Datei
Datei anzeigen

@@ -0,0 +1,65 @@
"""Integration-Tests fuer DB-freie Endpoints mit Mock-Auth.
GET /api/sources/meta liefert die Single-Source-of-Truth Kategorien/Typen
und braucht keine DB. Mit override des get_current_admin Dependency
testen wir den Endpoint richtig durch (echtes JSON, echtes Schema).
"""
import pytest
from fastapi.testclient import TestClient
@pytest.fixture
def authed_client():
from main import app
from auth import get_current_admin
def fake_admin():
return {"id": 1, "email": "test@aegis-sight.de", "username": "test"}
app.dependency_overrides[get_current_admin] = fake_admin
yield TestClient(app)
app.dependency_overrides = {}
def test_meta_returns_schema(authed_client):
r = authed_client.get("/api/sources/meta")
assert r.status_code == 200
data = r.json()
assert "categories" in data
assert "types" in data
assert isinstance(data["categories"], list)
assert isinstance(data["types"], list)
def test_meta_categories_have_required_fields(authed_client):
r = authed_client.get("/api/sources/meta")
data = r.json()
for cat in data["categories"]:
assert "key" in cat
assert "label" in cat
assert isinstance(cat["key"], str) and cat["key"]
assert isinstance(cat["label"], str) and cat["label"]
def test_meta_types_have_required_fields(authed_client):
r = authed_client.get("/api/sources/meta")
data = r.json()
for t in data["types"]:
assert "key" in t
assert "label" in t
def test_meta_includes_specialized_categories(authed_client):
"""Phase 3b - die spezielleren Lagen-Themen muessen als Kategorien existieren."""
r = authed_client.get("/api/sources/meta")
keys = {c["key"] for c in r.json()["categories"]}
assert "cybercrime" in keys
assert "ukraine-russland-krieg" in keys
assert "russische-staatspropaganda" in keys
def test_meta_includes_all_source_types(authed_client):
"""Alle 5 Source-Types muessen rauskommen."""
r = authed_client.get("/api/sources/meta")
keys = {t["key"] for t in r.json()["types"]}
assert keys == {"rss_feed", "web_source", "telegram_channel", "podcast_feed", "excluded"}

97
tests/test_api_smoke.py Normale Datei
Datei anzeigen

@@ -0,0 +1,97 @@
"""Smoke-Tests fuer alle API-Endpoints: Auth-Coverage.
Pruef, dass jeder geschuetzte Endpoint ohne Auth-Header 401/403 liefert -
verhindert, dass jemand versehentlich einen Endpoint ohne `get_current_admin`
schreibt und ihn oeffentlich macht.
"""
import pytest
from fastapi.testclient import TestClient
@pytest.fixture(scope="module")
def client():
from main import app
# raise_server_exceptions=False -> Exceptions werden als 500 ausgeliefert.
# Wir testen nur, dass Auth korrekt vor DB-Aufrufen greift.
return TestClient(app, raise_server_exceptions=False)
# (method, path, expected_status)
# Auth-geschuetzte Endpoints -> 401 (HTTPBearer ohne credentials wirft 403,
# aber FastAPI HTTPBearer auto_error=True liefert 403; wir akzeptieren beides).
AUTH_PROTECTED = [
("GET", "/api/orgs"),
("POST", "/api/orgs"),
("GET", "/api/orgs/1"),
("PUT", "/api/orgs/1"),
("DELETE", "/api/orgs/1"),
("GET", "/api/licenses"),
("POST", "/api/licenses"),
("PUT", "/api/licenses/1/revoke"),
("PUT", "/api/licenses/1/extend"),
("GET", "/api/licenses/expiring"),
("GET", "/api/users"),
("POST", "/api/users"),
("PUT", "/api/users/1/deactivate"),
("PUT", "/api/users/1/activate"),
("PUT", "/api/users/1/globe-access"),
("PUT", "/api/users/1/network-access"),
("PUT", "/api/users/1/role"),
("DELETE", "/api/users/1"),
("GET", "/api/dashboard/stats"),
("GET", "/api/sources/meta"),
("GET", "/api/sources/global"),
("POST", "/api/sources/global"),
("PUT", "/api/sources/global/1"),
("DELETE", "/api/sources/global/1"),
("GET", "/api/sources/global/stats"),
("GET", "/api/sources/tenant"),
("POST", "/api/sources/tenant/1/promote"),
("POST", "/api/sources/tenant/bulk-promote"),
("POST", "/api/sources/discover"),
("POST", "/api/sources/discover/add"),
("GET", "/api/sources/health"),
("GET", "/api/sources/suggestions"),
("PUT", "/api/sources/suggestions/1"),
("POST", "/api/sources/health/run"),
("POST", "/api/sources/health/run-stream"),
("POST", "/api/sources/health/search-fix/1"),
("GET", "/api/token-usage/overview"),
("GET", "/api/token-usage/1"),
("GET", "/api/token-usage/1/current"),
("PUT", "/api/token-usage/budget/1"),
("GET", "/api/audit-log"),
("GET", "/api/audit-log/distinct"),
]
@pytest.mark.parametrize("method,path", AUTH_PROTECTED)
def test_endpoint_requires_auth(client, method, path):
"""Ohne Authorization-Header muss jeder Endpoint 401 oder 403 liefern."""
r = client.request(method, path, json={})
assert r.status_code in (401, 403), (
f"{method} {path} sollte 401/403 sein, war {r.status_code}: {r.text[:200]}"
)
def test_magic_link_endpoint_is_public(client):
"""/api/auth/magic-link ist absichtlich oeffentlich (sonst kann sich keiner einloggen)."""
r = client.post("/api/auth/magic-link", json={"email": "stranger@example.com"})
# Mit gueltigem JSON -> 200 generische Antwort, ohne Auth-Header.
# 200 erwartet (Anti-Enumeration), aber DB-Aufruf koennte mit /tmp/x.db failen ->
# akzeptieren wir auch 500. Wir wollen nur sicherstellen, dass NICHT 401/403 kommt.
assert r.status_code != 401 and r.status_code != 403
def test_verify_endpoint_is_public(client):
"""/api/auth/verify ist auch oeffentlich (ohne Token koennen wir keinen JWT haben)."""
r = client.post("/api/auth/verify", json={"token": "x" * 20})
assert r.status_code != 401 and r.status_code != 403
def test_static_routes_public(client):
"""/ und /dashboard liefern HTML ohne Auth (Frontend regelt Login-Redirect)."""
r = client.get("/")
assert r.status_code == 200
r = client.get("/dashboard")
assert r.status_code == 200

51
tests/test_audit.py Normale Datei
Datei anzeigen

@@ -0,0 +1,51 @@
"""Tests fuer src/audit.py - diff() + _to_json() Helpers."""
import json
from audit import diff, _to_json
def test_diff_returns_only_changed_fields():
before = {"name": "Alt", "status": "active", "max_users": 5}
after = {"name": "Neu", "status": "active", "max_users": 5}
result = diff(before, after)
assert result == {"name": {"old": "Alt", "new": "Neu"}}
def test_diff_no_changes_returns_none():
same = {"a": 1, "b": "x"}
assert diff(same, dict(same)) is None
def test_diff_with_none_returns_none():
assert diff(None, {"a": 1}) is None
assert diff({"a": 1}, None) is None
def test_diff_added_or_removed_fields():
before = {"a": 1}
after = {"a": 1, "b": 2}
result = diff(before, after)
assert result == {"b": {"old": None, "new": 2}}
def test_to_json_handles_none():
assert _to_json(None) is None
def test_to_json_handles_dict():
out = _to_json({"x": 1, "y": "hallo"})
assert json.loads(out) == {"x": 1, "y": "hallo"}
def test_to_json_handles_non_serializable_via_str_default():
"""Custom Objekte werden via default=str zu Strings."""
class Foo:
def __str__(self):
return "FooObj"
out = _to_json({"obj": Foo()})
assert "FooObj" in out
def test_to_json_preserves_umlauts():
"""ensure_ascii=False soll deutsche Umlaute durchlassen."""
out = _to_json({"name": "Müller"})
assert "Müller" in out

38
tests/test_auth.py Normale Datei
Datei anzeigen

@@ -0,0 +1,38 @@
"""Tests fuer src/auth.py - Magic-Link-Token + JWT Round-Trip."""
import pytest
from auth import generate_magic_token, create_token, decode_token
def test_magic_token_is_url_safe_and_random():
t1 = generate_magic_token()
t2 = generate_magic_token()
assert t1 != t2
# token_urlsafe(32) -> 43 Zeichen base64-url
assert 40 <= len(t1) <= 50
# Nur URL-safe Zeichen
assert all(c.isalnum() or c in "-_" for c in t1)
def test_jwt_round_trip():
token = create_token(admin_id=42, email="info@aegis-sight.de", username="info")
payload = decode_token(token)
assert payload["sub"] == "42"
assert payload["email"] == "info@aegis-sight.de"
assert payload["username"] == "info"
assert payload["role"] == "portal_admin"
assert payload["iss"] == "aegissight-portal"
assert payload["aud"] == "aegissight-portal"
def test_jwt_username_default_from_email():
"""Wenn kein username uebergeben wird, kommt der local-part der Email."""
token = create_token(admin_id=1, email="someone@example.com")
payload = decode_token(token)
assert payload["username"] == "someone"
def test_decode_invalid_token_raises():
from fastapi import HTTPException
with pytest.raises(HTTPException) as exc:
decode_token("not.a.valid.jwt")
assert exc.value.status_code == 401

33
tests/test_imports.py Normale Datei
Datei anzeigen

@@ -0,0 +1,33 @@
"""Smoke-Test: alle Backend-Module importierbar (Catch-Net fuer Syntax-Errors)."""
import importlib
def test_main_app_imports():
import main # FastAPI app
assert hasattr(main, "app")
def test_all_routers_importable():
"""Bei Syntax-Fehlern in einem Router crasht das Ganze - hier fangen wir das ab."""
for mod in ("auth", "organizations", "licenses", "users",
"dashboard", "sources", "token_usage", "audit", "translation"):
m = importlib.import_module(f"routers.{mod}")
assert hasattr(m, "router"), f"routers/{mod} hat keinen router-Objekt"
def test_shared_modules_importable():
"""src/shared/ muss eigenstaendig importierbar sein (kein sys.path-Hack)."""
from shared.source_rules import discover_source, evaluate_feeds_with_claude
from shared.services.source_health import run_health_checks
from shared.services.source_suggester import generate_suggestions
from shared.agents.claude_client import call_claude
assert callable(discover_source)
assert callable(run_health_checks)
def test_helpers_importable():
from auth import generate_magic_token, create_token, decode_token
from audit import log_action, diff, get_client_ip, row_to_dict
from email_utils.sender import send_email
from email_utils.templates import portal_magic_link_email, invite_email
from source_meta import get_meta, category_label, type_label

53
tests/test_models.py Normale Datei
Datei anzeigen

@@ -0,0 +1,53 @@
"""Tests fuer src/models.py - Pydantic-Validierung."""
import pytest
from pydantic import ValidationError
from models import (
MagicLinkRequest, MagicLinkResponse,
VerifyTokenRequest, TokenResponse,
OrgCreate, LicenseCreate, UserCreate,
)
def test_magic_link_request_accepts_email():
r = MagicLinkRequest(email="info@aegis-sight.de")
assert r.email == "info@aegis-sight.de"
def test_magic_link_request_rejects_too_short():
with pytest.raises(ValidationError):
MagicLinkRequest(email="a")
def test_verify_token_min_length():
with pytest.raises(ValidationError):
VerifyTokenRequest(token="abc")
def test_token_response_default_email_empty():
r = TokenResponse(access_token="x" * 40, username="info")
assert r.email == ""
assert r.token_type == "bearer"
def test_org_create_slug_pattern():
"""Slug muss lowercase mit Bindestrichen sein."""
OrgCreate(name="Test", slug="abc-123")
with pytest.raises(ValidationError):
OrgCreate(name="Test", slug="Wrong Case")
with pytest.raises(ValidationError):
OrgCreate(name="Test", slug="under_score")
def test_license_create_type_pattern():
LicenseCreate(organization_id=1, license_type="trial")
LicenseCreate(organization_id=1, license_type="annual")
LicenseCreate(organization_id=1, license_type="permanent")
with pytest.raises(ValidationError):
LicenseCreate(organization_id=1, license_type="lifetime")
def test_user_create_role_pattern():
UserCreate(email="a@b.de", role="member")
UserCreate(email="a@b.de", role="org_admin")
with pytest.raises(ValidationError):
UserCreate(email="a@b.de", role="superuser")

51
tests/test_source_meta.py Normale Datei
Datei anzeigen

@@ -0,0 +1,51 @@
"""Tests fuer src/source_meta.py - Single Source of Truth fuer Kategorien/Typen."""
from source_meta import (
SOURCE_CATEGORIES, SOURCE_TYPES,
get_meta, category_label, type_label,
)
def test_categories_have_unique_keys():
keys = [c["key"] for c in SOURCE_CATEGORIES]
assert len(keys) == len(set(keys))
def test_types_have_unique_keys():
keys = [t["key"] for t in SOURCE_TYPES]
assert len(keys) == len(set(keys))
def test_categories_and_types_have_label():
for c in SOURCE_CATEGORIES:
assert "key" in c and "label" in c
assert isinstance(c["label"], str) and c["label"]
for t in SOURCE_TYPES:
assert "key" in t and "label" in t
def test_get_meta_shape():
meta = get_meta()
assert set(meta.keys()) == {"categories", "types"}
assert meta["categories"] == SOURCE_CATEGORIES
assert meta["types"] == SOURCE_TYPES
def test_category_label_lookup():
assert category_label("nachrichtenagentur") == "Nachrichtenagentur"
assert category_label("oeffentlich-rechtlich") == "Öffentlich-Rechtlich"
# Unbekannter key -> Fallback auf key selbst
assert category_label("does-not-exist") == "does-not-exist"
def test_type_label_lookup():
assert type_label("rss_feed") == "RSS-Feed"
assert type_label("telegram_channel") == "Telegram-Kanal"
assert type_label("does-not-exist") == "does-not-exist"
def test_category_includes_aktuelle_themen():
"""Phase 3b: Lagen-spezifische Kategorien (cybercrime etc.) müssen drin sein."""
keys = {c["key"] for c in SOURCE_CATEGORIES}
assert "cybercrime" in keys
assert "ukraine-russland-krieg" in keys
assert "russische-staatspropaganda" in keys