Commits vergleichen

...

42 Commits

Autor SHA1 Nachricht Datum
52f5debe44 Release-Notes: X-Recherche-Konten im Verwaltungsportal verwalten 2026-05-22 14:41:16 +02:00
claude-dev
8c75a70655 feat(x-scraper): X-Recherche-Konten im Verwaltungsportal verwalten
Neuer Sub-Tab "X-Recherche-Konten" unter Quellen: die X-Login-Konten,
mit denen der Monitor bei X scrapt (twscrape-Account-Pool), anzeigen,
hinzufuegen, Cookies erneuern, aktiv/inaktiv schalten, entfernen, plus
Sperren-Reset.

- neuer Router x_scraper.py, verwaltet den twscrape-Pool ueber dessen API
- X_ACCOUNTS_DB_PATH in config.py
- twscrape als Abhaengigkeit (git-main-Pin)
- Sub-Tab, Tabelle und zwei Modals in dashboard.html, Logik in x-scraper.js

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 11:22:35 +00:00
6bfff67c2f Promote develop → main (2026-05-22 11:13 UTC) 2026-05-22 13:13:41 +02:00
746b1bcd81 Release-Notes: Interne Verbesserungen 2026-05-22 13:13:39 +02:00
7ec153ca49 Release-Notes: Interne Verbesserungen 2026-05-22 13:13:27 +02:00
claude-dev
a27fe44b0b Revert "feat(sources): X-Account-Verwaltung im Verwaltungsportal"
This reverts commit bd476edb13.
2026-05-22 11:12:28 +00:00
6c623a8ae5 Promote develop → main (2026-05-22 11:09 UTC) 2026-05-22 13:09:25 +02:00
240222cb2a Release-Notes: X-Konten direkt im Verwaltungsportal verwalten 2026-05-22 13:09:23 +02:00
claude-dev
bd476edb13 feat(sources): X-Account-Verwaltung im Verwaltungsportal
Neuer Sub-Tab "X-Accounts" unter Quellen: die als Recherchequelle
eingebundenen X-Accounts anzeigen, hinzufuegen, bearbeiten und entfernen.
Schreibt source_type=x_account in die geteilte sources-Tabelle, von wo
der Monitor sie pro Lage nutzt.

- x_account im source_type-Pattern von GlobalSourceCreate/Update
- primary_language in Create/Update plus INSERT (Keyword-Matching)
- x_account-Typ und x-Kategorie in source_meta.py
- Sub-Tab, Tabelle und Modal in dashboard.html, Logik in sources.js

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 11:06:07 +00:00
ed38d68db7 Promote develop → main (2026-05-22 09:37 UTC) 2026-05-22 11:37:54 +02:00
c7d6d2eedf Release-Notes: Neue Übersetzungsfunktion im Dashboard 2026-05-22 11:37:47 +02:00
031bd9e114 feat(translation): manueller Übersetzungs-Button im Dashboard
Fremdsprachige Artikel ohne deutsche Fassung lassen sich jetzt manuell
über das Verwaltungs-Dashboard übersetzen. Hintergrund: die automatische
Übersetzung im Monitor wurde deaktiviert (TRANSLATOR_ENABLED=false),
nachdem ein sehr großer Lauf den Refresh-Worker blockiert hatte.

- translation_agent.py: Verwaltungs-Adaption des Monitor-Translators
  (Haiku-Batches), Imports auf shared.agents.claude_client umgestellt
- routers/translation.py: Endpoints /api/translation/status, /run und
  /cancel. Der Lauf läuft als entkoppelter Hintergrund-Task, blockiert
  keinen Request und ist jederzeit abbrechbar
- Dashboard-Karte mit Fortschrittsbalken, Aufwandsschätzung vorab und
  Abbrechen-Button
- test_imports.py: neuen Router in den Smoke-Test aufgenommen

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 10:07:37 +02:00
c316c67294 Kategorie stimmungsbild (#9) 2026-05-22 00:29:02 +02:00
430641b128 feat(sources): neue Kategorie 'stimmungsbild' fuer Forum-Quellen
Single Source of Truth (source_meta.SOURCE_CATEGORIES) um den Eintrag
"Forum / Stimmungsbild" erweitert. Wird vom Frontend ueber /api/sources/meta
geladen und in den Filter-Dropdowns angezeigt.

Hintergrund: jp_demo nutzt 5ch (Phase 2), Hatena Bookmark und Note Trending
als anonyme Foren-Quellen fuer eine eigene Stimmungs-Kachel im Monitor.
Diese Quellen bekommen in der DB category='stimmungsbild' + media_type='forum',
sodass sie aus dem Faktencheck rausfallen.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 00:28:04 +02:00
7c558b7cb4 Promote develop → main (2026-05-21 15:49 UTC) 2026-05-21 17:49:35 +02:00
Claude Code
c62be998d5 fix(config): CLAUDE_PATH-Default auf /usr/local/bin/claude
Bisheriger Default /home/claude-dev/.claude/local/claude existiert auf dem
Live-Server nicht. Folge: jeder call_claude im verwaltungsportal stirbt mit
FileNotFoundError [Errno 2]. Konkret betroffen: bulk_classify (alle Quellen-
Klassifizierungen schlugen still fehl bzw. brachen nach der ersten Quelle ab).

Live wurde uebergangsweise per CLAUDE_PATH-Env in /home/claude-dev/AegisSight
-Monitor-Verwaltung/.env ueberschrieben. Dieser Commit zieht den Fix in den
Code, damit der Default auch ohne .env-Override funktioniert (bzw. die .env-
Zeile spaeter wieder entfernt werden kann).

Monitor-config.py:30 hat einen abweichenden Default (/usr/bin/claude) —
nicht in diesem Commit angefasst, da Monitor heute funktioniert; getrennt
nachhalten falls auch dort Drift auftritt.
2026-05-20 19:53:03 +00:00
5d1d72bf3d Promote develop → main (2026-05-17 19:19 UTC) 2026-05-17 21:19:44 +02:00
d0b71d82e4 Release-Notes: 83 neue Quellen für Militär, Polizei-Technik & Waffen 2026-05-17 21:19:42 +02:00
claude-dev
c64675b266 feat(scripts): Bulk-Seed fuer 83 Militaer-, Polizei-Technik und Waffen-Quellen
Neues idempotentes Skript scripts/seed_military_sources.{json,py} legt 85
internationale Defense-Quellen an (RSS/Web/Telegram), kategorisiert mit
Topic-Tags im notes-Feld: [militaertechnik], [waffen-international],
[polizei-technik]. Sprachen EN/DE/FR/RU/FA/PL, country_code manuell gesetzt.

Erstlauf auf Staging-DB: 83 neu (IDs 384-466), 2 Duplikate (rybar,
osintdefender bereits vorhanden). URL-Check verhindert Duplikate, das
gleiche Skript laeuft ohne Aenderung gegen Live-DB:

  venv/bin/python scripts/seed_military_sources.py \
    --db /home/claude-dev/osint-data/osint.db

Sektionen: 31 internationale Equipment-Fachredaktionen (Janes, TWZ, Defense
News, Naval News, Army Recognition, Aviation Week ...), 8 deutsche
(ESuT, Soldat & Technik, hartpunkt, Augen geradeaus ...), 5 franzoesische
(Opex360, Mer et Marine ...), 5 russische (Topwar, TASS, RIA, bmpd, Zvezda),
4 ukrainisch/polnische (Defense Express, Militarnyi, Defence24), 2
israelische, 3 iranische, 3 chinesisch/asiatische, 8 OSINT-Tracker (ORYX,
WarSpotting, CIT, 5 Telegram), 5 Polizei-Technik (Behoerden-Spiegel, pvt,
Police Magazine ...) und 11 Waffen-Spezialisten (Small Arms Survey, SIPRI,
Conflict Armament Research, ARES, Calibre Obscura, ICRC ...).

Plan: ~/.claude/plans/gleaming-inventing-fern.md

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 11:18:07 +00:00
1d9ce20b68 Promote develop → main (2026-05-17 00:40 UTC) 2026-05-17 02:40:40 +02:00
claude-dev
9843ff0015 fix(pdf-upload): closeModal-Aufrufe ohne Quotes -> Abbrechen ging nicht
Die Anfuehrungszeichen waren beim Einfuegen verloren gegangen
(onclick="closeModal(modalPdfUpload)" statt closeModal())
-> Browser warf ReferenceError und der Klick blieb wirkungslos.
Mit &#39; (HTML-Entity-Apostroph) im Attribut-Wert eindeutig.
2026-05-16 23:43:41 +00:00
claude-dev
27afce7c9e feat(sources): PDF-Upload als neuer Quellentyp pdf_document
- POST /api/sources/global/upload-pdf: multipart File-Upload,
  50 MB Limit, SHA256-Dedup, speichert PDF unter <dirname(DB)>/pdfs/{sha}.pdf,
  legt Source mit processed_at=NULL an (Monitor verarbeitet asynchron)
- pattern in GlobalSourceUpdate um pdf_document erweitert (2x)
- dashboard.html: Button + Modal im Grundquellen-Sub-Tab
- sources.js: openPdfUploadModal + setupPdfUploadForm + FormData-Submit
- app.js: API.upload(path, formData) Helper fuer multipart
- requirements.txt: pypdf (Validierung optional)
2026-05-16 23:21:56 +00:00
claude-dev
d3e5fa7079 feat(orgs): Pipeline-Sprache als Org-Setting im Verwaltungsportal
- OrgCreate / OrgUpdate / OrgResponse um output_language (de | en).
- routers/organizations.py persistiert die Sprache nach create/update
  via shared.services.org_settings.set_org_setting.
- _enrich_org liest output_language aus organization_settings (Default de).
- Frontend: Dropdown im Modal Neue Organisation und im Org-Edit-Formular,
  Auto-Befuellung aus org.output_language. Cache-Buster auf app.js gebumpt.

Phase 7 von 8 (eng_demo / Org-Sprache).
2026-05-13 21:18:07 +00:00
claude-dev
521633bde9 feat(shared): org_settings Helper (Kopie aus Monitor)
Helper aus AegisSight-Monitor/src/services/org_settings.py uebernommen.
Wird in Phase 7 vom Verwaltungs-Org-Router verwendet, um output_language
beim Org-Anlegen/Bearbeiten zu setzen.

Phase 1 von 8 (eng_demo / Org-Sprache).
2026-05-13 20:46:10 +00:00
claude-dev
015255237a feat(klassifikation): Quellen-Klassifikation aus Monitor in Verwaltung verschoben
Service-Module (source_classifier, external_reputation) liegen jetzt in shared/services/, Endpoints unter /api/sources/classification/* sind hier statt im Monitor:
- classification/{stats,queue,bulk-classify,bulk-approve}
- {id}/classification/{approve,reject,reclassify}
- external-reputation/sync

modalSource erweitert um Klassifikations-Section (Politik, Medientyp, Reliability, state-affiliated, Land, 12 Alignment-Chips). Neuer Sub-Tab Klassifikation mit Review-Queue, Pending-Counter, Bulk-Actions. Auth via get_current_admin, Audit-Logging.

Begleit-Refactor: Monitor verliert die Klassifikations-UI/-Endpoints separat.
2026-05-09 21:27:55 +00:00
b56b7eeda2 Merge pull request 'Promote: sync Strategie/Karteileichen-Reihenfolge' (#5) from develop into main 2026-05-09 17:44:30 +02:00
Claude
2f7d967ce2 sync(shared): Reihenfolge Strategie-Eskalation/Karteileichen aus Monitor 5f053a3 2026-05-09 15:43:44 +00:00
1d9751ef1a Promote develop -> main 2026-05-09 17:27:03 +02:00
Claude
5e08d06784 feat(quellen-health): Strategie-Eskalation, Loesung-suchen bei Warnings, Trend-Delta
Drei zusammenhaengende Verbesserungen am Quellen-Health-Bereich:

1. shared/services/source_suggester.py:
   - sync mit Monitor commit 49c5572.
   - Neue Funktion generate_strategy_escalation_suggestions: erzeugt
     deactivate-Vorschlaege fuer Quellen mit fetch_strategy=googlebot|
     paywall, deren Reachability-Check trotzdem error meldet.

2. source-health.js: Loesung-suchen-Button erweitert.
   Bisher nur bei status=error AND check_type=reachability. Jetzt auch
   bei status=warning AND check_type=feed_validity (z.B. "Feed
   erreichbar aber leer"). Backend-Endpoint /api/sources/health/
   search-fix wird in beiden Faellen aufgerufen, Claude sucht eine
   bessere URL fuer die Quelle.

3. source-health.js: Trend-Delta im Counter.
   Liest healthHistoryCache[1] (vorletzter Run) und vergleicht mit
   aktuellen errors/warnings/ok. Zeigt z.B. "3 Fehler (+2)" rot oder
   "143 Warnungen (-15)" gruen. Bei steigenden ok-Counts ist Plus
   gruen, bei steigenden Fehlern ist Plus rot. Wenn der vorletzte
   Run nicht verfuegbar (Initial-Lauf): kein Delta.

Cache-Buster source-health.js auf 20260509l gebumpt.
2026-05-09 15:26:24 +00:00
6a02e06887 Promote develop -> main 2026-05-09 17:21:07 +02:00
Claude
719c67df3e sync(shared): Karteileichen-Heuristik aus Monitor-Repo uebernommen
Spiegelung von AegisSight-Monitor commit d973dc7. Identische Datei
(Pre-Commit-Hook prueft Drift gegen Monitor-Master = 0).

Neue Funktion generate_stale_deactivation_suggestions wird beim
manuellen Health-Check-Run ueber das Verwaltungsportal-UI aufgerufen
(/api/sources/health/run-stream am Ende). Karteileichen-Quellen
landen damit im Vorschlaege-Tab als deactivate_source-Vorschlaege
und koennen per Klick angenommen werden.
2026-05-09 15:09:45 +00:00
42a0647cd7 Promote develop → main (2026-05-09 15:05 UTC) 2026-05-09 17:05:22 +02:00
Claude (cleanup)
38a13c0b64 ux(quellen-health): Run-Verlauf-Tabelle kompakter
- Total-Spalte raus (= errors+warnings+ok, redundant).
- Spalten-Widths explizit per colgroup gesetzt: 200/160/110/130/110px,
  damit die Werte nicht in einer leeren Flaeche rechts kleben.
- Header-Bezeichnungen + Werte fuer Counter-Spalten zentriert
  (statt rechtsbuendig auf gleichmaessig verteilten Spalten).
- Run-ID gekuerzt auf 12 Zeichen, kleinerer font-size, voller
  Wert im title-Tooltip.
- Spaltenbeschriftung von "Zeitpunkt (Run-Ende)" -> "Zeitpunkt"
  (Klammer-Erklaerung war Footnote-Material).

Cache-Buster source-health.js auf 20260509k gebumpt.
2026-05-09 14:44:20 +00:00
Claude (cleanup)
3a838809c6 fix(navigation): #healthSubTabs aus globalem Top-Tab-Handler ausnehmen
Der globale setupNavTabs in app.js fing nav-tab-Clicks aus ALLEN
nav-tabs ab, ausser #orgDetailTabs und #sourceSubTabs. Das neue
#healthSubTabs (aus dem letzten Commit) war nicht in der :not()-
Liste und triggerte daher den Top-Level-Handler, der getElementById("sec-suggestions")
suchte und null bekam -> Crash beim classList.add("active").

Fix: :not(#healthSubTabs) ergaenzt an allen drei Stellen
(setupNavTabs, setupNavTabs Click-Handler, openSection-Helfer in Z. 408).
Cache-Buster fuer app.js gebumpt 20260509d -> 20260509j.
2026-05-09 14:38:36 +00:00
Claude (cleanup)
f1680c9f4f ux(quellen-health): Sub-Tabs Vorschläge / Health-Status / Verlauf, Lucide-Icons statt Emojis
Splittet die Quellen-Health-Section in drei eigene Sub-Tabs auf, damit
der User je nach Aufgabe nur den relevanten Bereich sieht und nicht
durch die ganze Seite scrollen muss.

dashboard.html:
- Innerhalb von <div id=sub-source-health>: neue nav-tabs healthSubTabs
  mit drei Buttons (Vorschläge / Health-Status / Verlauf).
- Drei Pane-Container ht-suggestions / ht-checks / ht-verlauf,
  jeweils per inline-style display kontrolliert.

source-health.js:
- setupHealthSubTabs(): Click-Handler fuer den Tab-Wechsel
  (toggle .active auf den Buttons + display none/block auf den Panes).
- renderHealthDashboard splittet jetzt in drei innerHTML-Calls,
  einen pro Pane:
    paneSuggestions <- Vorschlaege offen
    paneChecks      <- Counter + Filter + Tabelle + Mehr-laden
    paneVerlauf     <- erledigte Vorschlaege + Run-Verlauf
- Tab-Label "Vorschlaege" wird mit Counter angereichert (z.B.
  "Vorschlaege (24 offen)"), wenn welche offen sind.
- LUCIDE_ICONS-Konstante mit Inline-SVG fuer check, x, search,
  refresh. Emojis und HTML-Entities (&check; &times; ) ersetzt.
  Inline-SVG statt CDN-Library, damit keine externe Abhaengigkeit.

Cache-Buster fuer source-health.js auf 20260509i gebumpt.
2026-05-09 14:26:10 +00:00
Claude (cleanup)
5191962ce0 ux(quellen-health): Verschlankung - Beschreibung gekürzt, Verlauf eingeklappt, schmalere Health-Tabelle, Icon-Buttons
Vier UX-Hebel zusammengelegt, alle reines Frontend:

1. Vorschlaege-Tabelle: Beschreibung als Einzeiler mit Ellipsis;
   voller Text im title-Tooltip. Spart bei 24 offenen Vorschlaegen
   ~25 Bildschirmhoehen.

2. Verlauf-Card: standardmaessig eingeklappt via <details>-Element.
   Header zeigt nur "Verlauf (N erledigte Vorschlaege - klick zum
   Aufklappen)". Klick expandiert die Tabelle.

3. Health-Tabelle: Spalten Domain und Sprache aus der Tabelle raus,
   beide als Tooltip auf dem Quellen-Namen. Tabelle hat statt 8
   Spalten nur noch 6, ist schmaler und besser lesbar.

4. Aktionen-Spalten: Text-Buttons ("Annehmen", "Ablehnen", "Lösung
   suchen") durch kompakte Icon-Buttons ersetzt (✓ ✗ 🔍).
   Funktion identisch, Tooltip via title-Attribut.

Cache-Buster fuer source-health.js auf 20260509h gebumpt.
2026-05-09 14:18:04 +00:00
Claude (cleanup)
b6926df84d cleanup(sources): redundanten /health/run Endpoint entfernen
Frontend ruft ausschliesslich /health/run-stream auf. Der Legacy-Endpoint
/health/run war ein simples synchrones Pendant ohne Fortschrittsanzeige
und wurde nirgends mehr aufgerufen (verifiziert via grep -r im Repo).

Schritt 2 der Quellen-Health-Aufraeumung. Reine Code-Saeuberung,
keine UX- oder Backend-Verhaltensaenderung.
2026-05-09 14:03:51 +00:00
2594b0339f Promote develop -> main 2026-05-09 15:47:23 +02:00
claude-dev
e8bb2495ee ux(quellen-health): Default "Nur Probleme", Counter feiner gegliedert, Filter-Hint bei Pagination
Schritt 1 der Quellen-Health-Aufraeumung. Drei UX-Verbesserungen, kein Daten-Eingriff:

1. Default-Filter "Nur Probleme" (errors + warnings, ohne OK).
   - Neuer Status-Filter-Wert "issues" als virtuelles Frontend-Konstrukt.
   - applyHealthFilter behandelt "issues" als status != ok.
   - Default in healthFilters ist jetzt "issues". User sieht beim
     Tab-Klick sofort die kritischen 146 Eintraege statt der 281
     gruenen OK-Zeilen.

2. Counter aufgegliedert nach check_type.
   - Backend (/api/sources/health): zusaetzliches Feld "breakdown"
     mit der GROUP-BY (check_type, status) Aggregation.
   - Frontend rendert pro Status-Zeile die feine Aufschluesselung,
     z.B. "143 Warnungen (112 Aktualität, 27 Feed-Validität, 3 Duplikat,
     1 Erreichbarkeit)".
   - Hilft dem Admin, sofort zu sehen wo das Problem liegt.

3. Filter-Hint bei Pagination + leeren Treffern.
   - Wenn der aktuelle Filter ueber die geladenen 100 Items keinen
     Treffer findet UND has_more=true, zeigt das Frontend einen
     Hinweis-Link "Alle X Health-Checks laden und Filter erneut
     anwenden".
   - Loest das Edge-Problem, dass z.B. Filter "Nur OK" auf den
     Default-100 (errors first) leer schien.

Cache-Buster fuer source-health.js auf 20260509g gebumpt.
2026-05-09 13:24:44 +00:00
claude-dev
50749323f8 fix(quellen-health): fehlende Sub-Section + Render-Container
Tab-Button "Quellen-Health" verlinkte auf eine Sub-Section, die
es im DOM gar nicht gab:

- <button data-subtab="source-health"> existierte bereits
- <div id="sub-source-health"> fehlte komplett
- <div id="healthContent"> (Render-Anker für source-health.js) fehlte
  ebenfalls

Folge:
1. sources.js Click-Handler crashte mit
   "Cannot read properties of null (reading classList)" beim Versuch,
   die Sub-Section auf .active zu setzen
2. loadHealthData() lief zwar (über separaten Listener in
   source-health.js) und der Backend-Call ging durch, aber
   renderHealthDashboard fand kein #healthContent und brach still ab
   (if (!container) return). Nutzer sah niemals Inhalt.

Fix: Sub-Section <div id="sub-source-health"><div id="healthContent">
zwischen sub-tenant-sources und der Audit-Section eingefügt. Außerdem
das ohnehin fehlende </div> für sec-sources sauber geschlossen.

Damit löst sich das gefühlte "Quellen Health lädt minutenlang":
beim Klick wird der Tab korrekt aktiviert, der Render landet in
#healthContent und ist dank der Pagination + Cache aus den letzten
zwei Commits sofort sichtbar.
2026-05-09 13:15:16 +00:00
claude-dev
657683d491 perf(sources): Quellen-Health Pagination (default 100, plus mehr/alle laden)
Echter Bottleneck war der DOM-Render von 519 Tabellen-Zeilen, nicht
das Backend (45ms). Backend-Slim und Cache aus dem letzten Commit
haben Bandbreite und wiederholte Klicks beschleunigt, aber der erste
Klick blieb langsam, weil weiterhin alle 519 Items in einem
innerHTML-Schub gerendert wurden.

Lösung: Server-Side-Pagination.

Backend (/api/sources/health):
- Neue Query-Param: limit (default 100, max 5000), offset (default 0)
- Counters errors/warnings/ok/total_checks aus separater GROUP-BY-
  Aggregat-Query über den GESAMTEN Bestand, nicht über die Page.
- Neues Feld all_orgs in der Antwort: alle Tenants mit Health-Checks,
  damit das Filter-Dropdown auch im Pagination-Modus die volle
  Org-Liste hat.
- Neue Felder limit, offset, has_more.

Frontend (source-health.js):
- healthLoadLimit (default 100), wird durch loadMoreHealth() um 200
  hochgesetzt oder durch loadAllHealth() auf alles gesetzt.
- Cache-Key beinhaltet jetzt auch das aktuelle Limit, damit beim
  Mehr-laden nicht aus altem Cache bedient wird.
- Org-Liste kommt aus healthData.all_orgs statt aus den geladenen
  Page-Items, sonst wäre sie nach Pagination unvollständig.
- Footer mit zwei Buttons ("+200 laden", "Alle N weiteren laden")
  unter der Tabelle, nur sichtbar bei has_more=true.
- Counter-Anzeige: "X / Y angezeigt (von Z insgesamt)".

Cache-Buster für source-health.js auf 20260509f gebumpt.
2026-05-09 12:54:35 +00:00
claude-dev
f6af21e6cb perf(sources): Quellen-Health Tab schneller (Payload-Slim + 60s-Cache)
Tab "Quellen Health" lädt deutlich schneller:

1. /api/sources/health: SELECT reduziert auf nur die im Frontend wirklich
   gerenderten Felder. Weg sind: h.id, s.url, s.source_type, s.category,
   s.bias, h.details, h.checked_at. Response-Größe sinkt damit von ~198 KB
   auf grob die Hälfte (bei 519 Health-Checks) ohne UI-Verlust.

2. source-health.js: 60-Sekunden In-Memory-Cache fürs loadHealthData.
   Tab hin und her klicken ist damit instant statt jedes Mal voller
   Reload + Render der 519 Tabellen-Zeilen.
   Bei Mutationen (Vorschlag annehmen/ablehnen, run-stream beendet,
   search-fix) wird mit loadHealthData(true) der Cache umgangen,
   damit frische Daten gezeigt werden.

3. dashboard.html: Cache-Buster für source-health.js auf 20260509e gebumpt.
2026-05-09 12:33:30 +00:00
24 geänderte Dateien mit 3990 neuen und 112 gelöschten Zeilen

46
RELEASES.json Normale Datei
Datei anzeigen

@@ -0,0 +1,46 @@
[
{
"version": "2026-05-22T12:41Z",
"date": "2026-05-22",
"title": "X-Recherche-Konten im Verwaltungsportal verwalten",
"items": [
"Recherche-Konten für X (ehemals Twitter) können jetzt direkt im Verwaltungsportal hinzugefügt, bearbeitet und entfernt werden."
]
},
{
"version": "2026-05-22T11:13Z",
"date": "2026-05-22",
"title": "Interne Verbesserungen",
"items": []
},
{
"version": "2026-05-22T11:13Z",
"date": "2026-05-22",
"title": "Interne Verbesserungen",
"items": []
},
{
"version": "2026-05-22T11:09Z",
"date": "2026-05-22",
"title": "X-Konten direkt im Verwaltungsportal verwalten",
"items": [
"X-Konten können jetzt zentral über das Verwaltungsportal angelegt und verwaltet werden."
]
},
{
"version": "2026-05-22T09:37Z",
"date": "2026-05-22",
"title": "Neue Übersetzungsfunktion im Dashboard",
"items": [
"Texte können jetzt im Dashboard per Klick manuell übersetzt werden."
]
},
{
"version": "2026-05-17T19:19Z",
"date": "2026-05-17",
"title": "83 neue Quellen für Militär, Polizei-Technik & Waffen",
"items": [
"83 neue Quellen aus den Bereichen Militär, Polizei-Technik und Waffen sind jetzt verfügbar."
]
}
]

Datei anzeigen

@@ -7,3 +7,7 @@ python-multipart
aiosmtplib
httpx>=0.28
feedparser>=6.0
# PDF-Upload-Validierung
pypdf>=5.0
# X-Scraper-Konten-Verwaltung (twscrape-Account-Pool)
twscrape @ git+https://github.com/vladkens/twscrape.git@206f0942fe41149da28530399f7c772ec00be17a

Datei anzeigen

@@ -0,0 +1,104 @@
{
"_meta": {
"purpose": "Bulk-Seed fuer Militaer-, Polizei-Technik und internationale Waffen-Quellen",
"created": "2026-05-17",
"plan": "C:\\Users\\hendr\\.claude\\plans\\gleaming-inventing-fern.md"
},
"sources": [
{"name": "Janes OSINT Insights", "url": "https://www.janes.com/osint-insights/defence-news", "domain": "janes.com", "source_type": "web_source", "language": "en", "country_code": "GB", "fetch_strategy": "paywall", "notes": "[militaertechnik] Goldstandard fuer Equipment-Specs und Defense-OSINT, Vollartikel paywalled"},
{"name": "The War Zone (TWZ)", "url": "https://www.twz.com/feed", "domain": "twz.com", "source_type": "rss_feed", "language": "en", "country_code": "US", "fetch_strategy": "default", "notes": "[militaertechnik] Air/Land/Sea/Space/Cyber, sehr tiefe Equipment-Analysen, Tyler Rogoway"},
{"name": "Defense News", "url": "https://www.defensenews.com/arc/outboundfeeds/rss/?outputType=xml", "domain": "defensenews.com", "source_type": "rss_feed", "language": "en", "country_code": "US", "fetch_strategy": "default", "notes": "[militaertechnik] Industriepolitik, Beschaffung, Programme"},
{"name": "Breaking Defense", "url": "https://breakingdefense.com/full-rss-feed/", "domain": "breakingdefense.com", "source_type": "rss_feed", "language": "en", "country_code": "US", "fetch_strategy": "default", "notes": "[militaertechnik] Tech, Programme, Pentagon-Politik"},
{"name": "Naval News", "url": "https://www.navalnews.com/feed/", "domain": "navalnews.com", "source_type": "rss_feed", "language": "en", "country_code": "FR", "fetch_strategy": "default", "notes": "[militaertechnik] Marine global, Schiffstechnik, Werften, U-Boote"},
{"name": "Army Recognition", "url": "https://www.armyrecognition.com/news/army-news/feed/rss", "domain": "armyrecognition.com", "source_type": "rss_feed", "language": "en", "country_code": "BE", "fetch_strategy": "default", "notes": "[militaertechnik] Equipment-Specs Heer, sehr fahrzeugfokussiert, breite Datenbank"},
{"name": "Navy Recognition", "url": "https://www.navyrecognition.com/index.php?option=com_acymailing&ctrl=fronturl&task=rss", "domain": "navyrecognition.com", "source_type": "rss_feed", "language": "en", "country_code": "BE", "fetch_strategy": "default", "notes": "[militaertechnik] Equipment-Specs Marine, Schwesterportal Army Recognition"},
{"name": "Air Recognition", "url": "https://www.airrecognition.com/index.php?option=com_acymailing&ctrl=fronturl&task=rss", "domain": "airrecognition.com", "source_type": "rss_feed", "language": "en", "country_code": "BE", "fetch_strategy": "default", "notes": "[militaertechnik] Equipment-Specs Luftwaffe, Schwesterportal Army Recognition"},
{"name": "Aviation Week Defense", "url": "https://aviationweek.com/awn-rss/feed", "domain": "aviationweek.com", "source_type": "rss_feed", "language": "en", "country_code": "US", "fetch_strategy": "default", "notes": "[militaertechnik] Luftfahrt und Defense, seit 1916, Industrie-Insider"},
{"name": "Air & Space Forces Magazine", "url": "https://www.airandspaceforces.com/feed/", "domain": "airandspaceforces.com", "source_type": "rss_feed", "language": "en", "country_code": "US", "fetch_strategy": "default", "notes": "[militaertechnik] USAF-Schwerpunkt, Programme + Doktrin"},
{"name": "Shephard Media", "url": "https://www.shephardmedia.com/news/feed/", "domain": "shephardmedia.com", "source_type": "rss_feed", "language": "en", "country_code": "GB", "fetch_strategy": "default", "notes": "[militaertechnik] Defense News, Analyse + Daten, Land/Air/Sea/Training"},
{"name": "EDR Magazine (European Defence Review)", "url": "https://www.edrmagazine.eu/feed", "domain": "edrmagazine.eu", "source_type": "rss_feed", "language": "en", "country_code": "FR", "fetch_strategy": "default", "notes": "[militaertechnik] Europaeische Defense-Perspektive, Englisch"},
{"name": "The Defense Post", "url": "https://thedefensepost.com/feed/", "domain": "thedefensepost.com", "source_type": "rss_feed", "language": "en", "country_code": "US", "fetch_strategy": "default", "notes": "[militaertechnik] Globaler Defense-Nachrichten-Mix"},
{"name": "Defense Brief", "url": "https://defbrief.com/feed/", "domain": "defbrief.com", "source_type": "rss_feed", "language": "en", "country_code": "MK", "fetch_strategy": "default", "notes": "[militaertechnik] Defense-News-Aggregator"},
{"name": "Defense Update", "url": "https://defense-update.com/feed", "domain": "defense-update.com", "source_type": "rss_feed", "language": "en", "country_code": "IL", "fetch_strategy": "default", "notes": "[militaertechnik] Israel/US-Equipment-Tiefe, Tamir Eshel"},
{"name": "Naval Technology", "url": "https://www.naval-technology.com/feed/", "domain": "naval-technology.com", "source_type": "rss_feed", "language": "en", "country_code": "GB", "fetch_strategy": "default", "notes": "[militaertechnik] Industrieperspektive Marine"},
{"name": "Army Technology", "url": "https://www.army-technology.com/feed/", "domain": "army-technology.com", "source_type": "rss_feed", "language": "en", "country_code": "GB", "fetch_strategy": "default", "notes": "[militaertechnik] Industrieperspektive Heer"},
{"name": "Airforce Technology", "url": "https://www.airforce-technology.com/feed/", "domain": "airforce-technology.com", "source_type": "rss_feed", "language": "en", "country_code": "GB", "fetch_strategy": "default", "notes": "[militaertechnik] Industrieperspektive Luftwaffe"},
{"name": "The Aviationist", "url": "https://theaviationist.com/feed/", "domain": "theaviationist.com", "source_type": "rss_feed", "language": "en", "country_code": "IT", "fetch_strategy": "default", "notes": "[militaertechnik] Militaerluftfahrt-Specials, David Cenciotti"},
{"name": "C4ISRNET", "url": "https://www.c4isrnet.com/arc/outboundfeeds/rss/?outputType=xml", "domain": "c4isrnet.com", "source_type": "rss_feed", "language": "en", "country_code": "US", "fetch_strategy": "default", "notes": "[militaertechnik] Aufklaerung, Cyber, EW, Netze"},
{"name": "DefenseScoop", "url": "https://defensescoop.com/feed/", "domain": "defensescoop.com", "source_type": "rss_feed", "language": "en", "country_code": "US", "fetch_strategy": "default", "notes": "[militaertechnik] Pentagon-IT, Cyber, KI"},
{"name": "Federation of American Scientists", "url": "https://fas.org/feed/", "domain": "fas.org", "source_type": "rss_feed", "language": "en", "country_code": "US", "fetch_strategy": "default", "notes": "[militaertechnik, waffen-international] Nuklear, Strategic Security, Project on Government Secrecy"},
{"name": "Military Times", "url": "https://www.militarytimes.com/arc/outboundfeeds/rss/?outputType=xml", "domain": "militarytimes.com", "source_type": "rss_feed", "language": "en", "country_code": "US", "fetch_strategy": "default", "notes": "[militaertechnik] US-Streitkraefte-Alltag, Beschaffung, Truppe"},
{"name": "Stars and Stripes", "url": "https://www.stripes.com/rss", "domain": "stripes.com", "source_type": "rss_feed", "language": "en", "country_code": "US", "fetch_strategy": "default", "notes": "[militaertechnik] US-Forces Worldwide"},
{"name": "Defense One", "url": "https://www.defenseone.com/rss/all/", "domain": "defenseone.com", "source_type": "rss_feed", "language": "en", "country_code": "US", "fetch_strategy": "default", "notes": "[militaertechnik] Defense-Politik + Tech"},
{"name": "Inside Defense", "url": "https://insidedefense.com", "domain": "insidedefense.com", "source_type": "web_source", "language": "en", "country_code": "US", "fetch_strategy": "paywall", "notes": "[militaertechnik] US-Pentagon-Insider, komplett paywalled"},
{"name": "RealClearDefense", "url": "https://www.realcleardefense.com/index.xml", "domain": "realcleardefense.com", "source_type": "rss_feed", "language": "en", "country_code": "US", "fetch_strategy": "default", "notes": "[militaertechnik] Aggregator + Kommentare"},
{"name": "War on the Rocks", "url": "https://warontherocks.com/feed/", "domain": "warontherocks.com", "source_type": "rss_feed", "language": "en", "country_code": "US", "fetch_strategy": "default", "notes": "[militaertechnik] Strategie-Essays, hochwertige Analyse"},
{"name": "RUSI Commentary", "url": "https://www.rusi.org/rss/commentary", "domain": "rusi.org", "source_type": "rss_feed", "language": "en", "country_code": "GB", "fetch_strategy": "default", "notes": "[militaertechnik] Royal United Services Institute, Strategie"},
{"name": "CSIS Defense & Security", "url": "https://www.csis.org/rss.xml", "domain": "csis.org", "source_type": "rss_feed", "language": "en", "country_code": "US", "fetch_strategy": "default", "notes": "[militaertechnik] Center for Strategic and International Studies"},
{"name": "Soldier Systems Daily", "url": "https://soldiersystems.net/feed/", "domain": "soldiersystems.net", "source_type": "rss_feed", "language": "en", "country_code": "US", "fetch_strategy": "default", "notes": "[militaertechnik] Tactical Gear und Ausruestung, extrem detailreich"},
{"name": "ESuT - Europaeische Sicherheit & Technik", "url": "https://esut.de/feed/", "domain": "esut.de", "source_type": "rss_feed", "language": "de", "country_code": "DE", "fetch_strategy": "default", "notes": "[militaertechnik] Heer/Luft/Marine, Mittler Report, sehr Equipment-orientiert"},
{"name": "Soldat & Technik", "url": "https://soldat-und-technik.de/feed/", "domain": "soldat-und-technik.de", "source_type": "rss_feed", "language": "de", "country_code": "DE", "fetch_strategy": "default", "notes": "[militaertechnik] Infanterie-Ausruestung, Mittler Report"},
{"name": "hartpunkt", "url": "https://www.hartpunkt.de/feed/", "domain": "hartpunkt.de", "source_type": "rss_feed", "language": "de", "country_code": "DE", "fetch_strategy": "default", "notes": "[militaertechnik] Ruestung und Sicherheitspolitik, unabhaengig"},
{"name": "Augen geradeaus!", "url": "https://augengeradeaus.net/feed/", "domain": "augengeradeaus.net", "source_type": "rss_feed", "language": "de", "country_code": "DE", "fetch_strategy": "default", "notes": "[militaertechnik] Thomas Wiegold, Bundeswehr-Insider"},
{"name": "Bundeswehr-Journal", "url": "https://www.bundeswehr-journal.de/feed/", "domain": "bundeswehr-journal.de", "source_type": "rss_feed", "language": "de", "country_code": "DE", "fetch_strategy": "default", "notes": "[militaertechnik] Bundeswehr-Themen"},
{"name": "Strategie & Technik (Mittler Report)", "url": "https://mittler-report.de/feed/", "domain": "mittler-report.de", "source_type": "rss_feed", "language": "de", "country_code": "DE", "fetch_strategy": "default", "notes": "[militaertechnik] Mittler-Verlag-Hauptfeed, Fachartikel"},
{"name": "cpm Defence Network", "url": "https://www.cpm-defence.de/feed/", "domain": "cpm-defence.de", "source_type": "rss_feed", "language": "de", "country_code": "DE", "fetch_strategy": "default", "notes": "[militaertechnik] Deutsche Ruestungsbranche"},
{"name": "Bundeswehr (offiziell)", "url": "https://www.bundeswehr.de/de/rss", "domain": "bundeswehr.de", "source_type": "rss_feed", "language": "de", "country_code": "DE", "fetch_strategy": "default", "notes": "[militaertechnik] Offizielle BMVg/BW-Meldungen"},
{"name": "Opex360 (Zone Militaire)", "url": "https://www.opex360.com/feed/", "domain": "opex360.com", "source_type": "rss_feed", "language": "fr", "country_code": "FR", "fetch_strategy": "default", "notes": "[militaertechnik] Sehr aktiv, Equipment + Operations FR"},
{"name": "Mer et Marine", "url": "https://www.meretmarine.com/fr/rss.xml", "domain": "meretmarine.com", "source_type": "rss_feed", "language": "fr", "country_code": "FR", "fetch_strategy": "default", "notes": "[militaertechnik] Marine + Schiffbau FR"},
{"name": "FOB - Forces Operations Blog", "url": "https://www.forcesoperations.com/feed/", "domain": "forcesoperations.com", "source_type": "rss_feed", "language": "fr", "country_code": "FR", "fetch_strategy": "default", "notes": "[militaertechnik] Spezialeinheiten und Ausruestung FR"},
{"name": "Lignes de Defense", "url": "https://lignesdedefense.blogs.ouest-france.fr/index.rdf", "domain": "lignesdedefense.blogs.ouest-france.fr", "source_type": "rss_feed", "language": "fr", "country_code": "FR", "fetch_strategy": "default", "notes": "[militaertechnik] Blog Ouest-France, Defense FR"},
{"name": "Air & Cosmos Defense", "url": "https://air-cosmos.com/category/defense/feed", "domain": "air-cosmos.com", "source_type": "rss_feed", "language": "fr", "country_code": "FR", "fetch_strategy": "default", "notes": "[militaertechnik] Luftfahrt + Defense FR"},
{"name": "Topwar / Voyennoye Obozreniye (EN)", "url": "https://en.topwar.ru/rss.xml", "domain": "topwar.ru", "source_type": "rss_feed", "language": "en", "country_code": "RU", "fetch_strategy": "default", "notes": "[militaertechnik] Pro-russisch (MBFC: Right Biased, Propaganda). Wert: Sicht auf eigene Technik"},
{"name": "TASS Defense", "url": "https://tass.com/rss/v2.xml?sections=MjQ%3D", "domain": "tass.com", "source_type": "rss_feed", "language": "en", "country_code": "RU", "fetch_strategy": "default", "notes": "[militaertechnik] Russische Staatsagentur, Defense-Section"},
{"name": "RIA Novosti Army (RU)", "url": "https://ria.ru/export/rss2/army/index.xml", "domain": "ria.ru", "source_type": "rss_feed", "language": "ru", "country_code": "RU", "fetch_strategy": "default", "notes": "[militaertechnik] Russische Staatsagentur, Army-Section"},
{"name": "bmpd (LiveJournal)", "url": "https://bmpd.livejournal.com/data/rss", "domain": "bmpd.livejournal.com", "source_type": "rss_feed", "language": "ru", "country_code": "RU", "fetch_strategy": "default", "notes": "[militaertechnik] Blog des CAST (Centre for Analysis of Strategies and Technologies)"},
{"name": "Zvezda TV", "url": "https://tvzvezda.ru/news.rss", "domain": "tvzvezda.ru", "source_type": "rss_feed", "language": "ru", "country_code": "RU", "fetch_strategy": "default", "notes": "[militaertechnik] TV-Sender des russischen Verteidigungsministeriums"},
{"name": "Defense Express (UA, EN)", "url": "https://en.defence-ua.com/rss/", "domain": "defence-ua.com", "source_type": "rss_feed", "language": "en", "country_code": "UA", "fetch_strategy": "default", "notes": "[militaertechnik] Ukrainische Industrie + Technik EN"},
{"name": "Militarnyi (EN)", "url": "https://militarnyi.com/en/feed/", "domain": "militarnyi.com", "source_type": "rss_feed", "language": "en", "country_code": "UA", "fetch_strategy": "default", "notes": "[militaertechnik] Ukrainisches Defense-Portal EN"},
{"name": "Defence24 (PL)", "url": "https://defence24.pl/rss", "domain": "defence24.pl", "source_type": "rss_feed", "language": "pl", "country_code": "PL", "fetch_strategy": "default", "notes": "[militaertechnik] Polens groesstes Defense-Portal PL"},
{"name": "Defence24.com (EN)", "url": "https://defence24.com/feed", "domain": "defence24.com", "source_type": "rss_feed", "language": "en", "country_code": "PL", "fetch_strategy": "default", "notes": "[militaertechnik] Englische Ausgabe Defence24"},
{"name": "Israel Defense (EN)", "url": "https://www.israeldefense.co.il/en/rss.xml", "domain": "israeldefense.co.il", "source_type": "rss_feed", "language": "en", "country_code": "IL", "fetch_strategy": "default", "notes": "[militaertechnik] Israelische Industrie + IDF"},
{"name": "IDF Spokesperson Website", "url": "https://www.idf.il/en/mini-sites/idf-spokesperson/", "domain": "idf.il", "source_type": "web_source", "language": "en", "country_code": "IL", "fetch_strategy": "default", "notes": "[militaertechnik] Offizielle IDF-Meldungen (Telegram-Kanal haben wir bereits)"},
{"name": "Mehr News Defense (FA)", "url": "https://www.mehrnews.com/rss/tp/12", "domain": "mehrnews.com", "source_type": "rss_feed", "language": "fa", "country_code": "IR", "fetch_strategy": "default", "notes": "[militaertechnik] Halbstaatliche iranische Agentur, Defense-Section"},
{"name": "Tasnim News Defense (FA)", "url": "https://www.tasnimnews.com/de/rss/feed/0/8/6/1/1", "domain": "tasnimnews.com", "source_type": "rss_feed", "language": "fa", "country_code": "IR", "fetch_strategy": "default", "notes": "[militaertechnik] IRGC-nah, Defense-Section"},
{"name": "Fars News Defense (FA)", "url": "https://www.farsnews.ir/rss?cat=8", "domain": "farsnews.ir", "source_type": "rss_feed", "language": "fa", "country_code": "IR", "fetch_strategy": "default", "notes": "[militaertechnik] IRGC-nah, Defense-Section"},
{"name": "China Military Online (EN)", "url": "http://eng.chinamil.com.cn/", "domain": "chinamil.com.cn", "source_type": "web_source", "language": "en", "country_code": "CN", "fetch_strategy": "default", "notes": "[militaertechnik] Offizielles PLA-Organ"},
{"name": "Global Times Military (EN)", "url": "https://www.globaltimes.cn/rss/military.xml", "domain": "globaltimes.cn", "source_type": "rss_feed", "language": "en", "country_code": "CN", "fetch_strategy": "default", "notes": "[militaertechnik] Chinesisches Staatsmedium, Military-Section"},
{"name": "The Diplomat - Security", "url": "https://thediplomat.com/category/security/feed/", "domain": "thediplomat.com", "source_type": "rss_feed", "language": "en", "country_code": "JP", "fetch_strategy": "default", "notes": "[militaertechnik] Asien-Pazifik-Sicherheitsanalyse, in Tokio sitzend"},
{"name": "ORYX (Spioenkop)", "url": "https://www.oryxspioenkop.com/feeds/posts/default", "domain": "oryxspioenkop.com", "source_type": "rss_feed", "language": "en", "country_code": "NL", "fetch_strategy": "default", "notes": "[militaertechnik, waffen-international] Visually confirmed losses, Equipment-DB Ukraine-Krieg"},
{"name": "WarSpotting", "url": "https://warspotting.net/", "domain": "warspotting.net", "source_type": "web_source", "language": "en", "country_code": "NL", "fetch_strategy": "default", "notes": "[militaertechnik, waffen-international] ORYX-Nachfolger fuer Ukraine, OSINT-Verluste"},
{"name": "Conflict Intelligence Team (CIT)", "url": "https://citeam.org/feed/", "domain": "citeam.org", "source_type": "rss_feed", "language": "en", "country_code": "RU", "fetch_strategy": "default", "notes": "[militaertechnik] Russisches Exil-OSINT-Kollektiv"},
{"name": "Telegram @rybar", "url": "t.me/rybar", "domain": "t.me", "source_type": "telegram_channel", "language": "ru", "country_code": "RU", "fetch_strategy": "default", "notes": "[militaertechnik] Grosser russischer Mil-OSINT-Kanal"},
{"name": "Telegram @osintdefender", "url": "t.me/osintdefender", "domain": "t.me", "source_type": "telegram_channel", "language": "en", "country_code": "US", "fetch_strategy": "default", "notes": "[militaertechnik] Pro-westliches Equipment-Tracking"},
{"name": "Telegram @CovertCabal", "url": "t.me/CovertCabal", "domain": "t.me", "source_type": "telegram_channel", "language": "en", "country_code": "US", "fetch_strategy": "default", "notes": "[militaertechnik] Sat-Bild-OSINT"},
{"name": "Telegram @Tendar", "url": "t.me/Tendar", "domain": "t.me", "source_type": "telegram_channel", "language": "en", "country_code": "DE", "fetch_strategy": "default", "notes": "[militaertechnik] UA-Konflikt-Analyse"},
{"name": "Telegram @Osint613", "url": "t.me/Osint613", "domain": "t.me", "source_type": "telegram_channel", "language": "en", "country_code": "IL", "fetch_strategy": "default", "notes": "[militaertechnik] Nahost-OSINT"},
{"name": "Behoerden-Spiegel", "url": "https://www.behoerden-spiegel.de/feed/", "domain": "behoerden-spiegel.de", "source_type": "rss_feed", "language": "de", "country_code": "DE", "fetch_strategy": "default", "notes": "[polizei-technik] DE-BOS-Magazin, Polizei, Fuehrungstechnik"},
{"name": "pvt Polizei Verkehr + Technik", "url": "https://www.pvtweb.de/feed/", "domain": "pvtweb.de", "source_type": "rss_feed", "language": "de", "country_code": "DE", "fetch_strategy": "default", "notes": "[polizei-technik] DE-Polizeitechnik-Fachzeitschrift"},
{"name": "Police Magazine (US)", "url": "https://www.policemag.com/rss", "domain": "policemag.com", "source_type": "rss_feed", "language": "en", "country_code": "US", "fetch_strategy": "default", "notes": "[polizei-technik] US-Polizei + Ausruestung"},
{"name": "Police1.com", "url": "https://www.police1.com/rss/feed", "domain": "police1.com", "source_type": "rss_feed", "language": "en", "country_code": "US", "fetch_strategy": "default", "notes": "[polizei-technik] US-Polizei-Industrie"},
{"name": "Officer.com", "url": "https://www.officer.com/rss", "domain": "officer.com", "source_type": "rss_feed", "language": "en", "country_code": "US", "fetch_strategy": "default", "notes": "[polizei-technik] US-Polizei + Equipment"},
{"name": "Small Arms Survey", "url": "https://www.smallarmssurvey.org/rss.xml", "domain": "smallarmssurvey.org", "source_type": "rss_feed", "language": "en", "country_code": "CH", "fetch_strategy": "default", "notes": "[waffen-international] Genfer Forschungsinstitut, Goldstandard Kleinwaffen, Working Papers + Issue Briefs"},
{"name": "SIPRI Publications", "url": "https://www.sipri.org/rss/publications.xml", "domain": "sipri.org", "source_type": "rss_feed", "language": "en", "country_code": "SE", "fetch_strategy": "default", "notes": "[waffen-international] Stockholm International Peace Research, Waffenexporte, Militaerausgaben, SALW"},
{"name": "Conflict Armament Research", "url": "https://www.conflictarm.com/feed/", "domain": "conflictarm.com", "source_type": "rss_feed", "language": "en", "country_code": "GB", "fetch_strategy": "default", "notes": "[waffen-international] Field-Tracking von Waffen in Konfliktzonen, Lieferketten-Forensik"},
{"name": "Armament Research Services (ARES)", "url": "https://armamentresearch.com/feed/", "domain": "armamentresearch.com", "source_type": "rss_feed", "language": "en", "country_code": "AU", "fetch_strategy": "default", "notes": "[militaertechnik, waffen-international] Munitions- und Waffen-Identifikation, sehr Equipment-tief"},
{"name": "Calibre Obscura (Substack)", "url": "https://calibreobscura.substack.com/feed", "domain": "calibreobscura.substack.com", "source_type": "rss_feed", "language": "en", "country_code": "US", "fetch_strategy": "default", "notes": "[waffen-international] OSINT-Spezialist Kleinwaffen Nahost"},
{"name": "Arms Control Association", "url": "https://www.armscontrol.org/rss.xml", "domain": "armscontrol.org", "source_type": "rss_feed", "language": "en", "country_code": "US", "fetch_strategy": "default", "notes": "[waffen-international] US-Think-Tank, Ruestungskontrolle + Proliferation"},
{"name": "Arms Control Wonk", "url": "https://www.armscontrolwonk.com/feed/", "domain": "armscontrolwonk.com", "source_type": "rss_feed", "language": "en", "country_code": "US", "fetch_strategy": "default", "notes": "[waffen-international] Nuklear- und Raketen-Spezialisten-Blog"},
{"name": "Action on Armed Violence (AOAV)", "url": "https://aoav.org.uk/feed/", "domain": "aoav.org.uk", "source_type": "rss_feed", "language": "en", "country_code": "GB", "fetch_strategy": "default", "notes": "[waffen-international] Explosive Waffen in besiedelten Gebieten, Opferzahlen"},
{"name": "BICC Bonn", "url": "https://www.bicc.de/feed", "domain": "bicc.de", "source_type": "rss_feed", "language": "de", "country_code": "DE", "fetch_strategy": "default", "notes": "[waffen-international] Bonn International Centre for Conflict Studies, Konflikt + Konversion"},
{"name": "Stimson Center", "url": "https://www.stimson.org/feed/", "domain": "stimson.org", "source_type": "rss_feed", "language": "en", "country_code": "US", "fetch_strategy": "default", "notes": "[waffen-international] US-Think-Tank, konventionelle + nukleare Ruestung"},
{"name": "ICRC Law and Policy Blog", "url": "https://blogs.icrc.org/law-and-policy/feed/", "domain": "icrc.org", "source_type": "rss_feed", "language": "en", "country_code": "CH", "fetch_strategy": "default", "notes": "[waffen-international] Voelkerrechtliche Sicht auf Waffenwirkung"}
]
}

116
scripts/seed_military_sources.py Ausführbare Datei
Datei anzeigen

@@ -0,0 +1,116 @@
#!/usr/bin/env python3
"""Bulk-Seed fuer Militaer- und Polizei-Technik-Quellen + internationale Waffen-Spezialisten.
Liest scripts/seed_military_sources.json und legt jede Quelle idempotent in der
Ziel-DB an (Default: Verwaltungs-Staging-DB). Bestehende Quellen werden anhand
der URL erkannt und uebersprungen.
Beispiel:
.venv/bin/python scripts/seed_military_sources.py
.venv/bin/python scripts/seed_military_sources.py --db /home/claude-dev/osint-data/osint.db
"""
from __future__ import annotations
import argparse
import json
import sqlite3
import sys
from pathlib import Path
DEFAULT_DB = "/home/claude-dev/AegisSight-Monitor-staging/data/osint.db"
SEED_FILE = Path(__file__).with_suffix(".json")
INSERT_SQL = """
INSERT INTO sources (
name, url, domain, source_type, category, status, notes,
language, country_code, fetch_strategy, added_by, tenant_id
) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, 'seed_military', NULL)
"""
EXISTS_SQL = "SELECT id FROM sources WHERE url = ? AND tenant_id IS NULL"
def main() -> int:
ap = argparse.ArgumentParser(description=__doc__)
ap.add_argument("--db", default=DEFAULT_DB, help="Pfad zur Ziel-SQLite-DB")
ap.add_argument("--seed", default=str(SEED_FILE), help="Pfad zur Seed-JSON")
ap.add_argument("--dry-run", action="store_true", help="Nur loggen, nichts schreiben")
args = ap.parse_args()
seed_path = Path(args.seed)
if not seed_path.is_file():
print(f"FEHLER: Seed-Datei nicht gefunden: {seed_path}", file=sys.stderr)
return 2
with seed_path.open("r", encoding="utf-8") as fh:
seed = json.load(fh)
sources = seed.get("sources", [])
if not sources:
print("FEHLER: Seed-Datei enthaelt keine sources", file=sys.stderr)
return 2
print(f"DB: {args.db}")
print(f"Seed: {seed_path} ({len(sources)} Eintraege)")
print(f"Dry-Run: {args.dry_run}")
print()
con = sqlite3.connect(args.db)
con.row_factory = sqlite3.Row
cur = con.cursor()
created: list[tuple[int, str]] = []
skipped: list[tuple[int, str]] = []
for entry in sources:
url = entry.get("url")
name = entry.get("name", "?")
if not url:
skipped.append((-1, f"{name}: ohne url"))
continue
row = cur.execute(EXISTS_SQL, (url,)).fetchone()
if row is not None:
skipped.append((row["id"], f"{name}: existiert bereits (id={row['id']})"))
continue
params = (
name,
url,
entry.get("domain"),
entry.get("source_type", "rss_feed"),
entry.get("category", "fachmedien"),
entry.get("status", "active"),
entry.get("notes"),
entry.get("language"),
entry.get("country_code"),
entry.get("fetch_strategy", "default"),
)
if args.dry_run:
created.append((-1, name))
continue
cur.execute(INSERT_SQL, params)
created.append((cur.lastrowid, name))
if not args.dry_run:
con.commit()
con.close()
print(f"Angelegt: {len(created)}")
print(f"Uebersprungen:{len(skipped)}")
print()
if created:
print("--- Neue IDs ---")
for src_id, name in created:
print(f" {src_id:>5} {name}")
if skipped:
print()
print("--- Uebersprungen ---")
for src_id, msg in skipped:
print(f" {src_id:>5} {msg}")
return 0
if __name__ == "__main__":
sys.exit(main())

Datei anzeigen

@@ -8,6 +8,10 @@ STATIC_DIR = os.path.join(os.path.dirname(os.path.abspath(__file__)), "static")
# Gemeinsame Datenbank (gleiche wie OSINT-Monitor)
DB_PATH = os.environ.get("DB_PATH", "/mnt/gitea/osint-data/osint.db")
# twscrape-Account-Store: die X-Login-Konten, mit denen der Monitor bei X
# recherchiert. Geteilt mit dem Monitor (gleicher Pfad-Default).
X_ACCOUNTS_DB_PATH = os.environ.get("X_ACCOUNTS_DB_PATH", "/home/claude-dev/.x-scraper/accounts.db")
# JWT (eigener Secret fuer Verwaltungsportal)
JWT_SECRET = os.environ.get("PORTAL_JWT_SECRET")
if not JWT_SECRET:
@@ -42,7 +46,7 @@ PORTAL_MAGIC_LINK_EXPIRE_MINUTES = int(
)
# Source Discovery (geteilte Config mit OSINT-Monitor)
CLAUDE_PATH = os.environ.get("CLAUDE_PATH", "/home/claude-dev/.claude/local/claude")
CLAUDE_PATH = os.environ.get("CLAUDE_PATH", "/usr/local/bin/claude")
CLAUDE_TIMEOUT = 300
MAX_FEEDS_PER_DOMAIN = 3
CLAUDE_MODEL_FAST = "claude-haiku-4-5-20251001"

Datei anzeigen

@@ -11,7 +11,7 @@ from fastapi.staticfiles import StaticFiles
from fastapi.responses import FileResponse
from config import STATIC_DIR, PORT
from routers import auth, organizations, licenses, users, dashboard, sources, token_usage, audit
from routers import auth, organizations, licenses, users, dashboard, sources, token_usage, audit, translation, x_scraper
logging.basicConfig(
level=logging.INFO,
@@ -42,6 +42,8 @@ app.include_router(dashboard.router)
app.include_router(sources.router)
app.include_router(token_usage.router)
app.include_router(audit.router)
app.include_router(translation.router)
app.include_router(x_scraper.router)
# --- Statische Dateien ---
app.mount("/static", StaticFiles(directory=STATIC_DIR), name="static")

Datei anzeigen

@@ -25,11 +25,13 @@ class TokenResponse(BaseModel):
class OrgCreate(BaseModel):
name: str = Field(min_length=1, max_length=200)
slug: str = Field(min_length=1, max_length=100, pattern="^[a-z0-9-]+$")
output_language: str = Field(default="de", pattern="^(de|en)$")
class OrgUpdate(BaseModel):
name: Optional[str] = Field(default=None, max_length=200)
is_active: Optional[bool] = None
output_language: Optional[str] = Field(default=None, pattern="^(de|en)$")
class OrgResponse(BaseModel):
@@ -43,6 +45,7 @@ class OrgResponse(BaseModel):
created_at: str
globe_access: bool = False
network_access: bool = False
output_language: str = "de"
class LicenseCreate(BaseModel):

Datei anzeigen

@@ -25,6 +25,15 @@ async def _enrich_org(db: aiosqlite.Connection, row: aiosqlite.Row) -> dict:
lic = await cursor.fetchone()
org["license_status"] = lic["status"] if lic else "none"
org["license_type"] = lic["license_type"] if lic else ""
# output_language aus organization_settings (Default 'de')
cursor = await db.execute(
"SELECT value FROM organization_settings WHERE organization_id = ? AND key = 'output_language'",
(org["id"],),
)
lang_row = await cursor.fetchone()
org["output_language"] = lang_row["value"] if lang_row else "de"
return org
@@ -57,6 +66,10 @@ async def create_organization(
org_id = cursor.lastrowid
await db.commit()
# output_language als organization_settings-Eintrag persistieren
from shared.services.org_settings import set_org_setting
await set_org_setting(db, org_id, "output_language", data.output_language)
cursor = await db.execute("SELECT * FROM organizations WHERE id = ?", (org_id,))
new_row_obj = await cursor.fetchone()
await log_action(
@@ -105,6 +118,11 @@ async def update_organization(
await db.execute(f"UPDATE organizations SET {set_clause} WHERE id = ?", values)
await db.commit()
# output_language separat ueber organization_settings setzen
if data.output_language is not None:
from shared.services.org_settings import set_org_setting
await set_org_setting(db, org_id, "output_language", data.output_language)
after = await row_to_dict(db, "organizations", org_id)
await log_action(
db, admin, get_client_ip(request),

Datei anzeigen

@@ -1,30 +1,106 @@
"""Grundquellen-Verwaltung und Kundenquellen-Übersicht."""
import json
import logging
import hashlib
import os
import re
import uuid
from fastapi import APIRouter, Depends, HTTPException, Request
from fastapi import APIRouter, BackgroundTasks, Depends, File, Form, HTTPException, Request, UploadFile, status
from fastapi.responses import StreamingResponse
from pydantic import BaseModel, Field
from typing import Optional
import aiosqlite
from auth import get_current_admin
from database import db_dependency
from database import db_dependency, get_db
from audit import log_action, get_client_ip
from source_meta import get_meta
from config import HEALTH_CHECK_USER_AGENT, HEALTH_CHECK_TIMEOUT_S
from config import HEALTH_CHECK_USER_AGENT, HEALTH_CHECK_TIMEOUT_S, DB_PATH
from shared.source_rules import (
discover_source,
discover_all_feeds,
evaluate_feeds_with_claude,
domain_to_display_name,
)
from shared.services.source_classifier import (
bulk_classify,
classify_source,
ALIGNMENT_VALUES,
POLITICAL_VALUES,
MEDIA_TYPE_VALUES,
RELIABILITY_VALUES,
)
from shared.services.external_reputation import (
apply_reputation_overrides,
sync_all as sync_external_reputation,
)
logger = logging.getLogger("verwaltung.sources")
router = APIRouter(prefix="/api/sources", tags=["sources"])
SOURCE_UPDATE_COLUMNS = {"name", "url", "domain", "source_type", "category", "status", "notes", "language", "bias", "fetch_strategy"}
SOURCE_UPDATE_COLUMNS = {
"name", "url", "domain", "source_type", "category", "status", "notes",
"language", "bias", "fetch_strategy",
"political_orientation", "media_type", "reliability",
"state_affiliated", "country_code",
}
SOURCE_CLASSIFICATION_FIELDS = {
"political_orientation", "media_type", "reliability",
"state_affiliated", "country_code",
}
async def _load_alignments_for(db: aiosqlite.Connection, source_ids: list[int]) -> dict[int, list[str]]:
if not source_ids:
return {}
placeholders = ",".join("?" for _ in source_ids)
cursor = await db.execute(
f"SELECT source_id, alignment FROM source_alignments WHERE source_id IN ({placeholders}) ORDER BY alignment",
source_ids,
)
out: dict[int, list[str]] = {sid: [] for sid in source_ids}
for row in await cursor.fetchall():
out.setdefault(row["source_id"], []).append(row["alignment"])
return out
async def _replace_alignments(db: aiosqlite.Connection, source_id: int, alignments: list[str]):
"""Ersetzt die alignments-Liste einer Quelle (DELETE + INSERT) — Aufrufer muss commit() machen."""
await db.execute("DELETE FROM source_alignments WHERE source_id = ?", (source_id,))
seen: set[str] = set()
for raw in alignments:
a = (raw or "").strip().lower()
if not a or a in seen:
continue
if a not in ALIGNMENT_VALUES:
raise HTTPException(
status_code=status.HTTP_422_UNPROCESSABLE_ENTITY,
detail=f"Ungueltiger alignment-Wert: '{a}'",
)
seen.add(a)
await db.execute(
"INSERT INTO source_alignments (source_id, alignment) VALUES (?, ?)",
(source_id, a),
)
async def _clear_proposed(db: aiosqlite.Connection, source_id: int):
await db.execute(
"""UPDATE sources SET
proposed_political_orientation = NULL,
proposed_media_type = NULL,
proposed_reliability = NULL,
proposed_state_affiliated = NULL,
proposed_country_code = NULL,
proposed_alignments_json = NULL,
proposed_confidence = NULL,
proposed_reasoning = NULL,
proposed_at = NULL
WHERE id = ?""",
(source_id,),
)
@router.get("/meta")
@@ -42,7 +118,7 @@ class GlobalSourceCreate(BaseModel):
name: str = Field(min_length=1, max_length=200)
url: Optional[str] = None
domain: Optional[str] = None
source_type: str = Field(default="rss_feed", pattern="^(rss_feed|web_source|excluded|telegram_channel|podcast_feed)$")
source_type: str = Field(default="rss_feed", pattern="^(rss_feed|web_source|excluded|telegram_channel|podcast_feed|pdf_document)$")
category: str = Field(default="sonstige")
status: str = Field(default="active", pattern="^(active|inactive)$")
notes: Optional[str] = None
@@ -55,12 +131,18 @@ class GlobalSourceUpdate(BaseModel):
name: Optional[str] = Field(default=None, max_length=200)
url: Optional[str] = None
domain: Optional[str] = None
source_type: Optional[str] = Field(default=None, pattern="^(rss_feed|web_source|excluded|telegram_channel|podcast_feed)$")
source_type: Optional[str] = Field(default=None, pattern="^(rss_feed|web_source|excluded|telegram_channel|podcast_feed|pdf_document)$")
category: Optional[str] = None
status: Optional[str] = Field(default=None, pattern="^(active|inactive)$")
notes: Optional[str] = None
language: Optional[str] = Field(default=None, max_length=100)
bias: Optional[str] = Field(default=None, max_length=500)
political_orientation: Optional[str] = None
media_type: Optional[str] = None
reliability: Optional[str] = None
state_affiliated: Optional[bool] = None
country_code: Optional[str] = Field(default=None, max_length=8)
alignments: Optional[list[str]] = None
@router.get("/global")
@@ -120,7 +202,11 @@ async def list_global_sources(
WHERE s.tenant_id IS NULL
ORDER BY s.category, s.source_type, s.name
""")
return [dict(row) for row in await cursor.fetchall()]
rows = [dict(row) for row in await cursor.fetchall()]
alignments_map = await _load_alignments_for(db, [r["id"] for r in rows])
for r in rows:
r["alignments"] = alignments_map.get(r["id"], [])
return rows
@router.post("/global", status_code=201)
@@ -170,7 +256,7 @@ async def update_global_source(
admin: dict = Depends(get_current_admin),
db: aiosqlite.Connection = Depends(db_dependency),
):
"""Grundquelle bearbeiten."""
"""Grundquelle bearbeiten — inkl. Klassifikation + alignments."""
cursor = await db.execute(
"SELECT * FROM sources WHERE id = ? AND tenant_id IS NULL", (source_id,)
)
@@ -178,21 +264,50 @@ async def update_global_source(
if not row:
raise HTTPException(status_code=404, detail="Grundquelle nicht gefunden")
before = dict(row)
before_alignments = sorted((await _load_alignments_for(db, [source_id])).get(source_id, []))
updates = {}
for field, value in data.model_dump(exclude_none=True).items():
updates[field] = value
payload = data.model_dump(exclude_none=True)
alignments = payload.pop("alignments", None)
if not updates:
return before
if "political_orientation" in payload and payload["political_orientation"] not in POLITICAL_VALUES:
raise HTTPException(status_code=422, detail=f"Ungueltige political_orientation: {payload['political_orientation']}")
if "media_type" in payload and payload["media_type"] not in MEDIA_TYPE_VALUES:
raise HTTPException(status_code=422, detail=f"Ungueltiger media_type: {payload['media_type']}")
if "reliability" in payload and payload["reliability"] not in RELIABILITY_VALUES:
raise HTTPException(status_code=422, detail=f"Ungueltige reliability: {payload['reliability']}")
updates = {k: v for k, v in payload.items() if k in SOURCE_UPDATE_COLUMNS}
if "state_affiliated" in updates:
updates["state_affiliated"] = 1 if updates["state_affiliated"] else 0
classification_touched = any(k in updates for k in SOURCE_CLASSIFICATION_FIELDS) or alignments is not None
if classification_touched:
updates["classification_source"] = "manual"
updates["classified_at"] = None # CURRENT_TIMESTAMP via SQL — siehe unten
if updates:
sets = []
vals = []
for k, v in updates.items():
if k == "classified_at":
sets.append("classified_at = CURRENT_TIMESTAMP")
else:
sets.append(f"{k} = ?")
vals.append(v)
vals.append(source_id)
await db.execute(f"UPDATE sources SET {', '.join(sets)} WHERE id = ?", vals)
if alignments is not None:
await _replace_alignments(db, source_id, alignments)
set_clause = ", ".join(f"{k} = ?" for k in updates)
values = list(updates.values()) + [source_id]
await db.execute(f"UPDATE sources SET {set_clause} WHERE id = ?", values)
await db.commit()
cursor = await db.execute("SELECT * FROM sources WHERE id = ?", (source_id,))
after = dict(await cursor.fetchone())
after_alignments = sorted((await _load_alignments_for(db, [source_id])).get(source_id, []))
if before_alignments != after_alignments:
before["alignments"] = before_alignments
after["alignments"] = after_alignments
await log_action(
db, admin, get_client_ip(request),
action="update", resource_type="source", resource_id=source_id,
@@ -565,35 +680,82 @@ async def add_discovered_sources(
@router.get("/health")
async def get_health(
limit: int = 100,
offset: int = 0,
admin: dict = Depends(get_current_admin),
db: aiosqlite.Connection = Depends(db_dependency),
):
"""Health-Check-Ergebnisse abrufen."""
"""Health-Check-Ergebnisse abrufen.
Default-Limit 100, sortiert nach Status (errors first, dann warnings, dann ok).
Counters (errors/warnings/ok/total_checks) beziehen sich auf den GESAMTEN
Datenbestand, nicht nur auf die zurückgegebene Page. Damit kann das Frontend
den vollen Status anzeigen, ohne alle Zeilen rendern zu müssen.
has_more zeigt an, ob es weitere Items zum Nachladen gibt.
all_orgs liefert die Liste aller Tenants mit Health-Checks (für Filter-Dropdown).
"""
limit = max(1, min(int(limit or 100), 5000))
offset = max(0, int(offset or 0))
# Prüfen ob Tabelle existiert
cursor = await db.execute(
"SELECT name FROM sqlite_master WHERE type='table' AND name='source_health_checks'"
)
if not await cursor.fetchone():
return {"last_check": None, "total_checks": 0, "errors": 0, "warnings": 0, "ok": 0, "checks": []}
return {
"last_check": None, "total_checks": 0,
"errors": 0, "warnings": 0, "ok": 0,
"checks": [], "all_orgs": [],
"limit": limit, "offset": offset, "has_more": False,
}
# Aggregate über GESAMTEN Bestand. Eine GROUP-BY-Query nach (check_type, status)
# liefert sowohl die Top-Counters als auch das feine Breakdown für die UI.
cursor = await db.execute(
"SELECT check_type, status, COUNT(*) AS n FROM source_health_checks GROUP BY check_type, status"
)
breakdown = {} # {check_type: {status: count}}
error_count = 0
warning_count = 0
ok_count = 0
for row in await cursor.fetchall():
ct = row["check_type"]
st = row["status"]
breakdown.setdefault(ct, {})[st] = row["n"]
if st == "error":
error_count += row["n"]
elif st == "warning":
warning_count += row["n"]
elif st == "ok":
ok_count += row["n"]
total_checks = error_count + warning_count + ok_count
# Paginierte Daten
cursor = await db.execute("""
SELECT
h.id, h.source_id, s.name, s.domain, s.url, s.source_type,
s.tenant_id, s.category, s.language, s.bias,
h.source_id, s.name, s.domain, s.tenant_id, s.language,
o.name AS org_name,
h.check_type, h.status, h.message, h.details, h.checked_at
h.check_type, h.status, h.message
FROM source_health_checks h
JOIN sources s ON s.id = h.source_id
LEFT JOIN organizations o ON o.id = s.tenant_id
ORDER BY
CASE h.status WHEN 'error' THEN 0 WHEN 'warning' THEN 1 ELSE 2 END,
s.name
""")
LIMIT ? OFFSET ?
""", (limit, offset))
checks = [dict(row) for row in await cursor.fetchall()]
error_count = sum(1 for c in checks if c["status"] == "error")
warning_count = sum(1 for c in checks if c["status"] == "warning")
ok_count = sum(1 for c in checks if c["status"] == "ok")
# Org-Liste (alle Tenants mit Health-Checks, für Frontend-Filter-Dropdown)
cursor = await db.execute("""
SELECT DISTINCT s.tenant_id AS id, o.name AS name
FROM source_health_checks h
JOIN sources s ON s.id = h.source_id
LEFT JOIN organizations o ON o.id = s.tenant_id
WHERE s.tenant_id IS NOT NULL
ORDER BY o.name
""")
all_orgs = [dict(row) for row in await cursor.fetchall()]
cursor = await db.execute("SELECT MAX(checked_at) as last_check FROM source_health_checks")
row = await cursor.fetchone()
@@ -601,11 +763,16 @@ async def get_health(
return {
"last_check": last_check,
"total_checks": len(checks),
"total_checks": total_checks,
"errors": error_count,
"warnings": warning_count,
"ok": ok_count,
"breakdown": breakdown,
"checks": checks,
"all_orgs": all_orgs,
"limit": limit,
"offset": offset,
"has_more": (offset + len(checks)) < total_checks,
}
@@ -769,28 +936,6 @@ async def update_suggestion(
return {"status": new_status, "action": result_action}
@router.post("/health/run")
async def run_health_check_now(
admin: dict = Depends(get_current_admin),
db: aiosqlite.Connection = Depends(db_dependency),
):
"""Health-Check manuell starten."""
# source_health und source_suggester importieren
from shared.services.source_health import run_health_checks
from shared.services.source_suggester import generate_suggestions
result = await run_health_checks(db)
suggestion_count = await generate_suggestions(db)
return {
"checked": result["checked"],
"issues": result["issues"],
"suggestions": suggestion_count,
}
@router.post("/health/run-stream")
async def run_health_check_stream(
@@ -1056,3 +1201,420 @@ Nur das JSON, kein anderer Text."""
except Exception as e:
raise HTTPException(status_code=500, detail=f"Recherche fehlgeschlagen: {e}")
# === Klassifikations-Review (LLM-Vorschlaege approve/reject/reclassify) ===
@router.get("/classification/stats")
async def classification_stats(
admin: dict = Depends(get_current_admin),
db: aiosqlite.Connection = Depends(db_dependency),
):
"""Counts pro classification_source-Wert + Anzahl Pending-Reviews (alle Quellen)."""
cursor = await db.execute(
"""SELECT classification_source, COUNT(*) as cnt
FROM sources
WHERE status = 'active'
GROUP BY classification_source"""
)
by_source = {row["classification_source"] or "legacy": row["cnt"] for row in await cursor.fetchall()}
cursor = await db.execute(
"""SELECT COUNT(*) as cnt FROM sources
WHERE status = 'active' AND proposed_political_orientation IS NOT NULL"""
)
pending = (await cursor.fetchone())["cnt"]
return {
"by_classification_source": by_source,
"pending_review": pending,
"total": sum(by_source.values()),
}
@router.get("/classification/queue")
async def classification_queue(
limit: int = 50,
min_confidence: float = 0.0,
admin: dict = Depends(get_current_admin),
db: aiosqlite.Connection = Depends(db_dependency),
):
"""Liefert Quellen mit nicht-leeren proposed_*-Spalten (Review-Queue)."""
cursor = await db.execute(
"""SELECT s.* FROM sources s
WHERE s.proposed_political_orientation IS NOT NULL
AND COALESCE(s.proposed_confidence, 0) >= ?
ORDER BY s.proposed_confidence DESC, s.proposed_at DESC
LIMIT ?""",
(min_confidence, limit),
)
rows = [dict(r) for r in await cursor.fetchall()]
alignments_map = await _load_alignments_for(db, [r["id"] for r in rows])
out = []
for d in rows:
try:
proposed_aligns = json.loads(d.get("proposed_alignments_json") or "[]")
except (json.JSONDecodeError, TypeError):
proposed_aligns = []
out.append({
"id": d["id"],
"name": d["name"],
"url": d.get("url"),
"domain": d.get("domain"),
"source_type": d.get("source_type"),
"category": d.get("category"),
"is_global": d.get("tenant_id") is None,
"current": {
"political_orientation": d.get("political_orientation"),
"media_type": d.get("media_type"),
"reliability": d.get("reliability"),
"state_affiliated": bool(d.get("state_affiliated")),
"country_code": d.get("country_code"),
"alignments": alignments_map.get(d["id"], []),
"classification_source": d.get("classification_source"),
},
"proposed": {
"political_orientation": d.get("proposed_political_orientation"),
"media_type": d.get("proposed_media_type"),
"reliability": d.get("proposed_reliability"),
"state_affiliated": bool(d.get("proposed_state_affiliated")),
"country_code": d.get("proposed_country_code"),
"alignments": proposed_aligns,
"confidence": d.get("proposed_confidence"),
"reasoning": d.get("proposed_reasoning"),
"proposed_at": d.get("proposed_at"),
},
})
return out
@router.post("/{source_id}/classification/approve")
async def approve_classification(
source_id: int,
request: Request,
admin: dict = Depends(get_current_admin),
db: aiosqlite.Connection = Depends(db_dependency),
):
"""Uebernimmt proposed_* in echte Felder, setzt classification_source='llm_approved'."""
cursor = await db.execute("SELECT * FROM sources WHERE id = ?", (source_id,))
row = await cursor.fetchone()
if not row:
raise HTTPException(status_code=404, detail="Quelle nicht gefunden")
src = dict(row)
before_alignments = sorted((await _load_alignments_for(db, [source_id])).get(source_id, []))
before = {**src, "alignments": before_alignments}
if src.get("proposed_political_orientation") is None:
raise HTTPException(status_code=400, detail="Keine LLM-Vorschlaege fuer diese Quelle vorhanden")
try:
proposed_aligns = json.loads(src.get("proposed_alignments_json") or "[]")
except (json.JSONDecodeError, TypeError):
proposed_aligns = []
await db.execute(
"""UPDATE sources SET
political_orientation = ?,
media_type = ?,
reliability = ?,
state_affiliated = ?,
country_code = ?,
classification_source = 'llm_approved',
classified_at = CURRENT_TIMESTAMP
WHERE id = ?""",
(
src["proposed_political_orientation"],
src["proposed_media_type"],
src["proposed_reliability"],
1 if src.get("proposed_state_affiliated") else 0,
src.get("proposed_country_code"),
source_id,
),
)
await _replace_alignments(db, source_id, [a for a in proposed_aligns if a in ALIGNMENT_VALUES])
await _clear_proposed(db, source_id)
await db.commit()
try:
await apply_reputation_overrides(db, source_id)
except Exception as e:
logger.warning("Reputation-Override fuer source_id=%s fehlgeschlagen: %s", source_id, e)
cursor = await db.execute("SELECT * FROM sources WHERE id = ?", (source_id,))
after_row = dict(await cursor.fetchone())
after_alignments = sorted((await _load_alignments_for(db, [source_id])).get(source_id, []))
after = {**after_row, "alignments": after_alignments}
await log_action(
db, admin, get_client_ip(request),
action="update", resource_type="source", resource_id=source_id,
before=before, after=after,
)
return {"source_id": source_id, "status": "approved"}
@router.post("/{source_id}/classification/reject")
async def reject_classification(
source_id: int,
request: Request,
admin: dict = Depends(get_current_admin),
db: aiosqlite.Connection = Depends(db_dependency),
):
"""Verwirft die LLM-Vorschlaege ohne Uebernahme. classification_source: 'llm_pending' -> 'legacy'."""
cursor = await db.execute("SELECT * FROM sources WHERE id = ?", (source_id,))
row = await cursor.fetchone()
if not row:
raise HTTPException(status_code=404, detail="Quelle nicht gefunden")
src = dict(row)
before = dict(src)
await _clear_proposed(db, source_id)
if src.get("classification_source") == "llm_pending":
await db.execute(
"UPDATE sources SET classification_source = 'legacy' WHERE id = ?",
(source_id,),
)
await db.commit()
cursor = await db.execute("SELECT * FROM sources WHERE id = ?", (source_id,))
after = dict(await cursor.fetchone())
await log_action(
db, admin, get_client_ip(request),
action="update", resource_type="source", resource_id=source_id,
before=before, after=after,
)
return {"source_id": source_id, "status": "rejected"}
@router.post("/{source_id}/classification/reclassify")
async def reclassify_source(
source_id: int,
admin: dict = Depends(get_current_admin),
db: aiosqlite.Connection = Depends(db_dependency),
):
"""Triggert eine LLM-Klassifikation einer einzelnen Quelle (synchron, ~3-5s)."""
cursor = await db.execute("SELECT id FROM sources WHERE id = ?", (source_id,))
row = await cursor.fetchone()
if not row:
raise HTTPException(status_code=404, detail="Quelle nicht gefunden")
try:
result = await classify_source(db, source_id)
except Exception as e:
logger.error("Reclassify source_id=%s fehlgeschlagen: %s", source_id, e, exc_info=True)
raise HTTPException(status_code=500, detail=f"Klassifikation fehlgeschlagen: {e}")
return result
async def _bulk_classify_background(limit: int, only_unclassified: bool):
"""Hintergrund-Task: oeffnet eigene DB-Connection."""
db = await get_db()
try:
await bulk_classify(db, limit=limit, only_unclassified=only_unclassified)
finally:
await db.close()
@router.post("/classification/bulk-classify")
async def trigger_bulk_classify(
background_tasks: BackgroundTasks,
limit: int = 50,
only_unclassified: bool = True,
admin: dict = Depends(get_current_admin),
):
"""Startet eine Bulk-Klassifikation im Hintergrund."""
if limit < 1 or limit > 500:
raise HTTPException(status_code=400, detail="limit muss zwischen 1 und 500 liegen")
background_tasks.add_task(_bulk_classify_background, limit, only_unclassified)
return {"status": "started", "limit": limit, "only_unclassified": only_unclassified}
@router.post("/external-reputation/sync")
async def trigger_external_reputation_sync(
background_tasks: BackgroundTasks,
admin: dict = Depends(get_current_admin),
):
"""Startet Sync von IFCN- und EUvsDisinfo-Daten (Hintergrund)."""
async def _bg():
db = await get_db()
try:
await sync_external_reputation(db)
finally:
await db.close()
background_tasks.add_task(_bg)
return {"status": "started"}
@router.post("/classification/bulk-approve")
async def bulk_approve_classifications(
request: Request,
min_confidence: float = 0.85,
admin: dict = Depends(get_current_admin),
db: aiosqlite.Connection = Depends(db_dependency),
):
"""Genehmigt alle Pending-Vorschlaege ueber dem confidence-Schwellwert."""
cursor = await db.execute(
"""SELECT id, proposed_political_orientation, proposed_media_type,
proposed_reliability, proposed_state_affiliated,
proposed_country_code, proposed_alignments_json
FROM sources
WHERE proposed_political_orientation IS NOT NULL
AND COALESCE(proposed_confidence, 0) >= ?""",
(min_confidence,),
)
rows = [dict(r) for r in await cursor.fetchall()]
approved_ids: list[int] = []
for src in rows:
try:
proposed_aligns = json.loads(src.get("proposed_alignments_json") or "[]")
except (json.JSONDecodeError, TypeError):
proposed_aligns = []
await db.execute(
"""UPDATE sources SET
political_orientation = ?,
media_type = ?,
reliability = ?,
state_affiliated = ?,
country_code = ?,
classification_source = 'llm_approved',
classified_at = CURRENT_TIMESTAMP
WHERE id = ?""",
(
src["proposed_political_orientation"],
src["proposed_media_type"],
src["proposed_reliability"],
1 if src.get("proposed_state_affiliated") else 0,
src.get("proposed_country_code"),
src["id"],
),
)
await _replace_alignments(
db, src["id"], [a for a in proposed_aligns if a in ALIGNMENT_VALUES]
)
await _clear_proposed(db, src["id"])
approved_ids.append(src["id"])
await db.commit()
try:
for sid in approved_ids:
await apply_reputation_overrides(db, sid)
except Exception as e:
logger.warning("Bulk Reputation-Override fehlgeschlagen: %s", e)
await log_action(
db, admin, get_client_ip(request),
action="update", resource_type="source", resource_id=None,
after={"bulk_approved_ids": approved_ids, "min_confidence": min_confidence},
)
return {"approved": len(approved_ids), "ids": approved_ids}
# --- PDF-Upload (Quelle vom Typ pdf_document) ---
# Speicherort relativ zur DB: <dirname(DB_PATH)>/pdfs/{sha256}.pdf
# Der Monitor pollt pdf_document-Quellen mit processed_at IS NULL und
# extrahiert Text + Uebersetzungen (DE/EN). Dieser Endpoint legt nur die
# Datei + den Source-Eintrag an (kein LLM-Call hier).
MAX_PDF_SIZE_BYTES = 50 * 1024 * 1024 # 50 MB
PDF_DIR = os.path.join(os.path.dirname(os.path.abspath(DB_PATH)), "pdfs")
def _pdf_dir() -> str:
os.makedirs(PDF_DIR, exist_ok=True)
return PDF_DIR
@router.post("/global/upload-pdf", status_code=201)
async def upload_pdf_source(
request: Request,
admin: dict = Depends(get_current_admin),
db: aiosqlite.Connection = Depends(db_dependency),
file: UploadFile = File(...),
name: Optional[str] = Form(None),
category: str = Form("sonstige"),
language: Optional[str] = Form(None),
notes: Optional[str] = Form(None),
):
"""PDF hochladen + als Grundquelle (source_type=pdf_document) registrieren.
Idempotent ueber SHA256: bestehender Eintrag wird zurueckgegeben (409 mit
Detail), die Datei wird nicht erneut gespeichert.
"""
# Magic-Bytes-Check (PDF beginnt mit %PDF-)
head = await file.read(8)
if not head.startswith(b"%PDF-"):
raise HTTPException(status_code=415, detail="Datei ist kein gueltiges PDF (Magic-Bytes fehlen)")
# Datei streaming in Temp lesen + sha256 berechnen + Groesse pruefen
sha = hashlib.sha256()
sha.update(head)
total = len(head)
tmp_path = os.path.join(_pdf_dir(), f".upload-{uuid.uuid4().hex}.tmp")
try:
with open(tmp_path, "wb") as out:
out.write(head)
while True:
chunk = await file.read(1024 * 1024)
if not chunk:
break
total += len(chunk)
if total > MAX_PDF_SIZE_BYTES:
raise HTTPException(status_code=413, detail=f"PDF ueberschreitet Maximum von {MAX_PDF_SIZE_BYTES // 1024 // 1024} MB")
sha.update(chunk)
out.write(chunk)
sha_hex = sha.hexdigest()
final_path = os.path.join(_pdf_dir(), f"{sha_hex}.pdf")
rel_path = os.path.join("pdfs", f"{sha_hex}.pdf")
# Duplikat-Check ueber sha256
cursor = await db.execute(
"SELECT id, name FROM sources WHERE pdf_sha256 = ? AND tenant_id IS NULL",
(sha_hex,),
)
existing = await cursor.fetchone()
if existing:
# Datei wegwerfen, bestehende Quelle zurueckgeben
os.unlink(tmp_path)
raise HTTPException(
status_code=409,
detail=f"PDF bereits hochgeladen als Quelle '{existing['name']}' (id={existing['id']})",
)
# Atomar umbenennen
if not os.path.exists(final_path):
os.replace(tmp_path, final_path)
else:
# Datei mit gleichem sha existiert physisch, aber keine Source -> wiederverwenden
os.unlink(tmp_path)
except HTTPException:
if os.path.exists(tmp_path):
try: os.unlink(tmp_path)
except OSError: pass
raise
except Exception as e:
if os.path.exists(tmp_path):
try: os.unlink(tmp_path)
except OSError: pass
logger.exception("PDF-Upload fehlgeschlagen")
raise HTTPException(status_code=500, detail=f"PDF-Upload fehlgeschlagen: {e}")
# Name herleiten falls nicht angegeben
display_name = (name or "").strip() or re.sub(r"\.pdf$", "", file.filename or "PDF", flags=re.I)
display_name = display_name[:200]
cursor = await db.execute(
"""INSERT INTO sources
(name, url, domain, source_type, category, status, notes, language,
pdf_path, pdf_sha256, added_by, tenant_id)
VALUES (?, NULL, NULL, 'pdf_document', ?, 'active', ?, ?, ?, ?, ?, NULL)""",
(display_name, category, notes, language, rel_path, sha_hex, admin.get("email") or "system"),
)
src_id = cursor.lastrowid
await db.commit()
cursor = await db.execute("SELECT * FROM sources WHERE id = ?", (src_id,))
new_src = dict(await cursor.fetchone())
await log_action(
db, admin, get_client_ip(request),
action="upload_pdf", resource_type="source", resource_id=src_id,
after={"name": display_name, "pdf_sha256": sha_hex, "size_bytes": total},
)
return new_src

222
src/routers/translation.py Normale Datei
Datei anzeigen

@@ -0,0 +1,222 @@
"""Manuelle Artikel-Übersetzung.
Stößt die Haiku-Übersetzung fremdsprachiger Artikel an, die noch keine
deutsche Fassung haben. Im Monitor läuft der Translator seit 2026-05-22 NICHT
mehr automatisch (TRANSLATOR_ENABLED=false), weil ein sehr großer Lauf den
Refresh-Worker blockierte. Dieser Endpoint ist der bewusste manuelle Ersatz:
er läuft als entkoppelter Hintergrund-Task, blockiert keinen Request und ist
jederzeit abbrechbar.
"""
import asyncio
import logging
from datetime import datetime, timezone
from fastapi import APIRouter, Depends, HTTPException, Request
from auth import get_current_admin
from audit import log_action, get_client_ip
from database import get_db
from translation_agent import translate_articles_batch
logger = logging.getLogger("verwaltung.translation")
router = APIRouter(prefix="/api/translation", tags=["Translation"])
# Batch-Größe wie im Translator-Agent (durch das Haiku-Output-Limit bestimmt).
_BATCH_SIZE = 5
# Grobe Schätzwerte aus Produktiv-Logs (Haiku, 5 Artikel/Batch):
# rund 17 s und rund $0.03 pro Batch.
_SECONDS_PER_ARTICLE = 3.5
_COST_PER_ARTICLE = 0.006
# Artikel ohne deutsche Fassung: fremdsprachig (language gesetzt und != de)
# und headline_de ODER content_de fehlt.
_PENDING_WHERE = (
"language IS NOT NULL AND LOWER(language) != 'de' "
"AND (headline_de IS NULL OR headline_de = '' "
"OR content_de IS NULL OR content_de = '')"
)
# Modul-globaler Job-Status. Es gibt bewusst nur EINEN Übersetzungs-Job
# gleichzeitig, das hält Claude-Last und DB-Schreiblast kalkulierbar.
_job: dict = {
"running": False,
"started_at": None,
"finished_at": None,
"total": 0,
"done": 0,
"translated": 0,
"failed_batches": 0,
"cancelled": False,
"error": None,
"started_by": None,
}
_job_lock = asyncio.Lock()
_cancel_event = asyncio.Event()
# Referenz auf den laufenden Task halten, damit der Garbage Collector ihn
# nicht vorzeitig einsammelt.
_job_task: asyncio.Task | None = None
def _now_iso() -> str:
return datetime.now(timezone.utc).strftime("%Y-%m-%dT%H:%M:%SZ")
async def _count_pending(db) -> int:
cursor = await db.execute(
f"SELECT COUNT(*) FROM articles WHERE {_PENDING_WHERE}"
)
row = await cursor.fetchone()
return row[0] if row else 0
async def _run_translation_job(started_by: str):
"""Hintergrund-Task: übersetzt alle ausstehenden Artikel batchweise.
Schreibt nach jedem Batch in die DB zurück und aktualisiert den
Fortschritt, damit das Frontend live mitlesen kann. Bricht zwischen den
Batches ab, sobald _cancel_event gesetzt ist.
"""
db = await get_db()
try:
# Großzügiger Lock-Timeout, weil der Monitor parallel in dieselbe
# geteilte DB schreiben kann (WAL erlaubt nur einen Writer).
await db.execute("PRAGMA busy_timeout=30000")
cursor = await db.execute(
f"SELECT id, headline, content_original, language "
f"FROM articles WHERE {_PENDING_WHERE} ORDER BY id DESC"
)
articles = [dict(r) for r in await cursor.fetchall()]
_job["total"] = len(articles)
logger.info(
"Übersetzungs-Job gestartet von %s: %d Artikel",
started_by, len(articles),
)
for i in range(0, len(articles), _BATCH_SIZE):
if _cancel_event.is_set():
_job["cancelled"] = True
logger.info(
"Übersetzungs-Job abgebrochen bei %d/%d",
_job["done"], _job["total"],
)
break
batch = articles[i : i + _BATCH_SIZE]
try:
translations, _usage = await translate_articles_batch(batch, "de")
except Exception as e: # pragma: no cover - defensiv
_job["failed_batches"] += 1
logger.error("Übersetzungs-Batch fehlgeschlagen: %s", e)
_job["done"] = min(i + _BATCH_SIZE, len(articles))
continue
for t in translations:
hd = t.get("headline_de")
cd = t.get("content_de")
if hd or cd:
await db.execute(
"UPDATE articles SET "
"headline_de = COALESCE(?, headline_de), "
"content_de = COALESCE(?, content_de) WHERE id = ?",
(hd, cd, t["id"]),
)
_job["translated"] += 1
await db.commit()
_job["done"] = min(i + _BATCH_SIZE, len(articles))
logger.info(
"Übersetzungs-Job beendet: %d/%d übersetzt, %d Batch-Fehler, abgebrochen=%s",
_job["translated"], _job["total"], _job["failed_batches"],
_job["cancelled"],
)
except Exception as e:
_job["error"] = str(e)
logger.error(
"Übersetzungs-Job mit Fehler beendet: %s", e, exc_info=True
)
finally:
_job["running"] = False
_job["finished_at"] = _now_iso()
await db.close()
@router.get("/status")
async def translation_status(admin=Depends(get_current_admin)):
"""Aktueller Job-Status plus Anzahl noch nicht übersetzter Artikel."""
db = await get_db()
try:
pending = await _count_pending(db)
finally:
await db.close()
snap = dict(_job)
snap["pending"] = pending
snap["estimate"] = {
"seconds": round(pending * _SECONDS_PER_ARTICLE),
"cost_usd": round(pending * _COST_PER_ARTICLE, 2),
}
return snap
@router.post("/run")
async def translation_run(request: Request, admin=Depends(get_current_admin)):
"""Startet die Übersetzung aller ausstehenden Artikel als Hintergrund-Task."""
global _job_task
async with _job_lock:
if _job["running"]:
raise HTTPException(
status_code=409, detail="Es läuft bereits eine Übersetzung."
)
db = await get_db()
try:
pending = await _count_pending(db)
if pending == 0:
return {"status": "nothing_to_do", "pending": 0}
await log_action(
db, admin, get_client_ip(request), "translation.run",
resource_type="articles", after={"pending": pending},
)
finally:
await db.close()
started_by = (
admin.get("email") or admin.get("username") or str(admin.get("id"))
)
# Job-Status zurücksetzen und Task entkoppelt starten.
_cancel_event.clear()
_job.update({
"running": True,
"started_at": _now_iso(),
"finished_at": None,
"total": pending,
"done": 0,
"translated": 0,
"failed_batches": 0,
"cancelled": False,
"error": None,
"started_by": started_by,
})
_job_task = asyncio.create_task(_run_translation_job(started_by))
logger.info(
"Übersetzung manuell gestartet von %s (%d Artikel)", started_by, pending
)
return {"status": "started", "pending": pending}
@router.post("/cancel")
async def translation_cancel(request: Request, admin=Depends(get_current_admin)):
"""Bricht einen laufenden Übersetzungs-Job nach dem aktuellen Batch ab."""
if not _job["running"]:
raise HTTPException(
status_code=409, detail="Es läuft keine Übersetzung."
)
_cancel_event.set()
db = await get_db()
try:
await log_action(
db, admin, get_client_ip(request), "translation.cancel",
resource_type="articles",
)
finally:
await db.close()
return {"status": "cancelling"}

224
src/routers/x_scraper.py Normale Datei
Datei anzeigen

@@ -0,0 +1,224 @@
"""X-Scraper-Konten: Verwaltung des twscrape-Account-Pools.
Das sind die X-Login-Konten, mit denen der Monitor bei X recherchiert
(scrapen). Sie liegen im twscrape-Account-Store (config.X_ACCOUNTS_DB_PATH),
nicht in der Verwaltungs-Datenbank. twscrape wird lazy importiert, damit das
Portal auch ohne installiertes twscrape startet.
"""
import logging
import os
from datetime import datetime, timezone
from typing import Optional
import aiosqlite
from fastapi import APIRouter, Depends, HTTPException, Request
from pydantic import BaseModel, Field
from auth import get_current_admin
from audit import log_action, get_client_ip
from config import X_ACCOUNTS_DB_PATH
from database import db_dependency
logger = logging.getLogger("verwaltung.x_scraper")
router = APIRouter(prefix="/api/x-scraper", tags=["x-scraper"])
def _get_pool():
"""twscrape-AccountsPool oeffnen. Wirft HTTPException wenn nicht verfuegbar."""
try:
os.makedirs(os.path.dirname(X_ACCOUNTS_DB_PATH), exist_ok=True)
except Exception:
pass
try:
from twscrape import API
except ImportError:
raise HTTPException(status_code=503, detail="twscrape ist nicht installiert")
return API(X_ACCOUNTS_DB_PATH).pool
def _summary(acc) -> dict:
"""Account-Objekt auf ein anzeigbares Dict reduzieren -- ohne Geheimnisse."""
now = datetime.now(timezone.utc)
locked = False
locked_until = None
for ts in (acc.locks or {}).values():
if ts and ts > now:
locked = True
if locked_until is None or ts > locked_until:
locked_until = ts
return {
"username": acc.username,
"email": acc.email if acc.email and acc.email != "_" else None,
"active": bool(acc.active),
"locked": locked,
"locked_until": locked_until.isoformat() if locked_until else None,
"has_cookies": bool(acc.cookies),
"total_requests": sum((acc.stats or {}).values()),
"last_used": acc.last_used.isoformat() if acc.last_used else None,
"error_msg": acc.error_msg or None,
}
class XScraperCreate(BaseModel):
username: str = Field(min_length=1, max_length=100)
password: str = Field(default="", max_length=200)
email: str = Field(default="", max_length=200)
email_password: str = Field(default="", max_length=200)
cookies: str = Field(min_length=1, max_length=4000)
class XScraperCookies(BaseModel):
cookies: str = Field(min_length=1, max_length=4000)
class XScraperActive(BaseModel):
active: bool
@router.get("/accounts")
async def list_accounts(admin: dict = Depends(get_current_admin)):
"""Alle X-Scraper-Konten auflisten (ohne Passwoerter/Cookies)."""
pool = _get_pool()
try:
accounts = await pool.get_all()
except Exception as e:
logger.error("X-Scraper get_all fehlgeschlagen: %s", e)
raise HTTPException(status_code=500, detail="Konten konnten nicht geladen werden")
return [_summary(a) for a in accounts]
@router.post("/accounts", status_code=201)
async def add_account(
data: XScraperCreate,
request: Request,
admin: dict = Depends(get_current_admin),
db: aiosqlite.Connection = Depends(db_dependency),
):
"""Neues X-Scraper-Konto anlegen."""
pool = _get_pool()
username = data.username.strip().lstrip("@")
if not username:
raise HTTPException(status_code=422, detail="Benutzername ist erforderlich")
if await pool.get_account(username) is not None:
raise HTTPException(status_code=409, detail=f"Konto '{username}' existiert bereits")
try:
await pool.add_account(
username=username,
password=data.password or "_",
email=data.email or "_",
email_password=data.email_password or "_",
cookies=data.cookies.strip(),
)
except Exception as e:
logger.error("X-Scraper add_account fehlgeschlagen: %s", e)
raise HTTPException(status_code=500, detail="Konto konnte nicht angelegt werden")
acc = await pool.get_account(username)
if acc is None:
raise HTTPException(status_code=500, detail="Konto wurde nicht gespeichert, bitte Cookies pruefen")
await log_action(
db, admin, get_client_ip(request), action="create",
resource_type="x_scraper_account", after={"username": username, "email": data.email},
)
return _summary(acc)
@router.post("/accounts/{username}/cookies")
async def refresh_cookies(
username: str,
data: XScraperCookies,
request: Request,
admin: dict = Depends(get_current_admin),
db: aiosqlite.Connection = Depends(db_dependency),
):
"""Cookies eines bestehenden Kontos erneuern (Login auffrischen)."""
pool = _get_pool()
acc = await pool.get_account(username)
if acc is None:
raise HTTPException(status_code=404, detail="Konto nicht gefunden")
# twscrape hat keine Update-Methode -- Konto mit frischen Cookies neu anlegen.
pw, em, emp = acc.password, acc.email, acc.email_password
try:
await pool.delete_accounts([username])
await pool.add_account(
username=username, password=pw, email=em,
email_password=emp, cookies=data.cookies.strip(),
)
except Exception as e:
logger.error("X-Scraper Cookie-Refresh fehlgeschlagen: %s", e)
raise HTTPException(status_code=500, detail="Cookies konnten nicht erneuert werden")
acc = await pool.get_account(username)
if acc is None:
raise HTTPException(status_code=500, detail="Konto nach Cookie-Refresh nicht gefunden")
await log_action(
db, admin, get_client_ip(request), action="update",
resource_type="x_scraper_account", after={"username": username, "change": "cookies"},
)
return _summary(acc)
@router.post("/accounts/{username}/active")
async def set_active(
username: str,
data: XScraperActive,
request: Request,
admin: dict = Depends(get_current_admin),
db: aiosqlite.Connection = Depends(db_dependency),
):
"""Konto aktiv oder inaktiv schalten."""
pool = _get_pool()
if await pool.get_account(username) is None:
raise HTTPException(status_code=404, detail="Konto nicht gefunden")
try:
await pool.set_active(username, data.active)
except Exception as e:
logger.error("X-Scraper set_active fehlgeschlagen: %s", e)
raise HTTPException(status_code=500, detail="Status konnte nicht geaendert werden")
await log_action(
db, admin, get_client_ip(request), action="update",
resource_type="x_scraper_account", after={"username": username, "active": data.active},
)
acc = await pool.get_account(username)
return _summary(acc)
@router.delete("/accounts/{username}", status_code=204)
async def delete_account(
username: str,
request: Request,
admin: dict = Depends(get_current_admin),
db: aiosqlite.Connection = Depends(db_dependency),
):
"""X-Scraper-Konto entfernen."""
pool = _get_pool()
if await pool.get_account(username) is None:
raise HTTPException(status_code=404, detail="Konto nicht gefunden")
try:
await pool.delete_accounts([username])
except Exception as e:
logger.error("X-Scraper delete fehlgeschlagen: %s", e)
raise HTTPException(status_code=500, detail="Konto konnte nicht entfernt werden")
await log_action(
db, admin, get_client_ip(request), action="delete",
resource_type="x_scraper_account", before={"username": username},
)
@router.post("/reset-locks")
async def reset_locks(
request: Request,
admin: dict = Depends(get_current_admin),
db: aiosqlite.Connection = Depends(db_dependency),
):
"""Alle temporaeren Sperren der Konten zuruecksetzen."""
pool = _get_pool()
try:
await pool.reset_locks()
except Exception as e:
logger.error("X-Scraper reset_locks fehlgeschlagen: %s", e)
raise HTTPException(status_code=500, detail="Sperren konnten nicht zurueckgesetzt werden")
await log_action(
db, admin, get_client_ip(request), action="update",
resource_type="x_scraper_account", after={"change": "reset_locks"},
)
return {"status": "ok"}

Datei anzeigen

@@ -0,0 +1,282 @@
"""Externe Reputations-Daten fuer Quellen.
Synchronisiert Domain-Listen von oeffentlichen Reputations-/Faktencheck-Datenbanken
und schreibt die Treffer in die sources-Spalten:
- IFCN-Signatories (anerkannte Faktenchecker) -> ifcn_signatory
- EUvsDisinfo (pro-Kreml-Desinformation, Zenodo-CSV) -> eu_disinfo_listed,
eu_disinfo_case_count, eu_disinfo_last_seen
Anschliessend wendet apply_reputation_overrides() Override-Regeln auf die
reliability-Spalte an:
- ifcn_signatory=1 -> reliability='sehr_hoch'
- eu_disinfo_case_count >= 5 -> reliability='sehr_niedrig'
- eu_disinfo_case_count >= 1 -> reliability eine Stufe runter (max bis 'niedrig')
"""
import csv
import io
import logging
from collections import defaultdict
from urllib.parse import urlparse
import aiosqlite
import httpx
logger = logging.getLogger("osint.external_reputation")
IFCN_LIST_URL = "https://raw.githubusercontent.com/IFCN/verified-signatories/main/list"
EU_DISINFO_CSV_URL = "https://zenodo.org/records/10514307/files/euvsdisinfo_base.csv?download=1"
HTTP_TIMEOUT = httpx.Timeout(60.0, connect=10.0)
# Generische Plattform-Domains, die NICHT als Quelle markiert werden duerfen
# (EUvsDisinfo aggregiert anonyme Telegram-/Twitter-Posts unter Plattform-Domains).
PLATFORM_DOMAINS = {
"t.me", "telegram.me", "telegram.org",
"twitter.com", "x.com", "mobile.twitter.com",
"youtube.com", "youtu.be", "m.youtube.com",
"facebook.com", "fb.com", "m.facebook.com",
"instagram.com", "tiktok.com", "vk.com", "ok.ru",
"rumble.com", "bitchute.com", "odysee.com",
"reddit.com", "old.reddit.com",
"wordpress.com", "blogspot.com", "medium.com",
"substack.com", "wixsite.com",
}
# Reliability-Skala in Stufenfolge (schlecht -> gut)
RELIABILITY_ORDER = ["sehr_niedrig", "niedrig", "gemischt", "hoch", "sehr_hoch"]
def _normalize_domain(raw: str | None) -> str | None:
"""Normalisiert eine Domain: lowercase, ohne www., ohne Schema/Pfad."""
if not raw:
return None
raw = raw.strip().lower()
if not raw:
return None
# Falls eine vollstaendige URL uebergeben wurde
if "://" in raw:
try:
raw = urlparse(raw).netloc or raw
except ValueError:
pass
# Pfad/Query strippen
raw = raw.split("/")[0].split("?")[0].split("#")[0]
if raw.startswith("www."):
raw = raw[4:]
return raw or None
async def _fetch_text(url: str) -> str:
"""Laedt Text von einer URL. Wirft HTTPException bei Fehler."""
async with httpx.AsyncClient(timeout=HTTP_TIMEOUT, follow_redirects=True) as client:
resp = await client.get(url)
resp.raise_for_status()
return resp.text
async def sync_ifcn_signatories(db: aiosqlite.Connection) -> dict:
"""Laedt IFCN-Domain-Liste und matcht gegen sources.domain.
Setzt ifcn_signatory=1 wo die Domain in der Liste vorkommt, sonst 0.
"""
text = await _fetch_text(IFCN_LIST_URL)
domains: set[str] = set()
for line in text.splitlines():
d = _normalize_domain(line)
if d:
domains.add(d)
logger.info("IFCN-Liste geladen: %d Domains", len(domains))
# Aktuelle Quellen mit Domain laden
cursor = await db.execute(
"SELECT id, domain FROM sources WHERE domain IS NOT NULL AND domain != ''"
)
sources = [dict(r) for r in await cursor.fetchall()]
matched_ids: list[int] = []
unmatched_ids: list[int] = []
for s in sources:
nd = _normalize_domain(s["domain"])
if nd and nd not in PLATFORM_DOMAINS and nd in domains:
matched_ids.append(s["id"])
else:
unmatched_ids.append(s["id"])
# Bulk-Update in zwei Statements
if matched_ids:
placeholders = ",".join("?" for _ in matched_ids)
await db.execute(
f"UPDATE sources SET ifcn_signatory = 1 WHERE id IN ({placeholders})",
matched_ids,
)
if unmatched_ids:
placeholders = ",".join("?" for _ in unmatched_ids)
await db.execute(
f"UPDATE sources SET ifcn_signatory = 0 WHERE id IN ({placeholders})",
unmatched_ids,
)
await db.commit()
logger.info("IFCN-Sync: %d Quellen als Faktenchecker markiert (von %d)",
len(matched_ids), len(sources))
return {
"list_size": len(domains),
"sources_checked": len(sources),
"matched": len(matched_ids),
}
async def sync_eu_disinfo(db: aiosqlite.Connection) -> dict:
"""Laedt EUvsDisinfo-CSV von Zenodo, aggregiert pro Domain, schreibt sources.
- eu_disinfo_listed: 1 wenn Domain mindestens 1x als 'disinformation' debunkt
- eu_disinfo_case_count: Anzahl Disinformation-Faelle
- eu_disinfo_last_seen: spaetestes debunk_date
"""
text = await _fetch_text(EU_DISINFO_CSV_URL)
reader = csv.DictReader(io.StringIO(text))
# Per-Domain aggregieren (nur class='disinformation')
counts: dict[str, int] = defaultdict(int)
last_seen: dict[str, str] = {}
total_rows = 0
for row in reader:
total_rows += 1
if (row.get("class") or "").strip().lower() != "disinformation":
continue
d = _normalize_domain(row.get("article_domain"))
if not d:
continue
counts[d] += 1
debunk_date = (row.get("debunk_date") or "").strip()
if debunk_date:
prev = last_seen.get(d)
if not prev or debunk_date > prev:
last_seen[d] = debunk_date
logger.info("EUvsDisinfo-CSV: %d Zeilen, %d Domains mit Desinformation",
total_rows, len(counts))
# Quellen laden + matchen
cursor = await db.execute(
"SELECT id, domain FROM sources WHERE domain IS NOT NULL AND domain != ''"
)
sources = [dict(r) for r in await cursor.fetchall()]
matched = 0
for s in sources:
nd = _normalize_domain(s["domain"])
if nd and nd not in PLATFORM_DOMAINS and nd in counts:
await db.execute(
"""UPDATE sources SET
eu_disinfo_listed = 1,
eu_disinfo_case_count = ?,
eu_disinfo_last_seen = ?
WHERE id = ?""",
(counts[nd], last_seen.get(nd), s["id"]),
)
matched += 1
else:
await db.execute(
"""UPDATE sources SET
eu_disinfo_listed = 0,
eu_disinfo_case_count = 0,
eu_disinfo_last_seen = NULL
WHERE id = ?""",
(s["id"],),
)
await db.commit()
logger.info("EUvsDisinfo-Sync: %d Quellen als Desinformations-Quelle markiert (von %d)",
matched, len(sources))
return {
"rows_in_csv": total_rows,
"domains_with_disinfo_in_csv": len(counts),
"sources_checked": len(sources),
"matched": matched,
}
def _override_reliability(current: str | None, ifcn: bool, eu_count: int) -> str | None:
"""Wendet Override-Regeln auf eine reliability-Stufe an.
Rueckgabe: neue Stufe (oder None, wenn unveraendert).
"""
cur = current or "na"
# IFCN gewinnt: zertifizierter Faktenchecker -> sehr_hoch (immer)
if ifcn:
return "sehr_hoch" if cur != "sehr_hoch" else None
# EUvsDisinfo: Downgrade
if eu_count >= 5:
return "sehr_niedrig" if cur != "sehr_niedrig" else None
if eu_count >= 1:
# Eine Stufe runter, mindestens bis 'niedrig'
if cur == "na":
return "niedrig"
if cur in RELIABILITY_ORDER:
idx = RELIABILITY_ORDER.index(cur)
new_idx = max(0, idx - 1)
new = RELIABILITY_ORDER[new_idx]
# Mindeststufe 'niedrig' bei eu_count >= 1
if RELIABILITY_ORDER.index(new) > RELIABILITY_ORDER.index("niedrig"):
new = "niedrig"
return new if new != cur else None
return None
async def apply_reputation_overrides(db: aiosqlite.Connection, source_id: int | None = None) -> dict:
"""Wendet Reliability-Override-Regeln an.
Wenn source_id angegeben ist, nur fuer diese Quelle. Sonst fuer alle Quellen.
"""
if source_id is not None:
cursor = await db.execute(
"SELECT id, reliability, ifcn_signatory, eu_disinfo_case_count "
"FROM sources WHERE id = ?",
(source_id,),
)
else:
cursor = await db.execute(
"SELECT id, reliability, ifcn_signatory, eu_disinfo_case_count FROM sources"
)
sources = [dict(r) for r in await cursor.fetchall()]
changed = 0
for s in sources:
new = _override_reliability(
s.get("reliability"),
bool(s.get("ifcn_signatory")),
int(s.get("eu_disinfo_case_count") or 0),
)
if new is not None:
await db.execute(
"UPDATE sources SET reliability = ? WHERE id = ?",
(new, s["id"]),
)
changed += 1
await db.commit()
logger.info("Reliability-Override: %d Quellen angepasst (von %d gepruefte)",
changed, len(sources))
return {"checked": len(sources), "changed": changed}
async def sync_all(db: aiosqlite.Connection) -> dict:
"""Vollstaendiger Sync: IFCN + EUvsDisinfo + Reliability-Override.
Setzt external_data_synced_at fuer alle Quellen.
"""
ifcn_result = await sync_ifcn_signatories(db)
eu_result = await sync_eu_disinfo(db)
override_result = await apply_reputation_overrides(db)
await db.execute(
"UPDATE sources SET external_data_synced_at = CURRENT_TIMESTAMP "
"WHERE domain IS NOT NULL AND domain != ''"
)
await db.commit()
return {
"ifcn": ifcn_result,
"eu_disinfo": eu_result,
"override": override_result,
}

Datei anzeigen

@@ -0,0 +1,104 @@
"""Organization-Settings-Helper.
KV-Store pro Organisation. Aktuell genutzt fuer output_language ('de'|'en').
Spaeter erweiterbar (Default-Modell, Telegram-Toggle, Theme, ...).
Cache: TTL 60s in-memory pro (tenant_id, key). Wird bei set_org_setting()
invalidiert.
"""
import logging
import time
from typing import Optional
import aiosqlite
logger = logging.getLogger("osint.org_settings")
_CACHE: dict[tuple[int, str], tuple[float, Optional[str]]] = {}
_TTL_SECONDS = 60.0
def _cache_get(tenant_id: int, key: str) -> tuple[bool, Optional[str]]:
"""(hit, value). hit=True heisst Cache traf; value kann auch None sein."""
entry = _CACHE.get((tenant_id, key))
if entry is None:
return (False, None)
expires_at, value = entry
if time.monotonic() > expires_at:
_CACHE.pop((tenant_id, key), None)
return (False, None)
return (True, value)
def _cache_put(tenant_id: int, key: str, value: Optional[str]) -> None:
_CACHE[(tenant_id, key)] = (time.monotonic() + _TTL_SECONDS, value)
def _cache_invalidate(tenant_id: int, key: str) -> None:
_CACHE.pop((tenant_id, key), None)
async def get_org_setting(
db: aiosqlite.Connection,
tenant_id: int,
key: str,
default: Optional[str] = None,
) -> Optional[str]:
"""Liest ein Org-Setting. Fallback auf default."""
if tenant_id is None:
return default
hit, cached = _cache_get(tenant_id, key)
if hit:
return cached if cached is not None else default
cursor = await db.execute(
"SELECT value FROM organization_settings WHERE organization_id = ? AND key = ?",
(tenant_id, key),
)
row = await cursor.fetchone()
value = row["value"] if row else None
_cache_put(tenant_id, key, value)
return value if value is not None else default
async def set_org_setting(
db: aiosqlite.Connection,
tenant_id: int,
key: str,
value: str,
) -> None:
"""Setzt ein Org-Setting (upsert)."""
await db.execute(
"""INSERT INTO organization_settings (organization_id, key, value, updated_at)
VALUES (?, ?, ?, CURRENT_TIMESTAMP)
ON CONFLICT(organization_id, key) DO UPDATE SET
value = excluded.value,
updated_at = CURRENT_TIMESTAMP""",
(tenant_id, key, value),
)
await db.commit()
_cache_invalidate(tenant_id, key)
logger.info("Org %s Setting %s='%s' gespeichert", tenant_id, key, value)
# Bekannte Sprachen + Anzeigenamen fuer Prompts
LANGUAGE_DISPLAY_NAMES = {
"de": "Deutsch",
"en": "English",
}
async def get_org_language(
db: aiosqlite.Connection,
tenant_id: int,
) -> str:
"""Liefert ISO-2-Sprachcode der Org (default 'de')."""
value = await get_org_setting(db, tenant_id, "output_language", default="de")
if value not in LANGUAGE_DISPLAY_NAMES:
logger.warning("Unbekannte output_language '%s' fuer Org %s -- fallback 'de'", value, tenant_id)
return "de"
return value
def language_display(lang_iso: str) -> str:
"""ISO-Code -> Anzeigename fuer Prompts ('de' -> 'Deutsch')."""
return LANGUAGE_DISPLAY_NAMES.get(lang_iso, lang_iso)

Datei anzeigen

@@ -0,0 +1,295 @@
"""Klassifiziert Quellen via Claude (Haiku) nach 4 Achsen + state_affiliated + country.
Schreibt Vorschlaege in die proposed_*-Spalten von sources und setzt
classification_source='llm_pending'. Approval erfolgt ueber separate Endpoints,
die proposed_* in die echten Spalten kopieren.
"""
import asyncio
import json
import logging
import re
import aiosqlite
from shared.agents.claude_client import call_claude
from config import CLAUDE_MODEL_FAST
logger = logging.getLogger("osint.source_classifier")
POLITICAL_VALUES = {
"links_extrem", "links", "mitte_links", "liberal", "mitte",
"konservativ", "mitte_rechts", "rechts", "rechts_extrem", "na",
}
MEDIA_TYPE_VALUES = {
"tageszeitung", "wochenzeitung", "magazin", "tv_sender", "radio",
"oeffentlich_rechtlich", "nachrichtenagentur", "online_only", "blog",
"telegram_kanal", "telegram_bot", "podcast", "social_media", "imageboard",
"think_tank", "ngo", "behoerde", "staatsmedium", "fachmedium", "sonstige",
}
RELIABILITY_VALUES = {"sehr_hoch", "hoch", "gemischt", "niedrig", "sehr_niedrig", "na"}
ALIGNMENT_VALUES = {
"prorussisch", "proiranisch", "prowestlich", "proukrainisch",
"prochinesisch", "projapanisch", "proisraelisch", "propalaestinensisch",
"protuerkisch", "panarabisch", "neutral", "sonstige",
}
def _build_prompt(src: dict, sample_articles: list[dict]) -> str:
sample_text = ""
if sample_articles:
lines = []
for i, art in enumerate(sample_articles[:5], 1):
headline = (art.get("headline") or art.get("headline_de") or "").strip()
if headline:
lines.append(f"{i}. {headline[:200]}")
if lines:
sample_text = "\nLetzte Artikel/Headlines:\n" + "\n".join(lines)
return f"""Du bist ein OSINT-Analyst und klassifizierst Nachrichten- und Medienquellen fuer ein Lagebild-Monitoring-System (DACH-Raum).
QUELLE:
Name: {src.get('name')}
URL: {src.get('url') or '-'}
Domain: {src.get('domain') or '-'}
Quellentyp: {src.get('source_type')}
Bisherige Kategorie: {src.get('category')}
Sprache: {src.get('language') or 'unbekannt'}
Bisherige Notiz (Freitext): {src.get('bias') or '-'}{sample_text}
AUFGABE: Klassifiziere die Quelle nach folgenden Achsen.
1. political_orientation:
- links_extrem (z.B. linksunten.indymedia)
- links (klar links, z.B. junge Welt, taz)
- mitte_links (linksliberal/sozialdemokratisch, z.B. SZ, Spiegel)
- liberal (wirtschafts-/grünliberal, z.B. NZZ, Zeit)
- mitte (politisch neutral, Agentur, z.B. dpa, Reuters, tagesschau)
- konservativ (buergerlich-konservativ, z.B. FAZ, Welt)
- mitte_rechts (rechts-buergerlich, z.B. Tichys Einblick, Achgut)
- rechts (klar rechts, z.B. Junge Freiheit, EpochTimes)
- rechts_extrem (z.B. Compact, PI-News)
- na (nicht klassifizierbar: Behoerde, Fachmedium, Think Tank ohne klare politische Linie)
2. media_type (genau einer):
tageszeitung, wochenzeitung, magazin, tv_sender, radio, oeffentlich_rechtlich,
nachrichtenagentur, online_only, blog, telegram_kanal, telegram_bot, podcast,
social_media, imageboard, think_tank, ngo, behoerde, staatsmedium, fachmedium, sonstige
3. reliability:
- sehr_hoch (etablierte Qualitaet, Faktencheck: tagesschau, dpa, FAZ, Reuters)
- hoch (serioes mit gelegentlichen Schwaechen: taz, Welt, BILD bei harten News)
- gemischt (Mix Meinung/Einseitigkeit: Tichys Einblick, Achgut, Boulevard)
- niedrig (haeufig irrefuehrend, schwache Quellenarbeit: Junge Freiheit, EpochTimes)
- sehr_niedrig (bekannt fuer Desinformation/Verschwoerung: Compact, RT, Sputnik, PI-News)
- na (nicht bewertbar)
4. alignments (Mehrfach, leeres Array wenn keine ausgepraegte Naehe):
prorussisch, proiranisch, prowestlich, proukrainisch, prochinesisch, projapanisch,
proisraelisch, propalaestinensisch, protuerkisch, panarabisch, neutral, sonstige
5. state_affiliated (true/false): true wenn vom Staat finanziert/kontrolliert
(RT, Sputnik, CGTN, PressTV, Xinhua, TRT). Public Service Broadcaster
wie ARD/ZDF/BBC sind NICHT state_affiliated.
6. country_code (ISO 3166-1 alpha-2): Heimatland (DE, AT, CH, RU, US, ...). null wenn unklar.
7. confidence (0.0-1.0): 0.85+ fuer bekannte Outlets, 0.5-0.85 fuer mittelbekannt, <0.5 fuer unsicher.
8. reasoning (1-2 Saetze): Kurze Begruendung der Hauptklassifikationen.
WICHTIG:
- Antworte AUSSCHLIESSLICH mit einem JSON-Objekt, kein Text drumherum.
- Nutze ausschliesslich die genannten enum-Werte (snake_case).
- Bei Unklarheit lieber `na` und niedrige confidence.
JSON-Schema:
{{
"political_orientation": "...",
"media_type": "...",
"reliability": "...",
"alignments": ["..."],
"state_affiliated": false,
"country_code": "DE",
"confidence": 0.9,
"reasoning": "..."
}}"""
async def _load_sample_articles(db: aiosqlite.Connection, name: str, domain: str | None, limit: int = 5) -> list[dict]:
"""Laedt die letzten Headlines einer Quelle (per name oder Domain-Match)."""
rows: list = []
if name:
cursor = await db.execute(
"SELECT headline, headline_de FROM articles WHERE source = ? ORDER BY collected_at DESC LIMIT ?",
(name, limit),
)
rows = await cursor.fetchall()
if not rows and domain:
cursor = await db.execute(
"SELECT headline, headline_de FROM articles WHERE source_url LIKE ? ORDER BY collected_at DESC LIMIT ?",
(f"%{domain}%", limit),
)
rows = await cursor.fetchall()
return [dict(r) for r in rows]
def _validate(parsed: dict) -> dict:
"""Validiert + normalisiert eine LLM-Antwort gegen die Enums."""
pol = parsed.get("political_orientation", "na")
if pol not in POLITICAL_VALUES:
pol = "na"
mt = parsed.get("media_type", "sonstige")
if mt not in MEDIA_TYPE_VALUES:
mt = "sonstige"
rel = parsed.get("reliability", "na")
if rel not in RELIABILITY_VALUES:
rel = "na"
aligns_raw = parsed.get("alignments") or []
if not isinstance(aligns_raw, list):
aligns_raw = []
aligns = sorted({a for a in aligns_raw if isinstance(a, str) and a in ALIGNMENT_VALUES})
sa = bool(parsed.get("state_affiliated", False))
cc = parsed.get("country_code")
if isinstance(cc, str) and len(cc) == 2 and cc.isalpha():
cc = cc.upper()
else:
cc = None
try:
confidence = float(parsed.get("confidence", 0.5))
confidence = max(0.0, min(1.0, confidence))
except (TypeError, ValueError):
confidence = 0.5
reasoning = str(parsed.get("reasoning", ""))[:1000]
return {
"political_orientation": pol,
"media_type": mt,
"reliability": rel,
"alignments": aligns,
"state_affiliated": sa,
"country_code": cc,
"confidence": confidence,
"reasoning": reasoning,
}
async def classify_source(
db: aiosqlite.Connection,
source_id: int,
sample_limit: int = 5,
model: str = CLAUDE_MODEL_FAST,
) -> dict:
"""Klassifiziert eine einzelne Quelle und schreibt die Vorschlaege in proposed_*-Spalten."""
cursor = await db.execute(
"SELECT id, name, url, domain, source_type, category, language, bias, "
"classification_source FROM sources WHERE id = ?",
(source_id,),
)
row = await cursor.fetchone()
if not row:
raise ValueError(f"Quelle {source_id} nicht gefunden")
src = dict(row)
sample = await _load_sample_articles(db, src["name"], src.get("domain"), sample_limit)
prompt = _build_prompt(src, sample)
response, usage = await call_claude(prompt, tools=None, model=model)
json_match = re.search(r"\{.*\}", response, re.DOTALL)
if not json_match:
raise ValueError(f"Keine JSON-Antwort von Claude fuer source_id={source_id}: {response[:200]}")
parsed = json.loads(json_match.group(0))
result = _validate(parsed)
# Nur classification_source auf 'llm_pending' setzen, wenn nicht bereits manuell/approved
new_src = "CASE WHEN classification_source IN ('manual','llm_approved') THEN classification_source ELSE 'llm_pending' END"
await db.execute(
f"""UPDATE sources SET
proposed_political_orientation = ?,
proposed_media_type = ?,
proposed_reliability = ?,
proposed_state_affiliated = ?,
proposed_country_code = ?,
proposed_alignments_json = ?,
proposed_confidence = ?,
proposed_reasoning = ?,
proposed_at = CURRENT_TIMESTAMP,
classification_source = {new_src}
WHERE id = ?""",
(
result["political_orientation"],
result["media_type"],
result["reliability"],
1 if result["state_affiliated"] else 0,
result["country_code"],
json.dumps(result["alignments"], ensure_ascii=False),
result["confidence"],
result["reasoning"],
source_id,
),
)
await db.commit()
logger.info(
"Klassifiziert source_id=%s '%s' -> %s/%s/%s conf=%.2f ($%.4f)",
source_id, src["name"], result["political_orientation"],
result["media_type"], result["reliability"], result["confidence"],
usage.cost_usd,
)
result["source_id"] = source_id
result["usage"] = {
"cost_usd": usage.cost_usd,
"input_tokens": usage.input_tokens,
"output_tokens": usage.output_tokens,
}
return result
async def bulk_classify(
db: aiosqlite.Connection,
limit: int = 50,
only_unclassified: bool = True,
model: str = CLAUDE_MODEL_FAST,
) -> dict:
"""Klassifiziert noch unklassifizierte Quellen (sequenziell).
Args:
limit: Maximale Anzahl Quellen pro Aufruf
only_unclassified: Wenn True, nur classification_source='legacy'.
Wenn False, auch 'llm_pending' neu klassifizieren.
"""
if only_unclassified:
where = "classification_source = 'legacy'"
else:
where = "classification_source IN ('legacy', 'llm_pending')"
cursor = await db.execute(
f"SELECT id FROM sources WHERE {where} AND status = 'active' "
f"AND source_type != 'excluded' ORDER BY id LIMIT ?",
(limit,),
)
ids = [row["id"] for row in await cursor.fetchall()]
total_cost = 0.0
success = 0
errors: list[dict] = []
for sid in ids:
try:
r = await classify_source(db, sid, model=model)
total_cost += r["usage"]["cost_usd"]
success += 1
except asyncio.CancelledError:
raise
except Exception as e:
logger.error("Klassifikation source_id=%s fehlgeschlagen: %s", sid, e, exc_info=True)
errors.append({"source_id": sid, "error": str(e)})
logger.info(
"Bulk-Klassifikation fertig: %d/%d erfolgreich, $%.4f Kosten, %d Fehler",
success, len(ids), total_cost, len(errors),
)
return {
"processed": len(ids),
"success": success,
"errors": errors,
"total_cost_usd": total_cost,
}

Datei anzeigen

@@ -1,18 +1,201 @@
"""KI-gestützte Quellen-Vorschläge via Haiku."""
"""KI-gestützte Quellen-Vorschläge via Haiku + deterministische Karteileichen-Heuristik."""
import json
import logging
import re
import aiosqlite
from shared.agents.claude_client import call_claude
from agents.claude_client import call_claude
from config import CLAUDE_MODEL_FAST
logger = logging.getLogger("osint.source_suggester")
# Schwelle für "stumm seit": eine Quelle, die seit mehr als so vielen Tagen
# keinen Artikel mehr geliefert hat, gilt als Karteileichen-Kandidat.
STALE_DEACTIVATE_THRESHOLD_DAYS = 60
async def generate_stale_deactivation_suggestions(
db: aiosqlite.Connection,
days_threshold: int = STALE_DEACTIVATE_THRESHOLD_DAYS,
) -> int:
"""Erzeugt deactivate_source-Vorschläge für Karteileichen-Quellen.
Karteileiche = aktive Quelle, die entweder noch nie einen Artikel geliefert hat
(article_count = 0) oder seit mehr als days_threshold Tagen stumm ist
(last_seen_at älter als die Schwelle). Reine SQL-Heuristik, kein KI-Aufruf.
Doppel-Vermeidung: existiert bereits ein pending deactivate-Vorschlag für
dieselbe source_id, wird kein neuer erzeugt.
Returns: Anzahl neu erstellter Vorschläge.
"""
cursor = await db.execute(
f"""
SELECT id, name, url, domain, article_count, last_seen_at
FROM sources
WHERE status = 'active'
AND (
COALESCE(article_count, 0) = 0
OR (last_seen_at IS NOT NULL
AND last_seen_at < datetime('now', '-{int(days_threshold)} days'))
)
"""
)
candidates = [dict(row) for row in await cursor.fetchall()]
if not candidates:
return 0
cursor = await db.execute(
"SELECT DISTINCT source_id FROM source_suggestions "
"WHERE status = 'pending' AND suggestion_type = 'deactivate_source' "
"AND source_id IS NOT NULL"
)
already_pending = {row["source_id"] for row in await cursor.fetchall()}
created = 0
for c in candidates:
sid = c["id"]
if sid in already_pending:
continue
if (c["article_count"] or 0) == 0:
reason = "Hat seit Anlage noch nie einen Artikel geliefert."
else:
reason = (
f"Letzter Artikel vor mehr als {days_threshold} Tagen "
f"(last_seen_at={c['last_seen_at']})."
)
title = f"{c['name']} (ID {sid}) - Karteileiche, deaktivieren?"
description = (
f"Quelle: {c['name']} | URL: {c['url']} | Domain: {c['domain'] or '-'}\n"
f"Begründung: {reason}\n"
f"article_count={c['article_count'] or 0}, "
f"last_seen_at={c['last_seen_at'] or 'NULL'}\n"
"Hinweis: Quelle wurde automatisch als inaktiv erkannt. "
"Bitte vor Annahme prüfen, ob sie wirklich nicht mehr gebraucht wird."
)
suggested_data = json.dumps(
{"action": "deactivate", "source_id": sid}, ensure_ascii=False
)
await db.execute(
"INSERT INTO source_suggestions "
"(suggestion_type, title, description, source_id, suggested_data, "
" priority, status) VALUES "
"('deactivate_source', ?, ?, ?, ?, 'medium', 'pending')",
(title, description, sid, suggested_data),
)
created += 1
if created > 0:
await db.commit()
logger.info(
"Karteileichen-Heuristik: %d neue deactivate-Vorschläge erstellt "
"(%d Kandidaten, %d bereits pending)",
created, len(candidates), len(already_pending),
)
else:
logger.info(
"Karteileichen-Heuristik: keine neuen Vorschläge "
"(%d Kandidaten, alle bereits pending)",
len(candidates),
)
return created
async def generate_strategy_escalation_suggestions(db: aiosqlite.Connection) -> int:
"""Erzeugt deactivate_source-Vorschläge für Quellen, bei denen die fetch_strategy
bereits eskaliert wurde (googlebot oder paywall) und der Reachability-Check
trotzdem error meldet.
Beispiel: Rheinische Post hat fetch_strategy=googlebot, kriegt aber HTTP 403.
-> Strategie greift nicht, Quelle ist faktisch nicht abrufbar. Vorschlag: deaktivieren.
Doppel-Vermeidung wie in der Karteileichen-Heuristik: nur wenn noch kein pending
deactivate-Vorschlag für die source_id existiert.
Returns: Anzahl neu erstellter Vorschläge.
"""
cursor = await db.execute(
"""
SELECT s.id, s.name, s.url, s.domain, s.fetch_strategy, h.message
FROM sources s
JOIN source_health_checks h ON h.source_id = s.id
WHERE s.status = 'active'
AND s.fetch_strategy IN ('googlebot', 'paywall')
AND h.check_type = 'reachability'
AND h.status = 'error'
"""
)
candidates = [dict(row) for row in await cursor.fetchall()]
if not candidates:
return 0
cursor = await db.execute(
"SELECT DISTINCT source_id FROM source_suggestions "
"WHERE status = 'pending' AND suggestion_type = 'deactivate_source' "
"AND source_id IS NOT NULL"
)
already_pending = {row["source_id"] for row in await cursor.fetchall()}
created = 0
for c in candidates:
sid = c["id"]
if sid in already_pending:
continue
title = f"{c['name']} (ID {sid}) - Strategie greift nicht"
description = (
f"Quelle: {c['name']} | URL: {c['url']} | Domain: {c['domain'] or '-'}\n"
f"fetch_strategy='{c['fetch_strategy']}' wurde bereits zur Eskalation gesetzt, "
f"liefert beim Health-Check aber weiter einen Fehler:\n"
f" {c['message']}\n"
"Vorschlag: deaktivieren oder fetch_strategy='skip' setzen, damit die Quelle "
"den Health-Check nicht weiter verfälscht.\n"
"Hinweis: Quelle wurde automatisch erkannt. Bitte vor Annahme prüfen."
)
suggested_data = json.dumps(
{"action": "deactivate", "source_id": sid,
"reason": "fetch_strategy_failed", "current_strategy": c["fetch_strategy"]},
ensure_ascii=False,
)
await db.execute(
"INSERT INTO source_suggestions "
"(suggestion_type, title, description, source_id, suggested_data, "
" priority, status) VALUES "
"('deactivate_source', ?, ?, ?, ?, 'high', 'pending')",
(title, description, sid, suggested_data),
)
created += 1
if created > 0:
await db.commit()
logger.info(
"Strategie-Eskalations-Heuristik: %d neue deactivate-Vorschläge "
"(%d Kandidaten, %d bereits pending)",
created, len(candidates), len(already_pending),
)
return created
async def generate_suggestions(db: aiosqlite.Connection) -> int:
"""Generiert Quellen-Vorschläge basierend auf Health-Checks und Lückenanalyse."""
"""Generiert Quellen-Vorschläge basierend auf Health-Checks und Lückenanalyse.
Drei Stufen, in dieser Reihenfolge ausgeführt (spezifisch -> generisch -> KI):
1. Deterministisch: Strategie-Eskalations-Heuristik (fetch_strategy=googlebot
oder paywall, aber Reachability weiter error) erzeugt deactivate_source-
Vorschläge mit Priorität 'high'. Spezifischste Diagnose: "Workaround
greift nicht". Läuft ZUERST, damit diese Sources nicht von der
generischeren Karteileichen-Stufe weggefangen werden.
2. Deterministisch: Karteileichen-Heuristik (article_count=0 oder >60d stumm)
erzeugt sofort deactivate_source-Vorschläge für alle übrigen toten
Quellen ohne KI-Aufruf.
3. KI-basiert: Haiku schaut sich Quellensammlung + Health-Probleme an
und schlägt weitere Verbesserungen vor (add_source, deactivate_source,
fix_url, ...).
Rückgabe ist die Gesamtzahl neu erzeugter Vorschläge aller Stufen.
"""
strategy_count = await generate_strategy_escalation_suggestions(db)
stale_count = await generate_stale_deactivation_suggestions(db)
logger.info("Starte Quellen-Vorschläge via Haiku...")
# 1. Aktuelle Quellen laden
@@ -164,15 +347,16 @@ Nur das JSON-Array, kein anderer Text."""
await db.commit()
logger.info(
f"Quellen-Vorschläge: {count} neue Vorschläge generiert "
f"Quellen-Vorschläge: {count} neue Vorschläge generiert via Haiku "
f"(+{stale_count} Karteileichen, +{strategy_count} Strategie-Eskalation) "
f"(Haiku: {usage.input_tokens} in / {usage.output_tokens} out / "
f"${usage.cost_usd:.4f})"
)
return count
return count + stale_count + strategy_count
except Exception as e:
logger.error(f"Fehler bei Quellen-Vorschlägen: {e}", exc_info=True)
return 0
return stale_count + strategy_count
async def apply_suggestion(

Datei anzeigen

@@ -27,6 +27,7 @@ SOURCE_CATEGORIES: list[CategoryEntry] = [
{"key": "international", "label": "International"},
{"key": "regional", "label": "Regional"},
{"key": "boulevard", "label": "Boulevard"},
{"key": "stimmungsbild", "label": "Forum / Stimmungsbild"},
{"key": "sonstige", "label": "Sonstige"},
{"key": "cybercrime", "label": "Cybercrime / Hacktivismus"},
{"key": "cybercrime-leaks", "label": "Cybercrime / Leaks"},

Datei anzeigen

@@ -962,3 +962,162 @@ input[type="date"].filter-select { padding: 6px 10px; }
font-weight: 400;
}
/* === Klassifikations-Review === */
.sources-tab-badge {
display: inline-flex;
align-items: center;
justify-content: center;
min-width: 20px;
padding: 0 6px;
height: 18px;
border-radius: 9px;
background: var(--accent);
color: var(--bg-primary);
font-size: 10px;
font-weight: 700;
}
.review-toolbar {
display: flex;
align-items: center;
justify-content: space-between;
padding: 10px 14px;
background: var(--bg-secondary);
border: 1px solid var(--border);
border-radius: var(--radius);
margin-bottom: 12px;
flex-wrap: wrap;
gap: 12px;
}
.review-toolbar-info {
display: flex;
align-items: center;
gap: 16px;
font-size: 13px;
color: var(--text-primary);
}
.review-conf-filter {
display: inline-flex;
align-items: center;
gap: 6px;
font-size: 12px;
color: var(--text-secondary);
}
.review-toolbar-actions { display: flex; gap: 6px; }
.review-list { display: flex; flex-direction: column; gap: 8px; }
.review-card {
background: var(--bg-secondary);
border: 1px solid var(--border);
border-radius: var(--radius);
padding: 12px 14px;
}
.review-card-header {
display: flex;
justify-content: space-between;
align-items: flex-start;
gap: 12px;
margin-bottom: 10px;
}
.review-card-title {
display: flex;
flex-wrap: wrap;
align-items: center;
gap: 8px;
}
.review-card-name { font-weight: 600; font-size: 14px; color: var(--text-primary); }
.review-card-domain { font-size: 11px; color: var(--text-muted); }
.review-global-badge {
display: inline-flex;
align-items: center;
padding: 1px 6px;
border-radius: var(--radius);
background: #5e35b1;
color: #fff;
font-size: 9px;
font-weight: 600;
letter-spacing: 0.3px;
text-transform: uppercase;
}
.review-card-confidence {
display: inline-flex;
flex-direction: column;
align-items: center;
padding: 4px 10px;
border-radius: var(--radius);
min-width: 60px;
}
.review-card-confidence .conf-value { font-size: 14px; font-weight: 700; }
.review-card-confidence .conf-label { font-size: 9px; text-transform: uppercase; letter-spacing: 0.3px; opacity: 0.8; }
.review-card-confidence.conf-high { background: rgba(34,197,94,0.15); color: var(--success); }
.review-card-confidence.conf-medium { background: rgba(245,158,11,0.15); color: var(--warning); }
.review-card-confidence.conf-low { background: rgba(239,68,68,0.15); color: var(--danger); }
.review-card-diff {
display: grid;
grid-template-columns: 1fr;
gap: 4px;
font-size: 12px;
margin-bottom: 10px;
}
.review-diff-row {
display: grid;
grid-template-columns: 130px 1fr 24px 1fr;
align-items: center;
gap: 8px;
padding: 3px 6px;
border-radius: 3px;
}
.review-diff-row.changed { background: rgba(245,158,11,0.10); }
.review-diff-label { color: var(--text-secondary); font-weight: 500; }
.review-diff-current { color: var(--text-muted); }
.review-diff-arrow { text-align: center; color: var(--text-muted); font-weight: 600; }
.review-diff-proposed { color: var(--text-primary); font-weight: 500; }
.review-diff-row.changed .review-diff-proposed { color: var(--warning); font-weight: 600; }
.review-card-reasoning {
font-size: 12px;
color: var(--text-secondary);
background: var(--bg-tertiary);
padding: 8px 10px;
border-radius: var(--radius);
margin-bottom: 10px;
line-height: 1.5;
}
.review-card-actions { display: flex; gap: 6px; flex-wrap: wrap; }
/* Edit-Form: Klassifikations-Sektion */
.sources-classification-section {
margin-top: 14px;
padding-top: 14px;
border-top: 1px solid var(--border);
}
.sources-classification-header {
font-size: 12px;
font-weight: 600;
color: var(--text-secondary);
margin-bottom: 10px;
letter-spacing: 0.3px;
text-transform: uppercase;
}
.alignment-chips { display: flex; flex-wrap: wrap; gap: 6px; }
.alignment-chip {
display: inline-flex;
align-items: center;
padding: 4px 10px;
border-radius: 999px;
font-size: 11px;
font-weight: 500;
background: transparent;
color: var(--text-secondary);
border: 1px solid var(--border);
cursor: pointer;
transition: all 0.12s ease;
}
.alignment-chip:hover { background: var(--bg-tertiary); color: var(--text-primary); }
.alignment-chip.active {
background: var(--accent);
color: var(--bg-primary);
border-color: var(--accent);
}

Datei anzeigen

@@ -59,6 +59,32 @@
</div>
</div>
</div>
<!-- Artikel-Übersetzung -->
<div class="card" id="translationCard" style="margin-top:16px;">
<div class="card-header">
<h2>Artikel-Übersetzung</h2>
</div>
<div class="card-body">
<p class="text-muted" style="margin-top:0;">
Die automatische Übersetzung im Monitor ist deaktiviert. Hier lassen sich
fremdsprachige Artikel ohne deutsche Fassung manuell übersetzen.
</p>
<p id="translationInfo" style="margin:12px 0;">Status wird geladen…</p>
<div id="translationProgressWrap" style="display:none; margin:12px 0;">
<div style="background:rgba(128,128,128,0.25); border-radius:6px; height:14px; overflow:hidden;">
<div id="translationProgressBar" style="background:#1565c0; height:100%; width:0%; transition:width .3s;"></div>
</div>
<p class="text-muted" id="translationProgressText" style="margin:6px 0 0;"></p>
</div>
<div style="margin-top:12px; display:flex; gap:8px;">
<button class="btn btn-primary" id="translationRunBtn">Übersetzung starten</button>
<button class="btn btn-danger" id="translationCancelBtn" style="display:none;">Abbrechen</button>
</div>
</div>
</div>
</div>
<!-- Organizations Section -->
@@ -166,6 +192,14 @@
<option value="false">Deaktiviert</option>
</select>
</div>
<div class="form-group">
<label for="editOrgLanguage">Pipeline-Sprache</label>
<select id="editOrgLanguage">
<option value="de">Deutsch</option>
<option value="en">English</option>
</select>
<small class="text-secondary">Bestimmt die Ausgabesprache der KI (Lagebild, Faktencheck, Recherche) und der sichtbarsten UI-Elemente fuer alle Nutzer dieser Organisation.</small>
</div>
<div style="display: flex; gap: 8px; margin-top: 16px;">
<button type="submit" class="btn btn-primary">Speichern</button>
<button type="button" class="btn btn-danger" id="deleteOrgBtn">Organisation löschen</button>
@@ -294,6 +328,8 @@
<button class="nav-tab active" data-subtab="global-sources">Grundquellen</button>
<button class="nav-tab" data-subtab="tenant-sources">Kundenquellen</button>
<button class="nav-tab" data-subtab="source-health">Quellen-Health</button>
<button class="nav-tab" data-subtab="classification-review">Klassifikation <span class="sources-tab-badge" id="classificationPendingBadge">0</span></button>
<button class="nav-tab" data-subtab="x-scraper">X-Recherche-Konten</button>
</div>
<!-- Grundquellen -->
@@ -319,6 +355,7 @@
<span class="text-secondary" id="globalSourceCount"></span>
</div>
<button class="btn btn-secondary" id="discoverSourceBtn">Erkennen</button>
<button class="btn btn-secondary" id="newPdfSourceBtn" style="margin-right:8px;">+ PDF hochladen</button>
<button class="btn btn-primary" id="newGlobalSourceBtn">+ Neue Grundquelle</button>
</div>
<div class="card">
@@ -393,6 +430,81 @@
</div>
</div>
<!-- Quellen-Health (Sub-Tab) - drei Bereiche als Sub-Sub-Tabs;
source-health.js rendert pro Bereich in den jeweiligen Container. -->
<div class="section" id="sub-source-health">
<div class="nav-tabs" id="healthSubTabs" style="margin-top:0;">
<button class="nav-tab active" data-healthtab="suggestions">Vorschläge</button>
<button class="nav-tab" data-healthtab="checks">Health-Status</button>
<button class="nav-tab" data-healthtab="verlauf">Verlauf</button>
</div>
<div id="ht-suggestions" class="health-pane active"></div>
<div id="ht-checks" class="health-pane" style="display:none;"></div>
<div id="ht-verlauf" class="health-pane" style="display:none;"></div>
</div>
<!-- Klassifikations-Review -->
<div class="section" id="sub-classification-review">
<div class="action-bar review-toolbar">
<div class="review-toolbar-info">
<span><strong id="reviewPendingCount">0</strong> Vorschläge ausstehend</span>
<label class="review-conf-filter">
Mindest-Konfidenz:
<select class="filter-select" id="reviewMinConfidence" onchange="loadClassificationQueue()">
<option value="0">alle</option>
<option value="0.5">0.5+</option>
<option value="0.7">0.7+</option>
<option value="0.85">0.85+</option>
<option value="0.9">0.9+</option>
</select>
</label>
</div>
<div class="review-toolbar-actions">
<button class="btn btn-secondary btn-small" onclick="triggerExternalReputationSync()" title="IFCN-Faktenchecker und EUvsDisinfo neu syncen">Externe Daten syncen</button>
<button class="btn btn-secondary btn-small" onclick="triggerBulkClassify()" title="LLM-Klassifikation für noch unklassifizierte Quellen starten">+ Klassifikation starten</button>
<button class="btn btn-primary btn-small" onclick="bulkApproveHighConfidence()" title="Alle Vorschläge ab 0.85 Konfidenz übernehmen">Alle ≥ 0.85 genehmigen</button>
</div>
</div>
<div class="card">
<div class="review-list" id="classificationReviewList">
<div class="text-muted" style="padding:24px;text-align:center;">Lade Review-Queue…</div>
</div>
</div>
</div>
<!-- X-Recherche-Konten (Sub-Tab) -->
<div class="section" id="sub-x-scraper">
<div class="action-bar">
<div style="display:flex;align-items:center;gap:12px;flex-wrap:wrap;">
<span class="text-secondary" id="xScraperCount"></span>
</div>
<div style="display:flex;gap:8px;">
<button class="btn btn-secondary" onclick="resetXScraperLocks()">Sperren zurücksetzen</button>
<button class="btn btn-primary" onclick="openXScraperAddModal()">+ Konto hinzufügen</button>
</div>
</div>
<div class="card">
<p class="text-secondary" style="padding:0 4px 12px;">X-Login-Konten, mit denen der Monitor bei X recherchiert. Mehr Konten bedeuten paralleleres, schnelleres Scrapen. Cookies laufen periodisch ab und müssen dann erneuert werden.</p>
<div class="table-wrap">
<table>
<thead>
<tr>
<th>Benutzername</th>
<th>E-Mail</th>
<th>Status</th>
<th>Anfragen</th>
<th>Letzte Nutzung</th>
<th>Aktionen</th>
</tr>
</thead>
<tbody id="xScraperTable"></tbody>
</table>
</div>
</div>
</div>
</div> <!-- /sec-sources -->
<!-- Audit-Log Section -->
<div class="section" id="sec-audit">
<div class="action-bar" style="flex-wrap:wrap;gap:8px;">
@@ -454,6 +566,14 @@
<label for="newOrgSlug">Slug (URL-freundlich)</label>
<input type="text" id="newOrgSlug" required pattern="[a-z0-9-]+" placeholder="z.B. bundespolizei">
</div>
<div class="form-group">
<label for="newOrgLanguage">Pipeline-Sprache</label>
<select id="newOrgLanguage">
<option value="de" selected>Deutsch</option>
<option value="en">English</option>
</select>
<small class="text-secondary">Steuert die Ausgabesprache der KI-Pipeline (Lagebild, Faktencheck, Recherche) und die sichtbarsten UI-Strings im Monitor.</small>
</div>
<div id="newOrgError" class="error-msg" style="display:none"></div>
</div>
<div class="modal-footer">
@@ -580,6 +700,7 @@
<option value="telegram_channel">Telegram-Kanal</option>
<option value="podcast_feed">Podcast-Feed</option>
<option value="excluded">Ausgeschlossen</option>
<option value="pdf_document" disabled>PDF-Dokument (nur Upload)</option>
</select>
</div>
<div class="form-group">
@@ -638,6 +759,96 @@
<label for="sourceNotes">Notizen</label>
<input type="text" id="sourceNotes" placeholder="Optional">
</div>
<div class="sources-classification-section">
<div class="sources-classification-header">Einordnung (Klassifikation)</div>
<div style="display:grid;grid-template-columns:1fr 1fr 1fr;gap:12px;">
<div class="form-group">
<label for="sourcePolitical">Politische Ausrichtung</label>
<select id="sourcePolitical">
<option value="">— unverändert —</option>
<option value="na">Nicht eingeordnet</option>
<option value="links_extrem">Links (extrem)</option>
<option value="links">Links</option>
<option value="mitte_links">Mitte-Links</option>
<option value="liberal">Liberal</option>
<option value="mitte">Mitte</option>
<option value="konservativ">Konservativ</option>
<option value="mitte_rechts">Mitte-Rechts</option>
<option value="rechts">Rechts</option>
<option value="rechts_extrem">Rechts (extrem)</option>
</select>
</div>
<div class="form-group">
<label for="sourceMediaType">Medientyp</label>
<select id="sourceMediaType">
<option value="">— unverändert —</option>
<option value="sonstige">Sonstige</option>
<option value="tageszeitung">Tageszeitung</option>
<option value="wochenzeitung">Wochenzeitung</option>
<option value="magazin">Magazin</option>
<option value="tv_sender">TV-Sender</option>
<option value="radio">Radio</option>
<option value="oeffentlich_rechtlich">Öffentlich-Rechtlich</option>
<option value="nachrichtenagentur">Nachrichtenagentur</option>
<option value="online_only">Online-only</option>
<option value="blog">Blog</option>
<option value="telegram_kanal">Telegram-Kanal</option>
<option value="telegram_bot">Telegram-Bot</option>
<option value="podcast">Podcast</option>
<option value="social_media">Social Media</option>
<option value="imageboard">Imageboard</option>
<option value="think_tank">Think Tank</option>
<option value="ngo">NGO</option>
<option value="behoerde">Behörde</option>
<option value="staatsmedium">Staatsmedium</option>
<option value="fachmedium">Fachmedium</option>
</select>
</div>
<div class="form-group">
<label for="sourceReliability">Glaubwürdigkeit</label>
<select id="sourceReliability">
<option value="">— unverändert —</option>
<option value="na">Nicht eingeordnet</option>
<option value="sehr_hoch">Sehr hoch</option>
<option value="hoch">Hoch</option>
<option value="gemischt">Gemischt</option>
<option value="niedrig">Niedrig</option>
<option value="sehr_niedrig">Sehr niedrig</option>
</select>
</div>
</div>
<div style="display:grid;grid-template-columns:1fr 1fr;gap:12px;margin-top:8px;">
<div class="form-group">
<label for="sourceCountryCode">Land (ISO 3166)</label>
<input type="text" id="sourceCountryCode" maxlength="2" placeholder="z.B. DE, RU, US" style="text-transform:uppercase;">
</div>
<div class="form-group">
<label class="checkbox-label" style="display:flex;align-items:center;gap:8px;margin-top:24px;">
<input type="checkbox" id="sourceStateAffiliated">
<span>Staatsnah / staatlich kontrolliert</span>
</label>
</div>
</div>
<div class="form-group" style="margin-top:8px;">
<label>Geopolitische Nähe (Mehrfachauswahl)</label>
<div id="sourceAlignmentChips" class="alignment-chips" onclick="handleAlignmentChipClick(event)">
<button type="button" class="alignment-chip" data-alignment="prorussisch">prorussisch</button>
<button type="button" class="alignment-chip" data-alignment="proiranisch">proiranisch</button>
<button type="button" class="alignment-chip" data-alignment="prowestlich">prowestlich</button>
<button type="button" class="alignment-chip" data-alignment="proukrainisch">proukrainisch</button>
<button type="button" class="alignment-chip" data-alignment="prochinesisch">prochinesisch</button>
<button type="button" class="alignment-chip" data-alignment="projapanisch">projapanisch</button>
<button type="button" class="alignment-chip" data-alignment="proisraelisch">proisraelisch</button>
<button type="button" class="alignment-chip" data-alignment="propalaestinensisch">propalästinensisch</button>
<button type="button" class="alignment-chip" data-alignment="protuerkisch">protürkisch</button>
<button type="button" class="alignment-chip" data-alignment="panarabisch">panarabisch</button>
<button type="button" class="alignment-chip" data-alignment="neutral">neutral</button>
<button type="button" class="alignment-chip" data-alignment="sonstige">sonstige</button>
</div>
</div>
</div>
<div id="sourceError" class="error-msg" style="display:none"></div>
</div>
<div class="modal-footer">
@@ -648,6 +859,59 @@
</div>
</div>
<!-- Modal: PDF hochladen -->
<div class="modal-overlay" id="modalPdfUpload">
<div class="modal">
<div class="modal-header">
<h3>PDF als Quelle hochladen</h3>
<button class="modal-close" onclick="closeModal(&#39;modalPdfUpload&#39;)">&times;</button>
</div>
<form id="pdfUploadForm" enctype="multipart/form-data">
<div class="modal-body">
<p class="text-secondary" style="margin-top:0;">
Die PDF wird gespeichert und vom Monitor automatisch verarbeitet:
Text extrahieren (OCR-Fallback fuer gescannte Dokumente),
Übersetzung nach Deutsch und Englisch.
</p>
<div class="form-group">
<label for="pdfFile">PDF-Datei (max. 50 MB)</label>
<input type="file" id="pdfFile" accept="application/pdf,.pdf" required>
</div>
<div class="form-group">
<label for="pdfName">Anzeige-Name (optional)</label>
<input type="text" id="pdfName" maxlength="200" placeholder="leer = Dateiname">
</div>
<div style="display:grid;grid-template-columns:1fr 1fr;gap:12px;">
<div class="form-group">
<label for="pdfCategory">Kategorie</label>
<select id="pdfCategory">
<option value="sonstige" selected>Sonstige</option>
<option value="behoerde">Behörde</option>
<option value="think-tank">Think-Tank</option>
<option value="fachmedien">Fachmedien</option>
<option value="international">International</option>
</select>
</div>
<div class="form-group">
<label for="pdfLanguage">Sprache (optional)</label>
<input type="text" id="pdfLanguage" list="languageSuggestions" placeholder="z.B. Deutsch, Englisch">
</div>
</div>
<div class="form-group">
<label for="pdfNotes">Notizen</label>
<input type="text" id="pdfNotes" placeholder="Optional">
</div>
<div id="pdfUploadError" class="error-msg" style="display:none"></div>
<div id="pdfUploadProgress" class="text-secondary" style="display:none;margin-top:8px;">Lädt hoch …</div>
</div>
<div class="modal-footer">
<button type="button" class="btn btn-secondary" onclick="closeModal(&#39;modalPdfUpload&#39;)">Abbrechen</button>
<button type="submit" class="btn btn-primary" id="pdfUploadSubmitBtn">Hochladen</button>
</div>
</form>
</div>
</div>
<!-- Modal: Discover Sources -->
<div class="modal-overlay" id="modalDiscover">
<div class="modal" style="max-width:600px;">
@@ -706,9 +970,78 @@
</div>
</div>
<script src="/static/js/app.js?v=20260509d"></script>
<script src="/static/js/sources.js?v=20260509d"></script>
<script src="/static/js/source-health.js?v=20260509d"></script>
<!-- Modal: X-Recherche-Konto hinzufügen -->
<div class="modal-overlay" id="modalXScraperAdd">
<div class="modal">
<div class="modal-header">
<h3>X-Recherche-Konto hinzufügen</h3>
<button class="modal-close" onclick="closeModal('modalXScraperAdd')">&times;</button>
</div>
<form id="xScraperAddForm">
<div class="modal-body">
<div class="form-group">
<label for="xsUsername">X-Benutzername</label>
<input type="text" id="xsUsername" required placeholder="Login-Handle des Kontos, ohne @">
</div>
<div class="form-group">
<label for="xsPassword">X-Passwort</label>
<input type="password" id="xsPassword" placeholder="optional">
</div>
<div class="form-group">
<label for="xsEmail">E-Mail</label>
<input type="text" id="xsEmail" placeholder="optional, z.B. konto@protonmail.com">
</div>
<div class="form-group">
<label for="xsEmailPassword">E-Mail-Passwort</label>
<input type="password" id="xsEmailPassword" placeholder="optional">
</div>
<div class="form-group">
<label for="xsCookies">Cookies</label>
<textarea id="xsCookies" rows="3" required placeholder="auth_token=...; ct0=..."></textarea>
<small class="text-secondary">Aus dem eingeloggten X-Browser exportiert, mindestens auth_token und ct0.</small>
</div>
<div id="xScraperAddError" class="error-msg" style="display:none"></div>
</div>
<div class="modal-footer">
<button type="button" class="btn btn-secondary" onclick="closeModal('modalXScraperAdd')">Abbrechen</button>
<button type="submit" class="btn btn-primary">Konto anlegen</button>
</div>
</form>
</div>
</div>
<!-- Modal: X-Recherche-Konto Cookies erneuern -->
<div class="modal-overlay" id="modalXScraperCookies">
<div class="modal">
<div class="modal-header">
<h3>Cookies erneuern</h3>
<button class="modal-close" onclick="closeModal('modalXScraperCookies')">&times;</button>
</div>
<form id="xScraperCookiesForm">
<div class="modal-body">
<div class="form-group">
<label for="xsCookiesUsername">Konto</label>
<input type="text" id="xsCookiesUsername" readonly>
</div>
<div class="form-group">
<label for="xsCookiesValue">Neue Cookies</label>
<textarea id="xsCookiesValue" rows="3" required placeholder="auth_token=...; ct0=..."></textarea>
<small class="text-secondary">Frisch aus dem eingeloggten X-Browser exportieren.</small>
</div>
<div id="xScraperCookiesError" class="error-msg" style="display:none"></div>
</div>
<div class="modal-footer">
<button type="button" class="btn btn-secondary" onclick="closeModal('modalXScraperCookies')">Abbrechen</button>
<button type="submit" class="btn btn-primary">Cookies setzen</button>
</div>
</form>
</div>
</div>
<script src="/static/js/app.js?v=20260522a"></script>
<script src="/static/js/sources.js?v=20260522x2"></script>
<script src="/static/js/x-scraper.js?v=20260522a"></script>
<script src="/static/js/source-health.js?v=20260509l"></script>
<script src="/static/js/audit.js?v=20260509d"></script>
<div id="toastContainer" class="toast-container" aria-live="polite" aria-atomic="true"></div>
</body>

Datei anzeigen

@@ -26,6 +26,23 @@ const API = {
post(path, body) { return this.request(path, { method: "POST", body: JSON.stringify(body) }); },
put(path, body) { return this.request(path, { method: "PUT", body: body ? JSON.stringify(body) : undefined }); },
del(path) { return this.request(path, { method: "DELETE" }); },
async upload(path, formData) {
const headers = {};
if (this.token) headers["Authorization"] = `Bearer ${this.token}`;
const res = await fetch(path, { method: "POST", headers, body: formData });
if (res.status === 401) {
localStorage.removeItem("token");
localStorage.removeItem("username");
window.location.href = "/";
return;
}
if (!res.ok) {
const data = await res.json().catch(() => ({}));
throw new Error(data.detail || `Fehler ${res.status}`);
}
return res.json();
},
};
// --- State ---
@@ -42,8 +59,10 @@ document.addEventListener("DOMContentLoaded", () => {
setupNavTabs();
setupOrgDetailTabs();
setupForms();
setupTranslation();
loadDashboard();
loadDashboardTokenStats();
loadTranslationStatus();
loadOrgs();
});
@@ -55,14 +74,15 @@ function logout() {
// --- Navigation ---
function setupNavTabs() {
document.querySelectorAll(".nav-tabs:not(#orgDetailTabs):not(#sourceSubTabs) .nav-tab").forEach(tab => {
document.querySelectorAll(".nav-tabs:not(#orgDetailTabs):not(#sourceSubTabs):not(#healthSubTabs) .nav-tab").forEach(tab => {
tab.addEventListener("click", () => {
const section = tab.dataset.section;
document.querySelectorAll(".nav-tabs:not(#orgDetailTabs):not(#sourceSubTabs) .nav-tab").forEach(t => t.classList.remove("active"));
document.querySelectorAll(".nav-tabs:not(#orgDetailTabs):not(#sourceSubTabs):not(#healthSubTabs) .nav-tab").forEach(t => t.classList.remove("active"));
tab.classList.add("active");
document.querySelectorAll(".app-content > .section").forEach(s => s.classList.remove("active"));
document.getElementById(`sec-${section}`).classList.add("active");
if (section === "dashboard") loadTranslationStatus();
if (section === "licenses") loadExpiringLicenses();
if (section === "audit" && typeof loadAudit === "function") loadAudit();
});
@@ -213,6 +233,8 @@ async function openOrg(orgId) {
document.getElementById("editOrgName").value = org.name;
document.getElementById("editOrgActive").value = org.is_active ? "true" : "false";
const langEl = document.getElementById("editOrgLanguage");
if (langEl) langEl.value = org.output_language || "de";
loadOrgUsers(orgId);
loadOrgLicenses(orgId);
@@ -405,7 +427,7 @@ document.addEventListener("DOMContentLoaded", () => {
function switchToOrg(orgId) {
// Switch to orgs tab and open detail
document.querySelectorAll(".nav-tabs:not(#orgDetailTabs):not(#sourceSubTabs) .nav-tab").forEach(t => t.classList.remove("active"));
document.querySelectorAll(".nav-tabs:not(#orgDetailTabs):not(#sourceSubTabs):not(#healthSubTabs) .nav-tab").forEach(t => t.classList.remove("active"));
document.querySelector('.nav-tab[data-section="orgs"]').classList.add("active");
document.querySelectorAll(".app-content > .section").forEach(s => s.classList.remove("active"));
document.getElementById("sec-orgs").classList.add("active");
@@ -424,6 +446,7 @@ function setupForms() {
await API.post("/api/orgs", {
name: document.getElementById("newOrgName").value,
slug: document.getElementById("newOrgSlug").value,
output_language: document.getElementById("newOrgLanguage").value || "de",
});
closeModal("modalNewOrg");
document.getElementById("newOrgForm").reset();
@@ -518,6 +541,7 @@ function setupForms() {
await API.put(`/api/orgs/${currentOrgId}`, {
name: document.getElementById("editOrgName").value,
is_active: document.getElementById("editOrgActive").value === "true",
output_language: document.getElementById("editOrgLanguage").value || "de",
});
openOrg(currentOrgId);
loadOrgs();
@@ -631,6 +655,151 @@ function formatDate(iso) {
}
// ===== Artikel-Übersetzung =====
let translationPollTimer = null;
function setupTranslation() {
const runBtn = document.getElementById("translationRunBtn");
const cancelBtn = document.getElementById("translationCancelBtn");
if (runBtn) runBtn.addEventListener("click", startTranslation);
if (cancelBtn) cancelBtn.addEventListener("click", cancelTranslation);
}
function formatDuration(seconds) {
seconds = Math.max(0, Math.round(seconds || 0));
if (seconds < 60) return seconds + " Sek.";
const min = Math.round(seconds / 60);
if (min < 60) return min + " Min.";
const h = Math.floor(min / 60), m = min % 60;
return h + " Std. " + (m ? m + " Min." : "").trim();
}
function renderTranslation(st) {
const info = document.getElementById("translationInfo");
const wrap = document.getElementById("translationProgressWrap");
const bar = document.getElementById("translationProgressBar");
const ptext = document.getElementById("translationProgressText");
const runBtn = document.getElementById("translationRunBtn");
const cancelBtn = document.getElementById("translationCancelBtn");
if (!info || !runBtn) return;
if (st.running) {
runBtn.style.display = "none";
cancelBtn.style.display = "";
wrap.style.display = "";
const pct = st.total > 0 ? Math.round((st.done / st.total) * 100) : 0;
bar.style.width = pct + "%";
ptext.textContent = `${st.done} / ${st.total} verarbeitet, ${st.translated} übersetzt (${pct}%)`;
info.textContent = "Übersetzung läuft…";
return;
}
runBtn.style.display = "";
cancelBtn.style.display = "none";
wrap.style.display = "none";
let resultLine = "";
if (st.finished_at && (st.total > 0 || st.error)) {
if (st.error) {
resultLine = `Letzter Lauf mit Fehler beendet: ${st.error}. `;
} else if (st.cancelled) {
resultLine = `Letzter Lauf abgebrochen, ${st.translated} von ${st.total} Artikeln übersetzt. `;
} else {
resultLine = `Letzter Lauf abgeschlossen, ${st.translated} Artikel übersetzt. `;
}
}
if (st.pending > 0) {
const est = st.estimate || {};
info.textContent = resultLine +
`${st.pending} Artikel ohne deutsche Übersetzung. ` +
`Geschätzt: ${formatDuration(est.seconds)}, ca. $${est.cost_usd}.`;
runBtn.disabled = false;
} else {
info.textContent = resultLine + "Alle Artikel sind übersetzt.";
runBtn.disabled = true;
}
}
async function loadTranslationStatus() {
try {
const st = await API.get("/api/translation/status");
renderTranslation(st);
if (st.running && !translationPollTimer) {
translationPollTimer = setInterval(pollTranslation, 3000);
}
} catch (e) {
const info = document.getElementById("translationInfo");
if (info) info.textContent = "Status nicht abrufbar: " + (e.message || e);
}
}
async function pollTranslation() {
try {
const st = await API.get("/api/translation/status");
renderTranslation(st);
if (!st.running) {
clearInterval(translationPollTimer);
translationPollTimer = null;
if (st.error) {
showToast("Übersetzung mit Fehler beendet", "error");
} else if (st.cancelled) {
showToast(`Übersetzung abgebrochen, ${st.translated} übersetzt`, "info");
} else {
showToast(`Übersetzung fertig: ${st.translated} Artikel`, "success");
}
}
} catch (e) {
console.warn("Translation-Poll fehlgeschlagen:", e);
}
}
async function startTranslation() {
let st;
try {
st = await API.get("/api/translation/status");
} catch (e) {
showToast(e.message || "Status nicht abrufbar", "error");
return;
}
if (st.running) { showToast("Es läuft bereits eine Übersetzung", "info"); return; }
if (!st.pending) { showToast("Es gibt nichts zu übersetzen", "info"); return; }
const est = st.estimate || {};
const ok = await showConfirm(
"Übersetzung starten",
`${st.pending} Artikel werden ins Deutsche übersetzt. ` +
`Geschätzte Dauer: ${formatDuration(est.seconds)}, geschätzte Kosten: ca. $${est.cost_usd}. ` +
`Der Lauf kann jederzeit abgebrochen werden.`
);
if (!ok) return;
try {
const res = await API.post("/api/translation/run", {});
if (res && res.status === "started") {
showToast(`Übersetzung gestartet (${res.pending} Artikel)`, "success");
await loadTranslationStatus();
if (!translationPollTimer) {
translationPollTimer = setInterval(pollTranslation, 3000);
}
} else {
showToast("Es gibt nichts zu übersetzen", "info");
loadTranslationStatus();
}
} catch (e) {
showToast(e.message || "Start fehlgeschlagen", "error");
}
}
async function cancelTranslation() {
try {
await API.post("/api/translation/cancel", {});
showToast("Übersetzung wird abgebrochen…", "info");
} catch (e) {
showToast(e.message || "Abbruch fehlgeschlagen", "error");
}
}
// ===== Token-Nutzung =====
async function loadOrgTokenUsage(orgId) {
try {

Datei anzeigen

@@ -3,9 +3,21 @@
let healthData = null;
let suggestionsCache = [];
let healthFilters = { status: "", check_type: "", org: "all" };
// Default-Filter zeigt nur Probleme (errors + warnings); OK ist meistens Rauschen.
// "issues" ist ein virtueller Status-Wert, den nur das Frontend versteht (siehe applyHealthFilter).
let healthFilters = { status: "issues", check_type: "", org: "all" };
let healthHistoryCache = [];
// 60-Sekunden-Cache, damit Tab-Wechsel nicht jedes Mal die volle Antwort neu lädt.
// Bei Mutationen (Vorschlag annehmen/ablehnen, run-stream, search-fix) wird mit force=true neu geladen.
// Cache-Key beinhaltet das aktuelle Limit, damit "Mehr laden" nicht aus altem Cache bedient wird.
let healthDataCache = { health: null, suggestions: null, history: null, ts: 0, limit: 0 };
const HEALTH_CACHE_TTL_MS = 60000;
// Default-Pagination: 100 Items reichen meistens (errors+warnings stehen vorne, ok-Status hinten).
// Wird durch loadMoreHealth() / loadAllHealth() hochgesetzt.
let healthLoadLimit = 100;
const CHECK_TYPE_LABELS = {
reachability: "Erreichbarkeit",
@@ -27,6 +39,15 @@ const PRIORITY_LABELS = {
low: "Niedrig",
};
// Lucide-Icons als Inline-SVG-Konstanten (statt CDN-Abhängigkeit oder Emojis).
// 14x14, currentColor erbt vom Button-Style.
const LUCIDE_ICONS = {
check: '<svg xmlns="http://www.w3.org/2000/svg" width="14" height="14" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2.5" stroke-linecap="round" stroke-linejoin="round" style="vertical-align:-2px;"><polyline points="20 6 9 17 4 12"/></svg>',
x: '<svg xmlns="http://www.w3.org/2000/svg" width="14" height="14" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2.5" stroke-linecap="round" stroke-linejoin="round" style="vertical-align:-2px;"><line x1="18" y1="6" x2="6" y2="18"/><line x1="6" y1="6" x2="18" y2="18"/></svg>',
search:'<svg xmlns="http://www.w3.org/2000/svg" width="14" height="14" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2.5" stroke-linecap="round" stroke-linejoin="round" style="vertical-align:-2px;"><circle cx="11" cy="11" r="8"/><line x1="21" y1="21" x2="16.65" y2="16.65"/></svg>',
refresh:'<svg xmlns="http://www.w3.org/2000/svg" width="14" height="14" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" style="vertical-align:-2px;"><polyline points="23 4 23 10 17 10"/><polyline points="1 20 1 14 7 14"/><path d="M3.51 9a9 9 0 0 1 14.85-3.36L23 10M1 14l4.64 4.36A9 9 0 0 0 20.49 15"/></svg>',
};
// --- Init ---
function setupHealthTab() {
const tab = document.querySelector('#sourceSubTabs .nav-tab[data-subtab="source-health"]');
@@ -35,19 +56,49 @@ function setupHealthTab() {
}
}
document.addEventListener("DOMContentLoaded", setupHealthTab);
// Sub-Sub-Tabs innerhalb von Quellen-Health: Vorschläge / Health-Status / Verlauf.
function setupHealthSubTabs() {
document.querySelectorAll("#healthSubTabs .nav-tab").forEach((tab) => {
tab.addEventListener("click", () => {
const which = tab.dataset.healthtab;
document.querySelectorAll("#healthSubTabs .nav-tab").forEach(t => t.classList.remove("active"));
tab.classList.add("active");
["suggestions", "checks", "verlauf"].forEach(name => {
const pane = document.getElementById("ht-" + name);
if (pane) pane.style.display = name === which ? "block" : "none";
});
});
});
}
document.addEventListener("DOMContentLoaded", () => {
setupHealthTab();
setupHealthSubTabs();
});
// --- Health-Daten laden ---
async function loadHealthData() {
async function loadHealthData(force = false) {
const now = Date.now();
if (!force
&& healthDataCache.health
&& healthDataCache.limit === healthLoadLimit
&& (now - healthDataCache.ts) < HEALTH_CACHE_TTL_MS) {
healthData = healthDataCache.health;
suggestionsCache = healthDataCache.suggestions;
healthHistoryCache = healthDataCache.history;
renderHealthDashboard();
return;
}
try {
const [health, suggestions, history] = await Promise.all([
API.get("/api/sources/health"),
API.get("/api/sources/health?limit=" + healthLoadLimit),
API.get("/api/sources/suggestions"),
API.get("/api/sources/health/history?limit=10").catch(() => []),
]);
healthData = health;
suggestionsCache = suggestions;
healthHistoryCache = history || [];
healthDataCache = { health, suggestions, history: history || [], ts: Date.now(), limit: healthLoadLimit };
renderHealthDashboard();
} catch (err) {
console.error("Health-Daten laden fehlgeschlagen:", err);
@@ -56,9 +107,22 @@ async function loadHealthData() {
}
}
// Pagination-Steuerung: hochsetzen + neu laden
function loadMoreHealth() {
healthLoadLimit += 200;
loadHealthData(true);
}
function loadAllHealth() {
healthLoadLimit = 100000;
loadHealthData(true);
}
function applyHealthFilter(checks) {
return checks.filter(c => {
if (healthFilters.status && c.status !== healthFilters.status) return false;
// "issues" = Sammelfilter für errors + warnings (Default)
if (healthFilters.status === "issues" && c.status === "ok") return false;
if (healthFilters.status && healthFilters.status !== "issues" && c.status !== healthFilters.status) return false;
if (healthFilters.check_type && c.check_type !== healthFilters.check_type) return false;
if (healthFilters.org === "global" && c.tenant_id !== null) return false;
if (healthFilters.org !== "all" && healthFilters.org !== "global"
@@ -73,8 +137,11 @@ function setHealthFilter(field, value) {
}
function renderHealthDashboard() {
const container = document.getElementById("healthContent");
if (!container) return;
// Drei Sub-Panes (statt einer monolithischen Health-Section).
const paneSuggestions = document.getElementById("ht-suggestions");
const paneChecks = document.getElementById("ht-checks");
const paneVerlauf = document.getElementById("ht-verlauf");
if (!paneSuggestions || !paneChecks || !paneVerlauf) return;
// Vorschläge rendern
const pendingSuggestions = suggestionsCache.filter((s) => s.status === "pending");
@@ -106,13 +173,13 @@ function renderHealthDashboard() {
<tr>
<td><span class="badge badge-suggestion-${s.suggestion_type}">${SUGGESTION_TYPE_LABELS[s.suggestion_type] || s.suggestion_type}</span></td>
<td>${esc(s.title)}</td>
<td class="text-secondary" style="max-width:300px;">${esc(s.description || "")}</td>
<td class="text-secondary" style="max-width:300px; white-space:nowrap; overflow:hidden; text-overflow:ellipsis;" title="${esc(s.description || "")}">${esc(s.description || "")}</td>
<td><span class="badge badge-priority-${s.priority}">${PRIORITY_LABELS[s.priority] || s.priority}</span></td>
<td class="text-secondary">${formatDateTime(s.created_at)}</td>
<td style="white-space:nowrap;">
${s.suggestion_type === "deactivate_source" && s.source_id ? `<button class="btn btn-secondary btn-small" data-source-id="${s.source_id}" data-source-name="${esc(s.title.split(':')[0] || s.title)}" onclick="searchFix(this)">Lösung suchen</button> ` : ""}
<button class="btn btn-success btn-small" onclick="handleSuggestion(${s.id}, true)">Annehmen</button>
<button class="btn btn-danger btn-small" onclick="handleSuggestion(${s.id}, false)">Ablehnen</button>
${s.suggestion_type === "deactivate_source" && s.source_id ? `<button class="btn btn-secondary btn-small" data-source-id="${s.source_id}" data-source-name="${esc(s.title.split(':')[0] || s.title)}" onclick="searchFix(this)" title="Lösung suchen">${LUCIDE_ICONS.search}</button> ` : ""}
<button class="btn btn-success btn-small" onclick="handleSuggestion(${s.id}, true)" title="Annehmen">${LUCIDE_ICONS.check}</button>
<button class="btn btn-danger btn-small" onclick="handleSuggestion(${s.id}, false)" title="Ablehnen">${LUCIDE_ICONS.x}</button>
</td>
</tr>`,
)
@@ -129,20 +196,23 @@ function renderHealthDashboard() {
</div>`;
}
// Vergangene Vorschläge
// Vergangene Vorschläge - eingeklappt by default, weil rein historisch.
let historyHtml = "";
if (recentSuggestions.length > 0) {
const shown = recentSuggestions.slice(0, 20);
historyHtml = `
<div class="card" style="margin-bottom:16px;">
<div class="card-header"><h2>Verlauf</h2></div>
<div class="table-wrap">
<details class="card" style="margin-bottom:16px;">
<summary style="cursor:pointer; padding:14px 18px; list-style:none;">
<span style="color:var(--accent, #C8A851); font-weight:600; font-size:1.02rem;">Verlauf</span>
<span class="text-secondary" style="font-size:13px; margin-left:8px;">(${recentSuggestions.length} erledigte Vorschläge - klick zum Aufklappen)</span>
</summary>
<div class="table-wrap" style="border-top:1px solid var(--border, rgba(255,255,255,0.08));">
<table>
<thead>
<tr><th>Typ</th><th>Titel</th><th>Status</th><th>Bearbeitet</th></tr>
</thead>
<tbody>
${recentSuggestions
.slice(0, 20)
${shown
.map(
(s) => `
<tr>
@@ -156,7 +226,7 @@ function renderHealthDashboard() {
</tbody>
</table>
</div>
</div>`;
</details>`;
}
// Health-Check Ergebnisse
@@ -165,17 +235,48 @@ function renderHealthDashboard() {
// Filter anwenden
const allChecks = healthData.checks;
const filtered = applyHealthFilter(allChecks);
const okCount = healthData.checks.filter((c) => c.status === "ok").length;
// Counters aus Backend-Aggregat (über Gesamt-Bestand, nicht nur Page)
const okCount = healthData.ok != null ? healthData.ok : healthData.checks.filter((c) => c.status === "ok").length;
const totalAll = healthData.total_checks != null ? healthData.total_checks : allChecks.length;
const hasMore = !!healthData.has_more;
// Org-Liste fuer Dropdown
const orgs = Array.from(new Set(allChecks.map(c => c.tenant_id).filter(t => t != null)))
.map(tid => {
const c = allChecks.find(x => x.tenant_id === tid);
return { id: String(tid), name: c ? c.org_name : `Org ${tid}` };
});
// Org-Liste aus Backend-Liste (volle Liste, auch wenn Page kleiner ist)
const orgs = (healthData.all_orgs || []).map(o => ({ id: String(o.id), name: o.name || ("Org " + o.id) }));
const checkTypes = Array.from(new Set(allChecks.map(c => c.check_type)));
// Counter-Aufgliederung aus Backend-Breakdown (pro check_type x status).
// Beispiel: { reachability: {ok: 281, error: 3, warning: 1}, feed_validity: {...}, stale: {...}, duplicate: {...} }
const breakdown = healthData.breakdown || {};
function breakdownLine(statusKey, cssClass) {
const entries = Object.entries(breakdown)
.map(([ct, byStatus]) => [ct, byStatus[statusKey] || 0])
.filter(([_, n]) => n > 0)
.sort((a, b) => b[1] - a[1]);
if (entries.length === 0) return "";
const total = entries.reduce((s, [, n]) => s + n, 0);
const detail = entries.map(([ct, n]) => `${n} ${CHECK_TYPE_LABELS[ct] || ct}`).join(", ");
const label = statusKey === "error" ? "Fehler" : (statusKey === "warning" ? "Warnungen" : "OK");
return `<span class="${cssClass}" title="${esc(detail)}">${total} ${label}</span> <span class="text-secondary" style="font-size:11px;">(${esc(detail)})</span>`;
}
// Trend-Delta zum vorletzten Run (healthHistoryCache[1]). Index 0 ist
// typischerweise der aktuelle Stand, Index 1 der davor archivierte Run.
// Wenn weniger als 2 Runs in der History: kein Delta anzeigen.
const prevRun = (healthHistoryCache && healthHistoryCache.length > 1) ? healthHistoryCache[1] : null;
function deltaBadge(currentValue, prevValue, badIsUp) {
if (prevValue == null) return "";
const d = currentValue - prevValue;
if (d === 0) return ` <span class="text-secondary" style="font-size:11px;" title="unverändert seit letztem Run">(±0)</span>`;
const sign = d > 0 ? "+" : "";
// badIsUp=true: Anstieg = schlecht (rot), Abnahme = gut (grün). Umgekehrt für OK.
const cls = (badIsUp ? (d > 0) : (d < 0)) ? "text-danger" : "text-success";
return ` <span class="${cls}" style="font-size:11px;" title="seit letztem Run">(${sign}${d})</span>`;
}
const dErr = prevRun ? deltaBadge(healthData.errors, prevRun.errors, true) : "";
const dWarn = prevRun ? deltaBadge(healthData.warnings, prevRun.warnings, true) : "";
const dOk = prevRun ? deltaBadge(okCount, prevRun.ok, false) : "";
healthHtml = `
<div class="card">
<div class="card-header">
@@ -183,14 +284,15 @@ function renderHealthDashboard() {
<span class="text-secondary" style="font-size:13px;">
Letzter Check: ${healthData.last_check ? formatDateTime(healthData.last_check) : "Noch nie"}
&nbsp;|&nbsp;
<span class="text-danger">${healthData.errors} Fehler</span> &nbsp;
<span class="text-warning">${healthData.warnings} Warnungen</span> &nbsp;
<span class="text-success">${okCount} OK</span>
${breakdownLine("error", "text-danger") || `<span class="text-danger">0 Fehler</span>`}${dErr} &nbsp;
${breakdownLine("warning", "text-warning") || `<span class="text-warning">0 Warnungen</span>`}${dWarn} &nbsp;
<span class="text-success">${okCount} OK</span>${dOk}
</span>
</div>
<div class="action-bar" style="border-bottom:1px solid var(--border, rgba(255,255,255,0.08));">
<div style="display:flex;gap:10px;flex-wrap:wrap;align-items:center;">
<select class="filter-select" onchange="setHealthFilter('status', this.value)">
<option value="issues" ${healthFilters.status === "issues" ? "selected" : ""}>Nur Probleme (Default)</option>
<option value="" ${healthFilters.status === "" ? "selected" : ""}>Alle Status</option>
<option value="error" ${healthFilters.status === "error" ? "selected" : ""}>Nur Fehler</option>
<option value="warning" ${healthFilters.status === "warning" ? "selected" : ""}>Nur Warnungen</option>
@@ -205,7 +307,9 @@ function renderHealthDashboard() {
<option value="global" ${healthFilters.org === "global" ? "selected" : ""}>Nur Grundquellen</option>
${orgs.map(o => `<option value="${esc(o.id)}" ${healthFilters.org === o.id ? "selected" : ""}>Org: ${esc(o.name)}</option>`).join("")}
</select>
<span class="text-secondary" style="font-size:13px;">${filtered.length} / ${allChecks.length} Ergebnisse</span>
<span class="text-secondary" style="font-size:13px;">
${filtered.length} / ${allChecks.length} angezeigt${totalAll > allChecks.length ? ` (von ${totalAll} insgesamt)` : ''}
</span>
</div>
</div>`;
@@ -214,30 +318,60 @@ function renderHealthDashboard() {
<div class="table-wrap">
<table>
<thead>
<tr><th>Quelle</th><th>Domain</th><th>Typ</th><th>Org</th><th>Sprache</th><th>Status</th><th>Details</th><th>Aktionen</th></tr>
<tr><th>Quelle</th><th>Typ</th><th>Org</th><th>Status</th><th>Details</th><th>Aktion</th></tr>
</thead>
<tbody>
${filtered
.map(
(c) => `
(c) => {
// Domain + Sprache in Tooltip vom Quellnamen, statt eigene Spalten.
const tipParts = [];
if (c.domain) tipParts.push(c.domain);
if (c.language) tipParts.push(c.language);
const nameTip = tipParts.length ? ` title="${esc(tipParts.join(" · "))}"` : "";
return `
<tr>
<td>${esc(c.name)}</td>
<td class="text-secondary">${esc(c.domain || "-")}</td>
<td${nameTip}>${esc(c.name)}</td>
<td>${CHECK_TYPE_LABELS[c.check_type] || c.check_type}</td>
<td class="text-secondary">${c.tenant_id == null ? '<span style="color:#94a3b8;">global</span>' : esc(c.org_name || ("Org " + c.tenant_id))}</td>
<td class="text-secondary">${esc(c.language || "-")}</td>
<td><span class="badge badge-health-${c.status}">${c.status === "error" ? "Fehler" : (c.status === "warning" ? "Warnung" : "OK")}</span></td>
<td class="text-secondary" style="max-width:250px;" title="${esc(c.message || "")}">${esc(c.message || "")}</td>
<td>${c.status === "error" && c.check_type === "reachability" ? `<button class="btn btn-secondary btn-small" data-source-id="${c.source_id}" data-source-name="${esc(c.name)}" onclick="searchFix(this)">Lösung suchen</button>` : ""}</td>
</tr>`,
<td class="text-secondary" style="max-width:300px;" title="${esc(c.message || "")}">${esc(c.message || "")}</td>
<td>${(
(c.status === "error" && c.check_type === "reachability") ||
(c.status === "warning" && c.check_type === "feed_validity")
) ? `<button class="btn btn-secondary btn-small" data-source-id="${c.source_id}" data-source-name="${esc(c.name)}" onclick="searchFix(this)" title="Lösung suchen">${LUCIDE_ICONS.search}</button>` : ""}</td>
</tr>`;
}
)
.join("")}
</tbody>
</table>
</div>`;
} else if (hasMore) {
// 0 Treffer in der Page, aber es gibt noch ungeladene Items.
// Hinweis, dass der Filter erst über die volle Liste sicher ist.
healthHtml += `
<div class="card-body text-muted">
Keine Treffer in den geladenen ${allChecks.length} von ${totalAll} Items mit dem aktuellen Filter.
<a href="#" onclick="event.preventDefault(); loadAllHealth()" style="text-decoration:underline;">
Alle ${totalAll} Health-Checks laden
</a> und Filter erneut anwenden.
</div>`;
} else {
healthHtml += '<div class="card-body text-muted">Keine Ergebnisse mit diesen Filtern.</div>';
}
// Footer mit Mehr-laden-Buttons, falls Backend has_more meldet
if (hasMore) {
const remaining = Math.max(0, totalAll - allChecks.length);
healthHtml += `
<div class="card-body" style="display:flex;justify-content:center;gap:10px;align-items:center;border-top:1px solid var(--border, rgba(255,255,255,0.08));">
<span class="text-secondary" style="font-size:13px;">${allChecks.length} von ${totalAll} geladen</span>
<button class="btn btn-secondary btn-small" onclick="loadMoreHealth()">+200 laden</button>
<button class="btn btn-secondary btn-small" onclick="loadAllHealth()">Alle ${remaining} weiteren laden</button>
</div>`;
}
healthHtml += "</div>";
} else {
healthHtml = `
@@ -247,24 +381,39 @@ function renderHealthDashboard() {
</div>`;
}
// History-View: letzte Runs
// History-View: letzte Runs. Kompakt: Total raus (= errors+warnings+ok),
// Spalten-Widths explizit, Zahlen zentriert, Run-ID gekürzt + leiser.
let runsHtml = "";
if (healthHistoryCache.length > 0) {
runsHtml = `
<div class="card" style="margin-bottom:16px;">
<div class="card-header"><h2>Verlauf der Health-Check-Runs</h2></div>
<div class="table-wrap">
<table>
<thead><tr><th>Zeitpunkt (Run-Ende)</th><th>Run-ID</th><th>Total</th><th>Fehler</th><th>Warnungen</th><th>OK</th></tr></thead>
<table style="table-layout:fixed; width:100%;">
<colgroup>
<col style="width:200px;">
<col style="width:160px;">
<col style="width:110px;">
<col style="width:130px;">
<col style="width:110px;">
</colgroup>
<thead>
<tr>
<th>Zeitpunkt</th>
<th>Run-ID</th>
<th style="text-align:center;">Fehler</th>
<th style="text-align:center;">Warnungen</th>
<th style="text-align:center;">OK</th>
</tr>
</thead>
<tbody>
${healthHistoryCache.map(r => `
<tr>
<td>${formatDateTime(r.archived_at)}</td>
<td class="text-secondary"><code>${esc(r.run_id)}</code></td>
<td class="text-right">${r.total}</td>
<td class="text-right text-danger">${r.errors || 0}</td>
<td class="text-right text-warning">${r.warnings || 0}</td>
<td class="text-right text-success">${r.ok || 0}</td>
<td class="text-secondary" style="font-size:12px;" title="${esc(r.run_id)}"><code>${esc(String(r.run_id || "").slice(0, 12))}</code></td>
<td class="text-danger" style="text-align:center;">${r.errors || 0}</td>
<td class="text-warning" style="text-align:center;">${r.warnings || 0}</td>
<td class="text-success" style="text-align:center;">${r.ok || 0}</td>
</tr>`).join("")}
</tbody>
</table>
@@ -272,7 +421,17 @@ function renderHealthDashboard() {
</div>`;
}
container.innerHTML = suggestionsHtml + historyHtml + healthHtml + runsHtml;
// Statt einer monolithischen Render: drei Sub-Panes, einer pro Sub-Tab.
paneSuggestions.innerHTML = suggestionsHtml;
paneChecks.innerHTML = healthHtml;
paneVerlauf.innerHTML = historyHtml + runsHtml;
// Tab-Label "Vorschläge" mit Counter der offenen Vorschläge anreichern.
const tabBtnSugg = document.querySelector('#healthSubTabs .nav-tab[data-healthtab="suggestions"]');
if (tabBtnSugg) {
const open = pendingSuggestions.length;
tabBtnSugg.textContent = open > 0 ? `Vorschläge (${open} offen)` : "Vorschläge";
}
}
// --- Vorschlag annehmen/ablehnen ---
@@ -289,7 +448,7 @@ async function handleSuggestion(id, accept) {
if (result.action) {
showToast("Ergebnis: " + result.action, "success");
}
loadHealthData();
loadHealthData(true);
// Grundquellen-Liste auch aktualisieren
if (typeof loadGlobalSources === "function") loadGlobalSources();
} catch (err) {
@@ -371,7 +530,7 @@ async function runHealthCheck() {
}
}
loadHealthData();
loadHealthData(true);
} catch (err) {
progressEl.innerHTML = '<span class="text-danger">Fehler: ' + esc(err.message) + '</span>';
} finally {
@@ -419,7 +578,7 @@ async function searchFix(btn) {
msg += `\n\nKosten: $${result.cost_usd.toFixed(2)}`;
}
showToast(msg, "info");
loadHealthData();
loadHealthData(true);
} catch (err) {
showToast("Fehler: " + err.message, "error");
} finally {

Datei anzeigen

@@ -37,6 +37,8 @@ function setupSourceSubTabs() {
if (subtab === "global-sources") loadGlobalSources();
else if (subtab === "tenant-sources") loadTenantSources();
else if (subtab === "source-health") loadHealthData();
else if (subtab === "classification-review") loadClassificationQueue();
else if (subtab === "x-scraper") loadXScraperAccounts();
});
});
}
@@ -280,6 +282,7 @@ function openNewGlobalSource() {
editingSourceId = null;
document.getElementById("sourceModalTitle").textContent = "Neue Grundquelle";
document.getElementById("sourceForm").reset();
setAlignmentChips([]);
openModal("modalSource");
}
@@ -298,11 +301,19 @@ function editGlobalSource(id) {
document.getElementById("sourceLanguage").value = s.language || "";
document.getElementById("sourceBias").value = s.bias || "";
document.getElementById("sourceFetchStrategy").value = s.fetch_strategy || "default";
document.getElementById("sourcePolitical").value = s.political_orientation || "";
document.getElementById("sourceMediaType").value = s.media_type || "";
document.getElementById("sourceReliability").value = s.reliability || "";
document.getElementById("sourceCountryCode").value = s.country_code || "";
document.getElementById("sourceStateAffiliated").checked = !!s.state_affiliated;
setAlignmentChips(s.alignments || []);
openModal("modalSource");
}
function setupSourceForms() {
document.getElementById("newGlobalSourceBtn").addEventListener("click", openNewGlobalSource);
document.getElementById("newPdfSourceBtn")?.addEventListener("click", openPdfUploadModal);
setupPdfUploadForm();
document.getElementById("discoverSourceBtn").addEventListener("click", () => {
document.getElementById("discoverUrl").value = "";
document.getElementById("discoverStatus").style.display = "none";
@@ -328,6 +339,19 @@ function setupSourceForms() {
fetch_strategy: document.getElementById("sourceFetchStrategy").value || "default",
};
const pol = document.getElementById("sourcePolitical")?.value;
if (pol) body.political_orientation = pol;
const mt = document.getElementById("sourceMediaType")?.value;
if (mt) body.media_type = mt;
const rel = document.getElementById("sourceReliability")?.value;
if (rel) body.reliability = rel;
const cc = (document.getElementById("sourceCountryCode")?.value || "").trim().toUpperCase();
if (cc) body.country_code = cc;
if (editingSourceId) {
body.state_affiliated = !!document.getElementById("sourceStateAffiliated")?.checked;
body.alignments = getAlignmentChips();
}
try {
if (editingSourceId) {
await API.put("/api/sources/global/" + editingSourceId, body);
@@ -641,6 +665,213 @@ async function addDiscoveredFeeds() {
}
}
// === Klassifikations-Review ===
const POLITICAL_LABELS = {
links_extrem: { short: "L+", full: "Links (extrem)" },
links: { short: "L", full: "Links" },
mitte_links: { short: "ML", full: "Mitte-Links" },
liberal: { short: "LIB", full: "Liberal" },
mitte: { short: "M", full: "Mitte" },
konservativ: { short: "KON", full: "Konservativ" },
mitte_rechts: { short: "MR", full: "Mitte-Rechts" },
rechts: { short: "R", full: "Rechts" },
rechts_extrem: { short: "R+", full: "Rechts (extrem)" },
na: { short: "?", full: "Nicht eingeordnet" },
};
const RELIABILITY_LABELS = {
sehr_hoch: "Sehr hoch", hoch: "Hoch", gemischt: "Gemischt",
niedrig: "Niedrig", sehr_niedrig: "Sehr niedrig", na: "Nicht eingeordnet",
};
const MEDIA_TYPE_LABELS = {
tageszeitung: "Tageszeitung", wochenzeitung: "Wochenzeitung", magazin: "Magazin",
tv_sender: "TV-Sender", radio: "Radio", oeffentlich_rechtlich: "Öffentlich-Rechtlich",
nachrichtenagentur: "Nachrichtenagentur", online_only: "Online-only", blog: "Blog",
telegram_kanal: "Telegram-Kanal", telegram_bot: "Telegram-Bot", podcast: "Podcast",
social_media: "Social Media", imageboard: "Imageboard", think_tank: "Think Tank",
ngo: "NGO", behoerde: "Behörde", staatsmedium: "Staatsmedium",
fachmedium: "Fachmedium", sonstige: "Sonstige",
};
const ALIGNMENT_LABELS = {
prorussisch: "prorussisch", proiranisch: "proiranisch", prowestlich: "prowestlich",
proukrainisch: "proukrainisch", prochinesisch: "prochinesisch", projapanisch: "projapanisch",
proisraelisch: "proisraelisch", propalaestinensisch: "propalästinensisch",
protuerkisch: "protürkisch", panarabisch: "panarabisch", neutral: "neutral", sonstige: "sonstige",
};
function setAlignmentChips(active) {
const chips = document.querySelectorAll("#sourceAlignmentChips .alignment-chip");
const set = new Set((active || []).map((a) => (a || "").toLowerCase()));
chips.forEach((chip) => {
if (set.has(chip.dataset.alignment)) chip.classList.add("active");
else chip.classList.remove("active");
});
}
function getAlignmentChips() {
return Array.from(document.querySelectorAll("#sourceAlignmentChips .alignment-chip.active"))
.map((chip) => chip.dataset.alignment);
}
function handleAlignmentChipClick(e) {
const chip = e.target.closest(".alignment-chip");
if (!chip) return;
e.preventDefault();
chip.classList.toggle("active");
}
async function refreshClassificationStats() {
try {
const stats = await API.get("/api/sources/classification/stats");
const badge = document.getElementById("classificationPendingBadge");
if (badge) badge.textContent = String(stats.pending_review || 0);
} catch (_) { /* still ok */ }
}
async function loadClassificationQueue() {
const list = document.getElementById("classificationReviewList");
if (!list) return;
const minConf = parseFloat(document.getElementById("reviewMinConfidence")?.value || "0");
list.innerHTML = '<div class="text-muted" style="padding:24px;text-align:center;">Lade…</div>';
try {
const items = await API.get(`/api/sources/classification/queue?limit=200&min_confidence=${minConf}`);
const countEl = document.getElementById("reviewPendingCount");
if (countEl) countEl.textContent = String(items.length);
refreshClassificationStats();
if (items.length === 0) {
list.innerHTML = '<div class="text-muted" style="padding:24px;text-align:center;">Keine ausstehenden Vorschläge.</div>';
return;
}
list.innerHTML = items.map((it) => renderClassificationQueueItem(it)).join("");
} catch (err) {
list.innerHTML = `<div class="text-danger" style="padding:24px;text-align:center;">Fehler: ${esc(err.message)}</div>`;
}
}
function renderClassificationQueueItem(item) {
const cur = item.current || {};
const prop = item.proposed || {};
const conf = prop.confidence || 0;
const confPct = Math.round(conf * 100);
const confClass = conf >= 0.85 ? "high" : conf >= 0.7 ? "medium" : "low";
const polFmt = (v) => (v && v !== "na" ? POLITICAL_LABELS[v]?.full || v : "–");
const mtFmt = (v) => (v ? MEDIA_TYPE_LABELS[v] || v : "–");
const relFmt = (v) => (v && v !== "na" ? RELIABILITY_LABELS[v] || v : "–");
const stateFmt = (v) => (v ? "ja" : "nein");
const ccFmt = (v) => v || "–";
const alignFmt = (v) =>
Array.isArray(v) && v.length > 0 ? v.map((a) => ALIGNMENT_LABELS[a] || a).join(", ") : "–";
const row = (label, c, p, fmt) => {
const cs = fmt(c);
const ps = fmt(p);
const changed = cs !== ps;
return `<div class="review-diff-row${changed ? " changed" : ""}">
<span class="review-diff-label">${esc(label)}</span>
<span class="review-diff-current">${esc(cs)}</span>
<span class="review-diff-arrow">→</span>
<span class="review-diff-proposed">${esc(ps)}</span>
</div>`;
};
const reasoning = prop.reasoning ? esc(prop.reasoning) : "";
return `<div class="review-card" data-source-id="${item.id}">
<div class="review-card-header">
<div class="review-card-title">
<span class="review-card-name">${esc(item.name)}</span>
${item.is_global ? '<span class="review-global-badge">Grundquelle</span>' : ""}
<span class="review-card-domain">${esc(item.domain || "")}</span>
</div>
<div class="review-card-confidence conf-${confClass}" title="LLM-Konfidenz">
<span class="conf-value">${confPct}%</span>
<span class="conf-label">Konfidenz</span>
</div>
</div>
<div class="review-card-diff">
${row("Politik", cur.political_orientation, prop.political_orientation, polFmt)}
${row("Medientyp", cur.media_type, prop.media_type, mtFmt)}
${row("Glaubwürdigkeit", cur.reliability, prop.reliability, relFmt)}
${row("Staatsnah", cur.state_affiliated, prop.state_affiliated, stateFmt)}
${row("Land", cur.country_code, prop.country_code, ccFmt)}
${row("Geopol. Nähe", cur.alignments, prop.alignments, alignFmt)}
</div>
${reasoning ? `<div class="review-card-reasoning"><strong>Begründung:</strong> ${reasoning}</div>` : ""}
<div class="review-card-actions">
<button class="btn btn-small btn-primary" onclick="approveClassification(${item.id})">Übernehmen</button>
<button class="btn btn-small btn-secondary" onclick="rejectClassification(${item.id})">Verwerfen</button>
<button class="btn btn-small btn-secondary" data-reclassify-id="${item.id}" onclick="reclassifySource(${item.id})">Neu klassifizieren</button>
</div>
</div>`;
}
async function approveClassification(id) {
try {
await API.post(`/api/sources/${id}/classification/approve`, {});
showToast("Klassifikation übernommen.", "success");
loadClassificationQueue();
} catch (err) {
showToast("Approve fehlgeschlagen: " + err.message, "error");
}
}
async function rejectClassification(id) {
try {
await API.post(`/api/sources/${id}/classification/reject`, {});
showToast("Vorschlag verworfen.", "success");
loadClassificationQueue();
} catch (err) {
showToast("Reject fehlgeschlagen: " + err.message, "error");
}
}
async function reclassifySource(id) {
const btn = document.querySelector(`[data-reclassify-id="${id}"]`);
if (btn) { btn.disabled = true; btn.textContent = "..."; }
try {
await API.post(`/api/sources/${id}/classification/reclassify`, {});
showToast("Neu klassifiziert.", "success");
loadClassificationQueue();
} catch (err) {
showToast("Reclassify fehlgeschlagen: " + err.message, "error");
} finally {
if (btn) { btn.disabled = false; btn.textContent = "Neu klassifizieren"; }
}
}
async function triggerBulkClassify() {
if (!confirm("Bulk-Klassifikation aller noch nicht klassifizierten Quellen starten? Läuft im Hintergrund (~3-5 Sek pro Quelle, ~0.02 USD pro Quelle).")) return;
try {
const r = await API.post("/api/sources/classification/bulk-classify?limit=500&only_unclassified=true", {});
showToast(`Bulk-Klassifikation gestartet (limit=${r.limit}). In ~10 min neu laden.`, "info");
} catch (err) {
showToast("Start fehlgeschlagen: " + err.message, "error");
}
}
async function bulkApproveHighConfidence() {
if (!confirm("Alle Vorschläge mit Konfidenz ≥ 0.85 genehmigen?")) return;
try {
const r = await API.post("/api/sources/classification/bulk-approve?min_confidence=0.85", {});
showToast(`${r.approved} Vorschläge übernommen.`, "success");
loadClassificationQueue();
} catch (err) {
showToast("Bulk-Approve fehlgeschlagen: " + err.message, "error");
}
}
async function triggerExternalReputationSync() {
if (!confirm("IFCN- und EUvsDisinfo-Datenbanken jetzt syncen? Läuft im Hintergrund (~30 Sek).")) return;
try {
await API.post("/api/sources/external-reputation/sync", {});
showToast("Externer Sync gestartet. Quellenliste in ~30 Sek neu laden.", "info");
} catch (err) {
showToast("Sync fehlgeschlagen: " + err.message, "error");
}
}
function toggleSourceInfo(id) {
const row = document.getElementById("notes-" + id);
if (!row) return;
@@ -652,3 +883,68 @@ function toggleSourceInfo(id) {
if (btn) btn.classList.toggle("active", !isVisible);
}
}
// --- PDF-Quellen-Upload ---
function openPdfUploadModal() {
const form = document.getElementById("pdfUploadForm");
if (form) form.reset();
const err = document.getElementById("pdfUploadError");
if (err) { err.style.display = "none"; err.textContent = ""; }
const prog = document.getElementById("pdfUploadProgress");
if (prog) prog.style.display = "none";
openModal("modalPdfUpload");
}
function setupPdfUploadForm() {
const form = document.getElementById("pdfUploadForm");
if (!form || form.dataset.bound === "1") return;
form.dataset.bound = "1";
form.addEventListener("submit", async (e) => {
e.preventDefault();
const errEl = document.getElementById("pdfUploadError");
const progEl = document.getElementById("pdfUploadProgress");
const submitBtn = document.getElementById("pdfUploadSubmitBtn");
errEl.style.display = "none";
const fileInput = document.getElementById("pdfFile");
const f = fileInput?.files?.[0];
if (!f) {
errEl.textContent = "Bitte eine PDF-Datei auswaehlen.";
errEl.style.display = "block";
return;
}
if (f.size > 50 * 1024 * 1024) {
errEl.textContent = "Datei ueberschreitet 50 MB.";
errEl.style.display = "block";
return;
}
const fd = new FormData();
fd.append("file", f);
const nm = document.getElementById("pdfName").value.trim();
if (nm) fd.append("name", nm);
fd.append("category", document.getElementById("pdfCategory").value || "sonstige");
const lng = document.getElementById("pdfLanguage").value.trim();
if (lng) fd.append("language", lng);
const nt = document.getElementById("pdfNotes").value.trim();
if (nt) fd.append("notes", nt);
submitBtn.disabled = true;
progEl.style.display = "block";
try {
await API.upload("/api/sources/global/upload-pdf", fd);
closeModal("modalPdfUpload");
if (typeof showToast === "function") {
showToast("PDF hochgeladen -- Verarbeitung laeuft im Hintergrund", "success");
}
loadGlobalSources();
} catch (err) {
errEl.textContent = err.message || "Upload fehlgeschlagen";
errEl.style.display = "block";
} finally {
submitBtn.disabled = false;
progEl.style.display = "none";
}
});
}

169
src/static/js/x-scraper.js Normale Datei
Datei anzeigen

@@ -0,0 +1,169 @@
/* X-Recherche-Konten: Verwaltung des twscrape-Account-Pools */
"use strict";
let xScraperCache = [];
async function loadXScraperAccounts() {
setupXScraperForms();
const tbody = document.getElementById("xScraperTable");
tbody.innerHTML = '<tr><td colspan="6" class="text-muted">Lade...</td></tr>';
try {
xScraperCache = await API.get("/api/x-scraper/accounts");
renderXScraperAccounts(xScraperCache || []);
} catch (err) {
tbody.innerHTML = '<tr><td colspan="6" class="text-muted">Fehler: ' + esc(err.message || "") + '</td></tr>';
}
}
function renderXScraperAccounts(list) {
const tbody = document.getElementById("xScraperTable");
const cnt = document.getElementById("xScraperCount");
if (cnt) cnt.textContent = list.length + (list.length === 1 ? " Konto" : " Konten");
if (!list.length) {
tbody.innerHTML = '<tr><td colspan="6" class="text-muted">Keine X-Recherche-Konten. Mit „+ Konto hinzufügen" anlegen.</td></tr>';
return;
}
tbody.innerHTML = list.map((a) => {
let status;
if (!a.active) status = '<span class="text-muted">Inaktiv</span>';
else if (a.locked) status = '<span style="color:var(--warning,#b8860b);">Gesperrt</span>';
else status = '<span style="color:var(--success,#2e7d32);">Aktiv</span>';
const lastUsed = a.last_used && typeof formatDateTime === "function"
? formatDateTime(a.last_used)
: (a.last_used || "—");
const errInfo = a.error_msg
? ' <span class="info-icon" title="' + esc(a.error_msg) + '">!</span>'
: "";
const u = esc(a.username);
const toggleLabel = a.active ? "Deaktivieren" : "Aktivieren";
return '<tr>'
+ '<td><strong>' + u + '</strong>' + errInfo + '</td>'
+ '<td>' + esc(a.email || "—") + '</td>'
+ '<td>' + status + '</td>'
+ '<td>' + (a.total_requests || 0) + '</td>'
+ '<td>' + esc(lastUsed) + '</td>'
+ '<td>'
+ '<button class="btn btn-secondary btn-small" onclick="openXScraperCookiesModal(\'' + u + '\')">Cookies erneuern</button> '
+ '<button class="btn btn-secondary btn-small" onclick="toggleXScraperActive(\'' + u + '\',' + (!a.active) + ')">' + toggleLabel + '</button> '
+ '<button class="btn btn-danger btn-small" onclick="confirmDeleteXScraper(\'' + u + '\')">Entfernen</button>'
+ '</td>'
+ '</tr>';
}).join("");
}
function openXScraperAddModal() {
document.getElementById("xScraperAddError").style.display = "none";
["xsUsername", "xsPassword", "xsEmail", "xsEmailPassword", "xsCookies"].forEach((id) => {
const el = document.getElementById(id);
if (el) el.value = "";
});
openModal("modalXScraperAdd");
}
function openXScraperCookiesModal(username) {
document.getElementById("xScraperCookiesError").style.display = "none";
document.getElementById("xsCookiesUsername").value = username;
document.getElementById("xsCookiesValue").value = "";
openModal("modalXScraperCookies");
}
async function toggleXScraperActive(username, active) {
try {
await API.post("/api/x-scraper/accounts/" + encodeURIComponent(username) + "/active", { active: active });
showToast("Status geändert.", "success");
loadXScraperAccounts();
} catch (err) {
showToast(err.message || "Status konnte nicht geändert werden", "error");
}
}
function confirmDeleteXScraper(username) {
showConfirm(
"Konto entfernen",
'Soll das X-Recherche-Konto "' + username + '" entfernt werden? Der Monitor nutzt es dann nicht mehr zum Scrapen.',
async () => {
try {
await API.del("/api/x-scraper/accounts/" + encodeURIComponent(username));
showToast("Konto entfernt.", "success");
loadXScraperAccounts();
} catch (err) {
showToast(err.message || "Konto konnte nicht entfernt werden", "error");
}
}
);
}
function resetXScraperLocks() {
showConfirm(
"Sperren zurücksetzen",
"Alle temporären Sperren der X-Recherche-Konten zurücksetzen?",
async () => {
try {
await API.post("/api/x-scraper/reset-locks", {});
showToast("Sperren zurückgesetzt.", "success");
loadXScraperAccounts();
} catch (err) {
showToast(err.message || "Sperren konnten nicht zurückgesetzt werden", "error");
}
}
);
}
function setupXScraperForms() {
const addForm = document.getElementById("xScraperAddForm");
if (addForm && !addForm.dataset.wired) {
addForm.dataset.wired = "1";
addForm.addEventListener("submit", async (e) => {
e.preventDefault();
const errEl = document.getElementById("xScraperAddError");
errEl.style.display = "none";
const body = {
username: document.getElementById("xsUsername").value.trim().replace(/^@/, ""),
password: document.getElementById("xsPassword").value,
email: document.getElementById("xsEmail").value.trim(),
email_password: document.getElementById("xsEmailPassword").value,
cookies: document.getElementById("xsCookies").value.trim(),
};
if (!body.username || !body.cookies) {
errEl.textContent = "Benutzername und Cookies sind erforderlich.";
errEl.style.display = "block";
return;
}
try {
await API.post("/api/x-scraper/accounts", body);
closeModal("modalXScraperAdd");
showToast("Konto angelegt.", "success");
loadXScraperAccounts();
} catch (err) {
errEl.textContent = err.message || "Anlegen fehlgeschlagen";
errEl.style.display = "block";
}
});
}
const ckForm = document.getElementById("xScraperCookiesForm");
if (ckForm && !ckForm.dataset.wired) {
ckForm.dataset.wired = "1";
ckForm.addEventListener("submit", async (e) => {
e.preventDefault();
const errEl = document.getElementById("xScraperCookiesError");
errEl.style.display = "none";
const username = document.getElementById("xsCookiesUsername").value;
const cookies = document.getElementById("xsCookiesValue").value.trim();
if (!cookies) {
errEl.textContent = "Cookies sind erforderlich.";
errEl.style.display = "block";
return;
}
try {
await API.post("/api/x-scraper/accounts/" + encodeURIComponent(username) + "/cookies", { cookies: cookies });
closeModal("modalXScraperCookies");
showToast("Cookies erneuert.", "success");
loadXScraperAccounts();
} catch (err) {
errEl.textContent = err.message || "Cookies konnten nicht erneuert werden";
errEl.style.display = "block";
}
});
}
}

422
src/translation_agent.py Normale Datei
Datei anzeigen

@@ -0,0 +1,422 @@
"""Translator-Agent: übersetzt fremdsprachige Artikel ins Deutsche.
Verwaltungs-Adaption des gleichnamigen Monitor-Agents. Nutzt CLAUDE_MODEL_FAST
(Haiku) in Batches. Im Verwaltungsportal wird der Translator ausschließlich
manuell über den Übersetzungs-Button (routers/translation.py) angestoßen,
niemals automatisch.
Quelle: AegisSight-Monitor/src/agents/translator.py - bei größeren Änderungen
am Monitor-Original hier nachziehen. Die Imports weichen bewusst ab
(shared.agents.claude_client statt agents.claude_client). Der restliche Code
unterhalb ist eine 1:1-Kopie und behält daher den Stil des Originals.
"""
import json
import logging
import re
from shared.agents.claude_client import call_claude, ClaudeUsage, UsageAccumulator
from config import CLAUDE_MODEL_FAST
logger = logging.getLogger("verwaltung.translation")
# Im Verwaltungsportal gibt es kein automatisches Übersetzen: der Translator
# läuft nur, wenn translate_articles() explizit mit enabled=True gerufen wird.
# Diese Konstante ist daher der konservative Default für enabled=None.
TRANSLATOR_ENABLED = False
# Pro Batch nicht mehr als so viele Artikel an Claude geben.
# Bei Haiku ist das Output-Limit ca. 8k Tokens. Pro Artikel kommen leicht
# 400-600 Tokens raus (headline_de + content_de bis 1000 Zeichen). Bei 15
# wurde regelmaessig getrunkt (mid-JSON broken). 5 ist sicher mit Reserve.
DEFAULT_BATCH_SIZE = 5
# content_original wird ohnehin auf 1000 Zeichen gecappt (rss_parser).
# Fuer den Translator nochmal verkuerzen, falls vorhanden mehr.
CONTENT_INPUT_MAX = 1200
# content_de soll wie content_original auf 1000 Zeichen begrenzt sein.
CONTENT_OUTPUT_MAX = 1000
def _extract_complete_objects(text: str) -> list[dict]:
"""Extrahiert vollstaendige JSON-Objekte aus moeglicherweise abgeschnittenem Text.
Klammer-Counter-Ansatz: jedes balancierte {...} wird probiert.
"""
results = []
depth = 0
start = -1
in_string = False
escape = False
for i, ch in enumerate(text):
if escape:
escape = False
continue
if ch == "\\":
escape = True
continue
if ch == '"' and not escape:
in_string = not in_string
continue
if in_string:
continue
if ch == "{":
if depth == 0:
start = i
depth += 1
elif ch == "}":
depth -= 1
if depth == 0 and start >= 0:
obj_text = text[start:i + 1]
try:
obj = json.loads(obj_text)
if isinstance(obj, dict):
results.append(obj)
except json.JSONDecodeError:
pass
start = -1
return results
def _build_prompt(articles: list[dict], output_lang: str = "de") -> str:
"""Bauen den Translation-Prompt fuer eine Batch."""
lang_label = {"de": "Deutsch", "en": "Englisch"}.get(output_lang, output_lang)
items = []
for a in articles:
items.append({
"id": a["id"],
"headline": a.get("headline", "") or "",
"content": (a.get("content_original") or "")[:CONTENT_INPUT_MAX],
"source_lang": a.get("language", "en"),
})
return f"""Du bist ein praeziser Uebersetzer fuer Nachrichten-Artikel.
Uebersetze die folgenden Artikel nach {lang_label}.
WICHTIG:
- Verwende IMMER echte UTF-8-Umlaute (ä, ö, ü, ß) - NIEMALS Umschreibungen wie ae, oe, ue, ss.
Beispiele: "Gespraeche" -> "Gespräche", "Fuehrer" -> "Führer", "grosse" -> "große".
- Behalte Eigennamen (Personen, Orte, Organisationen) im Original.
- Headline kurz und buendig wie im Original.
- Content auf MAX {CONTENT_OUTPUT_MAX} Zeichen kuerzen, kein HTML, kein Markdown.
- Wenn der Artikel schon auf {lang_label} ist (z.B. source_lang="{output_lang}"),
kopiere headline und content unveraendert.
Antworte AUSSCHLIESSLICH mit einem flachen JSON-Array (kein Wrapper-Objekt!).
Format genau so:
[
{{"id": 1, "headline_de": "Titel auf Deutsch", "content_de": "Inhalt auf Deutsch"}},
{{"id": 2, "headline_de": "...", "content_de": "..."}}
]
NICHT erlaubt: {{"translations": [...]}} oder {{"items": [...]}} oder Markdown-Codefences.
Nur das Array, ohne Einleitung, ohne Erklaerung.
ARTIKEL:
{json.dumps(items, ensure_ascii=False, indent=2)}
"""
def _parse_response(text: str) -> list[dict]:
"""Robustes JSON-Array-Parsing.
Handhabt:
- reines JSON
- JSON in Markdown-Codefence ```json ... ```
- abgeschnittene Antworten (extrahiert vollstaendige Top-Level-Objekte)
"""
text = text.strip()
# Markdown-Codefence entfernen
if text.startswith("```"):
text = re.sub(r"^```(?:json)?\s*", "", text)
text = re.sub(r"\s*```\s*$", "", text)
text = text.strip()
try:
data = json.loads(text)
except json.JSONDecodeError:
# Erst Array versuchen
match = re.search(r"\[.*\]", text, re.DOTALL)
if match:
try:
data = json.loads(match.group(0))
except json.JSONDecodeError:
# Truncate-Fallback: einzelne Top-Level-Objekte extrahieren
data = _extract_complete_objects(text)
else:
data = _extract_complete_objects(text)
# Claude wraps das Array gelegentlich in {"translations": [...]} oder {"items": [...]}
if isinstance(data, dict):
for key in ("translations", "items", "results", "data"):
if isinstance(data.get(key), list):
data = data[key]
break
else:
# Einzelnes Objekt? Dann als Liste mit einem Element behandeln
if "id" in data:
data = [data]
else:
raise ValueError(f"Translator-Antwort: Dict ohne erwarteten Array-Key (keys={list(data.keys())[:5]})")
if not isinstance(data, list):
raise ValueError(f"Translator-Antwort ist kein Array: {type(data).__name__}")
cleaned = []
for item in data:
if not isinstance(item, dict):
continue
aid = item.get("id")
if not isinstance(aid, int):
try:
aid = int(aid)
except (TypeError, ValueError):
continue
cleaned.append({
"id": aid,
"headline_de": (item.get("headline_de") or "").strip() or None,
"content_de": (item.get("content_de") or "").strip() or None,
})
return cleaned
async def translate_articles_batch(
articles: list[dict],
output_lang: str = "de",
) -> tuple[list[dict], ClaudeUsage]:
"""Uebersetzt eine Batch von Artikeln.
Erwartet articles als Liste von Dicts mit den Feldern id, headline,
content_original, language.
Rueckgabe: (uebersetzte_artikel, usage)
Wenn der Call fehlschlaegt, wird ([], leere_usage) zurueckgegeben - der
Caller kann entscheiden, ob retry oder skip.
"""
if not articles:
return [], ClaudeUsage()
prompt = _build_prompt(articles, output_lang)
try:
result_text, usage = await call_claude(prompt, tools=None, model=CLAUDE_MODEL_FAST)
except Exception as e:
logger.error(f"Translator Claude-Call fehlgeschlagen: {e}")
return [], ClaudeUsage()
try:
translations = _parse_response(result_text)
except Exception as e:
logger.error(f"Translator JSON-Parsing fehlgeschlagen: {e}; raw: {result_text[:300]!r}")
return [], usage
# Validierung: nur Translations zurueckgeben, deren id wirklich
# in der angefragten Batch war
requested_ids = {a["id"] for a in articles}
valid = [t for t in translations if t["id"] in requested_ids]
if len(valid) != len(translations):
logger.warning(
"Translator: %d von %d Translations referenzieren unbekannte IDs",
len(translations) - len(valid), len(translations),
)
return valid, usage
# --- Pre-Topic-Filter: schmale Headline-Übersetzung -----------------------------
#
# Der Topic-Filter (analyzer.filter_relevant_articles) ist ein Haiku-Call, der pro
# Artikel beurteilt, ob er thematisch zur Lage passt. Bei fremdsprachigen Headlines
# (CJK/Arabisch/Hebräisch/Kyrillisch) bewertet Haiku konservativ und verwirft sie
# häufig, weil er sie nur halb versteht. Damit landeten z.B. die japanischen
# Ministeriums-Feeds (MOD, NHK, Asahi) in Lagen mit Japan-Bezug nie in der finalen
# Auswahl, obwohl der RSS-Match korrekt griff.
#
# Diese Funktion übersetzt einen einzelnen Batch-Call alle nicht-lateinischen
# Headlines + erste Content-Sätze ins Englische und hängt das Ergebnis als
# article["headline_en_for_topic"] / article["content_en_for_topic"] an. Der
# Topic-Filter zeigt das dem LLM zusätzlich zum Original.
#
# WICHTIG: Diese Mini-Übersetzung ist UNABHÄNGIG vom TRANSLATOR_ENABLED-Flag —
# sie wird auch dann gemacht, wenn der nachgelagerte Volltext-Translator
# deaktiviert ist (Pflicht für korrektes Topic-Filtering, sehr kleine Kosten).
_TOPIC_TRANSLATE_CONTENT_MAX = 500
def _needs_pretopic_translate(article: dict) -> bool:
"""Erkennt fremdsprachige Headlines, die für den Topic-Filter übersetzt
werden sollten.
Heuristik: Headline enthält Non-ASCII-Zeichen, die NICHT in den typischen
deutsch/franz./span./port./skand. Latin-1-Erweiterungen liegen.
Das sind v.a. CJK (Kanji/Kana/Hangul), Arabisch, Hebräisch, Kyrillisch,
Thai, Devanagari etc.
"""
headline = (article.get("headline_de") or article.get("headline") or "").strip()
if not headline:
return False
for ch in headline:
cp = ord(ch)
# Bereiche ausschließen, die in Latin-Schrift normal sind:
# ASCII (0-127), Latin-1 Supplement (128-255), Latin Extended-A/B (256-591)
if cp <= 591:
continue
# Alles darüber sind fremde Schriftsysteme → übersetzen
return True
return False
async def translate_headlines_for_topic_filter(
articles: list[dict],
target_lang: str = "en",
) -> tuple[int, ClaudeUsage]:
"""Übersetzt die Headlines fremdsprachiger Artikel ins Englische, damit der
nachgelagerte Topic-Filter (Haiku) sie zuverlässig beurteilen kann.
Setzt direkt auf den Artikel-Dicts:
article["headline_en_for_topic"]: str | None
article["content_en_for_topic"]: str | None
Returns:
(anzahl_übersetzt, ClaudeUsage)
"""
if not articles:
return 0, ClaudeUsage()
candidates = [a for a in articles if _needs_pretopic_translate(a)]
if not candidates:
return 0, ClaudeUsage()
# Eindeutige Indizes (auch wenn article kein "id"-Feld hat, weil noch nicht
# in der DB): wir nutzen die Position in der gesamten articles-Liste.
idx_by_obj = {id(a): i for i, a in enumerate(articles)}
items = []
for a in candidates:
idx = idx_by_obj.get(id(a))
if idx is None:
continue
headline = (a.get("headline_de") or a.get("headline") or "").strip()
content_src = (a.get("content_de") or a.get("content_original") or "")
items.append({
"i": idx,
"h": headline[:200],
"c": content_src[:_TOPIC_TRANSLATE_CONTENT_MAX],
})
if not items:
return 0, ClaudeUsage()
lang_label = {"en": "English", "de": "German"}.get(target_lang, target_lang)
prompt = f"""Translate these news headlines and short content snippets to {lang_label}.
Keep proper names (people, organizations, places) untouched. Keep it concise; the goal
is to let another model judge topical relevance, not to publish.
Return ONLY a JSON array. Each item: {{"i": <index>, "h": <headline in {lang_label}>, "c": <content snippet in {lang_label}>}}.
Keep the same "i" values. No prose, no markdown fences.
INPUT:
{json.dumps(items, ensure_ascii=False)}
"""
try:
result_text, usage = await call_claude(prompt, tools=None, model=CLAUDE_MODEL_FAST)
except Exception as e:
logger.warning(f"Pre-Topic-Translate Claude-Call fehlgeschlagen: {e}")
return 0, ClaudeUsage()
# Robustes Parsing (Markdown-Codefence + nacktes Array)
text = result_text.strip()
if text.startswith("```"):
text = re.sub(r"^```(?:json)?\s*", "", text)
text = re.sub(r"\s*```\s*$", "", text)
text = text.strip()
try:
data = json.loads(text)
except json.JSONDecodeError:
m = re.search(r"\[.*\]", text, re.DOTALL)
if not m:
logger.warning(
f"Pre-Topic-Translate: kein JSON-Array in Antwort. Sample: {text[:200]!r}"
)
return 0, usage
try:
data = json.loads(m.group(0))
except json.JSONDecodeError:
data = _extract_complete_objects(text)
if not isinstance(data, list):
logger.warning(
f"Pre-Topic-Translate: Antwort ist kein Array ({type(data).__name__})"
)
return 0, usage
applied = 0
for entry in data:
if not isinstance(entry, dict):
continue
idx = entry.get("i")
if not isinstance(idx, int) or not (0 <= idx < len(articles)):
try:
idx = int(idx)
if not (0 <= idx < len(articles)):
continue
except (TypeError, ValueError):
continue
h = (entry.get("h") or "").strip() or None
c = (entry.get("c") or "").strip() or None
if h:
articles[idx]["headline_en_for_topic"] = h
if c:
articles[idx]["content_en_for_topic"] = c
if h or c:
applied += 1
return applied, usage
async def translate_articles(
articles: list[dict],
output_lang: str = "de",
batch_size: int = DEFAULT_BATCH_SIZE,
usage_accumulator: UsageAccumulator | None = None,
enabled: bool | None = None,
) -> list[dict]:
"""Uebersetzt eine beliebige Anzahl Artikel in Batches.
Bringt die Batches durch Logik in `translate_articles_batch` und gibt
EINE flache Liste der Translations zurueck. Wenn ein Batch fehlschlaegt,
wird er uebersprungen (anderer Batches laufen weiter).
enabled: Pro-Aufruf-Override des globalen TRANSLATOR_ENABLED-Flags. Wenn None,
greift das Modul-Default (config.TRANSLATOR_ENABLED, abgeleitet aus .env).
Der Orchestrator setzt das aus dem Org-Setting 'translator_enabled', damit
jp_demo (Translator zwingend an) trotz global deaktiviertem Flag funktioniert.
"""
if not articles:
return []
is_enabled = TRANSLATOR_ENABLED if enabled is None else bool(enabled)
if not is_enabled:
logger.info(
"Translator deaktiviert (enabled=%s, global TRANSLATOR_ENABLED=%s), %d Artikel uebersprungen",
enabled, TRANSLATOR_ENABLED, len(articles),
)
return []
all_translations = []
for i in range(0, len(articles), batch_size):
batch = articles[i : i + batch_size]
translations, usage = await translate_articles_batch(batch, output_lang)
if usage_accumulator is not None:
usage_accumulator.add(usage)
all_translations.extend(translations)
logger.info(
"Translator-Batch %d/%d: %d/%d uebersetzt (cost=$%.4f)",
(i // batch_size) + 1,
(len(articles) + batch_size - 1) // batch_size,
len(translations), len(batch),
usage.cost_usd,
)
return all_translations

Datei anzeigen

@@ -10,7 +10,7 @@ def test_main_app_imports():
def test_all_routers_importable():
"""Bei Syntax-Fehlern in einem Router crasht das Ganze - hier fangen wir das ab."""
for mod in ("auth", "organizations", "licenses", "users",
"dashboard", "sources", "token_usage", "audit"):
"dashboard", "sources", "token_usage", "audit", "translation"):
m = importlib.import_module(f"routers.{mod}")
assert hasattr(m, "router"), f"routers/{mod} hat keinen router-Objekt"