Commits vergleichen
13 Commits
38ce26f0be
...
develop
| Autor | SHA1 | Datum | |
|---|---|---|---|
|
|
66176f357e | ||
|
|
d57b410dd6 | ||
|
|
ef2f638238 | ||
|
|
8b84447ad4 | ||
|
|
f32b8a8ec6 | ||
|
|
acac401034 | ||
|
|
46b2acfc36 | ||
|
|
68f0792440 | ||
|
|
1b3d6dbd57 | ||
|
|
e20b3de0fa | ||
|
|
d570e13dc6 | ||
|
|
7777b77abd | ||
|
|
952df87afa |
43
CLAUDE.md
43
CLAUDE.md
@@ -221,6 +221,49 @@ Changelog-Kategorien in TaskMate:
|
|||||||
- 35 = Changelog Website
|
- 35 = Changelog Website
|
||||||
- 36 = Changelog TaskMate
|
- 36 = Changelog TaskMate
|
||||||
|
|
||||||
|
## FIMI / Counter-Disinformation (Passiver Modus)
|
||||||
|
|
||||||
|
Abgleich von Monitor-Artikeln gegen den EUvsDisinfo-Falschbehauptungsbestand,
|
||||||
|
vollstaendig im Monitor (kein Vigil-Call). Zweistufig:
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
stufe_1_embedding_vorfilter:
|
||||||
|
modell: sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2 (384-dim)
|
||||||
|
service: src/services/embeddings.py (Lazy-Singleton, Modell-Cache ~/.cache/huggingface mit Vigil geteilt)
|
||||||
|
matcher: src/services/fimi_matcher.py (Claim-Embeddings als numpy-Matrix im RAM, Kosinus)
|
||||||
|
threshold: 0.55 Floor, 0.65 fuer Uebergabe an Stufe 2
|
||||||
|
zweck: thematisch nahe Kandidaten finden (hoher Recall)
|
||||||
|
stufe_2_llm_verifikation:
|
||||||
|
modell: CLAUDE_MODEL_FAST (Haiku), tools=None
|
||||||
|
zweck: "verbreitet die Behauptung" vs. "berichtet/widerlegt" trennen (Embedding ist themen-, nicht haltungssensitiv)
|
||||||
|
ergebnis: nur bestaetigte Verbreitungen werden gespeichert, inkl. woertlichem Zitat
|
||||||
|
env: FIMI_VERIFY_ENABLED (default true), FIMI_VERIFY_CONCURRENCY (default 4)
|
||||||
|
daten:
|
||||||
|
tabelle_claims: fimi_claims (id=Vigil-claim.id, embedding-BLOB, source_ref euvsdisinfo:<slug>, case_url)
|
||||||
|
tabelle_treffer: article_fimi_matches (article_id, fimi_claim_id, score, role, matched_text)
|
||||||
|
marker: articles.fimi_checked_at (verhindert Re-Encoding gepruefter Artikel)
|
||||||
|
import: scripts/import_fimi_claims.py (Sync aus vigil-data/vigil.db, idempotenter UPSERT)
|
||||||
|
pipeline:
|
||||||
|
hook: orchestrator nach dem Translator-Schritt, nur neue Artikel des Refreshes (match_article_ids)
|
||||||
|
endpoints:
|
||||||
|
GET /incidents/{id}/fimi-matches: Treffer pro Artikel inkl. Provenienz (Andockpunkt 1)
|
||||||
|
GET /incidents/{id}/fimi-summary: Aggregat fuers Lagebild (Andockpunkt 3)
|
||||||
|
sources-summary: fimi_match_count pro Quelle (Andockpunkt 2)
|
||||||
|
frontend:
|
||||||
|
andockpunkt_1: dezenter Inline-Hinweis am Artikel (Quellen-Detailliste)
|
||||||
|
andockpunkt_2: Track-Record-Badge pro Quelle
|
||||||
|
andockpunkt_3: Qualitaetsleiste ueber dem Lagebild + aufklappbare Top-Narrative
|
||||||
|
rechtslage_euvsdisinfo:
|
||||||
|
quelle: EUvsDisinfo, Projekt des EEAS (East StratCom Task Force)
|
||||||
|
lizenz: Forschungsdatensatz CC BY-SA 4.0; EU-Inhalte mit Quellenangabe weiterverwendbar
|
||||||
|
pflichten: Attribution (Quelle + Case-Link), keine Verfaelschung, Disclaimer "keine offizielle EU-Position"
|
||||||
|
disclaimer_ort: Fusszeile der FIMI-Qualitaetsleiste (UI.fimiDisclaimerHtml) + Tooltip der Einzeltreffer
|
||||||
|
provenienz_leitplanke: Monitor wertet nie selbst, zeigt nur was EUvsDisinfo als widerlegt fuehrt
|
||||||
|
offene_punkte:
|
||||||
|
- Verifizierer-Prompt feinjustieren (seltene FP bei serioesen Medien, die ueber eine Aussage berichten)
|
||||||
|
- Per-Satz-Extraktion (Vigil Phase 2) als Praezisionsstufe optional nachruestbar
|
||||||
|
```
|
||||||
|
|
||||||
## Staging-Umgebung
|
## Staging-Umgebung
|
||||||
|
|
||||||
```yaml
|
```yaml
|
||||||
|
|||||||
@@ -23,3 +23,10 @@ pdfplumber>=0.11
|
|||||||
pytesseract>=0.3
|
pytesseract>=0.3
|
||||||
pdf2image>=1.17
|
pdf2image>=1.17
|
||||||
Pillow>=10.0
|
Pillow>=10.0
|
||||||
|
# FIMI / Counter-Disinformation: Embedding-Match gegen EUvsDisinfo-Falschbehauptungen
|
||||||
|
# (services/embeddings.py, services/fimi_matcher.py). Modell-Cache wird mit Vigil
|
||||||
|
# geteilt (~/.cache/huggingface). Versionen wie Vigil-venv fuer Kompatibilitaet.
|
||||||
|
torch==2.12.0
|
||||||
|
sentence-transformers==3.4.1
|
||||||
|
transformers==4.57.6
|
||||||
|
numpy==2.4.5
|
||||||
|
|||||||
97
scripts/backfill_fimi.py
Ausführbare Datei
97
scripts/backfill_fimi.py
Ausführbare Datei
@@ -0,0 +1,97 @@
|
|||||||
|
#!/usr/bin/env python3
|
||||||
|
"""Backfill: alle noch ungeprueften Artikel gegen den Falschbehauptungsbestand
|
||||||
|
abgleichen (Embedding-Vorfilter + LLM-Verifikation).
|
||||||
|
|
||||||
|
Geht alle Lagen mit ungeprueften Artikeln durch, kleine zuerst (schnelle,
|
||||||
|
frueh testbare Ergebnisse), grosse zuletzt. Pro Lage in Batches, damit die
|
||||||
|
Score-Matrix (Artikel x Claims) den RAM nicht sprengt. Robust: Fehler
|
||||||
|
einzelner Batches stoppen den Lauf nicht; bei Artikeln, die wiederholt
|
||||||
|
scheitern (kein Fortschritt), wird die Lage abgebrochen statt endlos zu
|
||||||
|
schleifen.
|
||||||
|
|
||||||
|
Aufruf (im Staging-Verzeichnis, mit dessen venv):
|
||||||
|
HF_HUB_OFFLINE=1 TRANSFORMERS_OFFLINE=1 FIMI_VERIFY_CONCURRENCY=5 \
|
||||||
|
./venv/bin/python scripts/backfill_fimi.py
|
||||||
|
"""
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
import asyncio
|
||||||
|
import os
|
||||||
|
import sys
|
||||||
|
import time
|
||||||
|
|
||||||
|
sys.path.insert(0, "src")
|
||||||
|
|
||||||
|
import aiosqlite
|
||||||
|
from services import fimi_matcher
|
||||||
|
|
||||||
|
# Wie config.py: DB_PATH-Env hat Vorrang (Staging-Service nutzt eine eigene
|
||||||
|
# DB ausserhalb des Repos). Sonst der Repo-Default.
|
||||||
|
DB_PATH = os.environ.get("DB_PATH") or "data/osint.db"
|
||||||
|
BATCH = 120
|
||||||
|
|
||||||
|
|
||||||
|
def _ts() -> str:
|
||||||
|
return time.strftime("%H:%M:%S")
|
||||||
|
|
||||||
|
|
||||||
|
async def main() -> None:
|
||||||
|
db = await aiosqlite.connect(DB_PATH)
|
||||||
|
db.row_factory = aiosqlite.Row
|
||||||
|
t0 = time.time()
|
||||||
|
n_claims = await fimi_matcher.ensure_matrix(db)
|
||||||
|
print(f"[{_ts()}] Matrix: {n_claims} Claims geladen", flush=True)
|
||||||
|
|
||||||
|
cursor = await db.execute(
|
||||||
|
"""SELECT incident_id, COUNT(*) AS n
|
||||||
|
FROM articles WHERE fimi_checked_at IS NULL AND incident_id IS NOT NULL
|
||||||
|
GROUP BY incident_id ORDER BY n"""
|
||||||
|
)
|
||||||
|
incidents = [(r["incident_id"], r["n"]) for r in await cursor.fetchall()]
|
||||||
|
total = sum(n for _, n in incidents)
|
||||||
|
print(f"[{_ts()}] START: {len(incidents)} Lagen, {total} ungepruefte Artikel", flush=True)
|
||||||
|
|
||||||
|
grand = {"articles": 0, "candidates": 0, "articles_with_match": 0, "stored": 0, "errors": 0}
|
||||||
|
for iid, n in incidents:
|
||||||
|
done = 0
|
||||||
|
prev_remaining = None
|
||||||
|
while True:
|
||||||
|
res = await fimi_matcher.match_incident_articles(
|
||||||
|
db, iid, only_unchecked=True, limit=BATCH
|
||||||
|
)
|
||||||
|
if res["articles"] == 0:
|
||||||
|
break
|
||||||
|
done += res["articles"]
|
||||||
|
for k in grand:
|
||||||
|
grand[k] += res.get(k, 0)
|
||||||
|
|
||||||
|
cur = await db.execute(
|
||||||
|
"SELECT COUNT(*) FROM articles WHERE incident_id = ? AND fimi_checked_at IS NULL",
|
||||||
|
(iid,),
|
||||||
|
)
|
||||||
|
remaining = (await cur.fetchone())[0]
|
||||||
|
print(
|
||||||
|
f"[{_ts()}] Lage {iid}: +{res['articles']} ({done}/{n}), "
|
||||||
|
f"Treffer {res['articles_with_match']}, Fehler {res['errors']}, "
|
||||||
|
f"verbleibend {remaining}",
|
||||||
|
flush=True,
|
||||||
|
)
|
||||||
|
if remaining == 0:
|
||||||
|
break
|
||||||
|
if prev_remaining is not None and remaining >= prev_remaining:
|
||||||
|
print(
|
||||||
|
f"[{_ts()}] Lage {iid}: kein Fortschritt (verbleibend {remaining}), "
|
||||||
|
f"Abbruch wegen wiederholt fehlschlagender Artikel",
|
||||||
|
flush=True,
|
||||||
|
)
|
||||||
|
break
|
||||||
|
prev_remaining = remaining
|
||||||
|
print(f"[{_ts()}] == Lage {iid} fertig: {done} Artikel verarbeitet ==", flush=True)
|
||||||
|
|
||||||
|
await db.close()
|
||||||
|
dt = time.time() - t0
|
||||||
|
print(f"[{_ts()}] FERTIG nach {dt/60:.1f} min: {grand}", flush=True)
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
asyncio.run(main())
|
||||||
116
scripts/import_fimi_claims.py
Ausführbare Datei
116
scripts/import_fimi_claims.py
Ausführbare Datei
@@ -0,0 +1,116 @@
|
|||||||
|
#!/usr/bin/env python3
|
||||||
|
"""Einmal-/Sync-Import des EUvsDisinfo-Falschbehauptungsbestands in den Monitor.
|
||||||
|
|
||||||
|
Kopiert die Claims (Text, Verdict, Widerlegung, Quell-Referenz, Embedding-BLOB)
|
||||||
|
aus der Vigil-Datenbank in die Monitor-Tabelle fimi_claims. Die Embeddings
|
||||||
|
werden als BLOB 1:1 uebernommen (384-dim float32, L2-normalisiert) und im
|
||||||
|
Monitor mit demselben Modell (paraphrase-multilingual-MiniLM-L12-v2) gematcht.
|
||||||
|
|
||||||
|
Idempotent: UPSERT auf der stabilen Vigil-claim.id. Bestehende Treffer in
|
||||||
|
article_fimi_matches bleiben dadurch gueltig.
|
||||||
|
|
||||||
|
Aufruf (Staging):
|
||||||
|
python scripts/import_fimi_claims.py \
|
||||||
|
--vigil-db /home/claude-dev/vigil-data/vigil.db \
|
||||||
|
--osint-db /home/claude-dev/AegisSight-Monitor-staging/data/osint.db
|
||||||
|
"""
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
import argparse
|
||||||
|
import sqlite3
|
||||||
|
import sys
|
||||||
|
|
||||||
|
EUVSDISINFO_REPORT_BASE = "https://euvsdisinfo.eu/report/"
|
||||||
|
|
||||||
|
|
||||||
|
def case_url_from_source_ref(source_ref: str | None) -> str | None:
|
||||||
|
"""Leitet die EUvsDisinfo-Case-URL aus 'euvsdisinfo:<slug>' ab."""
|
||||||
|
if not source_ref:
|
||||||
|
return None
|
||||||
|
prefix = "euvsdisinfo:"
|
||||||
|
if source_ref.startswith(prefix):
|
||||||
|
slug = source_ref[len(prefix):].strip().strip("/")
|
||||||
|
if slug:
|
||||||
|
return f"{EUVSDISINFO_REPORT_BASE}{slug}/"
|
||||||
|
return None
|
||||||
|
|
||||||
|
|
||||||
|
def main() -> int:
|
||||||
|
ap = argparse.ArgumentParser(description=__doc__)
|
||||||
|
ap.add_argument("--vigil-db", required=True, help="Pfad zur Vigil-SQLite-DB (Quelle)")
|
||||||
|
ap.add_argument("--osint-db", required=True, help="Pfad zur Monitor-SQLite-DB (Ziel)")
|
||||||
|
ap.add_argument("--limit", type=int, default=0, help="Optional: nur N Claims importieren (Test)")
|
||||||
|
args = ap.parse_args()
|
||||||
|
|
||||||
|
src = sqlite3.connect(args.vigil_db)
|
||||||
|
src.row_factory = sqlite3.Row
|
||||||
|
q = (
|
||||||
|
"SELECT id, text, text_normalized, language, verdict, verdict_summary, "
|
||||||
|
"source_id, embedding, first_seen_at FROM claims WHERE embedding IS NOT NULL"
|
||||||
|
)
|
||||||
|
if args.limit:
|
||||||
|
q += f" LIMIT {int(args.limit)}"
|
||||||
|
rows = src.execute(q).fetchall()
|
||||||
|
src.close()
|
||||||
|
print(f"Vigil: {len(rows)} Claims mit Embedding gelesen", flush=True)
|
||||||
|
|
||||||
|
dst = sqlite3.connect(args.osint_db)
|
||||||
|
dst.execute("PRAGMA busy_timeout=10000")
|
||||||
|
|
||||||
|
# Sicherstellen, dass die Zieltabelle existiert (falls Skript vor init_db laeuft)
|
||||||
|
dst.execute(
|
||||||
|
"""CREATE TABLE IF NOT EXISTS fimi_claims (
|
||||||
|
id INTEGER PRIMARY KEY,
|
||||||
|
text TEXT NOT NULL,
|
||||||
|
text_normalized TEXT,
|
||||||
|
language TEXT,
|
||||||
|
verdict TEXT NOT NULL DEFAULT 'false',
|
||||||
|
verdict_summary TEXT,
|
||||||
|
source_ref TEXT,
|
||||||
|
case_url TEXT,
|
||||||
|
embedding BLOB,
|
||||||
|
first_seen_at TIMESTAMP,
|
||||||
|
imported_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
|
||||||
|
)"""
|
||||||
|
)
|
||||||
|
dst.execute("CREATE INDEX IF NOT EXISTS idx_fimi_claims_source_ref ON fimi_claims(source_ref)")
|
||||||
|
|
||||||
|
inserted = 0
|
||||||
|
with_url = 0
|
||||||
|
for r in rows:
|
||||||
|
case_url = case_url_from_source_ref(r["source_id"])
|
||||||
|
if case_url:
|
||||||
|
with_url += 1
|
||||||
|
dst.execute(
|
||||||
|
"""INSERT INTO fimi_claims
|
||||||
|
(id, text, text_normalized, language, verdict, verdict_summary,
|
||||||
|
source_ref, case_url, embedding, first_seen_at, imported_at)
|
||||||
|
VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, CURRENT_TIMESTAMP)
|
||||||
|
ON CONFLICT(id) DO UPDATE SET
|
||||||
|
text=excluded.text,
|
||||||
|
text_normalized=excluded.text_normalized,
|
||||||
|
language=excluded.language,
|
||||||
|
verdict=excluded.verdict,
|
||||||
|
verdict_summary=excluded.verdict_summary,
|
||||||
|
source_ref=excluded.source_ref,
|
||||||
|
case_url=excluded.case_url,
|
||||||
|
embedding=excluded.embedding,
|
||||||
|
first_seen_at=excluded.first_seen_at,
|
||||||
|
imported_at=CURRENT_TIMESTAMP""",
|
||||||
|
(
|
||||||
|
r["id"], r["text"], r["text_normalized"], r["language"],
|
||||||
|
r["verdict"] or "false", r["verdict_summary"], r["source_id"],
|
||||||
|
case_url, r["embedding"], r["first_seen_at"],
|
||||||
|
),
|
||||||
|
)
|
||||||
|
inserted += 1
|
||||||
|
dst.commit()
|
||||||
|
total = dst.execute("SELECT COUNT(*) FROM fimi_claims").fetchone()[0]
|
||||||
|
dst.close()
|
||||||
|
print(f"Monitor: {inserted} Claims upserted ({with_url} mit Case-URL), "
|
||||||
|
f"fimi_claims enthaelt jetzt {total} Eintraege", flush=True)
|
||||||
|
return 0
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
sys.exit(main())
|
||||||
@@ -124,7 +124,7 @@ BISHERIGE QUELLEN:
|
|||||||
AUFTRAG:
|
AUFTRAG:
|
||||||
1. Aktualisiere das Lagebild basierend auf den neuen Meldungen. Das Lagebild soll so ausführlich wie nötig sein, um alle wesentlichen Themenstränge abzudecken
|
1. Aktualisiere das Lagebild basierend auf den neuen Meldungen. Das Lagebild soll so ausführlich wie nötig sein, um alle wesentlichen Themenstränge abzudecken
|
||||||
2. Behalte bestätigte Fakten aus dem bisherigen Lagebild bei
|
2. Behalte bestätigte Fakten aus dem bisherigen Lagebild bei
|
||||||
3. Ergänze neue Erkenntnisse und markiere wichtige neue Entwicklungen
|
3. Arbeite neue Erkenntnisse direkt in den thematisch passenden Abschnitt ein. Erzeuge KEINE datierten Verlaufsblöcke wie "Neu am DD.MM." oder "Neu seit ...". Das Lagebild ist eine zusammenhängende thematische Darstellung des AKTUELLEN Stands, kein chronologisches Änderungsprotokoll. Die zeitliche Abfolge der jüngsten Ereignisse wird separat in der Kachel "Neueste Entwicklungen" gepflegt und darf hier NICHT als Datums-Changelog dupliziert werden
|
||||||
4. Aktualisiere die Quellenverweise — neue Quellen bekommen fortlaufende Nummern nach den bisherigen
|
4. Aktualisiere die Quellenverweise — neue Quellen bekommen fortlaufende Nummern nach den bisherigen
|
||||||
5. Entferne nur nachweislich widerlegte Informationen. Behalte alle thematischen Abschnitte bei, auch wenn sie nicht durch neue Meldungen aktualisiert werden
|
5. Entferne nur nachweislich widerlegte Informationen. Behalte alle thematischen Abschnitte bei, auch wenn sie nicht durch neue Meldungen aktualisiert werden
|
||||||
|
|
||||||
@@ -133,6 +133,8 @@ STRUKTUR:
|
|||||||
- Wenn sich Daten strukturiert vergleichen lassen (z.B. Produkte, Unternehmen, Kennzahlen, Modelle), verwende eine Markdown-Tabelle (| Spalte1 | Spalte2 | ... mit Trennzeile |---|---|)
|
- Wenn sich Daten strukturiert vergleichen lassen (z.B. Produkte, Unternehmen, Kennzahlen, Modelle), verwende eine Markdown-Tabelle (| Spalte1 | Spalte2 | ... mit Trennzeile |---|---|)
|
||||||
- KEIN Fettdruck (**) verwenden
|
- KEIN Fettdruck (**) verwenden
|
||||||
- ERZEUGE KEINE Sektion "## ZUSAMMENFASSUNG", "## ÜBERBLICK" oder "## KERNPUNKTE". Falls das BISHERIGE LAGEBILD eine solche Sektion enthält, ENTFERNE sie vollständig beim Aktualisieren. Die neuesten Entwicklungen werden separat als eigene Kachel gepflegt und dürfen im Lagebild NICHT dupliziert werden.
|
- ERZEUGE KEINE Sektion "## ZUSAMMENFASSUNG", "## ÜBERBLICK" oder "## KERNPUNKTE". Falls das BISHERIGE LAGEBILD eine solche Sektion enthält, ENTFERNE sie vollständig beim Aktualisieren. Die neuesten Entwicklungen werden separat als eigene Kachel gepflegt und dürfen im Lagebild NICHT dupliziert werden.
|
||||||
|
- KEINE datierten Verlaufsmarker im Lagebild. Einleitungen wie "Neu am 31.05./01.06.:", "Neu seit gestern:" oder vergleichbare Datums-Changelog-Phrasen sind nicht erlaubt. Falls das BISHERIGE LAGEBILD solche Blöcke enthält, LÖSE SIE AUF: integriere ihren Inhalt in den thematisch passenden Abschnitt und ENTFERNE die "Neu am"-Einleitung samt reiner Datumsgruppierung restlos. Innerhalb eines Abschnitts steht der aktuelle Stand vorne, ältere Belege werden im Fließtext zeitlich eingeordnet (z.B. "Ende Mai berichtete ...").
|
||||||
|
- KEINE stichwortartigen Fragmente und KEINE blanken Quellennummern-Sammlungen. Verboten sind Telegramm-Verkürzungen wie "Teheran-Bluff-Vorwurf [2897]. NYT-Abraham-Accords [2890]." sowie Auffangblöcke ohne Aussage wie "Frühere Belege [2806][2807]...". Jede Quellennummer muss an einem vollständigen, eigenständigen Satz hängen. Falls das BISHERIGE LAGEBILD solche Fragment- oder Sammelblöcke enthält, formuliere sie zu vollständigen Sätzen aus oder lass die betreffende Quellennummer weg. Am Ende eines Abschnitts oder des Lagebildes darf KEINE reine Aufzählung von Quellennummern stehen.
|
||||||
|
|
||||||
REGELN:
|
REGELN:
|
||||||
- Neutral und sachlich - keine Wertungen oder Spekulationen
|
- Neutral und sachlich - keine Wertungen oder Spekulationen
|
||||||
|
|||||||
@@ -1753,6 +1753,7 @@ class AgentOrchestrator:
|
|||||||
# Idempotent: nur Artikel ohne headline_de/content_de werden geholt.
|
# Idempotent: nur Artikel ohne headline_de/content_de werden geholt.
|
||||||
# Lauft nach der Analyse (Lagebild ist schon committed) und vor QC
|
# Lauft nach der Analyse (Lagebild ist schon committed) und vor QC
|
||||||
# (damit normalize_umlaut_articles auch die frischen DE-Texte fasst).
|
# (damit normalize_umlaut_articles auch die frischen DE-Texte fasst).
|
||||||
|
_translate_step_started = False
|
||||||
try:
|
try:
|
||||||
tr_cursor = await db.execute(
|
tr_cursor = await db.execute(
|
||||||
"""SELECT id, headline, content_original, language
|
"""SELECT id, headline, content_original, language
|
||||||
@@ -1764,7 +1765,10 @@ class AgentOrchestrator:
|
|||||||
(incident_id,),
|
(incident_id,),
|
||||||
)
|
)
|
||||||
pending_translations = [dict(r) for r in await tr_cursor.fetchall()]
|
pending_translations = [dict(r) for r in await tr_cursor.fetchall()]
|
||||||
if pending_translations:
|
if pending_translations and translator_enabled:
|
||||||
|
# Pipeline-Schritt 9: Artikel uebersetzen (nur sichtbar wenn was zu uebersetzen)
|
||||||
|
await _pipe_start("translate")
|
||||||
|
_translate_step_started = True
|
||||||
logger.info(
|
logger.info(
|
||||||
"Translator fuer Incident %d: %d Artikel ohne DE-Uebersetzung",
|
"Translator fuer Incident %d: %d Artikel ohne DE-Uebersetzung",
|
||||||
incident_id, len(pending_translations),
|
incident_id, len(pending_translations),
|
||||||
@@ -1795,10 +1799,44 @@ class AgentOrchestrator:
|
|||||||
"Translator fuer Incident %d: %d/%d Artikel uebersetzt",
|
"Translator fuer Incident %d: %d/%d Artikel uebersetzt",
|
||||||
incident_id, len(translations), len(pending_translations),
|
incident_id, len(translations), len(pending_translations),
|
||||||
)
|
)
|
||||||
|
await _pipe_done("translate", count_value=len(translations), count_secondary=len(pending_translations))
|
||||||
except Exception as e:
|
except Exception as e:
|
||||||
logger.error("Translator-Fehler fuer Incident %d: %s", incident_id, e, exc_info=True)
|
logger.error("Translator-Fehler fuer Incident %d: %s", incident_id, e, exc_info=True)
|
||||||
|
if _translate_step_started:
|
||||||
|
await _pipe_done("translate", count_value=0, count_secondary=0)
|
||||||
# Refresh trotz Translator-Fehler weiterlaufen lassen
|
# Refresh trotz Translator-Fehler weiterlaufen lassen
|
||||||
|
|
||||||
|
# --- FIMI: Abgleich gegen den EUvsDisinfo-Falschbehauptungsbestand ---
|
||||||
|
# Nur die in diesem Refresh neu hinzugekommenen Artikel (per ID), nach
|
||||||
|
# dem Translator, damit auch fremdsprachige Artikel ihren DE-Text fuer
|
||||||
|
# die LLM-Verifikation haben. Fehler duerfen den Refresh nicht brechen.
|
||||||
|
try:
|
||||||
|
_fimi_ids = [a.get("id") for a in new_articles_for_analysis if a.get("id")]
|
||||||
|
if _fimi_ids:
|
||||||
|
from services import fimi_matcher
|
||||||
|
await _pipe_start("fimi")
|
||||||
|
_fimi_res = await fimi_matcher.match_article_ids(db, _fimi_ids)
|
||||||
|
await _pipe_done(
|
||||||
|
"fimi",
|
||||||
|
count_value=_fimi_res.get("articles_with_match", 0),
|
||||||
|
count_secondary=_fimi_res.get("candidates", 0),
|
||||||
|
)
|
||||||
|
logger.info(
|
||||||
|
"FIMI-Abgleich Incident %d: %d Artikel, %d Kandidaten, "
|
||||||
|
"%d verbreiten Falschbehauptungen, %d Links",
|
||||||
|
incident_id, _fimi_res.get("articles", 0),
|
||||||
|
_fimi_res.get("candidates", 0),
|
||||||
|
_fimi_res.get("articles_with_match", 0),
|
||||||
|
_fimi_res.get("stored", 0),
|
||||||
|
)
|
||||||
|
except Exception as e:
|
||||||
|
logger.warning("FIMI-Abgleich fehlgeschlagen fuer Incident %d: %s",
|
||||||
|
incident_id, e, exc_info=True)
|
||||||
|
try:
|
||||||
|
await _pipe_done("fimi", count_value=0, count_secondary=0)
|
||||||
|
except Exception:
|
||||||
|
pass
|
||||||
|
|
||||||
# --- Neueste Entwicklungen (nur Live-Monitoring / adhoc) ---
|
# --- Neueste Entwicklungen (nur Live-Monitoring / adhoc) ---
|
||||||
# Basis ist jetzt das frisch generierte Lagebild (autoritativ, thematisch sauber).
|
# Basis ist jetzt das frisch generierte Lagebild (autoritativ, thematisch sauber).
|
||||||
# Zeitstempel und Quellen kommen aus den jüngsten belegenden Artikeln.
|
# Zeitstempel und Quellen kommen aus den jüngsten belegenden Artikeln.
|
||||||
|
|||||||
@@ -355,6 +355,41 @@ CREATE TABLE IF NOT EXISTS organization_settings (
|
|||||||
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
|
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
|
||||||
UNIQUE(organization_id, key)
|
UNIQUE(organization_id, key)
|
||||||
);
|
);
|
||||||
|
|
||||||
|
-- FIMI / Counter-Disinformation: importierter Falschbehauptungs-Bestand
|
||||||
|
-- (EUvsDisinfo). Read-only Referenz, befuellt per scripts/import_fimi_claims.py.
|
||||||
|
-- Die id entspricht der Vigil-claim.id (stabil fuer Re-Sync via UPSERT).
|
||||||
|
CREATE TABLE IF NOT EXISTS fimi_claims (
|
||||||
|
id INTEGER PRIMARY KEY,
|
||||||
|
text TEXT NOT NULL,
|
||||||
|
text_normalized TEXT,
|
||||||
|
language TEXT,
|
||||||
|
verdict TEXT NOT NULL DEFAULT 'false',
|
||||||
|
verdict_summary TEXT,
|
||||||
|
source_ref TEXT,
|
||||||
|
case_url TEXT,
|
||||||
|
embedding BLOB,
|
||||||
|
first_seen_at TIMESTAMP,
|
||||||
|
imported_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
|
||||||
|
);
|
||||||
|
CREATE INDEX IF NOT EXISTS idx_fimi_claims_source_ref ON fimi_claims(source_ref);
|
||||||
|
|
||||||
|
-- FIMI: Treffer zwischen Monitor-Artikeln und Falschbehauptungen.
|
||||||
|
-- Bewusst KEIN harter FK auf fimi_claims, damit ein Claim-Re-Sync die
|
||||||
|
-- bestehenden Treffer nicht kaskadierend loescht.
|
||||||
|
CREATE TABLE IF NOT EXISTS article_fimi_matches (
|
||||||
|
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
||||||
|
article_id INTEGER NOT NULL REFERENCES articles(id) ON DELETE CASCADE,
|
||||||
|
fimi_claim_id INTEGER NOT NULL,
|
||||||
|
score REAL NOT NULL,
|
||||||
|
role TEXT DEFAULT 'match',
|
||||||
|
matched_text TEXT,
|
||||||
|
matched_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
|
||||||
|
tenant_id INTEGER REFERENCES organizations(id),
|
||||||
|
UNIQUE(article_id, fimi_claim_id)
|
||||||
|
);
|
||||||
|
CREATE INDEX IF NOT EXISTS idx_afm_article ON article_fimi_matches(article_id);
|
||||||
|
CREATE INDEX IF NOT EXISTS idx_afm_claim ON article_fimi_matches(fimi_claim_id);
|
||||||
"""
|
"""
|
||||||
|
|
||||||
|
|
||||||
@@ -606,6 +641,14 @@ async def init_db():
|
|||||||
await db.execute("ALTER TABLE articles ADD COLUMN tenant_id INTEGER REFERENCES organizations(id)")
|
await db.execute("ALTER TABLE articles ADD COLUMN tenant_id INTEGER REFERENCES organizations(id)")
|
||||||
await db.commit()
|
await db.commit()
|
||||||
|
|
||||||
|
# Migration: FIMI-Match-Marker fuer articles (wann zuletzt gegen den
|
||||||
|
# Falschbehauptungs-Bestand geprueft; verhindert Re-Encoding bereits
|
||||||
|
# gepruefter Artikel bei jedem Refresh)
|
||||||
|
if "fimi_checked_at" not in art_columns:
|
||||||
|
await db.execute("ALTER TABLE articles ADD COLUMN fimi_checked_at TIMESTAMP")
|
||||||
|
await db.commit()
|
||||||
|
logger.info("Migration: fimi_checked_at zu articles hinzugefuegt")
|
||||||
|
|
||||||
# Migration: tenant_id fuer fact_checks
|
# Migration: tenant_id fuer fact_checks
|
||||||
cursor = await db.execute("PRAGMA table_info(fact_checks)")
|
cursor = await db.execute("PRAGMA table_info(fact_checks)")
|
||||||
fc_columns = [row[1] for row in await cursor.fetchall()]
|
fc_columns = [row[1] for row in await cursor.fetchall()]
|
||||||
|
|||||||
49
src/main.py
49
src/main.py
@@ -246,7 +246,14 @@ async def cleanup_expired():
|
|||||||
)
|
)
|
||||||
logger.info(f"Lage {incident['id']} archiviert (Aufbewahrung abgelaufen)")
|
logger.info(f"Lage {incident['id']} archiviert (Aufbewahrung abgelaufen)")
|
||||||
|
|
||||||
# Verwaiste running-Einträge bereinigen (> 15 Minuten ohne Abschluss)
|
# Verwaiste running-Einträge bereinigen.
|
||||||
|
# Pruefen auf Pipeline-Fortschritt: legitime Long-Runner (z.B. Translator
|
||||||
|
# nach summary fuer jp_demo mit 200+ Artikeln ~20 Min) duerfen nicht
|
||||||
|
# vorzeitig gekillt werden. Ein Refresh gilt als verwaist, wenn entweder
|
||||||
|
# (a) seit ORPHAN_IDLE_LIMIT Min kein Pipeline-Step Fortschritt zeigte,
|
||||||
|
# oder (b) das harte Limit ORPHAN_HARD_LIMIT Min ueberschritten wurde.
|
||||||
|
ORPHAN_IDLE_LIMIT = 60
|
||||||
|
ORPHAN_HARD_LIMIT = 120
|
||||||
cursor = await db.execute(
|
cursor = await db.execute(
|
||||||
"SELECT id, incident_id, started_at FROM refresh_log WHERE status = 'running'"
|
"SELECT id, incident_id, started_at FROM refresh_log WHERE status = 'running'"
|
||||||
)
|
)
|
||||||
@@ -258,12 +265,46 @@ async def cleanup_expired():
|
|||||||
else:
|
else:
|
||||||
started = started.astimezone(TIMEZONE)
|
started = started.astimezone(TIMEZONE)
|
||||||
age_minutes = (now - started).total_seconds() / 60
|
age_minutes = (now - started).total_seconds() / 60
|
||||||
if age_minutes >= 15:
|
if age_minutes < ORPHAN_IDLE_LIMIT:
|
||||||
|
continue
|
||||||
|
|
||||||
|
# Letzter Pipeline-Step-Fortschritt (Start ODER Ende)
|
||||||
|
prog_cursor = await db.execute(
|
||||||
|
"""SELECT MAX(COALESCE(completed_at, started_at)) AS last_activity
|
||||||
|
FROM refresh_pipeline_steps WHERE refresh_log_id = ?""",
|
||||||
|
(orphan["id"],),
|
||||||
|
)
|
||||||
|
prog_row = await prog_cursor.fetchone()
|
||||||
|
last_activity_str = prog_row["last_activity"] if prog_row else None
|
||||||
|
|
||||||
|
is_orphan = False
|
||||||
|
reason = None
|
||||||
|
if age_minutes >= ORPHAN_HARD_LIMIT:
|
||||||
|
is_orphan = True
|
||||||
|
reason = f"Verwaist (>{int(age_minutes)} Min, hartes Limit {ORPHAN_HARD_LIMIT} Min)"
|
||||||
|
elif last_activity_str:
|
||||||
|
last_activity = datetime.fromisoformat(last_activity_str)
|
||||||
|
if last_activity.tzinfo is None:
|
||||||
|
last_activity = last_activity.replace(tzinfo=TIMEZONE)
|
||||||
|
else:
|
||||||
|
last_activity = last_activity.astimezone(TIMEZONE)
|
||||||
|
idle_minutes = (now - last_activity).total_seconds() / 60
|
||||||
|
if idle_minutes >= ORPHAN_IDLE_LIMIT:
|
||||||
|
is_orphan = True
|
||||||
|
reason = (
|
||||||
|
f"Verwaist (kein Pipeline-Fortschritt seit {int(idle_minutes)} Min, "
|
||||||
|
f"gesamt {int(age_minutes)} Min)"
|
||||||
|
)
|
||||||
|
else:
|
||||||
|
is_orphan = True
|
||||||
|
reason = f"Verwaist (keine Pipeline-Schritte nach {int(age_minutes)} Min)"
|
||||||
|
|
||||||
|
if is_orphan:
|
||||||
await db.execute(
|
await db.execute(
|
||||||
"UPDATE refresh_log SET status = 'error', completed_at = ?, error_message = ? WHERE id = ?",
|
"UPDATE refresh_log SET status = 'error', completed_at = ?, error_message = ? WHERE id = ?",
|
||||||
(now.strftime('%Y-%m-%d %H:%M:%S'), f"Verwaist (>{int(age_minutes)} Min ohne Abschluss, automatisch bereinigt)", orphan["id"]),
|
(now.strftime('%Y-%m-%d %H:%M:%S'), reason, orphan["id"]),
|
||||||
)
|
)
|
||||||
logger.warning(f"Verwaisten Refresh #{orphan['id']} für Lage {orphan['incident_id']} bereinigt ({int(age_minutes)} Min)")
|
logger.warning(f"Verwaisten Refresh #{orphan['id']} fuer Lage {orphan['incident_id']} bereinigt: {reason}")
|
||||||
|
|
||||||
# Alte Notifications bereinigen (> 7 Tage)
|
# Alte Notifications bereinigen (> 7 Tage)
|
||||||
await db.execute("DELETE FROM notifications WHERE created_at < datetime('now', '-7 days')")
|
await db.execute("DELETE FROM notifications WHERE created_at < datetime('now', '-7 days')")
|
||||||
|
|||||||
@@ -495,11 +495,14 @@ async def get_articles_sources_summary(
|
|||||||
tenant_id = current_user.get("tenant_id")
|
tenant_id = current_user.get("tenant_id")
|
||||||
await _check_incident_access(db, incident_id, current_user["id"], tenant_id)
|
await _check_incident_access(db, incident_id, current_user["id"], tenant_id)
|
||||||
cursor = await db.execute(
|
cursor = await db.execute(
|
||||||
"""SELECT source,
|
"""SELECT a.source,
|
||||||
COUNT(*) AS article_count,
|
COUNT(*) AS article_count,
|
||||||
GROUP_CONCAT(DISTINCT COALESCE(language,'de')) AS languages
|
GROUP_CONCAT(DISTINCT COALESCE(a.language,'de')) AS languages,
|
||||||
FROM articles WHERE incident_id = ?
|
COUNT(DISTINCT m.article_id) AS fimi_match_count
|
||||||
GROUP BY source ORDER BY article_count DESC""",
|
FROM articles a
|
||||||
|
LEFT JOIN article_fimi_matches m ON m.article_id = a.id
|
||||||
|
WHERE a.incident_id = ?
|
||||||
|
GROUP BY a.source ORDER BY article_count DESC""",
|
||||||
(incident_id,),
|
(incident_id,),
|
||||||
)
|
)
|
||||||
sources = []
|
sources = []
|
||||||
@@ -507,6 +510,7 @@ async def get_articles_sources_summary(
|
|||||||
d = dict(r)
|
d = dict(r)
|
||||||
langs = (d.pop("languages") or "de").split(",")
|
langs = (d.pop("languages") or "de").split(",")
|
||||||
d["languages"] = sorted({(l or "de").strip() for l in langs if l is not None})
|
d["languages"] = sorted({(l or "de").strip() for l in langs if l is not None})
|
||||||
|
d["fimi_match_count"] = d.get("fimi_match_count") or 0
|
||||||
# Quellentyp aus dem source-Praefix ableiten (fuer den Typ-Filter der Quellenuebersicht)
|
# Quellentyp aus dem source-Praefix ableiten (fuer den Typ-Filter der Quellenuebersicht)
|
||||||
src = d.get("source") or ""
|
src = d.get("source") or ""
|
||||||
if src.startswith("X: "):
|
if src.startswith("X: "):
|
||||||
@@ -532,6 +536,114 @@ async def get_articles_sources_summary(
|
|||||||
return {"total": total, "sources": sources, "language_counts": lang_counts}
|
return {"total": total, "sources": sources, "language_counts": lang_counts}
|
||||||
|
|
||||||
|
|
||||||
|
@router.get("/{incident_id}/fimi-matches")
|
||||||
|
async def get_fimi_matches(
|
||||||
|
incident_id: int,
|
||||||
|
current_user: dict = Depends(get_current_user),
|
||||||
|
db: aiosqlite.Connection = Depends(db_dependency),
|
||||||
|
):
|
||||||
|
"""FIMI-Treffer einer Lage, gruppiert nach Artikel (fuer Andockpunkt 1).
|
||||||
|
|
||||||
|
Pro Artikel die verlinkten EUvsDisinfo-Falschbehauptungen mit Provenienz:
|
||||||
|
Claim-Text, Widerlegung, Case-URL, Embedding-Score und das woertliche
|
||||||
|
Zitat aus dem Artikel. Der Monitor wertet nicht selbst, er verweist.
|
||||||
|
"""
|
||||||
|
tenant_id = current_user.get("tenant_id")
|
||||||
|
await _check_incident_access(db, incident_id, current_user["id"], tenant_id)
|
||||||
|
cursor = await db.execute(
|
||||||
|
"""SELECT m.article_id, m.fimi_claim_id, m.score, m.role, m.matched_text,
|
||||||
|
c.text AS claim_text, c.verdict, c.verdict_summary,
|
||||||
|
c.source_ref, c.case_url
|
||||||
|
FROM article_fimi_matches m
|
||||||
|
JOIN articles a ON a.id = m.article_id
|
||||||
|
JOIN fimi_claims c ON c.id = m.fimi_claim_id
|
||||||
|
WHERE a.incident_id = ?
|
||||||
|
ORDER BY m.score DESC""",
|
||||||
|
(incident_id,),
|
||||||
|
)
|
||||||
|
by_article: dict[str, list] = {}
|
||||||
|
for r in await cursor.fetchall():
|
||||||
|
d = dict(r)
|
||||||
|
aid = str(d["article_id"])
|
||||||
|
by_article.setdefault(aid, []).append({
|
||||||
|
"claim_id": d["fimi_claim_id"],
|
||||||
|
"claim_text": d["claim_text"],
|
||||||
|
"verdict": d["verdict"],
|
||||||
|
"verdict_summary": d["verdict_summary"],
|
||||||
|
"case_url": d["case_url"],
|
||||||
|
"source_ref": d["source_ref"],
|
||||||
|
"score": d["score"],
|
||||||
|
"passage": d["matched_text"],
|
||||||
|
})
|
||||||
|
return {"matches_by_article": by_article, "article_count": len(by_article)}
|
||||||
|
|
||||||
|
|
||||||
|
@router.get("/{incident_id}/fimi-summary")
|
||||||
|
async def get_fimi_summary(
|
||||||
|
incident_id: int,
|
||||||
|
current_user: dict = Depends(get_current_user),
|
||||||
|
db: aiosqlite.Connection = Depends(db_dependency),
|
||||||
|
):
|
||||||
|
"""Aggregierte FIMI-Kennzahlen fuer die Lagebild-Qualitaetsachse (Andockpunkt 3).
|
||||||
|
|
||||||
|
Antwortet auch sinnvoll, wenn noch nichts geprueft wurde."""
|
||||||
|
tenant_id = current_user.get("tenant_id")
|
||||||
|
await _check_incident_access(db, incident_id, current_user["id"], tenant_id)
|
||||||
|
|
||||||
|
cur = await db.execute(
|
||||||
|
"""SELECT COUNT(*) AS total,
|
||||||
|
SUM(CASE WHEN fimi_checked_at IS NOT NULL THEN 1 ELSE 0 END) AS checked
|
||||||
|
FROM articles WHERE incident_id = ?""",
|
||||||
|
(incident_id,),
|
||||||
|
)
|
||||||
|
row = await cur.fetchone()
|
||||||
|
total = row["total"] or 0
|
||||||
|
checked = row["checked"] or 0
|
||||||
|
|
||||||
|
cur = await db.execute(
|
||||||
|
"""SELECT COUNT(DISTINCT m.article_id) AS matched_articles,
|
||||||
|
COUNT(DISTINCT m.fimi_claim_id) AS distinct_claims
|
||||||
|
FROM article_fimi_matches m
|
||||||
|
JOIN articles a ON a.id = m.article_id
|
||||||
|
WHERE a.incident_id = ?""",
|
||||||
|
(incident_id,),
|
||||||
|
)
|
||||||
|
row = await cur.fetchone()
|
||||||
|
matched_articles = row["matched_articles"] or 0
|
||||||
|
distinct_claims = row["distinct_claims"] or 0
|
||||||
|
|
||||||
|
cur = await db.execute(
|
||||||
|
"""SELECT c.id AS claim_id, c.text AS claim_text, c.case_url,
|
||||||
|
COUNT(DISTINCT m.article_id) AS article_count
|
||||||
|
FROM article_fimi_matches m
|
||||||
|
JOIN articles a ON a.id = m.article_id
|
||||||
|
JOIN fimi_claims c ON c.id = m.fimi_claim_id
|
||||||
|
WHERE a.incident_id = ?
|
||||||
|
GROUP BY c.id ORDER BY article_count DESC LIMIT 10""",
|
||||||
|
(incident_id,),
|
||||||
|
)
|
||||||
|
top_claims = [dict(r) for r in await cur.fetchall()]
|
||||||
|
|
||||||
|
cur = await db.execute(
|
||||||
|
"""SELECT a.source, COUNT(DISTINCT m.article_id) AS match_count
|
||||||
|
FROM article_fimi_matches m
|
||||||
|
JOIN articles a ON a.id = m.article_id
|
||||||
|
WHERE a.incident_id = ?
|
||||||
|
GROUP BY a.source ORDER BY match_count DESC LIMIT 10""",
|
||||||
|
(incident_id,),
|
||||||
|
)
|
||||||
|
by_source = [dict(r) for r in await cur.fetchall()]
|
||||||
|
|
||||||
|
return {
|
||||||
|
"articles_total": total,
|
||||||
|
"articles_checked": checked,
|
||||||
|
"articles_with_match": matched_articles,
|
||||||
|
"distinct_claims": distinct_claims,
|
||||||
|
"top_claims": top_claims,
|
||||||
|
"by_source": by_source,
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
@router.get("/{incident_id}/articles/timeline-buckets")
|
@router.get("/{incident_id}/articles/timeline-buckets")
|
||||||
async def get_articles_timeline_buckets(
|
async def get_articles_timeline_buckets(
|
||||||
incident_id: int,
|
incident_id: int,
|
||||||
|
|||||||
127
src/services/embeddings.py
Normale Datei
127
src/services/embeddings.py
Normale Datei
@@ -0,0 +1,127 @@
|
|||||||
|
"""Embedding-Service für den Claim-Matcher.
|
||||||
|
|
||||||
|
Lädt ein multilinguales SentenceTransformer-Modell als Singleton.
|
||||||
|
Erzeugt L2-normalisierte 384-dim Vektoren, sodass Kosinus-Ähnlichkeit
|
||||||
|
einem einfachen Skalarprodukt entspricht.
|
||||||
|
"""
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
import asyncio
|
||||||
|
import logging
|
||||||
|
import threading
|
||||||
|
from typing import Iterable
|
||||||
|
|
||||||
|
import numpy as np
|
||||||
|
|
||||||
|
logger = logging.getLogger("osint.embeddings")
|
||||||
|
|
||||||
|
MODEL_NAME = "sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2"
|
||||||
|
EMBED_DIM = 384
|
||||||
|
DTYPE = np.float32
|
||||||
|
|
||||||
|
# Threshold-Empfehlungen (empirisch aus Sanity-Tests):
|
||||||
|
# >= 0.85 -> sehr wahrscheinlich identische Behauptung
|
||||||
|
# >= 0.75 -> ähnliche Behauptung, dem User zur Auswahl vorschlagen
|
||||||
|
# < 0.60 -> wahrscheinlich verschiedene Behauptungen
|
||||||
|
DEFAULT_MATCH_THRESHOLD = 0.75 # fuer Duplikat-Warnung beim Anlegen
|
||||||
|
LIVE_SEARCH_THRESHOLD = 0.55 # fuer Live-Suche im Modal, mehr Recall
|
||||||
|
|
||||||
|
_model = None
|
||||||
|
_model_lock = threading.Lock()
|
||||||
|
|
||||||
|
|
||||||
|
def _get_model():
|
||||||
|
"""Lädt das Modell einmalig (lazy) und gibt es zurück."""
|
||||||
|
global _model
|
||||||
|
if _model is None:
|
||||||
|
with _model_lock:
|
||||||
|
if _model is None:
|
||||||
|
from sentence_transformers import SentenceTransformer
|
||||||
|
logger.info("Lade Embedding-Modell %s ...", MODEL_NAME)
|
||||||
|
_model = SentenceTransformer(MODEL_NAME)
|
||||||
|
logger.info("Embedding-Modell geladen, dim=%d", EMBED_DIM)
|
||||||
|
return _model
|
||||||
|
|
||||||
|
|
||||||
|
def _encode_sync(texts: list[str]) -> np.ndarray:
|
||||||
|
"""Synchroner Encode (CPU-bound, sollte im Executor laufen)."""
|
||||||
|
model = _get_model()
|
||||||
|
vecs = model.encode(
|
||||||
|
texts,
|
||||||
|
normalize_embeddings=True,
|
||||||
|
convert_to_numpy=True,
|
||||||
|
show_progress_bar=False,
|
||||||
|
)
|
||||||
|
return vecs.astype(DTYPE, copy=False)
|
||||||
|
|
||||||
|
|
||||||
|
async def encode_text(text: str) -> bytes:
|
||||||
|
"""Encodet einen Text und gibt das Embedding als Bytes (BLOB-tauglich) zurück."""
|
||||||
|
if not text or not text.strip():
|
||||||
|
raise ValueError("Leerer Text kann nicht embedded werden")
|
||||||
|
loop = asyncio.get_running_loop()
|
||||||
|
vec = await loop.run_in_executor(None, _encode_sync, [text])
|
||||||
|
return vec[0].tobytes()
|
||||||
|
|
||||||
|
|
||||||
|
async def encode_batch(texts: list[str]) -> list[bytes]:
|
||||||
|
"""Encodet mehrere Texte in einem Batch (effizienter als einzeln)."""
|
||||||
|
texts = [t for t in texts if t and t.strip()]
|
||||||
|
if not texts:
|
||||||
|
return []
|
||||||
|
loop = asyncio.get_running_loop()
|
||||||
|
vecs = await loop.run_in_executor(None, _encode_sync, texts)
|
||||||
|
return [v.tobytes() for v in vecs]
|
||||||
|
|
||||||
|
|
||||||
|
def decode_embedding(blob: bytes | None) -> np.ndarray | None:
|
||||||
|
"""Decodet einen BLOB zurück in einen numpy-Vektor."""
|
||||||
|
if blob is None or len(blob) == 0:
|
||||||
|
return None
|
||||||
|
return np.frombuffer(blob, dtype=DTYPE)
|
||||||
|
|
||||||
|
|
||||||
|
def cosine_similarity(a: np.ndarray, b: np.ndarray) -> float:
|
||||||
|
"""Kosinus-Ähnlichkeit zweier Vektoren.
|
||||||
|
|
||||||
|
Da wir L2-normalisiert encoden, reicht das Skalarprodukt.
|
||||||
|
Defensiv: wenn ein Vektor nicht normalisiert ist, fängt diese Variante das ab.
|
||||||
|
"""
|
||||||
|
na = float(np.linalg.norm(a))
|
||||||
|
nb = float(np.linalg.norm(b))
|
||||||
|
if na == 0.0 or nb == 0.0:
|
||||||
|
return 0.0
|
||||||
|
return float(np.dot(a, b) / (na * nb))
|
||||||
|
|
||||||
|
|
||||||
|
def find_similar(
|
||||||
|
query: np.ndarray,
|
||||||
|
candidates: Iterable[tuple[int, np.ndarray]],
|
||||||
|
top_k: int = 5,
|
||||||
|
threshold: float = DEFAULT_MATCH_THRESHOLD,
|
||||||
|
) -> list[tuple[int, float]]:
|
||||||
|
"""Sucht in einer Kandidaten-Menge die top_k ähnlichsten Embeddings.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
query: L2-normalisierter Query-Vektor.
|
||||||
|
candidates: Iterable von (id, embedding-Vektor)-Tupeln.
|
||||||
|
top_k: maximale Anzahl Treffer.
|
||||||
|
threshold: minimaler Score, alles darunter wird verworfen.
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
Liste von (id, score), absteigend sortiert.
|
||||||
|
"""
|
||||||
|
scored: list[tuple[int, float]] = []
|
||||||
|
for cid, vec in candidates:
|
||||||
|
if vec is None:
|
||||||
|
continue
|
||||||
|
score = cosine_similarity(query, vec)
|
||||||
|
if score >= threshold:
|
||||||
|
scored.append((cid, score))
|
||||||
|
scored.sort(key=lambda x: x[1], reverse=True)
|
||||||
|
return scored[:top_k]
|
||||||
|
|
||||||
|
|
||||||
|
def warm_up() -> None:
|
||||||
|
"""Lädt das Modell vor (kann beim App-Start in einem Thread aufgerufen werden)."""
|
||||||
|
_get_model()
|
||||||
410
src/services/fimi_matcher.py
Normale Datei
410
src/services/fimi_matcher.py
Normale Datei
@@ -0,0 +1,410 @@
|
|||||||
|
"""FIMI-Matcher: gleicht Monitor-Artikel gegen den importierten
|
||||||
|
Falschbehauptungs-Bestand (fimi_claims, EUvsDisinfo) ab.
|
||||||
|
|
||||||
|
Zweistufig, weil Embedding-Aehnlichkeit nur THEMENNAEHE misst, nicht HALTUNG:
|
||||||
|
ein Artikel, der Russlands Angriff einen "Angriffskrieg" nennt, liegt im
|
||||||
|
Embedding-Raum dicht an der Falschbehauptung "Russland wurde zum Angriff
|
||||||
|
gezwungen", sagt aber das Gegenteil. Reine Embeddings wuerden also neutrale
|
||||||
|
und sogar widerlegende Berichterstattung als Treffer markieren.
|
||||||
|
|
||||||
|
Stufe 1 (Embedding-Vorfilter, billig): findet thematisch nahe Kandidaten.
|
||||||
|
Die Claim-Embeddings liegen als numpy-Matrix im RAM (~30 MB), ein
|
||||||
|
Match ist eine Matrixmultiplikation (Kosinus == Skalarprodukt, da
|
||||||
|
L2-normalisiert).
|
||||||
|
Stufe 2 (LLM-Verifikation, praezise): ein Haiku-Call pro Kandidaten-Artikel
|
||||||
|
entscheidet, ob der Artikel die Behauptung tatsaechlich VERBREITET
|
||||||
|
(zustimmend als Tatsache aufstellt) oder nur darueber berichtet /
|
||||||
|
sie widerlegt. Nur bestaetigte Verbreitungen werden gespeichert.
|
||||||
|
|
||||||
|
Provenienz-Leitplanke: gespeichert wird nur eine Verknuepfung Artikel ->
|
||||||
|
benannter, pruefbarer EUvsDisinfo-Case plus das woertliche Zitat aus dem
|
||||||
|
Artikel. Der Monitor wertet nie selbst.
|
||||||
|
"""
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
import asyncio
|
||||||
|
import json
|
||||||
|
import logging
|
||||||
|
import os
|
||||||
|
import re
|
||||||
|
import threading
|
||||||
|
|
||||||
|
import aiosqlite
|
||||||
|
import numpy as np
|
||||||
|
|
||||||
|
# URLs aus dem Artikeltext entfernen: sonst versucht das Verifizierer-Modell,
|
||||||
|
# den Link per WebFetch zu oeffnen, was bei --allowedTools "" als
|
||||||
|
# error_max_turns scheitert.
|
||||||
|
_URL_RE = re.compile(r"https?://\S+")
|
||||||
|
|
||||||
|
from services.embeddings import encode_batch
|
||||||
|
from agents.claude_client import call_claude, ClaudeCliError
|
||||||
|
from config import CLAUDE_MODEL_FAST
|
||||||
|
|
||||||
|
logger = logging.getLogger("osint.fimi_matcher")
|
||||||
|
|
||||||
|
EMBED_DIM = 384
|
||||||
|
# Stufe 1: Vorfilter
|
||||||
|
EMBED_FLOOR = 0.55 # untere Grenze, ab der ein Kandidat ueberhaupt entsteht
|
||||||
|
PREFILTER_THRESHOLD = 0.65 # ab hier geht ein Kandidat in die LLM-Verifikation
|
||||||
|
TOP_K = 5 # max. Kandidaten-Claims pro Artikel
|
||||||
|
CONTENT_EXCERPT_CHARS = 1500
|
||||||
|
# Stufe 2: LLM-Verifikation
|
||||||
|
VERIFY_ENABLED = os.environ.get("FIMI_VERIFY_ENABLED", "true").lower() != "false"
|
||||||
|
VERIFY_CONCURRENCY = int(os.environ.get("FIMI_VERIFY_CONCURRENCY", "4"))
|
||||||
|
VERIFY_CONTENT_CHARS = 2200
|
||||||
|
VERIFY_TIMEOUT = 90
|
||||||
|
|
||||||
|
# Singleton-Matrix der Claim-Embeddings
|
||||||
|
_ids: np.ndarray | None = None # (N,) int64 -> fimi_claims.id
|
||||||
|
_matrix: np.ndarray | None = None # (N, 384) float32
|
||||||
|
_lock = threading.Lock()
|
||||||
|
|
||||||
|
|
||||||
|
# ──────────────────────────────────────────────────────────────────
|
||||||
|
# Stufe 1: Embedding-Vorfilter
|
||||||
|
# ──────────────────────────────────────────────────────────────────
|
||||||
|
|
||||||
|
async def ensure_matrix(db: aiosqlite.Connection, force: bool = False) -> int:
|
||||||
|
"""Laedt die Claim-Embeddings einmalig in eine numpy-Matrix. Idempotent."""
|
||||||
|
global _ids, _matrix
|
||||||
|
if _matrix is not None and not force:
|
||||||
|
return int(_matrix.shape[0])
|
||||||
|
|
||||||
|
cursor = await db.execute(
|
||||||
|
"SELECT id, embedding FROM fimi_claims WHERE embedding IS NOT NULL"
|
||||||
|
)
|
||||||
|
rows = await cursor.fetchall()
|
||||||
|
ids: list[int] = []
|
||||||
|
vecs: list[np.ndarray] = []
|
||||||
|
for r in rows:
|
||||||
|
v = np.frombuffer(r["embedding"], dtype=np.float32)
|
||||||
|
if v.size != EMBED_DIM:
|
||||||
|
continue
|
||||||
|
ids.append(r["id"])
|
||||||
|
vecs.append(v)
|
||||||
|
|
||||||
|
with _lock:
|
||||||
|
if vecs:
|
||||||
|
_ids = np.asarray(ids, dtype=np.int64)
|
||||||
|
_matrix = np.vstack(vecs).astype(np.float32, copy=False)
|
||||||
|
else:
|
||||||
|
_ids = np.empty((0,), dtype=np.int64)
|
||||||
|
_matrix = np.empty((0, EMBED_DIM), dtype=np.float32)
|
||||||
|
logger.info("FIMI-Matcher: %d Claim-Embeddings geladen", len(ids))
|
||||||
|
return len(ids)
|
||||||
|
|
||||||
|
|
||||||
|
def is_ready() -> bool:
|
||||||
|
return _matrix is not None and _matrix.shape[0] > 0
|
||||||
|
|
||||||
|
|
||||||
|
def _build_query_text(headline: str | None, content: str | None) -> str:
|
||||||
|
parts = []
|
||||||
|
if headline:
|
||||||
|
parts.append(headline.strip())
|
||||||
|
if content:
|
||||||
|
excerpt = content.strip()[:CONTENT_EXCERPT_CHARS]
|
||||||
|
if excerpt:
|
||||||
|
parts.append(excerpt)
|
||||||
|
return " ".join(parts).strip()
|
||||||
|
|
||||||
|
|
||||||
|
async def match_query_texts(
|
||||||
|
texts: list[str],
|
||||||
|
threshold: float = EMBED_FLOOR,
|
||||||
|
top_k: int = TOP_K,
|
||||||
|
) -> list[list[tuple[int, float]]]:
|
||||||
|
"""Stufe 1: matcht Query-Texte gegen die Claim-Matrix (Embedding-Kosinus).
|
||||||
|
|
||||||
|
Returns: Liste gleicher Laenge wie texts, je eine Liste von
|
||||||
|
(claim_id, score), absteigend sortiert, nur Treffer >= threshold.
|
||||||
|
"""
|
||||||
|
results: list[list[tuple[int, float]]] = [[] for _ in texts]
|
||||||
|
if _matrix is None or _matrix.shape[0] == 0:
|
||||||
|
return results
|
||||||
|
|
||||||
|
valid_idx = [i for i, t in enumerate(texts) if t and t.strip()]
|
||||||
|
if not valid_idx:
|
||||||
|
return results
|
||||||
|
blobs = await encode_batch([texts[i] for i in valid_idx])
|
||||||
|
if len(blobs) != len(valid_idx):
|
||||||
|
logger.warning("FIMI-Matcher: encode_batch-Laenge passt nicht, skip")
|
||||||
|
return results
|
||||||
|
|
||||||
|
qm = np.vstack([np.frombuffer(b, dtype=np.float32) for b in blobs]) # (V, 384)
|
||||||
|
scores = qm @ _matrix.T # (V, N) — Kosinus, da L2-normalisiert
|
||||||
|
|
||||||
|
for row, orig_i in enumerate(valid_idx):
|
||||||
|
s = scores[row]
|
||||||
|
if top_k < s.size:
|
||||||
|
cand = np.argpartition(s, -top_k)[-top_k:]
|
||||||
|
else:
|
||||||
|
cand = np.arange(s.size)
|
||||||
|
cand = cand[np.argsort(s[cand])[::-1]]
|
||||||
|
hits = [(int(_ids[j]), float(s[j])) for j in cand if s[j] >= threshold]
|
||||||
|
results[orig_i] = hits
|
||||||
|
return results
|
||||||
|
|
||||||
|
|
||||||
|
# ──────────────────────────────────────────────────────────────────
|
||||||
|
# Stufe 2: LLM-Verifikation
|
||||||
|
# ──────────────────────────────────────────────────────────────────
|
||||||
|
|
||||||
|
_VERIFY_PROMPT = """Bewerte ausschließlich den unten stehenden Artikeltext. Du hast KEINEN Internetzugang und darfst KEINE Werkzeuge benutzen (kein WebFetch, keine Suche, kein Öffnen von Links). Falls der Text gekürzt ist, bewerte nur das Vorhandene. Antworte sofort mit JSON.
|
||||||
|
|
||||||
|
Du prüfst, ob ein Nachrichtenartikel eine bekannte Falschbehauptung VERBREITET.
|
||||||
|
|
||||||
|
VERBREITET (spreads=true): Der Artikel übernimmt die SPEZIFISCHE, irreführende Kernaussage der Behauptung zustimmend, stellt sie als Tatsache hin oder legt sie dem Leser als wahr nahe.
|
||||||
|
|
||||||
|
VERBREITET NICHT (spreads=false), wenn eines zutrifft:
|
||||||
|
- Der Artikel berichtet neutral über das Thema.
|
||||||
|
- Der Artikel widerlegt die Behauptung, ordnet sie als Desinformation ein oder zitiert sie distanziert/kritisch.
|
||||||
|
- Der Artikel sagt inhaltlich das Gegenteil.
|
||||||
|
- Der Artikel erwähnt nur ein thematisch verwandtes Faktum, OHNE die irreführende Kernaussage zu übernehmen.
|
||||||
|
|
||||||
|
Entscheidend ist die HALTUNG zur konkreten Kernaussage, nicht die thematische Nähe. Ein gemeinsames Stichwort, Ereignis oder Faktum reicht NICHT.
|
||||||
|
|
||||||
|
Beispiele für spreads=false (häufige Verwechslung):
|
||||||
|
- Behauptung "Russland wurde zum Angriff gezwungen": Artikel nennt den Einmarsch einen "Angriffskrieg" -> false (Gegenteil).
|
||||||
|
- Behauptung "Die Ukraine ist eine westliche Marionette ohne Souveränität": Artikel berichtet, dass ausländische Ausbilder ukrainische Soldaten trainieren -> false (bloßes Faktum, keine Marionetten-Aussage).
|
||||||
|
- Behauptung "Russlands Wirtschaft boomt trotz Sanktionen": Artikel berichtet konkrete Öleinnahmen -> false (Einzelfaktum, kein Boom-Narrativ).
|
||||||
|
- Behauptung "Die Ukraine kann den Krieg nicht gewinnen": Artikel analysiert, dass militärisch keine Seite gewinnen kann -> false (symmetrische Analyse, nicht die einseitige Behauptung).
|
||||||
|
|
||||||
|
Im Zweifel spreads=false. Nur die eindeutige Übernahme der irreführenden Kernaussage zählt.
|
||||||
|
|
||||||
|
ARTIKEL
|
||||||
|
Titel: {headline}
|
||||||
|
Text: {content}
|
||||||
|
|
||||||
|
ZU PRÜFENDE BEHAUPTUNGEN
|
||||||
|
{claims}
|
||||||
|
|
||||||
|
Antworte AUSSCHLIESSLICH als JSON:
|
||||||
|
{{"results": [{{"claim_id": <id>, "spreads": <true|false>, "passage": "<wörtliches Zitat aus dem Artikel, das die Behauptung verbreitet; leer wenn spreads=false>"}}]}}"""
|
||||||
|
|
||||||
|
|
||||||
|
async def _verify_article(
|
||||||
|
article, candidate_claims: list[tuple[int, float, str]]
|
||||||
|
) -> list[tuple[int, float, str]]:
|
||||||
|
"""Ein Haiku-Call: welche Kandidaten-Behauptungen verbreitet der Artikel?
|
||||||
|
|
||||||
|
candidate_claims: Liste (claim_id, embed_score, claim_text).
|
||||||
|
Returns: bestaetigte (claim_id, embed_score, passage) fuer spreads=true.
|
||||||
|
Wirft bei CLI-/Parse-Fehler, damit der Aufrufer den Artikel nicht als
|
||||||
|
geprueft markiert (Retry beim naechsten Refresh).
|
||||||
|
"""
|
||||||
|
headline = (article["headline_de"] or article["headline"] or "").strip()
|
||||||
|
content = (
|
||||||
|
(article["content_de"] if "content_de" in article.keys() else None)
|
||||||
|
or (article["content_original"] if "content_original" in article.keys() else None)
|
||||||
|
or ""
|
||||||
|
)
|
||||||
|
content = _URL_RE.sub("", content).strip()[:VERIFY_CONTENT_CHARS]
|
||||||
|
if not content:
|
||||||
|
# Ohne Fliesstext laesst sich die Haltung nicht serioes bestimmen.
|
||||||
|
return []
|
||||||
|
|
||||||
|
claim_by_id = {cid: text for cid, _, text in candidate_claims}
|
||||||
|
claims_block = "\n".join(f"[{cid}] {text}" for cid, _, text in candidate_claims)
|
||||||
|
prompt = _VERIFY_PROMPT.format(headline=headline, content=content, claims=claims_block)
|
||||||
|
|
||||||
|
text, _usage = await call_claude(
|
||||||
|
prompt, tools=None, model=CLAUDE_MODEL_FAST, timeout=VERIFY_TIMEOUT
|
||||||
|
)
|
||||||
|
raw = (text or "").strip()
|
||||||
|
# Defensive: evtl. Markdown-Fences entfernen
|
||||||
|
if raw.startswith("```"):
|
||||||
|
raw = raw.strip("`")
|
||||||
|
nl = raw.find("\n")
|
||||||
|
if nl != -1:
|
||||||
|
raw = raw[nl + 1:]
|
||||||
|
start, end = raw.find("{"), raw.rfind("}")
|
||||||
|
if start == -1 or end == -1:
|
||||||
|
raise ValueError(f"Keine JSON-Antwort vom Verifizierer: {raw[:120]!r}")
|
||||||
|
data = json.loads(raw[start:end + 1])
|
||||||
|
|
||||||
|
embed_score = {cid: sc for cid, sc, _ in candidate_claims}
|
||||||
|
confirmed: list[tuple[int, float, str]] = []
|
||||||
|
for item in data.get("results", []):
|
||||||
|
try:
|
||||||
|
cid = int(item.get("claim_id"))
|
||||||
|
except (TypeError, ValueError):
|
||||||
|
continue
|
||||||
|
if cid not in claim_by_id:
|
||||||
|
continue
|
||||||
|
if item.get("spreads") is True:
|
||||||
|
passage = (item.get("passage") or "").strip()[:500]
|
||||||
|
confirmed.append((cid, embed_score.get(cid, 0.0), passage))
|
||||||
|
return confirmed
|
||||||
|
|
||||||
|
|
||||||
|
# ──────────────────────────────────────────────────────────────────
|
||||||
|
# Orchestrierung: matchen + speichern
|
||||||
|
# ──────────────────────────────────────────────────────────────────
|
||||||
|
|
||||||
|
async def _load_claim_texts(db, claim_ids: set[int]) -> dict[int, str]:
|
||||||
|
if not claim_ids:
|
||||||
|
return {}
|
||||||
|
qs = ",".join("?" for _ in claim_ids)
|
||||||
|
cursor = await db.execute(
|
||||||
|
f"SELECT id, text FROM fimi_claims WHERE id IN ({qs})", tuple(claim_ids)
|
||||||
|
)
|
||||||
|
return {r["id"]: r["text"] for r in await cursor.fetchall()}
|
||||||
|
|
||||||
|
|
||||||
|
async def match_and_store_articles(
|
||||||
|
db: aiosqlite.Connection,
|
||||||
|
articles: list,
|
||||||
|
prefilter_threshold: float = PREFILTER_THRESHOLD,
|
||||||
|
top_k: int = TOP_K,
|
||||||
|
verify: bool | None = None,
|
||||||
|
mark_checked: bool = True,
|
||||||
|
) -> dict:
|
||||||
|
"""Zweistufiger Match + Speicherung fuer eine Liste Artikel-Rows.
|
||||||
|
|
||||||
|
articles: Rows mit id, headline, headline_de, content_original, content_de
|
||||||
|
und (optional) tenant_id.
|
||||||
|
"""
|
||||||
|
if verify is None:
|
||||||
|
verify = VERIFY_ENABLED
|
||||||
|
await ensure_matrix(db)
|
||||||
|
if not articles:
|
||||||
|
return {"articles": 0, "candidates": 0, "articles_with_match": 0, "stored": 0, "errors": 0}
|
||||||
|
|
||||||
|
# Stufe 1: Embedding-Vorfilter
|
||||||
|
texts = [
|
||||||
|
_build_query_text(
|
||||||
|
a["headline_de"] or a["headline"],
|
||||||
|
(a["content_de"] if "content_de" in a.keys() else None)
|
||||||
|
or (a["content_original"] if "content_original" in a.keys() else None),
|
||||||
|
)
|
||||||
|
for a in articles
|
||||||
|
]
|
||||||
|
prefiltered = await match_query_texts(texts, threshold=EMBED_FLOOR, top_k=top_k)
|
||||||
|
|
||||||
|
# Claim-Texte fuer alle starken Kandidaten laden
|
||||||
|
strong_per_article: list[list[tuple[int, float]]] = [
|
||||||
|
[(cid, sc) for cid, sc in cands if sc >= prefilter_threshold]
|
||||||
|
for cands in prefiltered
|
||||||
|
]
|
||||||
|
need_ids: set[int] = {cid for lst in strong_per_article for cid, _ in lst}
|
||||||
|
claim_texts = await _load_claim_texts(db, need_ids)
|
||||||
|
|
||||||
|
# Stufe 2: Verifikation (parallel, begrenzt) — nur Artikel mit starken Kandidaten
|
||||||
|
sem = asyncio.Semaphore(max(1, VERIFY_CONCURRENCY))
|
||||||
|
candidates_total = sum(len(lst) for lst in strong_per_article)
|
||||||
|
|
||||||
|
async def _process(idx: int):
|
||||||
|
a = articles[idx]
|
||||||
|
strong = strong_per_article[idx]
|
||||||
|
if not strong:
|
||||||
|
# geprueft, aber kein starker Kandidat -> nichts zu verifizieren
|
||||||
|
return idx, [], False
|
||||||
|
cand = [(cid, sc, claim_texts.get(cid, "")) for cid, sc in strong if claim_texts.get(cid)]
|
||||||
|
if not cand:
|
||||||
|
return idx, [], False
|
||||||
|
if not verify:
|
||||||
|
return idx, [(cid, sc, None) for cid, sc, _ in cand], False
|
||||||
|
async with sem:
|
||||||
|
try:
|
||||||
|
confirmed = await _verify_article(a, cand)
|
||||||
|
return idx, confirmed, False
|
||||||
|
except (ClaudeCliError, ValueError, json.JSONDecodeError, TimeoutError) as e:
|
||||||
|
logger.warning("FIMI-Verifikation article_id=%s fehlgeschlagen: %s",
|
||||||
|
a["id"], e)
|
||||||
|
return idx, None, True # error -> nicht als checked markieren
|
||||||
|
|
||||||
|
proc = await asyncio.gather(*[_process(i) for i in range(len(articles))])
|
||||||
|
|
||||||
|
# Speichern (sequenziell, eine DB-Connection)
|
||||||
|
stored = 0
|
||||||
|
with_match = 0
|
||||||
|
errors = 0
|
||||||
|
for idx, confirmed, err in proc:
|
||||||
|
a = articles[idx]
|
||||||
|
if err:
|
||||||
|
errors += 1
|
||||||
|
continue # Artikel NICHT als checked markieren -> Retry
|
||||||
|
if confirmed:
|
||||||
|
with_match += 1
|
||||||
|
tenant_id = a["tenant_id"] if "tenant_id" in a.keys() else None
|
||||||
|
role = "verified" if verify else "match"
|
||||||
|
for cid, sc, passage in confirmed:
|
||||||
|
try:
|
||||||
|
await db.execute(
|
||||||
|
"""INSERT INTO article_fimi_matches
|
||||||
|
(article_id, fimi_claim_id, score, role, matched_text, tenant_id, matched_at)
|
||||||
|
VALUES (?, ?, ?, ?, ?, ?, CURRENT_TIMESTAMP)""",
|
||||||
|
(a["id"], cid, round(sc, 4), role, passage, tenant_id),
|
||||||
|
)
|
||||||
|
stored += 1
|
||||||
|
except aiosqlite.IntegrityError:
|
||||||
|
await db.execute(
|
||||||
|
"""UPDATE article_fimi_matches
|
||||||
|
SET score = MAX(COALESCE(score, 0), ?),
|
||||||
|
role = ?, matched_text = COALESCE(?, matched_text)
|
||||||
|
WHERE article_id = ? AND fimi_claim_id = ?""",
|
||||||
|
(round(sc, 4), role, passage, a["id"], cid),
|
||||||
|
)
|
||||||
|
if mark_checked:
|
||||||
|
await db.execute(
|
||||||
|
"UPDATE articles SET fimi_checked_at = CURRENT_TIMESTAMP WHERE id = ?",
|
||||||
|
(a["id"],),
|
||||||
|
)
|
||||||
|
await db.commit()
|
||||||
|
logger.info(
|
||||||
|
"FIMI-Matcher: %d Artikel, %d Kandidaten, %d verbreiten Falschbehauptungen, "
|
||||||
|
"%d Links, %d Fehler",
|
||||||
|
len(articles), candidates_total, with_match, stored, errors,
|
||||||
|
)
|
||||||
|
return {
|
||||||
|
"articles": len(articles),
|
||||||
|
"candidates": candidates_total,
|
||||||
|
"articles_with_match": with_match,
|
||||||
|
"stored": stored,
|
||||||
|
"errors": errors,
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
async def match_article_ids(
|
||||||
|
db: aiosqlite.Connection,
|
||||||
|
article_ids: list[int],
|
||||||
|
verify: bool | None = None,
|
||||||
|
) -> dict:
|
||||||
|
"""Matcht eine konkrete Menge Artikel (per ID). Pipeline-Einstieg fuer die
|
||||||
|
in einem Refresh neu hinzugekommenen Artikel."""
|
||||||
|
ids = [int(i) for i in article_ids if i]
|
||||||
|
if not ids:
|
||||||
|
return {"articles": 0, "candidates": 0, "articles_with_match": 0, "stored": 0, "errors": 0}
|
||||||
|
qs = ",".join("?" for _ in ids)
|
||||||
|
cursor = await db.execute(
|
||||||
|
f"SELECT id, headline, headline_de, content_original, content_de, tenant_id "
|
||||||
|
f"FROM articles WHERE id IN ({qs})",
|
||||||
|
tuple(ids),
|
||||||
|
)
|
||||||
|
articles = await cursor.fetchall()
|
||||||
|
return await match_and_store_articles(db, articles, verify=verify)
|
||||||
|
|
||||||
|
|
||||||
|
async def match_incident_articles(
|
||||||
|
db: aiosqlite.Connection,
|
||||||
|
incident_id: int,
|
||||||
|
only_unchecked: bool = True,
|
||||||
|
limit: int | None = None,
|
||||||
|
verify: bool | None = None,
|
||||||
|
) -> dict:
|
||||||
|
"""Matcht (standardmaessig noch nicht gepruefte) Artikel einer Lage."""
|
||||||
|
q = (
|
||||||
|
"SELECT id, headline, headline_de, content_original, content_de, tenant_id "
|
||||||
|
"FROM articles WHERE incident_id = ?"
|
||||||
|
)
|
||||||
|
params: list = [incident_id]
|
||||||
|
if only_unchecked:
|
||||||
|
q += " AND fimi_checked_at IS NULL"
|
||||||
|
q += " ORDER BY id"
|
||||||
|
if limit:
|
||||||
|
q += f" LIMIT {int(limit)}"
|
||||||
|
cursor = await db.execute(q, params)
|
||||||
|
articles = await cursor.fetchall()
|
||||||
|
return await match_and_store_articles(db, articles, verify=verify)
|
||||||
@@ -36,6 +36,8 @@ _PIPELINE_STEPS_DE = [
|
|||||||
"tooltip": "Aus Foren-Quellen (z.B. 5ch, Hatena, Note) wird ein Stimmungsbild der öffentlichen Diskussion extrahiert. Keine Faktenlage, sondern dominante Themen und Bruchlinien."},
|
"tooltip": "Aus Foren-Quellen (z.B. 5ch, Hatena, Note) wird ein Stimmungsbild der öffentlichen Diskussion extrahiert. Keine Faktenlage, sondern dominante Themen und Bruchlinien."},
|
||||||
{"key": "summary", "label": "Lagebild verfassen", "icon": "file-text",
|
{"key": "summary", "label": "Lagebild verfassen", "icon": "file-text",
|
||||||
"tooltip": "Aus allen geprüften Meldungen wird ein zusammenhängendes Lagebild geschrieben, mit Quellenangaben am Text."},
|
"tooltip": "Aus allen geprüften Meldungen wird ein zusammenhängendes Lagebild geschrieben, mit Quellenangaben am Text."},
|
||||||
|
{"key": "translate", "label": "Artikel uebersetzen", "icon": "languages",
|
||||||
|
"tooltip": "Fremdsprachige Meldungen (z.B. japanisch) werden ins Lagebild-Output uebersetzt. Laeuft nur fuer Quellen-Pools mit nicht-deutschen Sprachen und kann bei vielen neuen Artikeln einige Minuten dauern."},
|
||||||
{"key": "qc", "label": "Qualitätscheck", "icon": "check-circle",
|
{"key": "qc", "label": "Qualitätscheck", "icon": "check-circle",
|
||||||
"tooltip": "Eine letzte Kontrollprüfung am Ergebnis: Doppelte Fakten zusammenführen, Karten-Verortung prüfen, bevor du benachrichtigt wirst."},
|
"tooltip": "Eine letzte Kontrollprüfung am Ergebnis: Doppelte Fakten zusammenführen, Karten-Verortung prüfen, bevor du benachrichtigt wirst."},
|
||||||
{"key": "notify", "label": "Benachrichtigen", "icon": "bell",
|
{"key": "notify", "label": "Benachrichtigen", "icon": "bell",
|
||||||
@@ -59,6 +61,8 @@ _PIPELINE_STEPS_EN = [
|
|||||||
"tooltip": "Forum sources (5ch, Hatena, Note, etc.) are summarised into a public-mood overview. Not factual, but dominant themes and fault lines."},
|
"tooltip": "Forum sources (5ch, Hatena, Note, etc.) are summarised into a public-mood overview. Not factual, but dominant themes and fault lines."},
|
||||||
{"key": "summary", "label": "Writing the briefing", "icon": "file-text",
|
{"key": "summary", "label": "Writing the briefing", "icon": "file-text",
|
||||||
"tooltip": "All verified articles are combined into a coherent briefing with inline citations."},
|
"tooltip": "All verified articles are combined into a coherent briefing with inline citations."},
|
||||||
|
{"key": "translate", "label": "Translating articles", "icon": "languages",
|
||||||
|
"tooltip": "Foreign-language articles (e.g. Japanese) are translated into the briefing output language. Runs only when the source pool contains non-target-language items and can take several minutes for large incoming batches."},
|
||||||
{"key": "qc", "label": "Quality check", "icon": "check-circle",
|
{"key": "qc", "label": "Quality check", "icon": "check-circle",
|
||||||
"tooltip": "A final review: consolidate duplicate facts, verify map locations, before you get notified."},
|
"tooltip": "A final review: consolidate duplicate facts, verify map locations, before you get notified."},
|
||||||
{"key": "notify", "label": "Notifying", "icon": "bell",
|
{"key": "notify", "label": "Notifying", "icon": "bell",
|
||||||
|
|||||||
@@ -6172,3 +6172,122 @@ body.tutorial-active .tutorial-cursor {
|
|||||||
.pipeline-block.status-active { box-shadow: var(--glow-accent); }
|
.pipeline-block.status-active { box-shadow: var(--glow-accent); }
|
||||||
.pipeline-stage.is-looping .pipeline-loop { animation: none !important; opacity: 1; }
|
.pipeline-stage.is-looping .pipeline-loop { animation: none !important; opacity: 1; }
|
||||||
}
|
}
|
||||||
|
|
||||||
|
/* ──────────────────────────────────────────────────────────────────
|
||||||
|
FIMI / Counter-Disinformation (Andockpunkte 1-3)
|
||||||
|
Dezenter, hinweisender Ton (amber = --warning), keine Warnsirene.
|
||||||
|
Die Provenienz wird ueber Texte + Case-Links getragen, nicht ueber
|
||||||
|
Farbe. Kein Match -> kein Element, kein visueller Ballast.
|
||||||
|
────────────────────────────────────────────────────────────────── */
|
||||||
|
|
||||||
|
/* Andockpunkt 1: Inline-Hinweis am Artikel (in der Quellen-Detailliste) */
|
||||||
|
.fimi-hint {
|
||||||
|
flex-basis: 100%;
|
||||||
|
display: flex;
|
||||||
|
align-items: center;
|
||||||
|
gap: 6px;
|
||||||
|
margin-top: 5px;
|
||||||
|
padding: 4px 8px;
|
||||||
|
font-size: 11.5px;
|
||||||
|
line-height: 1.35;
|
||||||
|
background: rgba(245, 158, 11, 0.08);
|
||||||
|
border-left: 2px solid var(--warning);
|
||||||
|
border-radius: 3px;
|
||||||
|
}
|
||||||
|
.fimi-hint-icon { flex: 0 0 auto; font-size: 12px; color: var(--warning); }
|
||||||
|
.fimi-hint-text { color: var(--text-secondary); }
|
||||||
|
.fimi-hint-link {
|
||||||
|
margin-left: auto;
|
||||||
|
flex: 0 0 auto;
|
||||||
|
color: var(--warning);
|
||||||
|
font-weight: 600;
|
||||||
|
text-decoration: none;
|
||||||
|
white-space: nowrap;
|
||||||
|
}
|
||||||
|
.fimi-hint-link:hover { text-decoration: underline; }
|
||||||
|
.source-overview-detail-list li.has-fimi-hint { flex-wrap: wrap; }
|
||||||
|
|
||||||
|
/* Andockpunkt 2: empirischer Track-Record-Badge in der Quellen-Box */
|
||||||
|
.fimi-source-badge {
|
||||||
|
display: inline-flex;
|
||||||
|
align-items: center;
|
||||||
|
margin-left: 6px;
|
||||||
|
padding: 1px 6px;
|
||||||
|
font-size: 10px;
|
||||||
|
font-weight: 700;
|
||||||
|
letter-spacing: 0.02em;
|
||||||
|
color: var(--warning);
|
||||||
|
background: rgba(245, 158, 11, 0.12);
|
||||||
|
border: 1px solid rgba(245, 158, 11, 0.35);
|
||||||
|
border-radius: 10px;
|
||||||
|
white-space: nowrap;
|
||||||
|
}
|
||||||
|
.source-overview-item.has-fimi { box-shadow: inset 2px 0 0 var(--warning); }
|
||||||
|
|
||||||
|
/* Andockpunkt 3: Qualitaetsleiste ueber dem Lagebild */
|
||||||
|
.fimi-summary-bar {
|
||||||
|
margin: 0 0 12px 0;
|
||||||
|
padding: 10px 14px;
|
||||||
|
border-radius: 6px;
|
||||||
|
font-size: 13px;
|
||||||
|
line-height: 1.45;
|
||||||
|
}
|
||||||
|
.fimi-summary-bar:empty { display: none; }
|
||||||
|
.fimi-summary-bar--alert {
|
||||||
|
color: var(--text-primary);
|
||||||
|
background: rgba(245, 158, 11, 0.09);
|
||||||
|
border: 1px solid rgba(245, 158, 11, 0.30);
|
||||||
|
}
|
||||||
|
.fimi-summary-bar--clear {
|
||||||
|
display: flex;
|
||||||
|
align-items: center;
|
||||||
|
gap: 8px;
|
||||||
|
color: var(--text-secondary);
|
||||||
|
background: var(--bg-elevated);
|
||||||
|
border: 1px solid var(--border);
|
||||||
|
}
|
||||||
|
.fimi-summary-head { display: flex; align-items: center; gap: 10px; flex-wrap: wrap; }
|
||||||
|
.fimi-summary-icon { flex: 0 0 auto; color: var(--warning); font-size: 15px; }
|
||||||
|
.fimi-summary-bar--clear .fimi-summary-icon { color: var(--success); }
|
||||||
|
.fimi-summary-lead { flex: 1 1 240px; }
|
||||||
|
.fimi-summary-lead strong { color: var(--warning); }
|
||||||
|
.fimi-summary-toggle {
|
||||||
|
flex: 0 0 auto;
|
||||||
|
padding: 3px 10px;
|
||||||
|
font-size: 12px;
|
||||||
|
font-weight: 600;
|
||||||
|
color: var(--warning);
|
||||||
|
background: transparent;
|
||||||
|
border: 1px solid rgba(245, 158, 11, 0.4);
|
||||||
|
border-radius: 4px;
|
||||||
|
cursor: pointer;
|
||||||
|
}
|
||||||
|
.fimi-summary-toggle:hover { background: rgba(245, 158, 11, 0.12); }
|
||||||
|
.fimi-summary-claims {
|
||||||
|
list-style: none;
|
||||||
|
margin: 10px 0 0 0;
|
||||||
|
padding: 10px 0 0 0;
|
||||||
|
border-top: 1px solid rgba(245, 158, 11, 0.20);
|
||||||
|
}
|
||||||
|
.fimi-summary-claims li {
|
||||||
|
display: flex;
|
||||||
|
align-items: baseline;
|
||||||
|
gap: 8px;
|
||||||
|
padding: 4px 0;
|
||||||
|
font-size: 12.5px;
|
||||||
|
color: var(--text-secondary);
|
||||||
|
}
|
||||||
|
.fimi-claim-count { flex: 0 0 auto; font-weight: 700; color: var(--warning); min-width: 28px; }
|
||||||
|
.fimi-claim-text { flex: 1 1 auto; }
|
||||||
|
|
||||||
|
/* FIMI: Pflicht-Quellenhinweis EUvsDisinfo (dezent, gedaempft) */
|
||||||
|
.fimi-disclaimer {
|
||||||
|
margin-top: 10px;
|
||||||
|
padding-top: 8px;
|
||||||
|
border-top: 1px solid rgba(245, 158, 11, 0.18);
|
||||||
|
font-size: 10.5px;
|
||||||
|
line-height: 1.4;
|
||||||
|
color: var(--text-disabled);
|
||||||
|
}
|
||||||
|
.fimi-disclaimer a { color: var(--text-secondary); text-decoration: underline; }
|
||||||
|
.fimi-disclaimer a:hover { color: var(--warning); }
|
||||||
|
|||||||
@@ -234,6 +234,7 @@
|
|||||||
<span class="lagebild-timestamp" id="lagebild-timestamp"></span>
|
<span class="lagebild-timestamp" id="lagebild-timestamp"></span>
|
||||||
</div>
|
</div>
|
||||||
<div id="summary-content">
|
<div id="summary-content">
|
||||||
|
<div id="fimi-summary-bar"></div>
|
||||||
<div id="summary-text" class="summary-text"></div>
|
<div id="summary-text" class="summary-text"></div>
|
||||||
</div>
|
</div>
|
||||||
</div>
|
</div>
|
||||||
|
|||||||
@@ -181,6 +181,15 @@ const API = {
|
|||||||
return this._request('GET', `/incidents/${incidentId}/factchecks`);
|
return this._request('GET', `/incidents/${incidentId}/factchecks`);
|
||||||
},
|
},
|
||||||
|
|
||||||
|
// FIMI / Counter-Disinformation
|
||||||
|
getFimiMatches(incidentId) {
|
||||||
|
return this._request('GET', `/incidents/${incidentId}/fimi-matches`);
|
||||||
|
},
|
||||||
|
|
||||||
|
getFimiSummary(incidentId) {
|
||||||
|
return this._request('GET', `/incidents/${incidentId}/fimi-summary`);
|
||||||
|
},
|
||||||
|
|
||||||
getPipeline(incidentId) {
|
getPipeline(incidentId) {
|
||||||
return this._request('GET', `/incidents/${incidentId}/pipeline`);
|
return this._request('GET', `/incidents/${incidentId}/pipeline`);
|
||||||
},
|
},
|
||||||
|
|||||||
@@ -884,6 +884,9 @@ const App = {
|
|||||||
// Quellenuebersicht aus Aggregat-Endpunkt (alle Quellen, nicht nur erste Seite)
|
// Quellenuebersicht aus Aggregat-Endpunkt (alle Quellen, nicht nur erste Seite)
|
||||||
this._loadSourcesSummary(id).catch(err => console.warn('sources-summary:', err));
|
this._loadSourcesSummary(id).catch(err => console.warn('sources-summary:', err));
|
||||||
|
|
||||||
|
// FIMI: Treffer pro Artikel + Lagebild-Aggregat (Counter-Disinformation)
|
||||||
|
this._loadFimiData(id).catch(err => console.warn('fimi-data:', err));
|
||||||
|
|
||||||
// Wenn mehr Artikel existieren als initial geladen: progressiver Hintergrund-Load
|
// Wenn mehr Artikel existieren als initial geladen: progressiver Hintergrund-Load
|
||||||
if (articlesTotal > articles.length) {
|
if (articlesTotal > articles.length) {
|
||||||
this._loadRemainingArticlesInBackground(id).catch(err => console.warn('bg-articles:', err));
|
this._loadRemainingArticlesInBackground(id).catch(err => console.warn('bg-articles:', err));
|
||||||
@@ -909,6 +912,44 @@ const App = {
|
|||||||
}
|
}
|
||||||
},
|
},
|
||||||
|
|
||||||
|
/** FIMI-Daten der Lage laden: Treffer pro Artikel + Aggregat fuers Lagebild. */
|
||||||
|
async _loadFimiData(incidentId) {
|
||||||
|
let matches = {}, summary = null;
|
||||||
|
try {
|
||||||
|
const [m, s] = await Promise.all([
|
||||||
|
API.getFimiMatches(incidentId),
|
||||||
|
API.getFimiSummary(incidentId),
|
||||||
|
]);
|
||||||
|
matches = (m && m.matches_by_article) || {};
|
||||||
|
summary = s || null;
|
||||||
|
} catch (err) {
|
||||||
|
console.warn('fimi-data:', err);
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
if (this.currentIncidentId !== incidentId) return; // User hat gewechselt
|
||||||
|
this._currentFimiMatches = matches;
|
||||||
|
this._currentFimiSummary = summary;
|
||||||
|
this._renderFimiSummaryBar();
|
||||||
|
},
|
||||||
|
|
||||||
|
/** Andockpunkt 3: Qualitaetsleiste ins Lagebild rendern. */
|
||||||
|
_renderFimiSummaryBar() {
|
||||||
|
const host = document.getElementById('fimi-summary-bar');
|
||||||
|
if (!host || typeof UI.renderFimiSummaryBar !== 'function') return;
|
||||||
|
host.innerHTML = UI.renderFimiSummaryBar(this._currentFimiSummary);
|
||||||
|
},
|
||||||
|
|
||||||
|
/** Narrative-Liste in der FIMI-Qualitaetsleiste auf-/zuklappen. */
|
||||||
|
toggleFimiDetail(btn) {
|
||||||
|
const bar = btn.closest('.fimi-summary-bar');
|
||||||
|
if (!bar) return;
|
||||||
|
const list = bar.querySelector('.fimi-summary-claims');
|
||||||
|
if (!list) return;
|
||||||
|
const open = list.style.display !== 'none';
|
||||||
|
list.style.display = open ? 'none' : '';
|
||||||
|
btn.textContent = open ? 'Narrative anzeigen' : 'Narrative verbergen';
|
||||||
|
},
|
||||||
|
|
||||||
/** Quellenuebersicht der Lage nach Quellentyp filtern (Web/Telegram/X). */
|
/** Quellenuebersicht der Lage nach Quellentyp filtern (Web/Telegram/X). */
|
||||||
filterSourceOverview(type, chipEl) {
|
filterSourceOverview(type, chipEl) {
|
||||||
const content = document.getElementById('source-overview-content');
|
const content = document.getElementById('source-overview-content');
|
||||||
@@ -1009,10 +1050,16 @@ const App = {
|
|||||||
const inner = a.source_url
|
const inner = a.source_url
|
||||||
? `<a href="${UI.escape(a.source_url)}" target="_blank" rel="noopener">${headline}</a>`
|
? `<a href="${UI.escape(a.source_url)}" target="_blank" rel="noopener">${headline}</a>`
|
||||||
: headline;
|
: headline;
|
||||||
return `<li>
|
// Andockpunkt 1: FIMI-Hinweis, falls dieser Artikel eine widerlegte
|
||||||
|
// Behauptung verbreitet. Kein Match -> keine Zeile, kein Ballast.
|
||||||
|
const fimiMatches = (this._currentFimiMatches || {})[String(a.id)];
|
||||||
|
const fimiHint = (fimiMatches && typeof UI.renderFimiHint === 'function')
|
||||||
|
? UI.renderFimiHint(fimiMatches) : '';
|
||||||
|
return `<li${fimiMatches ? ' class="has-fimi-hint"' : ''}>
|
||||||
${numHtml}
|
${numHtml}
|
||||||
<span class="source-overview-detail-date">${UI.escape(dateStr)}</span>
|
<span class="source-overview-detail-date">${UI.escape(dateStr)}</span>
|
||||||
<span class="source-overview-detail-headline">${inner}</span>
|
<span class="source-overview-detail-headline">${inner}</span>
|
||||||
|
${fimiHint}
|
||||||
</li>`;
|
</li>`;
|
||||||
}).join('');
|
}).join('');
|
||||||
detail.innerHTML = `<ul class="source-overview-detail-list">${items}</ul>`;
|
detail.innerHTML = `<ul class="source-overview-detail-list">${items}</ul>`;
|
||||||
|
|||||||
@@ -1058,8 +1058,14 @@ const UI = {
|
|||||||
const langs = (s.languages || ['de']).map(l => (l || 'de').toUpperCase()).join('/');
|
const langs = (s.languages || ['de']).map(l => (l || 'de').toUpperCase()).join('/');
|
||||||
const sourceName = this.escape(s.source || 'Unbekannt');
|
const sourceName = this.escape(s.source || 'Unbekannt');
|
||||||
const sType = s.source_type || 'web';
|
const sType = s.source_type || 'web';
|
||||||
html += `<div class="source-overview-item" data-source="${sourceName}" data-type="${sType}" tabindex="0" role="button" aria-expanded="false" onclick="App.toggleSourceOverviewDetail(this)" onkeydown="if(event.key==='Enter'||event.key===' '){event.preventDefault();App.toggleSourceOverviewDetail(this);}">
|
// Andockpunkt 2: empirischer Track-Record. Nur bei Treffern, dezent.
|
||||||
|
const fimiN = s.fimi_match_count || 0;
|
||||||
|
const fimiBadge = fimiN > 0
|
||||||
|
? `<span class="fimi-source-badge" title="${fimiN} ${fimiN === 1 ? 'Artikel dieser Quelle deckt' : 'Artikel dieser Quelle decken'} sich mit einer bei EUvsDisinfo widerlegten Falschbehauptung">${fimiN} FIMI</span>`
|
||||||
|
: '';
|
||||||
|
html += `<div class="source-overview-item${fimiN > 0 ? ' has-fimi' : ''}" data-source="${sourceName}" data-type="${sType}" tabindex="0" role="button" aria-expanded="false" onclick="App.toggleSourceOverviewDetail(this)" onkeydown="if(event.key==='Enter'||event.key===' '){event.preventDefault();App.toggleSourceOverviewDetail(this);}">
|
||||||
<span class="source-overview-name">${sourceName}</span>
|
<span class="source-overview-name">${sourceName}</span>
|
||||||
|
${fimiBadge}
|
||||||
<span class="source-overview-lang">${langs}</span>
|
<span class="source-overview-lang">${langs}</span>
|
||||||
<span class="source-overview-count">${s.article_count}</span>
|
<span class="source-overview-count">${s.article_count}</span>
|
||||||
</div>`;
|
</div>`;
|
||||||
@@ -1069,6 +1075,79 @@ const UI = {
|
|||||||
return html;
|
return html;
|
||||||
},
|
},
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Andockpunkt 1: dezenter Inline-Hinweis an einem Artikel, der sich mit
|
||||||
|
* einer bei EUvsDisinfo widerlegten Falschbehauptung deckt. Provenienz-
|
||||||
|
* Leitplanke: nennt die Quelle (EUvsDisinfo), verlinkt den Case, wertet
|
||||||
|
* nicht selbst. matches: Array aus dem fimi-matches-Endpunkt.
|
||||||
|
*/
|
||||||
|
renderFimiHint(matches) {
|
||||||
|
if (!matches || matches.length === 0) return '';
|
||||||
|
const n = matches.length;
|
||||||
|
const top = matches[0];
|
||||||
|
const claimText = this.escape(top.claim_text || '');
|
||||||
|
const passage = top.passage ? this.escape(top.passage) : '';
|
||||||
|
let tip = `Bei EUvsDisinfo als widerlegt geführte Behauptung: ${claimText}`;
|
||||||
|
if (passage) tip += ` | Im Artikel: ${passage}`;
|
||||||
|
tip += ' | Quelle der Einordnung: EUvsDisinfo (EEAS East StratCom Task Force), keine offizielle EU-Position.';
|
||||||
|
const label = n === 1
|
||||||
|
? 'Deckt sich mit einer von EUvsDisinfo widerlegten Falschbehauptung'
|
||||||
|
: `Deckt sich mit ${n} von EUvsDisinfo widerlegten Falschbehauptungen`;
|
||||||
|
const link = top.case_url
|
||||||
|
? `<a href="${this.escape(top.case_url)}" target="_blank" rel="noopener" class="fimi-hint-link" onclick="event.stopPropagation()">Beleg ansehen</a>`
|
||||||
|
: '';
|
||||||
|
return `<div class="fimi-hint" title="${tip}">
|
||||||
|
<span class="fimi-hint-icon" aria-hidden="true">⚠</span>
|
||||||
|
<span class="fimi-hint-text">${label}</span>
|
||||||
|
${link}
|
||||||
|
</div>`;
|
||||||
|
},
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Andockpunkt 3: Qualitaetsachse fuers Lagebild. Verdichtet die
|
||||||
|
* Einzeltreffer auf Lage-Ebene. Bei 0 Treffern eine ruhige Entwarnung,
|
||||||
|
* sonst eine zurueckhaltende Hinweisleiste mit aufklappbaren Narrativen.
|
||||||
|
*/
|
||||||
|
renderFimiSummaryBar(s) {
|
||||||
|
if (!s || !s.articles_checked) return '';
|
||||||
|
const matched = s.articles_with_match || 0;
|
||||||
|
const checked = s.articles_checked || 0;
|
||||||
|
const distinct = s.distinct_claims || 0;
|
||||||
|
if (matched === 0) {
|
||||||
|
return `<div class="fimi-summary-bar fimi-summary-bar--clear">
|
||||||
|
<span class="fimi-summary-icon" aria-hidden="true">✓</span>
|
||||||
|
<span>Keine bekannten Falschbehauptungen unter ${checked} geprüften Artikeln.</span>
|
||||||
|
</div>`;
|
||||||
|
}
|
||||||
|
const topClaims = (s.top_claims || []).slice(0, 6);
|
||||||
|
const claimList = topClaims.map(c => {
|
||||||
|
const txt = this.escape(c.claim_text || '');
|
||||||
|
const link = c.case_url
|
||||||
|
? `<a href="${this.escape(c.case_url)}" target="_blank" rel="noopener" class="fimi-hint-link">Beleg</a>`
|
||||||
|
: '';
|
||||||
|
return `<li><span class="fimi-claim-count">${c.article_count}×</span> <span class="fimi-claim-text">${txt}</span> ${link}</li>`;
|
||||||
|
}).join('');
|
||||||
|
return `<div class="fimi-summary-bar fimi-summary-bar--alert">
|
||||||
|
<div class="fimi-summary-head">
|
||||||
|
<span class="fimi-summary-icon" aria-hidden="true">⚠</span>
|
||||||
|
<span class="fimi-summary-lead"><strong>${matched}</strong> von ${checked} geprüften Artikeln decken sich mit <strong>${distinct}</strong> bei EUvsDisinfo widerlegten Falschbehauptungen.</span>
|
||||||
|
<button type="button" class="fimi-summary-toggle" onclick="App.toggleFimiDetail(this)">Narrative anzeigen</button>
|
||||||
|
</div>
|
||||||
|
<ul class="fimi-summary-claims" style="display:none;">${claimList}</ul>
|
||||||
|
${this.fimiDisclaimerHtml()}
|
||||||
|
</div>`;
|
||||||
|
},
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Pflicht-Quellenhinweis fuer EUvsDisinfo-Einordnungen. Dezent (kleine
|
||||||
|
* graue Fusszeile), aber praesent: Attribution an EEAS East StratCom
|
||||||
|
* Task Force + der offizielle Disclaimer, dass es keine offizielle
|
||||||
|
* EU-Position ist (Wortlaut der EUvsDisinfo-Veroeffentlichungen).
|
||||||
|
*/
|
||||||
|
fimiDisclaimerHtml() {
|
||||||
|
return `<div class="fimi-disclaimer">Einordnungen aus der <a href="https://euvsdisinfo.eu/" target="_blank" rel="noopener">EUvsDisinfo</a>-Datenbank des Europäischen Auswärtigen Dienstes (EEAS East StratCom Task Force). Sie beruhen auf Medienbeobachtung und Analyse der Task Force und stellen keine offizielle Position der EU dar.</div>`;
|
||||||
|
},
|
||||||
|
|
||||||
renderSourceOverview(articles) {
|
renderSourceOverview(articles) {
|
||||||
if (!articles || articles.length === 0) return '';
|
if (!articles || articles.length === 0) return '';
|
||||||
|
|
||||||
|
|||||||
In neuem Issue referenzieren
Einen Benutzer sperren