diff --git a/CLAUDE.md b/CLAUDE.md index f256938..d77b57b 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -213,6 +213,8 @@ CHANGE_LOG: - "STRUCTURE: aktive vs. legacy CSS/JS getrennt, videos/, downloads/, insights/, accountforger-video.html, weitere Top-Level-Files ergänzt" - "Lagen-Layout: cyberangriffe und deepfakes binden Lagebild-CSS/JS aus iran-konflikt ein (zentrale Assets)" - "SERVICES: sync-lagebild und insights ergänzt" + - "SEO Stufe 1: noindex/nofollow von 12 indexierbaren Seiten entfernt; robots.txt scharf geschaltet (Live-Search-AI-Bots erlaubt, Training-Bots geblockt); sitemap.xml deployt" + - "Lagen-Seiten: description, canonical, Open Graph, Twitter Card, Schema.org Article ergänzt; Topic-Default in #incident-title als Crawler-Fallback" Last-Updated: 2026-05-10 diff --git a/datenschutz.html b/datenschutz.html index 3393f20..6601ba6 100644 --- a/datenschutz.html +++ b/datenschutz.html @@ -6,7 +6,6 @@
Cyberattacks on German Infrastructure
Legal Status of Deepfakes in Germany
Iran Conflict
diff --git a/impressum.html b/impressum.html index 9a94deb..4759e31 100644 --- a/impressum.html +++ b/impressum.html @@ -6,7 +6,6 @@Cyberangriffe auf deutsche Infrastruktur
Rechtliche Lage von Deepfakes in Deutschland
Iran-Konflikt
diff --git a/robots.txt b/robots.txt index ca4047a..6807159 100644 --- a/robots.txt +++ b/robots.txt @@ -1,94 +1,23 @@ -# robots.txt for AegisSight UG -# Block ALL web crawlers and bots from the entire site +# robots.txt - AegisSight UG +# Crawling allgemein erlaubt, ausser API/interne Pfade +# Keine Trainingsdaten-Verwendung durch AI-Crawler (Training-Bots geblockt) +# Live-Search-AI-Bots (OAI-SearchBot, ChatGPT-User, ClaudeBot, PerplexityBot) sind erlaubt -# Block all bots User-agent: * -Disallow: / -Crawl-delay: 86400 +Allow: / +Disallow: /api/ +Disallow: /_archiv/ +Disallow: /insights/ -# Specifically block major search engines -User-agent: Googlebot -Disallow: / +# Sitemap +Sitemap: https://aegis-sight.de/sitemap.xml -User-agent: Bingbot -Disallow: / - -User-agent: Slurp -Disallow: / - -User-agent: DuckDuckBot -Disallow: / - -User-agent: Baiduspider -Disallow: / - -User-agent: YandexBot -Disallow: / - -# Block social media crawlers -User-agent: facebookexternalhit -Disallow: / - -User-agent: Twitterbot -Disallow: / - -User-agent: LinkedInBot -Disallow: / - -User-agent: WhatsApp -Disallow: / - -User-agent: TelegramBot -Disallow: / - -# Block SEO and analysis bots -User-agent: AhrefsBot -Disallow: / - -User-agent: SemrushBot -Disallow: / - -User-agent: DotBot -Disallow: / - -User-agent: MJ12bot -Disallow: / - -User-agent: SEOkicks-Robot -Disallow: / - -User-agent: SeznamBot -Disallow: / - -User-agent: MauiBot -Disallow: / - -User-agent: Majestic-12 -Disallow: / - -User-agent: Majestic-SEO -Disallow: / - -# Block archiving bots -User-agent: ia_archiver -Disallow: / - -User-agent: Wayback Machine -Disallow: / - -User-agent: SiteSnagger -Disallow: / - -User-agent: WebCopier -Disallow: / - -# Block AI/ML crawlers +# ---------------------------------------------------------------------- +# AI-Training-Crawler -- BLOCKED (kein Training auf unseren Inhalten) +# ---------------------------------------------------------------------- User-agent: GPTBot Disallow: / -User-agent: ChatGPT-User -Disallow: / - User-agent: CCBot Disallow: / @@ -98,15 +27,86 @@ Disallow: / User-agent: Claude-Web Disallow: / -# Block download managers -User-agent: wget +User-agent: Google-Extended Disallow: / -User-agent: curl +User-agent: Applebot-Extended Disallow: / +User-agent: Meta-ExternalAgent +Disallow: / + +User-agent: Bytespider +Disallow: / + +User-agent: cohere-ai +Disallow: / + +User-agent: FacebookBot +Disallow: / + +User-agent: ImagesiftBot +Disallow: / + +User-agent: Diffbot +Disallow: / + +User-agent: Omgilibot +Disallow: / + +# ---------------------------------------------------------------------- +# AI-Live-Search-Crawler -- ALLOWED (Sichtbarkeit in KI-Antworten) +# OAI-SearchBot, ChatGPT-User, ClaudeBot, PerplexityBot werden NICHT +# blockiert. Sie crawlen fuer Live-Antworten, nicht fuer Training. +# ---------------------------------------------------------------------- + +# ---------------------------------------------------------------------- +# Archiv-Bots +# ---------------------------------------------------------------------- +User-agent: ia_archiver +Disallow: / + +User-agent: archive.org_bot +Disallow: / + +# ---------------------------------------------------------------------- +# SEO-/Spam-Crawler +# ---------------------------------------------------------------------- +User-agent: AhrefsBot +Disallow: / + +User-agent: SemrushBot +Disallow: / + +User-agent: MJ12bot +Disallow: / + +User-agent: DotBot +Disallow: / + +User-agent: SEOkicks-Robot +Disallow: / + +User-agent: MauiBot +Disallow: / + +User-agent: Majestic-12 +Disallow: / + +User-agent: BLEXBot +Disallow: / + +User-agent: SerendeputyBot +Disallow: / + +# ---------------------------------------------------------------------- +# Download-Manager +# ---------------------------------------------------------------------- User-agent: HTTrack Disallow: / -# No sitemap provided -# No crawl permissions granted \ No newline at end of file +User-agent: SiteSnagger +Disallow: / + +User-agent: WebCopier +Disallow: / diff --git a/sitemap.xml b/sitemap.xml new file mode 100644 index 0000000..9183821 --- /dev/null +++ b/sitemap.xml @@ -0,0 +1,100 @@ + +