SEO Stufe 1: Indexierung freischalten + Lagen-Meta
- noindex/nofollow von 12 indexierbaren Seiten entfernt (Hauptseiten DE/EN, 3 Lagen DE/EN, Legal DE/EN) - robots.txt scharf geschaltet: Crawling allgemein erlaubt, Live-Search-AI-Bots (OAI-SearchBot, ChatGPT-User, ClaudeBot, PerplexityBot) erlaubt, Training-Bots (GPTBot, CCBot, anthropic-ai, Google-Extended, Applebot-Extended, Bytespider, ...) geblockt - sitemap.xml: Inhalt aus sitemap-launch.xml uebernommen, mit Sitemap-Verweis in robots.txt - Lagen-Seiten (3 DE + 3 EN): description, canonical, Open Graph, Twitter Card und Schema.org Article ergaenzt - Lagen-Hero: Topic-Default in <p id="incident-title"> als Crawler-Fallback (JS ueberschreibt mit Datum bei Lade) - CLAUDE.md CHANGE_LOG ergaenzt
Dieser Commit ist enthalten in:
178
robots.txt
178
robots.txt
@@ -1,94 +1,23 @@
|
||||
# robots.txt for AegisSight UG
|
||||
# Block ALL web crawlers and bots from the entire site
|
||||
# robots.txt - AegisSight UG
|
||||
# Crawling allgemein erlaubt, ausser API/interne Pfade
|
||||
# Keine Trainingsdaten-Verwendung durch AI-Crawler (Training-Bots geblockt)
|
||||
# Live-Search-AI-Bots (OAI-SearchBot, ChatGPT-User, ClaudeBot, PerplexityBot) sind erlaubt
|
||||
|
||||
# Block all bots
|
||||
User-agent: *
|
||||
Disallow: /
|
||||
Crawl-delay: 86400
|
||||
Allow: /
|
||||
Disallow: /api/
|
||||
Disallow: /_archiv/
|
||||
Disallow: /insights/
|
||||
|
||||
# Specifically block major search engines
|
||||
User-agent: Googlebot
|
||||
Disallow: /
|
||||
# Sitemap
|
||||
Sitemap: https://aegis-sight.de/sitemap.xml
|
||||
|
||||
User-agent: Bingbot
|
||||
Disallow: /
|
||||
|
||||
User-agent: Slurp
|
||||
Disallow: /
|
||||
|
||||
User-agent: DuckDuckBot
|
||||
Disallow: /
|
||||
|
||||
User-agent: Baiduspider
|
||||
Disallow: /
|
||||
|
||||
User-agent: YandexBot
|
||||
Disallow: /
|
||||
|
||||
# Block social media crawlers
|
||||
User-agent: facebookexternalhit
|
||||
Disallow: /
|
||||
|
||||
User-agent: Twitterbot
|
||||
Disallow: /
|
||||
|
||||
User-agent: LinkedInBot
|
||||
Disallow: /
|
||||
|
||||
User-agent: WhatsApp
|
||||
Disallow: /
|
||||
|
||||
User-agent: TelegramBot
|
||||
Disallow: /
|
||||
|
||||
# Block SEO and analysis bots
|
||||
User-agent: AhrefsBot
|
||||
Disallow: /
|
||||
|
||||
User-agent: SemrushBot
|
||||
Disallow: /
|
||||
|
||||
User-agent: DotBot
|
||||
Disallow: /
|
||||
|
||||
User-agent: MJ12bot
|
||||
Disallow: /
|
||||
|
||||
User-agent: SEOkicks-Robot
|
||||
Disallow: /
|
||||
|
||||
User-agent: SeznamBot
|
||||
Disallow: /
|
||||
|
||||
User-agent: MauiBot
|
||||
Disallow: /
|
||||
|
||||
User-agent: Majestic-12
|
||||
Disallow: /
|
||||
|
||||
User-agent: Majestic-SEO
|
||||
Disallow: /
|
||||
|
||||
# Block archiving bots
|
||||
User-agent: ia_archiver
|
||||
Disallow: /
|
||||
|
||||
User-agent: Wayback Machine
|
||||
Disallow: /
|
||||
|
||||
User-agent: SiteSnagger
|
||||
Disallow: /
|
||||
|
||||
User-agent: WebCopier
|
||||
Disallow: /
|
||||
|
||||
# Block AI/ML crawlers
|
||||
# ----------------------------------------------------------------------
|
||||
# AI-Training-Crawler -- BLOCKED (kein Training auf unseren Inhalten)
|
||||
# ----------------------------------------------------------------------
|
||||
User-agent: GPTBot
|
||||
Disallow: /
|
||||
|
||||
User-agent: ChatGPT-User
|
||||
Disallow: /
|
||||
|
||||
User-agent: CCBot
|
||||
Disallow: /
|
||||
|
||||
@@ -98,15 +27,86 @@ Disallow: /
|
||||
User-agent: Claude-Web
|
||||
Disallow: /
|
||||
|
||||
# Block download managers
|
||||
User-agent: wget
|
||||
User-agent: Google-Extended
|
||||
Disallow: /
|
||||
|
||||
User-agent: curl
|
||||
User-agent: Applebot-Extended
|
||||
Disallow: /
|
||||
|
||||
User-agent: Meta-ExternalAgent
|
||||
Disallow: /
|
||||
|
||||
User-agent: Bytespider
|
||||
Disallow: /
|
||||
|
||||
User-agent: cohere-ai
|
||||
Disallow: /
|
||||
|
||||
User-agent: FacebookBot
|
||||
Disallow: /
|
||||
|
||||
User-agent: ImagesiftBot
|
||||
Disallow: /
|
||||
|
||||
User-agent: Diffbot
|
||||
Disallow: /
|
||||
|
||||
User-agent: Omgilibot
|
||||
Disallow: /
|
||||
|
||||
# ----------------------------------------------------------------------
|
||||
# AI-Live-Search-Crawler -- ALLOWED (Sichtbarkeit in KI-Antworten)
|
||||
# OAI-SearchBot, ChatGPT-User, ClaudeBot, PerplexityBot werden NICHT
|
||||
# blockiert. Sie crawlen fuer Live-Antworten, nicht fuer Training.
|
||||
# ----------------------------------------------------------------------
|
||||
|
||||
# ----------------------------------------------------------------------
|
||||
# Archiv-Bots
|
||||
# ----------------------------------------------------------------------
|
||||
User-agent: ia_archiver
|
||||
Disallow: /
|
||||
|
||||
User-agent: archive.org_bot
|
||||
Disallow: /
|
||||
|
||||
# ----------------------------------------------------------------------
|
||||
# SEO-/Spam-Crawler
|
||||
# ----------------------------------------------------------------------
|
||||
User-agent: AhrefsBot
|
||||
Disallow: /
|
||||
|
||||
User-agent: SemrushBot
|
||||
Disallow: /
|
||||
|
||||
User-agent: MJ12bot
|
||||
Disallow: /
|
||||
|
||||
User-agent: DotBot
|
||||
Disallow: /
|
||||
|
||||
User-agent: SEOkicks-Robot
|
||||
Disallow: /
|
||||
|
||||
User-agent: MauiBot
|
||||
Disallow: /
|
||||
|
||||
User-agent: Majestic-12
|
||||
Disallow: /
|
||||
|
||||
User-agent: BLEXBot
|
||||
Disallow: /
|
||||
|
||||
User-agent: SerendeputyBot
|
||||
Disallow: /
|
||||
|
||||
# ----------------------------------------------------------------------
|
||||
# Download-Manager
|
||||
# ----------------------------------------------------------------------
|
||||
User-agent: HTTrack
|
||||
Disallow: /
|
||||
|
||||
# No sitemap provided
|
||||
# No crawl permissions granted
|
||||
User-agent: SiteSnagger
|
||||
Disallow: /
|
||||
|
||||
User-agent: WebCopier
|
||||
Disallow: /
|
||||
|
||||
In neuem Issue referenzieren
Einen Benutzer sperren