Skip to main content
Use public websites as a knowledge source to help Ravenna AI answer questions based on documentation, help centers, or other web content.

Setup

1

Select Website as your source

When adding documents to a knowledge folder, choose Website from the available sources.
Website
2

Enter the URL

Provide the URL of the webpage you want to import.
3

Import content

Ravenna will scrape and import the content from the specified URL.

How it works

Ravenna uses browser rendering to scrape website content, ensuring accurate extraction of:
  • Main page content and text
  • Formatted content (headings, lists, paragraphs)
  • Publicly accessible information
  • JavaScript-rendered content
By default, Ravenna imports content from the single URL you provide. This is ideal for:
  • Specific documentation pages
  • Help articles
  • FAQ pages
  • Individual blog posts
Enable crawling to import entire documentation sites automatically:
  • Automatic discovery: Ravenna follows links within the same domain to discover and import connected pages
  • Breadth-first crawling: Pages are crawled level by level, ensuring comprehensive coverage of your site structure
  • Page limit: Ravenna crawls a maximum of 100 pages to prevent overloading your website
  • Folder structure: Crawled sites are organized under a root folder named after the domain
  • Same-domain only: Crawling stays within the original domain to avoid importing external content
Crawling respects the same domain as the starting URL. External links are automatically filtered out.

Requirements

  • The website must be publicly accessible (no authentication required)
  • Content must be available without login or paywalls

Managing imported content

After import, you can:
  • Archive pages to exclude them from AI responses
  • Move pages from your knowledge folder

Auto sync

When auto-sync is enabled, Ravenna keeps your website content up-to-date:
  • Page content is re-scraped during sync to capture updates
  • Changes to the webpage are reflected in your knowledge base
  • If the page becomes unavailable, the sync will fail and you’ll be notified
Auto-sync for websites re-scrapes the same URL to check for updates. It does not discover new pages or follow links.
I