How to Build a Knowledge Base Using the Web Crawler in Inquirly

Article Tree

The Web Crawler lets you create a Knowledge Base automatically from an existing public website. Instead of writing articles from scratch, you provide a URL and Inquirly crawls the site, extracts the content, and generates a structured collection of articles ready for review.

This is the fastest way to populate a Knowledge Base when you already have published documentation, a help center, or product pages on a public website.

Who This Is For

Role	How they use the crawler
Admins	Run crawls, manage generated collections, decide which content to keep
Content managers	Review and refine generated articles before publishing them
Agents	Use generated content as a reference when answering customers

You need permission to access the Knowledge Base and run the Web Crawler. The URL you provide must be a valid HTTP or HTTPS URL and must be publicly accessible — the crawler cannot access pages that require login.

Where to Find It

Go to Sidebar → Knowledge Base → Web Crawler.

Inquirly Knowledge Base navigation panel showing the Web Crawler option in the menu

How to Crawl a Website

Step 1: Start a new crawl

Open Knowledge Base from the left sidebar
Click Web Crawler
In the Crawl Website dialog, paste the URL you want to crawl — for example, your documentation site or existing help center
Click Submit

Tip: Use a URL that points to your documentation or help center pages. Avoid marketing or landing pages — they produce low-quality articles that require heavy editing.

Step 2: Wait for the crawl to complete

After submitting, you will see a progress message such as “Preparing your knowledge base…” The crawl runs in the background. Time to completion depends on how much content is available on the site.

Inquirly Knowledge Base showing a completed crawl with a timestamp-named collection and Knowledge base ready notification

Step 3: Review the generated Knowledge Base

When the crawl finishes, Inquirly automatically creates:

A new Collection named with the crawl timestamp (example: Crawl – 2026-04-21 12:39:14)
A generated Knowledge Base inside that collection
Articles created from each crawled page

A “Knowledge base ready!” notification confirms the crawl is complete. Open the new collection from the left sidebar to begin reviewing.

Reviewing Generated Content

Before publishing or connecting the crawled content to Aily, do a quick review pass to confirm:

Key pages were captured — check that your most important documentation pages are present
Titles and headings are accurate — the crawler uses page headings to generate article titles; some may need adjustment
Content is clean and readable — remove any navigation text, footer content, or repeated boilerplate that the crawler may have picked up alongside the main content
No login-required pages are missing — any page that required authentication will not have been crawled

After review, you can edit, reorganize, rename, or delete articles exactly as you would with any other Knowledge Base content.

Managing Crawl Results Over Time

Each crawl creates a new, independent collection. This means you can re-run the crawler as your website changes and always have a fresh import alongside your existing content.

Recommended maintenance workflow:

Run a new crawl when significant website content changes
Review the new collection
Set the new collection as your active documentation set
Delete the previous crawl collection if it is no longer needed

If your workspace supports selecting a primary documentation source, use it to ensure the most current collection is the one your team and Aily rely on.

Tips for Better Crawl Results

Target documentation URLs, not marketing pages. The crawler extracts whatever content it finds at the URL you provide. A documentation index page (for example, https://help.yourcompany.com) produces far more useful articles than a homepage or product marketing page.

Avoid pages that require login. The crawler can only access publicly available content. Protected pages will be skipped silently — if important content lives behind authentication, you will need to add those articles manually.

Scan a few articles before publishing. After every crawl, open three to five articles from different sections and read through them. This is faster than reviewing everything and gives you a representative picture of crawl quality before you share or publish the content.

Re-crawl instead of editing extensively. If the source website changes significantly, a fresh crawl is faster than editing dozens of outdated articles. Use the previous crawl collection as a reference while the new one is being reviewed.

Categories