Add Knowledge Source

curl -X POST https://api.magpipe.ai/functions/v1/knowledge-source-add \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com/help/faq",
    "sync_period": "7d"
  }'

{
  "id": "f7a8b9c0-d1e2-3f4a-5b6c-789012def345",
  "url": "https://example.com/help/faq",
  "title": "Frequently Asked Questions - Example Inc",
  "description": "Find answers to common questions about our products and services.",
  "sync_period": "7d",
  "sync_status": "completed",
  "chunk_count": 24,
  "last_synced_at": "2024-01-15T10:35:00Z",
  "next_sync_at": "2024-01-22T10:35:00Z",
  "created_at": "2024-01-15T10:30:00Z"
}

POST

knowledge-source-add

curl -X POST https://api.magpipe.ai/functions/v1/knowledge-source-add \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com/help/faq",
    "sync_period": "7d"
  }'

{
  "id": "f7a8b9c0-d1e2-3f4a-5b6c-789012def345",
  "url": "https://example.com/help/faq",
  "title": "Frequently Asked Questions - Example Inc",
  "description": "Find answers to common questions about our products and services.",
  "sync_period": "7d",
  "sync_status": "completed",
  "chunk_count": 24,
  "last_synced_at": "2024-01-15T10:35:00Z",
  "next_sync_at": "2024-01-22T10:35:00Z",
  "created_at": "2024-01-15T10:30:00Z"
}

Add a webpage URL as a knowledge source. The content is scraped, processed into chunks, and embedded as vectors for RAG (Retrieval Augmented Generation). Your AI agents can then reference this knowledge when answering questions.

Request Body

url

string

required

URL to scrape and add to knowledge base. Must be publicly accessible or include auth headers.Example: https://example.com/faq

sync_period

string

default:"7d"

How often to re-sync content from the URL.Options: 24h, 7d, 1mo, 3mo

crawl_mode

string

default:"single"

How much of the website to crawl.Options:

single - Fetch one page only (immediate)
sitemap - Crawl all pages in sitemap.xml (async)
recursive - Follow links from starting URL (async)

max_pages

integer

default:"100"

Maximum pages to crawl (for sitemap/recursive modes). Range: 1-500.

crawl_depth

integer

default:"3"

How deep to follow links (recursive mode only). Range: 1-5.

respect_robots_txt

boolean

default:"true"

Whether to honor robots.txt crawl restrictions.

auth_headers

object

Authentication headers for protected pages.Example (Bearer):

{
  "Authorization": "Bearer YOUR_API_KEY"
}

Example (Basic):

{
  "Authorization": "Basic dXNlcm5hbWU6cGFzc3dvcmQ="
}

Response

string

Unique knowledge source identifier.

url

string

The source URL.

title

string

Extracted page title.

description

string

Extracted meta description.

sync_status

string

Current sync status: pending, syncing, completed, failed.

chunk_count

integer

Number of text chunks created from the content.

next_sync_at

string

When the next automatic sync will occur.

curl -X POST https://api.magpipe.ai/functions/v1/knowledge-source-add \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com/help/faq",
    "sync_period": "7d"
  }'

{
  "id": "f7a8b9c0-d1e2-3f4a-5b6c-789012def345",
  "url": "https://example.com/help/faq",
  "title": "Frequently Asked Questions - Example Inc",
  "description": "Find answers to common questions about our products and services.",
  "sync_period": "7d",
  "sync_status": "completed",
  "chunk_count": 24,
  "last_synced_at": "2024-01-15T10:35:00Z",
  "next_sync_at": "2024-01-22T10:35:00Z",
  "created_at": "2024-01-15T10:30:00Z"
}

How Knowledge Works

Scraping: Content is extracted from the URL, removing navigation and boilerplate
Chunking: Text is split into semantic chunks (~500 tokens each)
Embedding: Each chunk is converted to a vector embedding
Storage: Chunks are stored in a vector database (pgvector)
Retrieval: During calls/chats, relevant chunks are retrieved and included in agent context

Supported Content

HTML pages (blogs, FAQs, documentation)
PDF files (product manuals, guides)
Plain text files

Dynamic content loaded via JavaScript may not be captured. For SPAs, consider providing direct links to static content or using server-side rendered pages.

List Chat Sessions Add Knowledge (Manual)

​Request Body

​Response

​How Knowledge Works

​Supported Content

Request Body

Response

How Knowledge Works

Supported Content