- On the 'Sources' page, click on 'Scrape new domain'. A dialog box will appear with two options: enter a website URL directly, or enter the sitemap URL of the website.
- Enter the website URL or the sitemap URL and click submit. The platform will begin to scrape all the links of the website or sitemap.
<aside>
💡 For optimal performance, it is recommended to add server-side websites as these are currently better supported by the platform. The platform also supports GitBook, Intercom articles, and various resource centers
</aside>
- After the scraping process, all links from the website or sitemap will be displayed in a table. This table will show details like page URL, character count, the last trained on, and status.
<aside>
💡 Note: Scraping process might take some time depending on the size of the website or the number of links present in the sitemap. The larger the site or the more extensive the sitemap, the longer it will take to fully scrape the domain.
</aside>
Once scraping done, the platform will list down the links in a table. This table includes columns for the page URL, character count, the last trained on, and status.
- Page URL: The URL of the scraped page
- Character count: The total number of characters scraped from the page
- Last trained on: This will be 'Empty' for a new domain, but will show the date when the bot was last trained using this page for existing domains.
- Status: Indicates the current state of each individual source and its involvement in the bot training process.
- Actions: These actions are used to manage the training status of the pages from the domain
- Enable/Disable:
- 'Enable' will appear if the page status is 'Skipped', clicking this will change the status to 'Draft'. '
- Disable' will be shown if the page is in 'Draft' mode or 'Trained' mode. By clicking this, you can remove the page from the bot's training process.
- Refresh: This action will appear if the bot status is 'Trained'. By clicking 'Refresh', the platform will rescrape the specific page and retrain the bot with any new content from that page. This allows your bot to stay updated with the latest information from the source.
<aside>
💡 Remember, you can add as many domains as you like until you reach the page limit set by your chosen plan. With each new domain, your bot gains a wider knowledge base to pull from, enhancing its ability to assist your users
</aside>
How status works?
Each source used for training your bot displays a status indicating its current state in the training process. This guide will help you understand what each status means, enabling you to manage and monitor your sources effectively.
Here's a breakdown of what each status signifies:
- Draft: If a source's status is marked as 'Draft', it signifies that no action has been taken on this source yet. It remains unprocessed and has not been utilized in the training of your bot.
- Skipped: When a source's status is 'Skipped', it indicates that the source has been deliberately overlooked and will not be used for bot training. You might choose to skip certain sources that are irrelevant to the bot's knowledge base or that contain unnecessary information.
- Trained: If a source's status is 'Trained', it means the source has been used to train your bot. The data or information contained in these sources has been integrated into your bot's knowledge and can be used to assist your users.
- Training/Retraining: A source status of 'Training' or 'Retraining' implies that the source is currently being used for the bot's training process. The bot is actively learning from the information provided in these sources.
<aside>
💡 Note The 'Skipped' status is only applicable at the domain page level and not for each individual source.
</aside>
Knowing what each status means is crucial for understanding your bot's training process. This knowledge provides insights into what your bot has learned, what it is currently learning, and what it will potentially learn in the future.