When you want to not just crawl a sitemap for URLs but crawl a sitemap for the content contained in pages in markdown format then the SCAN keyword can be helpful.
SCAN url
To make programmatic access to our docs easier, we set up a rule specifically for scanning lsd.so/docs to return the markdown content from our documentation
SCAN https://lsd.so/docs
For when you’re interested in obtaining a “lite” version of the documentation (ie context window limitations), you can also specify the docs path you’re interested in retrieving.
SCAN https://lsd.so/docs/database/language