Database |> 

Language

LSD is a web-native programming language that lets you query for and interact with web data over a postgres connection. Our database provides a SQL without ontology, taking advantage of how the grammar of a SELECT statement already tells you the desired structure of output without needing to CREATE TABLE beforehand.

Contents

Coming From a Postgres Background?

When you look at a SQL statement, you can figure out what the structure of the output rows would look like based on the columns requested. In LSD, if you swap URLs for tables and CSS selectors for columns, then you’ve got a language that allows you to work with web data in either a Web-Transform-Load (WTL) or Web-Load-Transform (WLT) pattern.

Features

Query For and Interact With Web Data

Data is cached around pages based on distinct states. The page state for just going to google would be different from the page state for going to google, entering some text into the search bar, and clicking on the submit button. The principle behind this is there are often flows of interactions on the web where the page following a particular sequence is the one containing data you’re interested in.

Programming Language

When defining repeatable flows of interactions on the web, there may be times where you want to follow conditional rules or even recursively iterate until a stopping condition. To encapsulate these scenarios, LSD is a “real” programming language that enables you to as succintly write the code to get web data as you would describe the instructions in the first place.

Keywords and Concepts

Concepts:

For querying:

For data manipulation:

  • MAP - Return sitemap and interactive elements
  • ASSIGN - Define variables
  • RUN - Execute commands
  • SCAN - (coming soon)
  • ZIP - Combine multiple queries

For navigation:

  • CRAWL - Return sitemap
  • DIVE - Navigate nested content
  • ENTER - Enter text entry field

For browser automation:

  • CLICK - Interact with elements
  • TYPE - (coming soon)
  • HOVER - Mouse over elements

For data extraction:

  • HTML - Get raw HTML
  • TEXT - Get readable text from a page
  • MARKDOWN - Parse markdown
  • PDF - Extract from PDFs
  • URL - Obtain resolved URLs

Related: