Internet |> 

Browser

For info on how to use the browser non-technically, check out our product walkthroughs or this video showing how the browser helps quickly “pluck” data from a website.

Contents:

  1. Tutorial
  2. Running

Tutorial

Let’s suppose we’re interested in programmatically getting Google search results without using their API. You’d have two options for going about this:

Going with the former means you’re eventually constrained with respect to how quickly you can add new websites while maintaining previous integrations; your limiting factor becomes how quickly you can find and implement clever tricks for getting the new data you’re looking for.

However, the latter describes how you would get the data if you were to do it by hand. Shown below is a finished example of a script that follows the heuristic in the second bullet point above since browser operations are represented in LSD SQL simply using different keywords.

google <| https://google.com |
search_input <| textarea[name="q"] |
query_to_try <| "What is LSD.so?" |
search_button <| form center:nth-child(2) input[value="Google Search"] |
search_item_container <| div#search a |
search_item_link <| div#search a@href |

FROM google
|> ENTER INTO search_input query_to_try
|> CLICK ON search_button
|> GROUP BY search_item_container
|> SELECT query_to_try, search_item_link

The lines at the very top is where we define variables that help our code’s understandability plus editability.

google <| https://google.com |
search_input <| textarea[name="q"] |
query_to_try <| "What is LSD.so?" |
search_button <| form center:nth-child(2) input[value="Google Search"] |
search_item_container <| div#search a |
search_item_link <| div#search a@href |

We’re starting from the Google home screen and, since starting FROM a page implies navigating to it in the first place, this is a page altering operation so it’d be forwarded to your browser to invoke if appropriate.

google <| https://google.com |

FROM google

The following two expressions are where we specify the data we’re interested in as being the result after page altering operations (ie browser interactions).

search_input <| textarea[name="q"] |
query_to_try <| "What is LSD.so?" |
search_button <| form center:nth-child(2) input[value="Google Search"] |

|> ENTER INTO search_input query_to_try
|> CLICK ON search_button

The last two expressions are where we specify the fields we want to extract from the resulting website after page altering operations.

query_to_try <| "What is LSD.so?" |
search_item_container <| div#search a |
search_item_link <| div#search a@href |

|> GROUP BY search_item_container
|> SELECT query_to_try, search_item_link

Running

To use SQL that uses a browser to transform web data, here are the steps to do so:

  1. Open your bicycle
  2. Press the key combination Command+L (is a shorthand for navigating to https://lsd.so/connect)
  3. You will be presented with psql credentials for your account
  4. Connect using those credentials then run your query to see your browser be controlled in the background

To see a video of this in action with the SQL above, check out this demo on Twitter where two key features are showcased: