$ ls contents
We build tools that bring the internet to life
LSD is a fork of PostgreSQL designed to work as intended by the original creator of SQL (click here to read the paper), in particular one line from the abstract is worth highlighting
Activities of users at terminals and most application programs should
remain unaffected when the internal representation is changed and even
when some aspects of the external representation are changed.
As such, extracting information from a web page should be the same irrespective if the underlying CSS changes hence the flexibility of the language and browser
Note, when using with a driver like psycopg2, you will need to set the autocommit mode to True
import psycopg2
conn = psycopg2.connect("dbname='lsd' host='lsd.so'")
conn.set_session(autocommit=True)
with self.conn.cursor() as curs:
curs.execute("<query>")
many_rows = curs.fetchall()
LSD is a fork of PostgreSQL where tables don't need to be defined beforehand due to taking advantage of how SELECT statements imply the desired structure of the output, take for example the following expression
$ psql -h lsd.so -U you
you=> SELECT
a
FROM
https://news.ycombinator.com
GROUP BY
span.titleline;
Here the URL is treated as a table identifier and the "a" column being selected refers to the CSS selector for an anchor tag (basically a link on the page) and you don't have to specify "a" to be available prior to running the query. The group by clause states that, rather than just grab the first matching element to the provided CSS selector, to group each "span.titleline" element into a row and then query for the anchor tag inside those containers
If a matching element is not found within the designated group, it'll see if the group itself is the thing intended to matched on, consider the following query to get all email links from a page
$ psql -h lsd.so -U you
you=> SELECT
a@href AS email
FROM
<url>
WHERE
a@href LIKE 'mailto:%'
GROUP BY
a;
Since the "@" symbol isn't used for CSS selectors, it's used in LSD as a mnemonic for "attribute" hence selecting the matched anchor tags' "href" attribute and aliasing into a column named "email".
Due to the selected CSS selector being the anchor tag as well as the group by, this query grabs all the anchor tags' href attributes in a page where the href starts with "mailto:" hence grabbing all email links contained within a page. In the case of the following query:
$ psql -h lsd.so -U you
you=> SELECT
a:nth-child(7)
AS
jobs
FROM
https://news.ycombinator.com
GROUP BY
span.titleline;
There doesn't exist a 7th anchor tag child inside one of the "titleline" containers so this query returns just one row for the "jobs" link at the top of the page
These are all providing live data as you see them on their respective websites
SELECT
h3.title.h5 AS headline
, span.date_time__KhlCV.time AS how_long_ago
FROM
https://www.goal.com/en-us/category/transfers/1/k94w8e1yy9ch14mllpf4srnks
GROUP BY
div.content-wrapper;
SELECT
td.company-listing__cell-wide.company-listing__text.u-md-hide AS company_tagline
, th.company-listing__cell-wide.company-listing__head AS company_name
, td.u-lg-hide AS company_stage
, li AS investor
FROM
https://www.sequoiacap.com/our-companies/?_categories=fintech&_sort=stage_current-asc#all-panel
GROUP BY
tr.aos-init.aos-animate;
SELECT
th.company-listing__cell-wide.company-listing__head AS company_name
, td.company-listing__cell-wide.company-listing__text.u-md-hide AS company_function
, td.u-lg-hide AS company_status
, li AS investor
FROM
https://www.sequoiacap.com/our-companies/?_stage_current=acquired&_sort=stage_current-asc#all-panel
GROUP BY
tr.aos-init;
SELECT
p.paragraph.break-spaces.hide-tablet AS what_does_it_do
, p.text-sm.hide-tablet AS location
, p.text-sm AS name
FROM
https://www.hummingbird.vc/portfolio
GROUP BY
div.grid-item-row;
SELECT
a.company-t AS portco
FROM
https://www.propel.vc/investments
GROUP BY
div.company_link_cover;
SELECT
a AS portco
FROM
https://www.abstractvc.com/companies
WHERE a != '' AND a != 'About'
GROUP BY
div.MuiGrid-item;
SELECT
img.featured-logo AS portco
FROM
https://abstractvc.com/companies
GROUP BY
div.featured-inner;
SELECT
a AS post_title,
a@href AS post_link
FROM
https://news.ycombinator.com
GROUP BY
span.titleline;
SELECT
a@href AS scraping_tool
FROM
https://www.octoparse.com/blog/top-30-free-web-scraping-software
GROUP BY
strong;
If you are looking to access the SQL database from client-side application then you may be interested in the general /api endpoint
Simply provide the SQL statement that you'd like to run on your database in the query parameter (the below example is for JavasScript)
fetch(
`https://lsd.so/api?query=${
encodeURIComponent(
'SELECT a FROM https://news.ycombinator.com GROUP BY span.titleline;'
)
}`
);
Knawledge is how, with natural language, you can obtain information about the stuff that's on a page
To make a request for knawledge, simply hit the knawledge API with the natural language query in the query parameter (the below example is for JavaScript)
fetch(
`https://lsd.so/knawledge?query=${
encodeURIComponent(
'give me every post and link on hacker news'
)
}`
);
The way this works is by attempting to grab the following from the given input string
<noun (field)> <noun (field)> ... <preposition> <group of nouns (source)>
The aliases bank can be seen as the developing bibliography for the web. If you've ever spent time digging through the Devtools in order to find the exact CSS selector then this is what you'd be interested in.
Provide either the url or title of the page you're interested in via query parameter to get a list of aliases that have been labeled for that page
fetch(
`https://lsd.so/aliases?url=${
encodeURIComponent('https://news.ycombinator.com')
}`
);
The bicycle is an early version of what will be the modern equivalent of a memex
In its present state, the Bicycle is a single-page web browser that, when "activated" via Control-K, provides a highlighter so that you can hover and click on the elements in front of you that you are interested in.
Normatively, if there exists information in a page in front of you, you should be able to grab it in the structured format of your liking
A natural language iMessage interface to interact with LSD
Announcement: We made our self-hosted iMessage Python client open source!
Users can send links with notes to Lucy to add to their “Me” tab and to be shown on Bicycle mobile. Users can also make natural language requests to LSD by using the command “Lucy” before the request. To start interacting with Lucy, go to the login page
Bicycle for mobile is a companion app to the Bicycle desktop browser allowing users to see links they have shared and that others have shared and what people say about each link through their comments.
Bicycle for mobile has two tabs. The first tab, “Explore”, shows a chronological view of the latest links that everyone has shared through Lucy. When a link is clicked, an expanded view shows the link and all comments left on it. The second tab, “Your Links”, shows the same view but specifically for links that you have shared
If you need support with any of LSD's products, contact us and we'll do our best to resolve your issues.