When plucking content from a page, you may encounter repeating blocks or containers within the HTML that you’d want to grab content from as rows; this is where the GROUP BY syntax comes into play. The below query grabs the first post from Hacker News:
hn <| https://news.ycombinator.com |
post <| span.titleline a |
post_link <| span.titleline a@href |
FROM hn
|> SELECT post, post_link
However what’s Hacker News without the entire front page? The container element to each post is the span tag with a titleline class applied to it so we can refactor the above statement:
hn <| https://news.ycombinator.com |
container <| span.titleline |
post <| a |
post_link <| a@href |
FROM hn
|> GROUP BY container
|> SELECT post, post_link
However, if it enhances the readability of the statement you’re writing, the GROUP BY expression can be written before or after the SELECT one like so:
hn <| https://news.ycombinator.com |
post <| a |
post_link <| a@href |
container <| span.titleline |
FROM hn
|> SELECT post, post_link
|> GROUP BY container