Database |> Language |> 

List Comprehension

Contents

Definition

Popularized by Google’s foobar (at least for Python), list comprehension is when a language features some form of syntactical sugar to make dealing with lists of values simpler. In general purpose programming languages, this can be useful for code golfing down the number of lines but, in LSD, this is useful for handling lists of values that happen to exist within DOM elements.

Usage

Normally, when ASSIGNing to a variable, the selector value will retrieve the first matched element according to the selector. When a |> is introduced to the definition of a variable, it’s assumed that the selector before the |> is one that repeats within the context its being selected in (whether that’s the entire HTML or a GROUP of repeating containers) and the value after the |> is what you’re interested in plucking from the list of matches.

variable <| repeating |> value_of_interest |

Here the value_of_interest will yield the first matching selector (or value) corresponding to each element being matched against the repeating selector.

Attributes

When retrieving attributes from the repeating container, you do not need to specify the element you’re selecting the attribute from. For instance, the below assignment:

links <| meta |> meta@name |

Is equivalent to the below since, either way, you’re retrieving the name attribute from the matched meta tags.

links <| meta |> @name |

Properties

Similarly to attributes, you can retrieve properties via list comprehension as well.

a1 <| { "a" } |
a2 <| { "b" } |
a3 <| { "c" } |
arr <| { a1, a2, a3 } |> {0} |

In the example above, arr would evaluate to { “a”, “b”, “c” } since it’s plucking the value at the zero-th index {0} of each nested list a1, a2, and a3. For a richer example of this in practice, see our example involving array modification.

Examples

Repeating elements

For experimentation, we have some dummy HTML data that statically returns when you request the URL https://example.lsd.so. Contained in it is the following HTML (at the time of writing this doc):

<!DOCTYPE html>

<html>
  <head>
    <title>This is the title of the page</title>
  </head>

  <body>
    <div class="repeating-container">
      <a class="repeating-link" href="a">A</a>
      <a class="repeating-link" href="b">B</a>
      <a class="repeating-link" href="c">C</a>
    </div>
    <div class="repeating-container">
      <a class="repeating-link" href="d">D</a>
      <a class="repeating-link" href="e">E</a>
      <a class="repeating-link" href="f">F</a>
    </div>
    <div class="repeating-container">
      <a class="repeating-link" href="g">G</a>
      <a class="repeating-link" href="h">H</a>
      <a class="repeating-link" href="i">I</a>
    </div>
  </body>
</html>

There are repeating div elements with the class repeating-container and, in each of them, there are repeating a elements with the class repeating-link. The LSD to retrieve each container as well as lists of links for each of them would look like the one shown below:

url <| https://example.lsd.so |
links <| a.repeating-link |> @href |
labels <| a.repeating-link |> a |

repeating_container <| div.repeating-container |

FROM url
|> GROUP BY repeating_container
|> SELECT links, labels

Array modification

There may be times where you would be interested in handling a collection of complex web data objects before getting something specific from within each item.

To show an example of this, we have a dummy page at https://lsd.so/dummy_interactions with labels and checkboxes. The goal here? To check the boxes coinciding with even numbers. The only problem is that the checkboxes we’re interested in checking don’t themselves contain the number we’re interested in evaluating.

The solution? We have an LSD trip that queries for a filtered list of data from the page itself to define instructions on proceeding, here’s what a final program for that could look like with comments:

-- The URL to a page with interactions
page_with_interactions <| https://lsd.so/dummy_interactions |

-- CSS selector for the repeating container that wraps each pair of checkbox and label
repeating_input_container <| form > div |

-- CSS selector for the checkbox; this is so we can identify certain ones to [CLICK THROUGH] later on
checkbox_of_interest <| input[type="checkbox"] |

-- CSS selector to get the label corresponding to a checkbox
label_of_interest <| label |

-- An arg-less function to obtain complex objects containing checkboxes and labels together if and only if the label corresponds to an even number
get_checkboxes_and_labels <|>
GROUP BY repeating_input_container
|> SELECT checkbox_of_interest, label_of_interest
|> WHERE label_of_interest % 2 = 0 |

-- A function that takes a list of elements and clicks on each one recursively
click_through <|> elements <|
WHEN elements{length} > 0
THEN
     |> CLICK ON elements{0}
     |> click_through elements{1:} |

FROM page_with_interactions -- Starting at the URL we're interested in
|> checkboxes_of_interest <| get_checkboxes_and_labels |> {0} | -- Assign to the variable [checkboxes_of_interest] the value of performing list comprehension to the output of [get_checkboxes_and_labels] via retrieving the first element of each nested list
|> click_through checkboxes_of_interest -- Clicking through each selector in the list
|> CLICK ON input[type="submit"] -- Clicking on the submit button at the end to see the result
|> SELECT #result -- Retrieving the result at the end

Outline

Here’s an outline of the tutorial walking through each section one by one going through the code block shown above.

Assignments

Starting with assignments to help the program be more readable, the first variable assignment is to the page we’re interested in

page_with_interactions <| https://lsd.so/dummy_interactions |

The assignment for the repeating input container takes advantage of how the structure of the page HTML looks like

<form>
  <div>
    <input type="checkbox" ... />
    <label>...</label>
  </div>
  <div>
    <input type="checkbox" ... />
    <label>...</label>
  </div>
  <!-- More repeating div's here -->
</form>

So, to refer to the repeating container that encapsulates each group of checkbox and label, we simply identify repeating_input_container as being the direct div child of the form element on the page.

repeating_input_container <| form > div |

Next, because we’ll be interested clicking the checkboxes later on after we’ve filtered for only the ones we’re interested in, let’s define the CSS selector for the checkbox_of_interest.

checkbox_of_interest <| input[type="checkbox"] |

In order to determine the specific checkboxes we’re interested in, we need the label in each repeating container as well.

label_of_interest <| label |

Helper functions

With those established, let’s define a function we’ll be using later on that retrieves checkboxes and labels where the label corresponds to an even number. The first line of the definition has the name of the function get_checkboxes_and_labels followed by the function “argument signature” <|>. Here, because the function itself takes zero arguments, we can exclude the <| that would usually follow the <|> in function definitions.

get_checkboxes_and_labels <|>

In this function, the first thing we’re going to do is establish the repeating container we’re interested in defining “rows” around as the repeating_input_container we defined earlier.

GROUP BY repeating_input_container

In each of these containers, we’d like to grab both the checkbox_of_interest and the label_of_interest.

|> SELECT checkbox_of_interest, label_of_interest

However, since the goal is to only check the checkboxes corresponding to even numbers, we’ll have to filter for the collections of checkboxes and labels WHERE the label_of_interest is an even number, or in other words, would return a remainder of zero after dividing by two. Since this is also the last part of the definition of the function, we close the overall function body with the vertical bar | at the end.

|> WHERE label_of_interest % 2 = 0 |

In addition to this function, we’ll define another utility function to click through each of the checkboxes after filtering for the ones we’re actually interested in. Starting with the function name and the argument.

click_through <|> elements <|

When there are no elements to actually click through we don’t want to do anything so we start with the conditional to only proceed when the length of the list of elements is greater than zero

WHEN elements{length} > 0
THEN

Taking advantage of how there’s nothing else to do in the function body, we can leverage pipe operators for the remainder of the function body containing instructions for WHEN the above is true.

Following the theme set by the name of the function we’re going to click on the first element in the provided array.

     |> CLICK ON elements{0}

And finish by recursively clicking through the remainder of the array.

     |> click_through elements{1:} |

This will bring us back to the initial conditional and repeat until there no more elements to click through.

Program logic

With these functions defined, we’re ready to get into the actual substance of the program. Starting with declaring that we’d like to start FROM the URL we’re interested in which is defined as the one we set at the beginning.

FROM page_with_interactions

Next, we need to get the checkboxes that belong to even numbers before we can consider clicking on anything. So we’re going to assign to the variable checkboxes_of_interest the result of what’s contained in the <| ... | block.

|> checkboxes_of_interest <| ... |

The |> at the far left of the line should be seen as the pipe operator flowing from the previous line or instruction to declare the current line and the <| after checkboxes_of_interest point a value to the right of it as being what the variable is being assigned to be.

Here it’s the output of the get_checkboxes_and_labels function we defined earlier but then the output is twisted through list comprehension to get the zeroth item from each nested list which, in this case, happens to be the checkbox we got for checkbox_of_interest. All together that looks like

|> checkboxes_of_interest <| get_checkboxes_and_labels |> {0} |

Since we already filtered for the checkboxes WHERE our desired condition was met, we can then simply click_through them all.

|> click_through checkboxes_of_interest

To submit our answer and see whether or not we succeded, we then CLICK ON the submit button at the end.

|> CLICK ON input[type="submit"]

Lastly, we SELECT the result of the LSD to see whether or not it was a success.

|> SELECT #result

Related: