SQL Has Problems. We Can Fix Them: Pipe Syntax In SQL

Thinker · 6 months ago

SQL Has Problems. We Can Fix Them: Pipe Syntax In SQL

@[email protected] · 6 months ago

The reorganization of statements is excellent but the pipe operator itself is unnecessary and annoying. It’d be far better to just rearrange the clauses and call it a day, relying on the keywords that are still present to signify clause termination…

Especially once we get into subqueries and CTES, I never want to write:

|> LEFT JOIN |> FROM foo |> GROUP BY clusterid |> SELECT clusterid, COUNT(*)
      ON cluster.id = foo.clusterid

And I’m also not splitting out a trivial subselect like that into four lines because I respect my reader.

@[email protected] · 6 months ago

I find dplyr in R to be pretty reasonable.

https://cran.r-project.org/web/packages/dplyr/vignettes/dplyr.html

I don’t know if that’s what the article is referring to by “other data languages”.

@[email protected] · 6 months ago

What about respecting the reader of the diff when there’s a change in the middle?

@[email protected] · 6 months ago

If this is something likely to change I’d space it out - but mid-line diffs are usually pretty readable in most clients.

As always, expression should cater to readability and shouldn’t be limited by syntax rules.

@[email protected] · edit-2 6 months ago

No matter which tool you’re using, this:

- |> LEFT JOIN |> FROM foo |> GROUP BY clusterid |> SELECT clusterid, COUNT(*)
+ |> LEFT JOIN |> FROM foobar |> GROUP BY clusterid |> SELECT clusterid, COUNT(*)
      ON cluster.id = foo.clusterid

Is always less readable than:

  |> LEFT JOIN 
- |> FROM foo 
+ |> FROM foobar
  |> GROUP BY clusterid 
  |> SELECT clusterid, COUNT(*)
      ON cluster.id = foo.clusterid

And this isn’t even the worst example I’ve seen. That would be a file that had a bug due to duplicated entries in a list, and it became very obvious as soon as I converted it to something akin to the second version.