Really intriguing article about a SQL syntax extension that has apparently already been trialed at Google.
As someone who works with SQL for hours every week, this makes me hopeful for potential improvements, although the likelihood of any changes to SQL arriving in my sector before I retire seems slim.
The reorganization of statements is excellent but the pipe operator itself is unnecessary and annoying. It’d be far better to just rearrange the clauses and call it a day, relying on the keywords that are still present to signify clause termination…
Especially once we get into subqueries and CTES, I never want to write:
|> LEFT JOIN |> FROM foo |> GROUP BY clusterid |> SELECT clusterid, COUNT(*) ON cluster.id = foo.clusterid
And I’m also not splitting out a trivial subselect like that into four lines because I respect my reader.
I find dplyr in R to be pretty reasonable.
https://cran.r-project.org/web/packages/dplyr/vignettes/dplyr.html
I don’t know if that’s what the article is referring to by “other data languages”.
What about respecting the reader of the diff when there’s a change in the middle?
If this is something likely to change I’d space it out - but mid-line diffs are usually pretty readable in most clients.
As always, expression should cater to readability and shouldn’t be limited by syntax rules.
No matter which tool you’re using, this:
- |> LEFT JOIN |> FROM foo |> GROUP BY clusterid |> SELECT clusterid, COUNT(*) + |> LEFT JOIN |> FROM foobar |> GROUP BY clusterid |> SELECT clusterid, COUNT(*) ON cluster.id = foo.clusterid
Is always less readable than:
|> LEFT JOIN - |> FROM foo + |> FROM foobar |> GROUP BY clusterid |> SELECT clusterid, COUNT(*) ON cluster.id = foo.clusterid
And this isn’t even the worst example I’ve seen. That would be a file that had a bug due to duplicated entries in a list, and it became very obvious as soon as I converted it to something akin to the second version.