xkcd #2986: Every Scientific Field

@[email protected] · 4 months ago

xkcd #2986: Every Scientific Field

@affiliate · 4 months ago

anytime. i’ve also had my fair share of long days studying analysis. and i feel like most of my time spent trying to learn analysis was spent fighting with the textbooks. i think the (ε,δ) stuff is to blame for that, but that’s a whole other topic.

anyways, i was thinking a bit more about the matrix stuff and i think i have a better explanation if you’re interested, since my previous one was probably a bit too abstract. i think it should honestly be criminal to teach multivariate analysis before linear algebra, since a lot of the purpose of multivariate analysis is to turn complicated problems into linear problems. but anyways, here’s the big picture:

you don’t really need to understand the ins and outs of matrices and be super familiar with them to get a sense of what the total derivative is, and how it should behave. for that purpose, here are some of the highlights of matrices and the total derivative:

Let A be an m x n matrix. Then:

Multiplication with A defines a so-called “linear function” from ℝⁿ to ℝ^m. put simply, this means that if you have a line in ℝⁿ, and you multiply each point in that line with A, then the result is a line in ℝ^m. (This is because, under the hood, matrix multiplication is just a bunch of scalar multiplication and addition.)
There’s a slight catch to what I said above: sometimes you multiply the points in a line with a matrix and they all get sent to the 0 vector instead of to another line. (Compare this to what happens when A is a 1 x 1 matrix, i.e. a number, and multiplying every point in ℝ with A will either give you only the number 0, or it will give you all of ℝ.)
Now think about a plane: it’s something spanned by two lines. (The simplest case being ℝ², which is spanned by the x and y axis.) Since matrices send lines to either lines or 0, there are three options for what can happen to a plane: it gets sent to a plane (no spanning lines get sent to 0), or a line (one of the spanning lines get sent to 0), or a 0 (both spanning lines get sent to 0). You can do some fancy math to show that the first case (where a plane get sent to a plane) is much more likely than the other two cases. So this is where the idea of a tangent plane comes from: approximate a function with a matrix, and the matrix corresponds to a plane that “stays close” to the function.
In any case, matrix multiplication is an extremely easy thing for computers to do, because there’s a formula for it. In contrast, evaluating arbitrary functions is not easy, and there’s no formula for that. This is really the main benefit of the total derivative: you can approximate the behavior of a function with matrix multiplication. And we know a whole lot more about dealing with matrices than we do about dealing with random functions.

So those are two ways to look at the total derivative: you can try to get a geometric understanding of what it does (approximate the function with the best fitting plane), or try to look at why it’s useful (turning harder problems into easier problems). But just to be clear, dealing with matrices is still hard, it’s just comparably a lot easier than dealing with random functions.