• @affiliate
    link
    English
    24 months ago

    How is it so hard?

    I think a lot of the reason is that fields (the real numbers in this case) have some pretty lousy categorical properties, and you can’t define a very nice additive and multiplicative structure on ℝn for n >3. So you end up having to deal with vector spaces instead of fields. i.e., you can’t (in general) multiply or divide points in ℝn by other points in ℝn, so you have way fewer tricks at your disposal. The other thing is that you don’t have a way to order points in ℝn, so nice things like the mean value property sort of disappear. There are a few other complications as well, but I think those are the big ones. It’s a whole other beast than singlevariate analysis.

    How is it that the more I’m “learning” for that damn math exam the less I know?

    I feel like this is just an unfortunate part of learning math. I’m not really sure that feeling ever goes away, but it usually means you’re making progress. My experience has been that the more math I learn, the more comfortable I get with the things I already know, and the more I realize how much is left to learn. So it feels like I only really know the “basic stuff” and continue to struggle with the “hard stuff”. My advice would be to try to not be discouraged by it, although it’s easier said than done.

    Why do I need it in the first place? Multivariate analysis is super useful in applications, especially for 3d rendering/modeling. It shows up a lot in video game/physics programming, and probably a bunch of other things too. It’s also foundational for more advanced things like tensor calculus/differential geometry/special relativity.

    Why have exams at all? I’m going to be real with you, I’m completely on your side on this one. I think exams just cause a bunch of stress and that it would be better to just get rid of them. I never liked exams.

    Onto the more technical questions. I’ll try to make things handwavey to hopefully make the “big picture” shine through a bit. I think analysis textbooks are a bit guilty of getting too wrapped up in the details and missing the forest for the trees (or however the saying goes).

    What the hell is a total derivative, and why is it suddenly the same as a tangential plane?

    The total derivative is basically just a way to turn calculus problems into linear algebra problems. I think it’s best understood by first looking at the one dimensional case, and then trying to generalize it a bit to higher dimensions. The key idea is this:

    The derivative of a function f: ℝ -> ℝ at a point x0 is the best way to approximate f with a straight line at the point x0. This means that the linear equation y = f’(x0) * x + f(x0) is the most accurate approximation of f at x0.

    Notice how in the 1-dimensional case this is just a “clever” way to rephrase that f’(x0) is the “instantaneous slope” of f at x0.

    In higher dimensions, it no longer makes sense to approximate f with a straight line, because lines are 1-dimensional objects, whereas the domain/codomain of the function might not necessarily be 1-dimensional. However, it does still make sense to talk about the best linear approximation of f. A bit of linear algebra knowledge helps to make this idea clearer, but I’ll try to do my best to explain it with as little linear algebra as I can. (But let me know if you want a more linear algebra heavy explanation.)

    A higher dimensional linear function is (basically) just a matrix, and a matrix is basically just a way to (linearly) turn one vector into another vector. At a high level, you can think of a matrix as turning one copy of ℝm into another copy of ℝn, possibly rotating/translating/scaling things in the process. (Compare this to the 1-dimensional case, where a 1 x 1 matrix is just a number, and multiplying by a a number “turns one copy of ℝ to another copy of ℝ”, provided that number isn’t 0.)

    So, the total derivative is basically just a matrix that gives the best way to approximate a multivariable function f at a vector x0. And as you vary the input vectors, you end up tracing out a copy of n for some n. i.e., you get an n-dimensional plane that corresponds to the “best” approximation for f. And “best approximation” is just a slightly less fancy way of saying “tangential”.

    Why is the gradient just a collection of the first partial derivatives?

    I always found the gradient to be a bit confusing. But I think it helps to understand it best in terms of what it does, and not in terms of how it’s defined. The “purpose” of the gradient is to let you compute the directional derivative. i.e., what is the derivative in the direction of a given vector v. So, lets use the notation

    (∇f)(v) to denote the directional derivative of f, in the direction of v.

    Let’s consider the 3-dimensional case and write v = a1e1 + a2e2 + a3e3 for basis vectors ei and real numbers ai.

    Since “taking the derivative” is linear, we would expect to have

    (∇f)(v) = (∇f)(a1e1 + a2e2 + a3e3) = a1(∇f)(e1) + a2(∇f)(e2) + a3(∇f)(e3).

    In other words, we only need to compute the directional derivative of the basis vectors in order to figure out the gradient. That’s pretty nice! Also, the derivative of f in the direction of ei is exactly the partial derivative of f taken with respect to ei. Let’s write fi for the partial derivative with respect to ei (just because I don’t know how well Lemmy handles double subscripts). Then we can rewrite the above equation as

    (∇f)(v) = = a1f1 + a2f2 + a3f3.

    Now compare that with the dot product of the vectors (f1, f2, f3) and _v = (a1, a2, a3). It’s exactly the same. So, the gradient can be defined in terms of taking the dot product of a vector with the partial derivatives. But I think that kind of loses a lot of the intuitive meaning of the gradient in the process.

    I hope you found some of this helpful, and feel free to ask if you have any more questions/found something I said confusing.

    • @[email protected]
      link
      fedilink
      English
      24 months ago

      Thanks for answering my frustrated questions, was a long day yesterday. I’ll try to understand the deeper truths later, but I can already tell the matrix stuff goes over my head.

      • @affiliate
        link
        English
        14 months ago

        anytime. i’ve also had my fair share of long days studying analysis. and i feel like most of my time spent trying to learn analysis was spent fighting with the textbooks. i think the (ε,δ) stuff is to blame for that, but that’s a whole other topic.

        anyways, i was thinking a bit more about the matrix stuff and i think i have a better explanation if you’re interested, since my previous one was probably a bit too abstract. i think it should honestly be criminal to teach multivariate analysis before linear algebra, since a lot of the purpose of multivariate analysis is to turn complicated problems into linear problems. but anyways, here’s the big picture:

        you don’t really need to understand the ins and outs of matrices and be super familiar with them to get a sense of what the total derivative is, and how it should behave. for that purpose, here are some of the highlights of matrices and the total derivative:

        Let A be an m x n matrix. Then:

        • Multiplication with A defines a so-called “linear function” from ℝn to ℝm. put simply, this means that if you have a line in ℝn, and you multiply each point in that line with A, then the result is a line in ℝm. (This is because, under the hood, matrix multiplication is just a bunch of scalar multiplication and addition.)
        • There’s a slight catch to what I said above: sometimes you multiply the points in a line with a matrix and they all get sent to the 0 vector instead of to another line. (Compare this to what happens when A is a 1 x 1 matrix, i.e. a number, and multiplying every point in ℝ with A will either give you only the number 0, or it will give you all of ℝ.)
        • Now think about a plane: it’s something spanned by two lines. (The simplest case being ℝ2, which is spanned by the x and y axis.) Since matrices send lines to either lines or 0, there are three options for what can happen to a plane: it gets sent to a plane (no spanning lines get sent to 0), or a line (one of the spanning lines get sent to 0), or a 0 (both spanning lines get sent to 0). You can do some fancy math to show that the first case (where a plane get sent to a plane) is much more likely than the other two cases. So this is where the idea of a tangent plane comes from: approximate a function with a matrix, and the matrix corresponds to a plane that “stays close” to the function.
        • In any case, matrix multiplication is an extremely easy thing for computers to do, because there’s a formula for it. In contrast, evaluating arbitrary functions is not easy, and there’s no formula for that. This is really the main benefit of the total derivative: you can approximate the behavior of a function with matrix multiplication. And we know a whole lot more about dealing with matrices than we do about dealing with random functions.

        So those are two ways to look at the total derivative: you can try to get a geometric understanding of what it does (approximate the function with the best fitting plane), or try to look at why it’s useful (turning harder problems into easier problems). But just to be clear, dealing with matrices is still hard, it’s just comparably a lot easier than dealing with random functions.