• @gedhrel
    link
    439 months ago

    Casey’s video is interesting, but his example is framed as moving from 35 cycles/object to 24 cycles/object being a 1.5x speedup.

    Another way to look at this is, it’s a 12-cycle speedup per object.

    If you’re writing a shader or a physics sim this is a massive difference.

    If you’re building typical business software, it isn’t; that 10,000-line monster method does crop up, and it’s a maintenance disaster.

    I think extracting “clean code principles lead to a 50% cost increase” is a message that needs taking with a degree of context.

    • @[email protected]
      link
      fedilink
      159 months ago

      Yup. If that 12-cycle speedup is in a hot loop, then yeah, throw a bunch of comments and tests around it and perhaps keep the “clean” version around for illustrative purposes, and then do the fast thing. Perhaps throw in a feature flag to switch between the “clean” and “fast but a little sketchy” versions, and maybe someone will make a method to memoize pure functions generically so the “clean” version can be used with minimal performance overhead.

      Clean code should be the default, optimizations should come later as necessary.

      • @[email protected]
        link
        fedilink
        19 months ago

        Keeping the clean version around seems dangerous advice.

        You know it won’t get maintained if there are changes / fixes. So by the time someone may needs to rewrite the part, or application many years later (think migration to different language) it will be more confusing than helping.

    • @bonus_crab
      link
      59 months ago

      For what its worth , the cache locality of Vec<Box<Dyn trait>> is terrible in general, i feel like if youre iterating over a large array of things and applying a polymorphic function you’re making a mistake.

      Cache locality isnt a problem when youre only accessing something once though.

      So imo polymorphism has its place for non iterative-compute type work, ie web server handler functions and event driven systems.

  • @zweieuro
    link
    169 months ago

    Correct me if I am wrong but isn’t “loop unrolling/unwinding” something that the c++ and rust compilers do? Why does the loop here not get unwound?

    • @Giooschi
      link
      English
      149 months ago

      Loop unrolling is not really the speedup, autovectorization is. Loop unrolling does often help with autovectorization, but is not enough, especially with floating point numbers. In fact the accumulation operation you’re doing needs to be associative, and floating point numbers addition is not associative (i.e. (x + y) + z is not always equal to (x + (y + z)). Hence autovectorizing the code would change the semantics and the compiler is not allowed to do that.

      • @bonus_crab
        link
        79 months ago

        so if (somehow) the accumulator was an integer, this loop would autovectorize and the performance differences would be smaller ?

  • Alex
    link
    fedilink
    119 months ago

    Did author knows about difference between static and dynamic dispatch? 🤦🏻‍♂️

  • Turun
    link
    fedilink
    79 months ago

    It would be interesting to see if an iterator instead of a manual for loop would increase the performance of the base case.

    My guess is not, because the compiler should know they are equivalent, but would be interesting to check anyway.

    • Deebster
      link
      fedilink
      29 months ago

      I wonder if the compiler checks to see if the calls are pure and are therefore safe to run in parallel. It seems like the kind of thing the Rust compiler should be able to do.

    • @[email protected]
      link
      fedilink
      English
      1
      edit-2
      9 months ago

      Do you mean this for loop?

      for shape in &shapes {
        accum += shape.area();
      }
      

      That does use an iterator

      for-in-loops, or to be more precise, iterator loops, are a simple syntactic sugar over a common practice within Rust, which is to loop over anything that implements IntoIterator until the iterator returned by .into_iter() returns None (or the loop body uses break).

      Anti Commercial AI thingy

      CC BY-NC-SA 4.0

        • Turun
          link
          fedilink
          49 months ago

          Yes. That’s what I meant.

          Though I heavily expect the rust compiler to produce identical assembly for both types of iteration.

          • @[email protected]
            link
            fedilink
            29 months ago

            Anti Commercial AI thingy

            Off-topic, but does that actually work? I would assume OpenAI would just ignore it and you’d have to prove that they did so.

            • @[email protected]
              link
              fedilink
              English
              59 months ago

              Dunno if it works. AI has been tricked into revealing it’s training data, so it’s possible that it happens and they are sued for using copyrighted material.

              This is my drop in the ocean.

              Anti Commercial AI thingy

              CC BY-NC-SA 4.0

                • @[email protected]
                  link
                  fedilink
                  English
                  39 months ago

                  Welcome 🙂 A drop more.

                  Btw, if you’re using linux and X11, you can bind a keyboard shortcut to the following shell-script (probably will need to install xte).

                  #!/usr/bin/env bash
                  sleep 0.5
                  xte "str ::: spoiler Anti Commercial AI thingy"
                  xte "key Return"
                  xte "str [CC BY-NC-SA 4.0](https://creativecommons.org/licenses/by-nc-sa/4.0/)"
                  xte "key Return"
                  xte "str :::"
                  
                  Anti Commercial AI thingy

                  CC BY-NC-SA 4.0

  • @[email protected]
    link
    fedilink
    -19 months ago

    No

    struct Shapes<const N: usize>([Shape; N])
    
    impl<const N: usize> Shapes<N> {
     const fn area(&self) -> f64 { /* ... */ }
    }
    

    Bad article 🤨