• @zweieuro
    link
    1610 months ago

    Correct me if I am wrong but isn’t “loop unrolling/unwinding” something that the c++ and rust compilers do? Why does the loop here not get unwound?

    • @Giooschi
      link
      English
      1410 months ago

      Loop unrolling is not really the speedup, autovectorization is. Loop unrolling does often help with autovectorization, but is not enough, especially with floating point numbers. In fact the accumulation operation you’re doing needs to be associative, and floating point numbers addition is not associative (i.e. (x + y) + z is not always equal to (x + (y + z)). Hence autovectorizing the code would change the semantics and the compiler is not allowed to do that.

      • @bonus_crab
        link
        710 months ago

        so if (somehow) the accumulator was an integer, this loop would autovectorize and the performance differences would be smaller ?

        • @Giooschi
          link
          English
          410 months ago

          Very likely yes