I don’t think anyone is saying CUDA as in the platform, but as in the API for higher level languages like C and C++.
PTX is a close-to-metal ISA that exposes the GPU as a data-parallel computing device and, therefore, allows fine-grained optimizations, such as register allocation and thread/warp-level adjustments, something that CUDA C/C++ and other languages cannot enable.
I don’t think anyone is saying CUDA as in the platform, but as in the API for higher level languages like C and C++.
Some commenters on this post are clearly not aware of PTX being a part of the CUDA environment. If you know this, you aren’t who I’m trying to inform.
aah I see them now