The large language models behind AI chatbots are developing so rapidly that after eight months, a model only needs half the computing power to hit the same benchmark score - which is much faster than the rate at which computer chips improve

  • @[email protected]
    link
    fedilink
    1
    edit-2
    8 months ago

    That is… Odd. (It’s also paywalled.)

    If they are referring to exponential increases in speed, similar to Moore’s law, I would suspect there would be some improvement over time but… Comparing transistor density to the speed improvement of an ANN is bizarre, TBH.

    Training methods may be improving? That could be a thing. An ANN uses fairly basic math but it needs to be computed en masse. That is dependent on processors, so that makes for an even weirder comparison.

  • @pavnilschandaOPM
    link
    18 months ago

    Content:

    The artificial intelligence models behind popular chatbots developing faster than Moore’s law, a measure of how quickly computer hardware performance increases. That suggests the developers of AI systems, known as large language models (LLMs), are becoming smarter at doing more with less.

    “There are basically two ways your performance might improve,” says Tamay Besiroglu at the Massachusetts Institute of Technology. One is to scale up the size of an LLM, which, in turn, requires a commensurate increase in computing power. But due to the generative AI revolution, there are global supply shortages in the graphics processing unit computer chips used to power LLMs, creating a bottleneck in AI development.

    The alternative, says Besiroglu, is to improve the underlying algorithms to make better use of the same computing hardware.

    This seems to be the approach favoured by the current crop of LLM developers, to great success. Besiroglu and his colleagues analysed the performance of 231 LLMs developed between 2012 and 2023 and found that, on average, the computing power required for subsequent versions of an LLM to hit a given benchmark halved every eight months. That is far faster than Moore’s law, a computing rule of thumb coined in 1965 that suggests the number of transistors on a chip, a measure of performance, doubles every 18 to 24 months.

    While Besiroglu believes that this increase in LLM performance is partly due to more efficient software coding, the researchers were unable to pinpoint precisely how those efficiencies were gained – in part because AI algorithms are often impenetrable black boxes. He also points out that hardware improvements still play a big role in increased performance.

    Nevertheless, the disparity in the pace of development is an indication of how well LLM developers are making use of the resources available to them. “We should not discount human ingenuity here,” says Anima Anandkumar at the California Institute of Technology. While more powerful hardware or ever larger training datasets have driven AI progress for the past decade, that is starting to change. “We are seeing limits to the scale, both with data and compute,” she says. “The future will be algorithmic gains.”

    But Besiroglu says it might not be possible to endlessly optimise algorithms for performance. “It’s much less clear whether this is going to occur for a very long period of time,” he says.

    Whatever happens, there are concerns that making models more efficient could paradoxically increase the energy used by the AI sector. “Focusing on energy efficiency of AI alone tends to overlook the broader rebound effects in terms of usage,” says Sasha Luccioni at AI firm Hugging Face. “This has been observed in other domains, from transportation to energy,” she says. “It’s good to keep this in mind when considering the environmental impacts of compute and AI algorithms.”