• @[email protected]
    link
    fedilink
    English
    466 hours ago

    This kind of skill might help developers build AI agents that identify buttons or fields on a webpage to handle tasks like making a reservation at a restaurant.

    … to improve efficiency of click farms and to bypass captchas.

  • @[email protected]
    link
    fedilink
    English
    25
    edit-2
    6 hours ago

    This reads like an ad. They claim to use 1000 times less data than proprietary models, except nobody knows how much data they use or how big proprietary models actually are. Also there’s a giant asterisk here they fail to mention: Molmo outperforms the competition at visual benchmarks, not actual text chat.

  • Pennomi
    link
    English
    96 hours ago

    Daaaang, Apache license AND open dataset + training tools.

  • @[email protected]
    link
    fedilink
    English
    87 hours ago

    but an order of magnitude smaller

    I’m pretty sure that would be three orders of magnitude.

    • FaceDeer
      link
      fedilink
      146 hours ago

      They’re not talking about the same thing.

      Last week, researchers at the Allen Institute for Artificial Intelligence (Ai2) released a new family of open-source multimodal models competitive with state-of-the-art models like OpenAI’s GPT-4o—but an order of magnitude smaller.

      That’s in reference to the size of the model itself.

      They then compiled a more focused, higher quality dataset of around 700,000 images and 1.3 million captions to train new models with visual capabilities. That may sound like a lot, but it’s on the order of 1,000 times less data than what’s used in proprietary multimodal models.

      That’s in reference to the size of the training data that was used to train the model.

      Minimizing both of those things is useful, but for different reasons. Smaller training sets make the model cheaper to train, and a smaller model makes the model cheaper to run.

  • @chemical_cutthroat
    link
    English
    25 hours ago

    And a modern calculator has more computer power than the Apollo program… This is how tech works.