• @iAvicenna
    link
    723 days ago

    I suppose both plantnet and deep fakes have conv networks as part of their architectures though

    • @[email protected]
      link
      fedilink
      223 days ago

      Likely transformers now (I think SD3 uses a ViT for text encoding, and ViTs are currently one of the best model architectures for image classification).