• andrew_bidlaw
    link
    fedilink
    English
    1213 hours ago

    As it learns from our data, no wonder it fucks up at regexps. They are the arcane knowledge not accessible to us mere mortals, nor to LLMs.

    • @[email protected]
      link
      fedilink
      612 hours ago

      If you know even a little about how an LLM works it’s obvious why regex is basically impossible for it. I suspect perl has similar problems, but no one is capable of actually validating that.

      • Ignotum
        link
        14 hours ago

        What do you mean it’s impossible for it? I know how LLMs work but I don’t know if any such limitations

        Write me a regex that matches a letter repeated four times, followed by a 3 or 4 digit number

        Here’s your regex: ([a-zA-Z])\1{3}\d{3,4}

        • @[email protected]
          link
          fedilink
          13 hours ago

          They aren’t context aware, it’s using statistical probability. It can replicate things it’s seen a lot of like a tutorial regex. It can’t apply that to make a more complicated one. Regex in the wild isn’t really standard at all, because it’s rarely used to solve common problems. It has a bunch of random regexs from code it analyzed and will spit something out that looks similar.