• @MTK
    link
    818 hours ago

    Downvoted so that everyone can know I’m cool since I understand regex better than the idiot who made that meme.

  • @[email protected]
    link
    fedilink
    221 day ago

    Just pop them into regex101 or a similar tool, add sample data, see the mistake, fix the mistake, continue to do other stuff.

  • exu
    link
    fedilink
    English
    211 day ago

    I usually do

    # What we are doing (high level)
    # Why we need regex
    # Regex step by step
    # Examples of matches
    regex
    

    And I still rewrite it the next time

    • @InnerScientist
      link
      17
      edit-2
      1 day ago
      // abandon all hope ye who commit here
      (?:[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*|"(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])*")@(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|\[(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?|[a-z0-9-]*[a-z0-9]:(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])+)\])
      

      Edith: damit, Not the first to post this abomination

  • AItoothbrush
    link
    fedilink
    English
    151 day ago

    Me checking my own docs: “this is some voodoo shit, idk how it works”

  • @NegativeLookBehind
    link
    English
    82
    edit-2
    1 day ago

    I found your email address:

    (?:[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*|"(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])*")@(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|\[(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?|[a-z0-9-]*[a-z0-9]:(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])+)\])
    
    • exu
      link
      fedilink
      English
      151 day ago

      I was about to ruin your day by finding a valid email address that would be rejected by your regex, but it doesn’t even parse correctly on regex101.com

      The only valid regex for email is .+@.+ btw

      • FreshLight
        link
        fedilink
        217 hours ago

        Does this regex include

        "very.(),:;<>[]\".VERY.\"very@\\ \"very\".unusual"@strange.example.com

      • :3 3: :3 3: :3 3: :3
        link
        fedilink
        521 hours ago

        What about "user@not_domain"? It validates but isn’t valid - there’s no domain part, the @ is quoted

        • exu
          link
          fedilink
          English
          320 hours ago

          That’s not something you can determine using a regex.

          “user@com” for example could be a perfectly working email.

          The right way is to send a verification email in every case.

      • @NegativeLookBehind
        link
        English
        10
        edit-2
        1 day ago

        It’s the RFC standard regex for email, so it’s definitely valid, btw. My copy/paste just seems to add a bunch of backslashes for some reason

        • exu
          link
          fedilink
          English
          319 hours ago

          Where did you get that “RFC standard” regex? It doesn’t allow domain names with one component RFC5321

          Neither does it allow spaces in quoted string, as per RFC5322

          This, 👋@✉️.gg, is already a working email address in most clients and if RFC6532 ever gets accepted, it would be officially recognized as such.

          My point isn’t to make your regex bad, just that it doesn’t validate or invalidate an email properly. Nothing stops me from giving you and invalid but syntactically correct email after all.
          You have to send an email anyways to verify, so the most you can check is the presence of one @ symbol.

        • @[email protected]
          link
          fedilink
          522 hours ago

          The argument here is that checking complex validation is a fool’s errand. Yes, you can write a fully validating regex for RFC email. In fact, it should be possible to write a regex shorter than the one that gets passed around since the 90s, because regular expression engines support recursive patterns now. (Part of the reason that old regex is so complicated is because email allows nested comments (which is insane (how insane? (Lisp levels insane)))).

          However, it doesn’t get you much of anywhere. What you really want to know is if it’s a valid email or not, and the only way to do that is to send an email to that address with a confirmation. The only point of the regex is to throw away obviously bad addresses. For that, checking that there’s an @ symbol and something for the user and domain portions is sufficient. I’d add needing a dot in the domain portion, but it’s not that important.

          Classically, it was argued that emails don’t even need a domain portion when things are done for internal systems, or that internal domains don’t need a tld. In my personal experience, this is rarely done anymore and can be safely ignored. Maybe some very, very old legacy systems, and if you’re working on one of those, then sure. For everyone else, don’t worry about it. You’re probably working on publicly accessible systems, and even if you’re not, most users are going to prefer using their fully spec’d out email address, anyway.

    • palordrolap
      link
      fedilink
      91 day ago

      That \\. part doesn’t look right, but what do I know. Apparently control codes are valid elsewhere, so a literal backslash followed by any character, even a space or a newline, might actually be valid there.

      “Yeah, my e-mail address is abc, carriage return, three backspaces and a terminal bell at example dot com. … What do you mean your mail program doesn’t support it?”

    • @[email protected]
      link
      fedilink
      English
      41 day ago

      No My email is

      (?:(?:\r\n)?[ \t])*(?:(?:(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t]
      )+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:
      \r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(
      ?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ 
      \t]))*"(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\0
      31]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\
      ](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+
      (?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:
      (?:\r\n)?[ \t])*))*|(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z
      |(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)
      ?[ \t])*)*\<(?:(?:\r\n)?[ \t])*(?:@(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\
      r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[
       \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)
      ?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t]
      )*))*(?:,@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[
       \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*
      )(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t]
      )+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*)
      *:(?:(?:\r\n)?[ \t])*)?(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+
      |\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r
      \n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:
      \r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t
      ]))*"(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031
      ]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](
      ?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?
      :(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?
      :\r\n)?[ \t])*))*\>(?:(?:\r\n)?[ \t])*)|(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?
      :(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?
      [ \t]))*"(?:(?:\r\n)?[ \t])*)*:(?:(?:\r\n)?[ \t])*(?:(?:(?:[^()<>@,;:\\".\[\] 
      \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|
      \\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>
      @,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"
      (?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t]
      )*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\
      ".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?
      :[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[
      \]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*|(?:[^()<>@,;:\\".\[\] \000-
      \031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(
      ?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)*\<(?:(?:\r\n)?[ \t])*(?:@(?:[^()<>@,;
      :\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([
      ^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\"
      .\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\
      ]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*(?:,@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\
      [\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\
      r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] 
      \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]
      |\\.)*\](?:(?:\r\n)?[ \t])*))*)*:(?:(?:\r\n)?[ \t])*)?(?:[^()<>@,;:\\".\[\] \0
      00-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\
      .|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,
      ;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?
      :[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t])*
      (?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".
      \[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[
      ^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]
      ]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*\>(?:(?:\r\n)?[ \t])*)(?:,\s*(
      ?:(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\
      ".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)(?:\.(?:(
      ?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[
      \["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t
      ])*))*@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t
      ])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?
      :\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|
      \Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*|(?:
      [^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\
      ]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)*\<(?:(?:\r\n)
      ?[ \t])*(?:@(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["
      ()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)
      ?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>
      @,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*(?:,@(?:(?:\r\n)?[
       \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,
      ;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t]
      )*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\
      ".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*)*:(?:(?:\r\n)?[ \t])*)?
      (?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".
      \[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)(?:\.(?:(?:
      \r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\[
      "()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])
      *))*@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])
      +|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\
      .(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z
      |(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*\>(?:(
      ?:\r\n)?[ \t])*))*)?;\s*)
      		
      
  • @[email protected]
    link
    fedilink
    English
    271 day ago

    I have found chatgpt to be very good at writing regex. I also don’t know how to write regex.

    • @SwordInStone
      link
      720 hours ago

      well, you won’t get better using chatgpt for it

    • @[email protected]
      link
      fedilink
      161 day ago

      In my experience, it is good at simple to medium complexity regex. For the harder ones it starts being quite useless though, at best providing a decent starting point to begin debugging from.

  • @[email protected]
    link
    fedilink
    302 days ago

    It helps if you break it apart into its component parts. Which is like anything else, really, but we’ve all accepted that regexes are supposed to run together in an unreadable mess. No reason it has to be that way.

  • @Skyrmir
    link
    English
    131 day ago

    Never debug regex, just generate a new one. It’s not worth the hassle to figure out not only what it does, but what it was meant to do.

    Better yet, just write it out in code, and never use regex. Tis a stupid thing that never should have been made.

    • @[email protected]
      link
      fedilink
      217 hours ago

      I love regex and I use it a lot, but I very rarely use it in any kind of permanent solution. When I do, I make sure to keep it as minimal as possible, supplementing with higher level programming where possible. Backreferences and assertions are a cardinal sin and should never be used.

    • @[email protected]
      link
      fedilink
      14
      edit-2
      1 day ago

      Hard disagree. The function regex serves in programs like Notepad++ can’t be easily replaced by “writing it out in code”. With a very small number of characters you can get complex search patterns and capturing groups. It’s hard to read but incredibly useful.

      • @lightsblinken
        link
        -116 hours ago

        feel like thats a notepad++ problem? in general, breaking it out into manageable human ingest-able chunks is A Good Idea

      • @[email protected]
        link
        fedilink
        11 day ago

        I fall in the abandon it camp. Code is read way more times than it is written. I’d rather read an algorithm that validates input than read a regex that validates input.

        • @[email protected]
          link
          fedilink
          41 day ago

          You’re discussing a completely different use case from what I said. RegEx can be increidbly useful but it’s not always the only/best option.

      • @Skyrmir
        link
        English
        -31 day ago

        If you’re needing that level of complexity in a text file search, you already fucked up by putting the data in a text file. There’s a reason data file formats exist.

        • @[email protected]
          link
          fedilink
          31 day ago

          Not even close. Sometimes you can have a large text file where you need to do a find replace with a pattern. For example, in the translation world this can be a common occurrence for translation files (.xliff) or translation memories (.tmx).

          There’s a reason why this is widely used and it’s not because everyone else but you is dumb.

          • @Skyrmir
            link
            English
            01 day ago

            Turns out the million hours of coding put into SQL, makes it a better option than regex, even for xml based files.

            • nickwitha_k (he/him)
              link
              fedilink
              19 hours ago

              Why would I use SQL to to reformat a poorly structured log file for programs whose source I have no input in during a live debug with a customer on system that I don’t own and can’t install anything on? Or to extract and format things like hosts from a similar file?

              That’s stuff that’s quickly and easily done in vim (which is generally part of the base install) with regex. There’s a lot of use cases that have no overlap with SQL.

            • @[email protected]
              link
              fedilink
              222 hours ago

              Maybe for your very specific use case that’s true. However, other use cases exist and for many of those RegEx is the better option.

              • @Skyrmir
                link
                English
                120 hours ago

                I’m saying if your use case makes regex the best option, you’ve gone the wrong way and should turn back. There are definitely corners you can paint yourself into that make it the way to go, but you’ve ended up there through a series of bad ideas.

                • @[email protected]
                  link
                  fedilink
                  218 hours ago

                  Maybe, just maybe, the context in which you use regex isn’t the same as everyone elses. But hey, who am I to deny you the disservice of thinking you’re the center of the world?

    • @sakodak
      link
      41 day ago

      Regex is a write only language.

  • @[email protected]
    link
    fedilink
    8
    edit-2
    1 day ago

    If I have a complex regular expression to code into my app, I write it in pomsky, then copy paste the compiled regex to my source file, but also keep the pomsky source nearby. Much more maintainable.