I wrote a TUI application to help you practice Python regular expressions. There are more than 100 exercises covering both the builtin re and third-party regex module.

If you have pipx, use pipx install regexexercises to install the app. See the repo for source code and other details.

  • @alyth
    link
    15 months ago

    Thanks for sharing this. I took the time to read through the documentation of the re module. Here’s my review of the functions.

    Useful:

    • re.finditer returns an iterator over all Match objects
    • re.search returns the first Match object or None if there are no matches.
    • r'' use raw strings for patters so you don’t have to worry about backslashes
    • the optional flags argument modifies the behaviour (case insensitive, multiline)

    Utility:

    • re.sub replace each match in the string
    • re.split split a string by a regular expression

    The Match object:

    • match.groups(0) returns the portion of text matched by the pattern
    • match.groups(1) returns the first capturing group
    • match.groups(2) returns the second capturing group, and so on

    I don’t understand why these exist:

    • re.match like search, but only matches at the beginning of the string. why not just use ‘^’ or ‘\A’ in the pattern you pass to ‘search’?
    • re.fullmatch like ‘search’, but only if the full string matches. Why not just use ‘\A’ and ‘\Z’ in the pattern you pass to ‘search’?
    • re.findall Returns all matches. It seems like a shitty version of ‘finditer’. The function has three different return types which depend on the pattern you pattern you pass to the function. Who wants to work with that?
    • @[email protected]
      link
      fedilink
      45 months ago

      I would argue that having distinct match and search helps readability. The difference between match('((([0-9]+-[0-9]+)|([0-9]+))[,]?)+[^,]', s) and search('((([0-9]+-[0-9]+)|([0-9]+))[,]?)+[^,]', s) is clear without the need for me to parse the regular expression myself. It also helps code reuse. Consider that you have PHONE_NUMBER_REGEX defined somewhere. If you only had a method to “search” but not to “match”, you would have to do something like search(f"\A{PHONE_NUMBER_REGEX}\Z", s), which is error-prone and less readable. Most likely you would end up having at least two sets of precompiled regex objects (i.e. PHONE_NUMBER_REGEX and PHONE_NUMBER_FULLMATCH_REGEX). It is also a fairly common practice in other languages’ regex libraries (cf. [1,2]). Golang, which is usually very reserved in the number of ways to express the same thing, has 16 different matching methods[3].

      Regarding re.findall, I see what you mean, however I don’t agree with your conclusions. I think it is a useful convenience method that improves readability in many cases. I’ve found these usages from my code, and I’m quite happy that this method was available[4]:

      digits = [digit_map[digit] for digit in re.findall("(?=(one|two|three|four|five|six|seven|eight|nine|[0-9]))", line)]
      [(minutes, seconds)] = re.findall(r"You have (?:(\d+)m )?(\d+)s left to wait", text)
      

      [1] https://docs.oracle.com/javase/7/docs/api/java/util/regex/Matcher.html

      [2] https://en.cppreference.com/w/cpp/regex

      [3] https://pkg.go.dev/regexp

      [4] https://github.com/search?q=repo%3Ahades%2Faoc23 findall&type=code

      • @alyth
        link
        35 months ago

        Thank you for the very thorough reply! This is kind of high quality stuff you love to see on Lemmy. Your use cases seem very valid.