I wrote a TUI application to help you practice Python regular expressions. There are more than 100 exercises covering both the builtin re
and third-party regex
module.
If you have pipx
, use pipx install regexexercises
to install the app. See the repo for source code and other details.
I would argue that having distinct
match
andsearch
helps readability. The difference betweenmatch('((([0-9]+-[0-9]+)|([0-9]+))[,]?)+[^,]', s)
andsearch('((([0-9]+-[0-9]+)|([0-9]+))[,]?)+[^,]', s)
is clear without the need for me to parse the regular expression myself. It also helps code reuse. Consider that you havePHONE_NUMBER_REGEX
defined somewhere. If you only had a method to “search” but not to “match”, you would have to do something likesearch(f"\A{PHONE_NUMBER_REGEX}\Z", s)
, which is error-prone and less readable. Most likely you would end up having at least two sets of precompiled regex objects (i.e.PHONE_NUMBER_REGEX
andPHONE_NUMBER_FULLMATCH_REGEX
). It is also a fairly common practice in other languages’ regex libraries (cf. [1,2]). Golang, which is usually very reserved in the number of ways to express the same thing, has 16 different matching methods[3].Regarding
re.findall
, I see what you mean, however I don’t agree with your conclusions. I think it is a useful convenience method that improves readability in many cases. I’ve found these usages from my code, and I’m quite happy that this method was available[4]:digits = [digit_map[digit] for digit in re.findall("(?=(one|two|three|four|five|six|seven|eight|nine|[0-9]))", line)] [(minutes, seconds)] = re.findall(r"You have (?:(\d+)m )?(\d+)s left to wait", text)
[1] https://docs.oracle.com/javase/7/docs/api/java/util/regex/Matcher.html
[2] https://en.cppreference.com/w/cpp/regex
[3] https://pkg.go.dev/regexp
[4] https://github.com/search?q=repo%3Ahades%2Faoc23 findall&type=code
Thank you for the very thorough reply! This is kind of high quality stuff you love to see on Lemmy. Your use cases seem very valid.