I wrote a TUI application to help you practice Python regular expressions. There are more than 100 exercises covering both the builtin re
and third-party regex
module.
If you have pipx
, use pipx install regexexercises
to install the app. See the repo for source code and other details.
Thanks for sharing this. I took the time to read through the documentation of the
re
module. Here’s my review of the functions.Useful:
re.finditer
returns an iterator over all Match objectsre.search
returns the first Match object or None if there are no matches.r''
use raw strings for patters so you don’t have to worry about backslashes- the optional
flags
argument modifies the behaviour (case insensitive, multiline)
Utility:
re.sub
replace each match in the stringre.split
split a string by a regular expression
The Match object:
match.groups(0)
returns the portion of text matched by the patternmatch.groups(1)
returns the first capturing groupmatch.groups(2)
returns the second capturing group, and so on
I don’t understand why these exist:
re.match
like search, but only matches at the beginning of the string. why not just use ‘^’ or ‘\A’ in the pattern you pass to ‘search’?re.fullmatch
like ‘search’, but only if the full string matches. Why not just use ‘\A’ and ‘\Z’ in the pattern you pass to ‘search’?re.findall
Returns all matches. It seems like a shitty version of ‘finditer’. The function has three different return types which depend on the pattern you pattern you pass to the function. Who wants to work with that?
I would argue that having distinct
match
andsearch
helps readability. The difference betweenmatch('((([0-9]+-[0-9]+)|([0-9]+))[,]?)+[^,]', s)
andsearch('((([0-9]+-[0-9]+)|([0-9]+))[,]?)+[^,]', s)
is clear without the need for me to parse the regular expression myself. It also helps code reuse. Consider that you havePHONE_NUMBER_REGEX
defined somewhere. If you only had a method to “search” but not to “match”, you would have to do something likesearch(f"\A{PHONE_NUMBER_REGEX}\Z", s)
, which is error-prone and less readable. Most likely you would end up having at least two sets of precompiled regex objects (i.e.PHONE_NUMBER_REGEX
andPHONE_NUMBER_FULLMATCH_REGEX
). It is also a fairly common practice in other languages’ regex libraries (cf. [1,2]). Golang, which is usually very reserved in the number of ways to express the same thing, has 16 different matching methods[3].Regarding
re.findall
, I see what you mean, however I don’t agree with your conclusions. I think it is a useful convenience method that improves readability in many cases. I’ve found these usages from my code, and I’m quite happy that this method was available[4]:digits = [digit_map[digit] for digit in re.findall("(?=(one|two|three|four|five|six|seven|eight|nine|[0-9]))", line)] [(minutes, seconds)] = re.findall(r"You have (?:(\d+)m )?(\d+)s left to wait", text)
[1] https://docs.oracle.com/javase/7/docs/api/java/util/regex/Matcher.html
[2] https://en.cppreference.com/w/cpp/regex
[4] https://github.com/search?q=repo%3Ahades%2Faoc23 findall&type=code
Thank you for the very thorough reply! This is kind of high quality stuff you love to see on Lemmy. Your use cases seem very valid.