While doing my previous post (downloading content from codingbat) i got this script for parsing the url links
urls = re.findall(r'href=[\'"]p?([^\'" >]+)', line)
r - is provided to denote the string is a rawstring(we dont need to specify escape charcters)
href=[\'"] - the string must start with "HREF=" and can either have any of the characters (' - single quote, " - double quote) next to it.
p? - the next character must be a p
([^\'" >]+) - it must end with a greater than symbol which must be preceeded either by single or double quote.
No comments:
Post a Comment