| Title |
Test
Find
Pattern Title
|
| Expression |
([\d\w-.]+?\.(a[cdefgilmnoqrstuwz]|b[abdefghijmnorstvwyz]|c[acdfghiklmnoruvxyz]|d[ejkmnoz]|e[ceghrst]|f[ijkmnor]|g[abdefghilmnpqrstuwy]|h[kmnrtu]|i[delmnoqrst]|j[emop]|k[eghimnprwyz]|l[abcikrstuvy]|m[acdghklmnopqrstuvwxyz]|n[acefgilopruz]|om|p[aefghklmnrstwy]|qa|r[eouw]|s[abcdeghijklmnortuvyz]|t[cdfghjkmnoprtvwz]|u[augkmsyz]|v[aceginu]|w[fs]|y[etu]|z[amw]|aero|arpa|biz|com|coop|edu|info|int|gov|mil|museum|name|net|org|pro)(\b|\W(?<!&|=)(?!\.\s|\.{3}).*?))(\s|$) |
| Description |
This will find URLs in plain text. With or without protocol. It matches against all toplevel domains to find the URL in the text. |
| Matches |
http://www.website.com/index.html | www.website.com | website.com |
| Non-Matches |
Works in all my tests. Does not capture protocol. |
| Author |
Rating:
James Johnston
|
| Source |
Modified, can't remember original source |
| Your Rating |
|
Title: This does not work
Name: JohnC
Date: 11/20/2008 2:14:35 AM
Comment:
This does not work at all. Very few of the regular expressions on this site do.
Title: James Johnstons url regex
Name: DC
Date: 6/24/2008 10:54:01 PM
Comment:
Works on almost all my tests except ftp://myname@host.dom/%2Fetc/motd
prospero://host.dom//pros/name
Title: Ok... I got it.
Name: James Johnston
Date: 3/1/2005 11:03:43 PM
Comment:
I understand. Thanks. I appreciate the help. :)
It includes all TLDs but how do I exclude certain matches?
I've noticed that if a jpeg image is listed in the text it matches the .jp part of the extension and thinks it's a URL.
Title: Whuh? {1} DOES NOTHING, GET IT?
Name: Randal L. Schwartz
Date: 3/1/2005 9:37:50 PM
Comment:
You still have {1} there. It DOES NOTHING WASTE THREE CHARACTERS OF YOUR REGEX.
Get it? {1} is useless. Pointless. Always.
a matches 1 a
a{1} matches 1 a
Same exact thing
Get it?
Title: Updated...
Name: James Johnston
Date: 3/1/2005 2:16:24 PM
Comment:
There... updated it. :)
That should be better.
Title: Sorry... :]
Name: James Johnston
Date: 3/1/2005 2:00:50 PM
Comment:
Thanks for the suggestion about the PERL "URI::Find".
I'm new to regexps and this worked in my tests.
I didn't see a regexp on this site that did exactly what I wanted. I did notice the error with the second {1} after I'd already posted this. The last part of the regexp should be "coop){1}[:/]?.*?)(\s|$)" would also remove the need for the regexp to be followed by a newline or space.
Is there a way to change my post?
Title: Bad
Name: Randal L. Schwartz
Date: 3/1/2005 7:00:05 AM
Comment:
First off, the {1} do absolutely nothing except take up three characters (twice!). Second, this is case sensitive. Third, you should probably look at the Perl "URI::Find" module to see how to do it right.