Extract domain from url (including the hard ones)(从 url 中提取域(包括困难的域))
问题描述
我正在尝试编写(或只是找到一个现有的)PHP 方法,该方法可以获取链接并提取 url.诀窍是,它需要承受像以下奇怪域的重量:
I'm trying to write (or just find an existing) PHP method that can take a link and extract the url. The trick is, it needs to hold under the weight of strange looking domains like:
www.champa.kku.ac.th
亲眼看到这个,我还是猜错了:原以为域名是kku.ac.th
,但访问时出现dns错误.
Looking at this one myself with human eyes, I still guessed it incorrectly: thought the domain would be kku.ac.th
but that gives a dns error when visiting.
所以任何人都知道从 url 中可靠地提取域的好方法:
So anyone knows of a good way to reliably extract the domain from url:
http://site.com/hello.php
http://site.com.uk/hello.php
http://subdomain.site.com/hello.php
http://subdomain.site.com.uk/hello.php
http://www.champa.kku.ac.th/hello.php // and even the one I couldn't tell
推荐答案
PHP 有 parse_url() 功能将帮助您进行基本的协议、主机、端口等拆分.
PHP has the parse_url() function that will help you do the basic splitting into protocol, host, port, and so on.
至于在不确定的情况下提取正确"的域,这很难说,因为有时两部分 TLD"是 TLD 当局(例如在英国)的措施,有时是私营企业(例如.uk.com
).我认为您不会绕过维护包含两部分的顶级域列表,例如
As to extracting the "right" domain in uncertain cases, this is extremely hard to tell because sometimes, "two-part TLDs" are a measure by the TLD authority (e.g. in the UK) and sometimes are private enterprises (e.g. .uk.com
). I think you won't get around maintaining lists of top level domains that have two parts like
- .co.uk
- .ac.uk
- .ac.th
那些结尾将被视为 TLD(顶级 级别域),吞下第二部分.
those endings would be treated like TLDs (Top level domains), swallowing the second part.
这是可靠区分两部分 TLD"的唯一方法,例如 .co.uk
- where server1.ibm.co.uk
(其中两个-part .co.uk
需要从诸如 server1.ibm.com
(其中 .com代码>需要删除).
This is the only way of reliably telling apart "two-part TLDs" like .co.uk
- where server1.ibm.co.uk
(where the two-part .co.uk
needs to be removed to determine the domain itself) from regular sub-domains like server1.ibm.com
(where .com
needs to be removed).
获取许多重要的两部分 TLD"列表的一个很好的起点是在 speednames.com 上进行域搜索(在国家/地区选择全部").更完整的列表可以在 Ruby domainatrix 库的一部分中找到一>.
A good starting point to get a list of many important "two-part TLDs" is the domain search at speednames.com (select "all" in countries). A more complete list can be found as part of the Ruby domainatrix library.
这篇关于从 url 中提取域(包括困难的域)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持编程学习网!
本文标题为:从 url 中提取域(包括困难的域)
基础教程推荐
- 在 CakePHP 2.0 中使用 Html Helper 时未定义的变量 2021-01-01
- 如何在 Symfony 和 Doctrine 中实现多对多和一对多? 2022-01-01
- 在 yii2 中迁移时出现异常“找不到驱动程序" 2022-01-01
- 如何在 XAMPP 上启用 mysqli? 2021-01-01
- HTTP 与 FTP 上传 2021-01-01
- 找不到类“AppHttpControllersDB",我也无法使用新模型 2022-01-01
- Doctrine 2 - 在多对多关系中记录更改 2022-01-01
- 使用 PDO 转义列名 2021-01-01
- PHP 守护进程/worker 环境 2022-01-01
- phpmyadmin 错误“#1062 - 密钥 1 的重复条目‘1’" 2022-01-01