how to extract links and titles from a .html page?(如何从 .html 页面中提取链接和标题?)
问题描述
对于我的网站,我想添加一个新功能.
for my website, i'd like to add a new functionality.
我希望用户能够上传他的书签备份文件(如果可能,从任何浏览器),这样我就可以将其上传到他们的个人资料中,而他们不必手动插入所有这些...
I would like user to be able to upload his bookmarks backup file (from any browser if possible) so I can upload it to their profile and they don't have to insert all of them manually...
我唯一缺少的部分是从上传的文件中提取标题和 URL 的部分.. 任何人都可以提供线索从哪里开始或从哪里阅读?
the only part i'm missing to do this it's the part of extracting title and URL from the uploaded file.. can anyone give a clue where to start or where to read?
使用的搜索选项和(如何从原始 HTML 文件?)这是与我最相关的问题,它没有谈论它..
used search option and (How to extract data from a raw HTML file?) this is the most related question for mine and it doesn't talk about it..
我真的不介意它是使用 jquery 还是 php
I really don't mind if its using jquery or php
非常感谢.
推荐答案
谢谢大家,我明白了!
最终代码:
$html = file_get_contents('bookmarks.html');
//Create a new DOM document
$dom = new DOMDocument;
//Parse the HTML. The @ is used to suppress any parsing errors
//that will be thrown if the $html string isn't valid XHTML.
@$dom->loadHTML($html);
//Get all links. You could also use any other tag name here,
//like 'img' or 'table', to extract other tags.
$links = $dom->getElementsByTagName('a');
//Iterate over the extracted links and display their URLs
foreach ($links as $link){
//Extract and show the "href" attribute.
echo $link->nodeValue;
echo $link->getAttribute('href'), '<br>';
}
这会显示为.html 文件中的所有链接分配的锚 文本和href.
This shows you the anchor text assigned and the href for all links in a .html file.
再次感谢.
这篇关于如何从 .html 页面中提取链接和标题?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持编程学习网!
本文标题为:如何从 .html 页面中提取链接和标题?
基础教程推荐
- 在 CakePHP 2.0 中使用 Html Helper 时未定义的变量 2021-01-01
- 使用 PDO 转义列名 2021-01-01
- 如何在 Symfony 和 Doctrine 中实现多对多和一对多? 2022-01-01
- Doctrine 2 - 在多对多关系中记录更改 2022-01-01
- 如何在 XAMPP 上启用 mysqli? 2021-01-01
- 在 yii2 中迁移时出现异常“找不到驱动程序" 2022-01-01
- phpmyadmin 错误“#1062 - 密钥 1 的重复条目‘1’" 2022-01-01
- PHP 守护进程/worker 环境 2022-01-01
- 找不到类“AppHttpControllersDB",我也无法使用新模型 2022-01-01
- HTTP 与 FTP 上传 2021-01-01