我有一个包含数千个 div class =date / div ul … / ul的HTML文件代码块如下:!DOCTYPE htmlhtmlhead/headbodydiv class=dateWed May 23 2018/divulliDo laundryulliGet coins/li...
我有一个包含数千个< div class ='date'>< / div>< ul> …< / ul>的HTML文件代码块如下:
<!DOCTYPE html>
<html>
<head>
</head>
<body>
<div class="date">Wed May 23 2018</div>
<ul>
<li>
Do laundry
<ul>
<li>
Get coins
</li>
</ul>
</li>
<li>
Wash the dishes
</li>
</ul>
<div class='date'>Thu May 24 2018</div>
<ul>
<li>
Solve the world's hunger problem
<ul>
<li>
Don't tell anyone
</li>
</ul>
</li>
<li>
Get something to wear
</li>
</ul>
<div class='date'>Fri May 25 2018</div>
<ul>
<li>
Modify the website according to GDPR
</li>
<li>
Watch YouTube
</li>
</ul>
</body>
</html>
每个< div>和相应的< ul>元素是针对特定日期的. < div class ='date'>< / div>< ul> …< / ul>的块按升序排序,即较新的日期位于文件的底部.我打算按降序排列它们,以便较新的日期位于文件的顶部,如下所示:
<!DOCTYPE html>
<html>
<head>
</head>
<body>
<div class='date'>Fri May 25 2018</div>
<ul>
<li>
Modify the website according to GDPR
</li>
<li>
Watch YouTube
</li>
</ul>
<div class='date'>Thu May 24 2018</div>
<ul>
<li>
Solve the world's hunger problem
<ul>
<li>
Don't tell anyone
</li>
</ul>
</li>
<li>
Get something to wear
</li>
</ul>
<div class="date">Wed May 23 2018</div>
<ul>
<li>
Do laundry
<ul>
<li>
Get coins
</li>
</ul>
</li>
<li>
Wash the dishes
</li>
</ul>
</body>
</html>
我不确定什么是正确的工具,是shell脚本吗?是awk吗?是Python吗?还有什么其他可能更快更方便的?
解决方法:
扩展Python解决方案:
sort_html_by_date.py脚本:
from bs4 import BeautifulSoup
from datetime import datetime
with open('input.html') as html_doc: # replace with your actual html file name
soup = BeautifulSoup(html_doc, 'lxml')
divs = {}
for div in soup.find_all('div', 'date'):
divs[datetime.strptime(div.string, '%a %B %d %Y')] = \
str(div) + '\n' + div.find_next_sibling('ul').prettify()
soup.body.clear()
for el in sorted(divs, reverse=True):
soup.body.append(divs[el])
print(soup.prettify(formatter=None))
用法:
python sort_html_by_date.py
输出:
<!DOCTYPE html>
<html>
<head>
</head>
<body>
<div class="date">Fri May 25 2018</div>
<ul>
<li>
Modify the website according to GDPR
</li>
<li>
Watch YouTube
</li>
</ul>
<div class="date">Thu May 24 2018</div>
<ul>
<li>
Solve the world's hunger problem
<ul>
<li>
Don't tell anyone
</li>
</ul>
</li>
<li>
Get something to wear
</li>
</ul>
<div class="date">Wed May 23 2018</div>
<ul>
<li>
Do laundry
<ul>
<li>
Get coins
</li>
</ul>
</li>
<li>
Wash the dishes
</li>
</ul>
</body>
</html>
二手模块:
beautifulsoup – https://www.crummy.com/software/BeautifulSoup/bs4/doc/
datetime – https://docs.python.org/3.3/library/datetime.html#module-datetime
本文标题为:shell-script – 用于反转HTML文件中数千个元素的排序顺序的正确工具
基础教程推荐
- Ajax犯的错误处理方法 2023-01-21
- CSS linear-gradient属性案例详解 2022-11-20
- JavaScript实现网页版贪吃蛇游戏 2023-08-08
- Ajax动态为下拉列表添加数据的实现方法 2023-01-26
- 原生AJAX封装的简单实现 2023-01-20
- JavaScript中的预解析你了解吗 2023-07-09
- ajax使用formdata上传文件流 2023-02-23
- vue中面包屑的封装 2023-10-08
- Html分层的box-shadow效果的示例代码 2022-09-20
- HTML汉字编码标准介绍 2022-09-21