shell-script – 用于反转HTML文件中数千个元素的排序顺序的正确工具

我有一个包含数千个 div class =date / div ul … / ul的HTML文件代码块如下:!DOCTYPE htmlhtmlhead/headbodydiv class=dateWed May 23 2018/divulliDo laundryulliGet coins/li...

我有一个包含数千个< div class ='date'>< / div>< ul> …< / ul>的HTML文件代码块如下:

<!DOCTYPE html>
<html>

    <head>
    </head>

    <body>

        <div class="date">Wed May 23 2018</div>
        <ul>
            <li>
                Do laundry
                <ul>
                    <li>
                        Get coins
                    </li>
                </ul>
            </li>
            <li>
                Wash the dishes
            </li>
        </ul>

        <div class='date'>Thu May 24 2018</div>
        <ul>
            <li>
                Solve the world's hunger problem
                <ul>
                    <li>
                        Don't tell anyone
                    </li>
                </ul>
            </li>
            <li>
                Get something to wear
            </li>
        </ul>

        <div class='date'>Fri May 25 2018</div>
        <ul>
            <li>
                Modify the website according to GDPR
            </li>
            <li>
                Watch YouTube
            </li>
        </ul>

    </body>

</html>

每个< div>和相应的< ul>元素是针对特定日期的. < div class ='date'>< / div>< ul> …< / ul>的块按升序排序,即较新的日期位于文件的底部.我打算按降序排列它们,以便较新的日期位于文件的顶部,如下所示:

<!DOCTYPE html>
<html>

    <head>
    </head>

    <body>

        <div class='date'>Fri May 25 2018</div>
        <ul>
            <li>
                Modify the website according to GDPR
            </li>
            <li>
                Watch YouTube
            </li>
        </ul>

        <div class='date'>Thu May 24 2018</div>
        <ul>
            <li>
                Solve the world's hunger problem
                <ul>
                    <li>
                        Don't tell anyone
                    </li>
                </ul>
            </li>
            <li>
                Get something to wear
            </li>
        </ul>

        <div class="date">Wed May 23 2018</div>
        <ul>
            <li>
                Do laundry
                <ul>
                    <li>
                        Get coins
                    </li>
                </ul>
            </li>
            <li>
                Wash the dishes
            </li>
        </ul>

    </body>

</html> 

我不确定什么是正确的工具,是shell脚本吗?是awk吗?是Python吗?还有什么其他可能更快更方便的?

解决方法:

扩展Python解决方案:

sort_html_by_date.py脚本:

from bs4 import BeautifulSoup
from datetime import datetime

with open('input.html') as html_doc:    # replace with your actual html file name
    soup = BeautifulSoup(html_doc, 'lxml')
    divs = {}
    for div in soup.find_all('div', 'date'):
        divs[datetime.strptime(div.string, '%a %B %d %Y')] = \
            str(div) + '\n' + div.find_next_sibling('ul').prettify()

    soup.body.clear()
    for el in sorted(divs, reverse=True):
        soup.body.append(divs[el])

    print(soup.prettify(formatter=None))

用法:

python sort_html_by_date.py

输出:

 <!DOCTYPE html>
<html>
 <head>
 </head>
 <body>
  <div class="date">Fri May 25 2018</div>
<ul>
 <li>
  Modify the website according to GDPR
 </li>
 <li>
  Watch YouTube
 </li>
</ul>
  <div class="date">Thu May 24 2018</div>
<ul>
 <li>
  Solve the world's hunger problem
  <ul>
   <li>
    Don't tell anyone
   </li>
  </ul>
 </li>
 <li>
  Get something to wear
 </li>
</ul>
  <div class="date">Wed May 23 2018</div>
<ul>
 <li>
  Do laundry
  <ul>
   <li>
    Get coins
   </li>
  </ul>
 </li>
 <li>
  Wash the dishes
 </li>
</ul>
 </body>
</html>

二手模块:

beautifulsoup – https://www.crummy.com/software/BeautifulSoup/bs4/doc/
datetime – https://docs.python.org/3.3/library/datetime.html#module-datetime

本文标题为:shell-script – 用于反转HTML文件中数千个元素的排序顺序的正确工具

基础教程推荐