How to get BeautifulSoup 4 to respect a self-closing tag?(如何让 BeautifulSoup 4 尊重自闭标签?)
问题描述
这个问题是针对 BeautifulSoup4 的问题,这使得它不同于以前的问题:
This question is specific to BeautifulSoup4, which makes it different from the previous questions:
BeautifulSoup 为什么要修改我的自闭合元素?
BeautifulSoup 中的 selfClosingTags
由于 BeautifulStoneSoup
已经消失(之前的 xml 解析器),我怎样才能让 bs4
尊重一个新的自闭合标签?例如:
Since BeautifulStoneSoup
is gone (the previous xml parser), how can I get bs4
to respect a new self-closing tag? For example:
import bs4
S = '''<foo> <bar a="3"/> </foo>'''
soup = bs4.BeautifulSoup(S, selfClosingTags=['bar'])
print soup.prettify()
不会自动关闭 bar
标签,但会给出提示.bs4 所指的这个树生成器是什么以及如何自我关闭标签?
Does not self-close the bar
tag, but gives a hint. What is this tree builder that bs4 is referring to and how to I self-close the tag?
/usr/local/lib/python2.7/dist-packages/bs4/__init__.py:112: UserWarning: BS4 does not respect the selfClosingTags argument to the BeautifulSoup constructor. The tree builder is responsible for understanding self-closing tags.
"BS4 does not respect the selfClosingTags argument to the "
<html>
<body>
<foo>
<bar a="3">
</bar>
</foo>
</body>
</html>
推荐答案
解析你传入的XMLxml"作为 BeautifulSoup 构造函数的第二个参数.
soup = bs4.BeautifulSoup(S, 'xml')
您需要安装 lxml.
您不再需要传递 selfClosingTags
:
In [1]: import bs4
In [2]: S = '''<foo> <bar a="3"/> </foo>'''
In [3]: soup = bs4.BeautifulSoup(S, 'xml')
In [4]: print soup.prettify()
<?xml version="1.0" encoding="utf-8"?>
<foo>
<bar a="3"/>
</foo>
这篇关于如何让 BeautifulSoup 4 尊重自闭标签?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持编程学习网!
本文标题为:如何让 BeautifulSoup 4 尊重自闭标签?


基础教程推荐
- 包装空间模型 2022-01-01
- PermissionError: pip 从 8.1.1 升级到 8.1.2 2022-01-01
- 使用大型矩阵时禁止 Pycharm 输出中的自动换行符 2022-01-01
- 求两个直方图的卷积 2022-01-01
- 无法导入 Pytorch [WinError 126] 找不到指定的模块 2022-01-01
- 在同一图形上绘制Bokeh的烛台和音量条 2022-01-01
- PANDA VALUE_COUNTS包含GROUP BY之前的所有值 2022-01-01
- 修改列表中的数据帧不起作用 2022-01-01
- 在Python中从Azure BLOB存储中读取文件 2022-01-01
- Plotly:如何设置绘图图形的样式,使其不显示缺失日期的间隙? 2022-01-01