Unicode Encoding Errors Python - Parsing XML can#39;t encode a character (Star)(Unicode 编码错误 Python - 解析 XML 无法编码字符(星号))
问题描述
我是 Python 的初学者,目前正在从 eventful.com API 解析一个基于 Web 的 XML 文件,但是,在检索数据的某些元素时,我收到了一些 unicode 错误.
I am a beginner to Python and am currently parsing a web-based XML file from the eventful.com API however, I am receiving some unicode errors when retrieving certain elements of the data.
我能够从 xml 文件中检索 5 个数据元素而没有任何我想要的问题,但是它会终止并在 GAE 错误控制台中产生以下错误:
I am able to retrieve 5 data elements without any problems which I want from the xml file, however then it terminates and produces the following error in the GAE error console:
UnicodeEncodeError: 'ascii' codec can't encode character u'u2605' in position 0: ordinal not in range(128)
我知道抛出我的解析器的字符是★"字符,无论如何我都不想从 xml 文件中检索它.
I know that the character that is throwing my parser is a "★" character, which I would prefer to not retrieve from the xml file anyway.
我的代码如下:
class XMLParser(webapp2.RequestHandler):
def get(self):
base_url = 'my xml file'
#downloads data from xml file
response = urllib.urlopen(base_url)
#converts data to string:
data = response.read()
#closes file
response.close()
#parses xml downloaded
dom = mdom.parseString(data)
node = dom.documentElement
#print out all event names (titles) found in the eventful xml
event_main = dom.getElementsByTagName('event')
event_names = []
for event in event_main:
eventObj = event.getElementsByTagName("title")[0]
event_names.append(eventObj)
for ev in event_names:
nodes = ev.childNodes
for node in nodes:
if node.nodeType == node.TEXT_NODE:
print node.data
有什么方法可以检索标题"元素并忽略此处的 ★ 字符等有趣字符?我真的很感激在这件事上的任何帮助.我已经尝试过使用 word.encode('us-ascii', 'ignore') 的解决方案,但这并不能解决问题.
Is there any way that I would be able to retrieve the "title" elements and ignore funny characters like the ★ character here? I would really appreciate any help on this matter. I have already tried solutions which uses word.encode('us-ascii', 'ignore') but this is not fixing the issue.
-----------我找到了解决方案:
-----------I HAVE FOUND THE SOLUTION:
因此,当我遇到此类问题时,在与该主题的讲师交谈后,我发现只需要两行代码即可对已解析的 xml 文件进行编码和解码(在读取后进入程序).希望这可以帮助遇到同样问题的其他人!
So as I was having such issues with this problem and after talking to a lecturer on this topic I was able to find that all it required was two lines of code to both encode and decode the parsed xml file (after it was read into the program). Hope this helps someone else having the same issue!
unicode_data = data.decode('utf-8')
data = unicode_data.encode('ascii','ignore')
推荐答案
你在哪里使用你的解码方法?
Where are you using your decoding methods?
我过去遇到过这个错误,不得不解码原始数据.换句话说,我会尝试做
I had this error in the past and had to decode the raw. In other words, I would try doing
data = response.read()
#closes file
response.close()
#decode
data.encode("us-ascii")
也就是说,如果它实际上是 ascii.我的意思是,在调用 parseString 之前,请确保在原始结果仍为字符串格式时对其进行编码/解码.
That is if it is in fact ascii. My point being make sure you are encoding/decoding the raw results while it is still in a string format, before you call parseString on it.
这篇关于Unicode 编码错误 Python - 解析 XML 无法编码字符(星号)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持编程学习网!
本文标题为:Unicode 编码错误 Python - 解析 XML 无法编码字符(星
基础教程推荐
- 使用 Google App Engine (Python) 将文件上传到 Google Cloud Storage 2022-01-01
- 哪些 Python 包提供独立的事件系统? 2022-01-01
- 将 YAML 文件转换为 python dict 2022-01-01
- 合并具有多索引的两个数据帧 2022-01-01
- 使 Python 脚本在 Windows 上运行而不指定“.py";延期 2022-01-01
- 症状类型错误:无法确定关系的真值 2022-01-01
- 使用Python匹配Stata加权xtil命令的确定方法? 2022-01-01
- 如何在 Python 中检测文件是否为二进制(非文本)文 2022-01-01
- 如何在Python中绘制多元函数? 2022-01-01
- Python 的 List 是如何实现的? 2022-01-01