ne_chunk without pos_tag in NLTK(NLTK 中没有 pos_tag 的 ne_chunk)
问题描述
我正在尝试在 nltk 中使用 ne_chunk 和 pos_tag 对句子进行分块.
I'm trying to chunk a sentence using ne_chunk and pos_tag in nltk.
from nltk import tag
from nltk.tag import pos_tag
from nltk.tree import Tree
from nltk.chunk import ne_chunk
sentence = "Michael and John is reading a booklet in a library of Jakarta"
tagged_sent = pos_tag(sentence.split())
print_chunk = [chunk for chunk in ne_chunk(tagged_sent) if isinstance(chunk, Tree)]
print print_chunk
这是结果:
[Tree('GPE', [('Michael', 'NNP')]), Tree('PERSON', [('John', 'NNP')]), Tree('GPE', [('Jakarta', 'NNP')])]
我的问题是,是否可以不包含 pos_tag(如上面的 NNP)而只包含树 'GPE'、'PERSON'?GPE"是什么意思?
my question, is it possible not to include pos_tag (like NNP above) and only include Tree 'GPE','PERSON'? and what 'GPE' means?
提前致谢
推荐答案
命名实体分块器将为您提供一个包含分块和标签的树.你不能改变它,但你可以去掉标签.从您的 tagged_sent
开始:
The named entity chunker will give you a tree containing both chunks and tags. You can't change that, but you can take the tags out. Starting from your tagged_sent
:
chunks = nltk.ne_chunk(tagged_sent)
simple = []
for elt in chunks:
if isinstance(elt, Tree):
simple.append(Tree(elt.label(), [ word for word, tag in elt ]))
else:
simple.append( elt[0] )
如果您只想要块,请省略上面的 else:
子句.您可以调整代码以任何您想要的方式包装块.我使用 nltk Tree
将更改保持在最低限度.请注意,某些块由多个单词组成(尝试在您的示例中添加New York"),因此块的内容必须是一个列表,而不是单个元素.
If you only want the chunks, omit the else:
clause in the above. You can adapt the code to wrap the chunks any way you want. I used an nltk Tree
to keep the changes to a minimum. Note that some chunks consist of multiple words (try adding "New York" to your example), so the chunk's contents must be a list, not a single element.
附注.GPE"代表地缘政治实体"(显然是一个词组错误).你可以在nltk书中看到常用标签"的列表,这里.
PS. "GPE" stands for "geo-political entity" (obviously a chunker mistake). You can see a list of the "commonly used tags" in the nltk book, here.
这篇关于NLTK 中没有 pos_tag 的 ne_chunk的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持编程学习网!
本文标题为:NLTK 中没有 pos_tag 的 ne_chunk
基础教程推荐
- Dask.array.套用_沿_轴:由于额外的元素([1]),使用dask.array的每一行作为另一个函数的输入失败 2022-01-01
- 如何在海运重新绘制中自定义标题和y标签 2022-01-01
- 如何让 python 脚本监听来自另一个脚本的输入 2022-01-01
- 在 Python 中,如果我在一个“with"中返回.块,文件还会关闭吗? 2022-01-01
- 使用PyInstaller后在Windows中打开可执行文件时出错 2022-01-01
- Python kivy 入口点 inflateRest2 无法定位 libpng16-16.dll 2022-01-01
- 何时使用 os.name、sys.platform 或 platform.system? 2022-01-01
- 筛选NumPy数组 2022-01-01
- 线程时出现 msgbox 错误,GUI 块 2022-01-01
- 用于分类数据的跳跃记号标签 2022-01-01