Keyword (OR, AND) search in Lucene(Lucene 中的关键字(OR、AND)搜索)
问题描述
我在我的门户(基于 J2EE)中使用 Lucene 来提供索引和搜索服务.
问题在于 Lucene 的关键字.当您在搜索查询中使用其中一个时,您会收到错误消息.
例如:
searchTerms = "ik OR jij"这很好用,因为它会搜索 "ik" 或 "jij"
searchTerms = "ik AND jij"
这很好用,它搜索 "ik"
和 "jij"
但是当你搜索时:
searchTerms = "OR"searchTerms = "AND"searchTerms = "ik 或"searchTerms = "或 ik"
等等,会失败并报错:
<上一页>组件名称:STSE_RESULTS 类:org.apache.lucene.queryParser.ParseException 消息:无法解析OR jij":在第 1 行第 0 列遇到OR".期待其中之一:...
这是有道理的,因为这些词是 Lucene 的关键字,可能是保留的,并将充当关键字.
在荷兰语中,OR"这个词很重要,因为它具有Ondernemings Raad"的含义.它在许多文本中使用,需要找到它.例如,或"确实有效,但不返回与或"一词匹配的文本.如何使其可搜索?
如何转义关键字或"?或者我如何告诉 Lucene 将或"视为搜索词而不是关键字.
我猜你试过把OR"放在双引号里?
如果这不起作用,我认为您可能不得不更改 Lucene 源代码,然后重新编译整个东西,因为运算符OR"深埋在代码中.实际上,编译可能还不够:您必须更改源包中用作 JavaCC 输入的文件 QueryParser.jj,然后运行 JavaCC,然后重新编译整个东西.
不过,好消息是只有一行需要更改:
<代码>|<OR: ("OR" | "||") >
变成
<代码>|<OR: ("||") >
这样,您将只有||"作为逻辑或运算符.有一个 build.xml 也包含 JavaCC 的调用,但你必须下载 那个工具你自己.恐怕我现在不能自己尝试.
这对于 Lucene 开发者邮件列表来说可能是一个很好的问题,但是如果你这样做了,请告诉我们,他们会提出一个更简单的解决方案 ;-)
I am using Lucene in my portal (J2EE based) for indexing and search services.
The problem is about the keywords of Lucene. When you use one of them in the search query, you'll get an error.
For example:
searchTerms = "ik OR jij"
This works fine, because it will search for "ik"
or "jij"
searchTerms = "ik AND jij"
This works fine, it searches for "ik"
and "jij"
But when you search:
searchTerms = "OR"
searchTerms = "AND"
searchTerms = "ik OR"
searchTerms = "OR ik"
Etc., it will fail with an error:
Component Name: STSE_RESULTS Class: org.apache.lucene.queryParser.ParseException Message: Cannot parse 'OR jij': Encountered "OR" at line 1, column 0. Was expecting one of: ...
It makes sense, because these words are keywords for Lucene are probably reserved and will act as keywords.
In Dutch, the word "OR" is important because it has a meaning for "Ondernemings Raad". It is used in many texts, and it needs to be found. For example "or" does work, but does not return texts matching the term "OR". How can I make it searchable?
How can I escape the keyword "or"? Or How can I tell Lucene to treat "or" as a search term NOT as a keyword.
I suppose you have tried putting the "OR" into double quotes?
If that doesn't work I think you might have to go so far as to change the Lucene source and then recompile the whole thing, as the operator "OR" is buried deep inside the code. Actually, compiling probably isn't even enough: you'll have to change the file QueryParser.jj in the source package that serves as input for JavaCC, then run JavaCC, then recompile the whole thing.
The good news, however, is that there's only one line to change:
| <OR: ("OR" | "||") >
becomes
| <OR: ("||") >
That way, you'll have only "||" as logical OR operator. There is a build.xml that also contains the invocation of JavaCC, but you have to download that tool yourself. I can't try it myself right now, I'm afraid.
This is perhaps a good question for the Lucene developer mailing list, but please let us know if you do that and they come up with a simpler solution ;-)
这篇关于Lucene 中的关键字(OR、AND)搜索的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持编程学习网!
本文标题为:Lucene 中的关键字(OR、AND)搜索
基础教程推荐
- Java Keytool 导入证书后出错,"keytool error: java.io.FileNotFoundException &拒绝访问" 2022-01-01
- 无法使用修饰符“public final"访问 java.util.Ha 2022-01-01
- 如何使用 Java 创建 X509 证书? 2022-01-01
- “未找到匹配项"使用 matcher 的 group 方法时 2022-01-01
- 在 Libgdx 中处理屏幕的正确方法 2022-01-01
- 减少 JVM 暂停时间 >1 秒使用 UseConcMarkSweepGC 2022-01-01
- 降序排序:Java Map 2022-01-01
- 设置 bean 时出现 Nullpointerexception 2022-01-01
- FirebaseListAdapter 不推送聊天应用程序的单个项目 - Firebase-Ui 3.1 2022-01-01
- Java:带有char数组的println给出乱码 2022-01-01