pattern matching in elastic search?(弹性搜索中的模式匹配?)
问题描述
继续我之前的 post,我已根据 femtoRgon 的 post 弹性搜索不支持某些字符和锚点.
Continuing from my earlier post, I have changed the query as according to femtoRgon's post some characters and anchors are not supported by elastic search.
我正在寻找匹配xxx-xx-xxxx"等模式的方法,以便使用 elastic search
查找带有社会安全号码的文档.
I am looking the way to match the pattern like "xxx-xx-xxxx" in order to look for documents with social security numbers using elastic search
.
假设,在索引文档中,我想查找所有那些社会安全号码与xxx-xx-xxxx"模式匹配的文档.
Let’s suppose, in indexed documents, I would like to find all those documents that has social security numbers that matches "xxx-xx-xxxx" pattern.
索引文档的示例代码:
InputStream is = null;
try {
is = new FileInputStream("/home/admin/Downloads/20121221.doc");
ContentHandler contenthandler = new BodyContentHandler();
Metadata metadata = new Metadata();
Parser parser = new AutoDetectParser();
parser.parse(is, contenthandler, metadata, new ParseContext());
}
catch (Exception e) {
e.printStackTrace();
}
finally {
if (is != null) is.close();
}
搜索示例代码
QueryBuilder queryBuilderFullText = null;
queryBuilderFullText = QueryBuilders.filteredQuery(QueryBuilders.matchAllQuery(),
FilterBuilders.regexpFilter("_all", "[0-9]{3}?[0-9]{2}?[0-9]{4}"));
SearchRequestBuilder requestBuilder;
requestBuilder = client.prepareSearch()
.setIndices(getDomainIndexId(project))
.setTypes(getProjectTypeId(project))
.setQuery(queryBuilderFullText);
SearchResponse response = requestBuilder.execute().actionGet(ES_TIMEOUT_MS);
SearchHits hits = response.getHits();
if (hits.getTotalHits() > 0) {
System.out.println(hits.getTotalHits());
} else {
return 0l;
}
我因关注而受到关注:
45-555-5462
457-55-5462
4578-55-5462
457-55-54623
457-55-5462-23
但根据我的要求,它应该只返回457-55-5462"(基于模式匹配xxx-xx-xxxx").
But as per my requirement, it should only return "457-55-5462" (based on pattern matching "xxx-xx-xxxx").
请帮忙.
推荐答案
看到 ^
, $
和 d
不能用过,我会这样做:
Seeing as ^
, $
and d
can't be used, I would do this:
[^0-9-][0-9]{3}-[0-9]{2}-[0-9]{4}[^0-9-]
或者在 Java 中:
Or in Java:
FilterBuilders.regexpFilter("_all", "[^0-9-][0-9]{3}-[0-9]{2}-[0-9]{4}[^0-9-]"));
检查找到的数字之前或之后是否没有其他数字或破折号.它确实需要在匹配之前和之后有 some 字符,因此这不会捕获将社会安全号码作为 very beginning 或 very结束.
Which checks that before or after the found number are no other numbers or dashes. It does require there be some character before and after the match though, so this won't capture documents that have the social security number as the very beginning or very end.
Regex101 演示
这篇关于弹性搜索中的模式匹配?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持编程学习网!
本文标题为:弹性搜索中的模式匹配?


基础教程推荐
- Java 实例变量在两个语句中声明和初始化 2022-01-01
- Java Swing计时器未清除 2022-01-01
- 不推荐使用 Api 注释的描述 2022-01-01
- 验证是否调用了所有 getter 方法 2022-01-01
- 在 Java 中创建日期的正确方法是什么? 2022-01-01
- 大摇大摆的枚举 2022-01-01
- 多个组件的复杂布局 2022-01-01
- 如何在 Spring @Value 注解中正确指定默认值? 2022-01-01
- 如何在 JFrame 中覆盖 windowsClosing 事件 2022-01-01
- 从 python 访问 JVM 2022-01-01