Issue with below snippet on boundary matchers regex ()(边界匹配器正则表达式 () 上的以下片段问题)
问题描述
我的意见:
1. end
2. end of the day or end of the week
3. endline
4. something
5. "something" end
基于上述讨论,如果我尝试使用此代码段替换单个字符串,它会成功从该行中删除相应的单词
Based on the above discussions, If I try to replace a single string using this snippet, it removes the appropriate words from the line successfully
public class DeleteTest {
public static void main(String[] args) {
// TODO Auto-generated method stub
try {
File file = new File("C:/Java samples/myfile.txt");
File temp = File.createTempFile("myfile1", ".txt", file.getParentFile());
String delete="end";
BufferedReader reader = new BufferedReader(new InputStreamReader(new FileInputStream(file)));
PrintWriter writer = new PrintWriter(new OutputStreamWriter(new FileOutputStream(temp)));
for (String line; (line = reader.readLine()) != null;) {
line = line.replaceAll("\b"+delete+"\b", "");
writer.println(line);
}
reader.close();
writer.close();
}
catch (Exception e) {
System.out.println("Something went Wrong");
}
}
}
我的输出如果我使用上面的片段:(也是我的预期输出)
My output If I use the above snippet:(Also my expected output)
1.
2. of the day or of the week
3. endline
4. something
5. "something"
但是当我包含更多要删除的单词时,并且为此我使用 Set 时,我使用以下代码片段:
But when I include more words to delete, and for that purpose when I use Set, I use the below code snippet:
public static void main(String[] args) {
// TODO Auto-generated method stub
try {
File file = new File("C:/Java samples/myfile.txt");
File temp = File.createTempFile("myfile1", ".txt", file.getParentFile());
BufferedReader reader = new BufferedReader(new InputStreamReader(new FileInputStream(file)));
PrintWriter writer = new PrintWriter(new OutputStreamWriter(new FileOutputStream(temp)));
Set<String> toDelete = new HashSet<>();
toDelete.add("end");
toDelete.add("something");
for (String line; (line = reader.readLine()) != null;) {
line = line.replaceAll("\b"+toDelete+"\b", "");
writer.println(line);
}
reader.close();
writer.close();
}
catch (Exception e) {
System.out.println("Something went Wrong");
}
}
我的输出是:(它只是删除了空间)
I get my output as: (It just removes the space)
1. end
2. endofthedayorendoftheweek
3. endline
4. something
5. "something" end
你们能帮我解决这个问题吗?
Can u guys help me on this?
点击这里关注线程
推荐答案
你需要创建一个 交替组出组与
You need to create an alternation group out of the set with
String.join("|", toDelete)
并用作
line = line.replaceAll("\b(?:"+String.join("|", toDelete)+")\b", "");
图案看起来像
(?:end|something)
请参阅 正则表达式演示.这里,(?:...)
是一个非捕获组,用于分组几个备选方案,而不为捕获(您不需要它,因为您删除了匹配项).
See the regex demo. Here, (?:...)
is a non-capturing group that is used to group several alternatives without creating a memory buffer for the capture (you do not need it since you remove the matches).
或者,更好的是,在进入循环之前编译正则表达式:
Or, better, compile the regex before entering the loop:
Pattern pat = Pattern.compile("\b(?:" + String.join("|", toDelete) + ")\b");
...
line = pat.matcher(line).replaceAll("");
更新:
要允许匹配可能包含特殊字符的整个单词",您需要 Pattern.quote
这些单词以转义这些特殊字符,然后您需要使用明确的单词边界,(?<!w)
而不是初始的 以确保之前没有单词 char 和
(?!w)
负前瞻而不是最后的 以确保匹配后没有单词 char.
To allow matching whole "words" that may contain special chars, you need to Pattern.quote
those words to escape those special chars, and then you need to use unambiguous word boundaries, (?<!w)
instead of the initial to make sure there is no word char before and
(?!w)
negative lookahead instead of the final to make sure there is no word char after the match.
在 Java 8 中,您可以使用以下代码:
In Java 8, you may use this code:
Set<String> nToDel = new HashSet<>();
nToDel = toDelete.stream()
.map(Pattern::quote)
.collect(Collectors.toCollection(HashSet::new));
String pattern = "(?<!\w)(?:" + String.join("|", nToDel) + ")(?!\w)";
正则表达式看起来像 (?<!w)(?:Q+endE|Qsomething-E)(?!w)
.请注意,Q
和 E
之间的符号被解析为 文字符号.
The regex will look like (?<!w)(?:Q+endE|Qsomething-E)(?!w)
. Note that the symbols between Q
and E
is parsed as literal symbols.
这篇关于边界匹配器正则表达式 () 上的以下片段问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持编程学习网!
本文标题为:边界匹配器正则表达式 () 上的以下片段问题
基础教程推荐
- 如何使用 Java 创建 X509 证书? 2022-01-01
- FirebaseListAdapter 不推送聊天应用程序的单个项目 - Firebase-Ui 3.1 2022-01-01
- 无法使用修饰符“public final"访问 java.util.Ha 2022-01-01
- Java Keytool 导入证书后出错,"keytool error: java.io.FileNotFoundException &拒绝访问" 2022-01-01
- Java:带有char数组的println给出乱码 2022-01-01
- 设置 bean 时出现 Nullpointerexception 2022-01-01
- 减少 JVM 暂停时间 >1 秒使用 UseConcMarkSweepGC 2022-01-01
- “未找到匹配项"使用 matcher 的 group 方法时 2022-01-01
- 降序排序:Java Map 2022-01-01
- 在 Libgdx 中处理屏幕的正确方法 2022-01-01