Elegant ways to count the frequency of words in a file(计算文件中单词出现频率的优雅方法)
问题描述
统计文件中每个英文"词出现频率的优雅有效方法有哪些?
What are the elegant and effective ways to count the frequency of each "english" word in a file?
推荐答案
首先,我定义了 letter_only
std::locale
以便忽略来自流的标点符号,并仅从输入流中读取有效的英文"字母.这样,流会将词 "ways"
、"ways."
和 "ways!"
视为同一个词 >"ways"
,因为流将忽略诸如 "."
和 "!"
之类的标点符号.
First of all, I define letter_only
std::locale
so as to ignore punctuations coming from the stream, and to read only valid "english" letters from the input stream. That way, the stream will treat the words "ways"
, "ways."
and "ways!"
as just the same word "ways"
, because the stream will ignore punctuations like "."
and "!"
.
struct letter_only: std::ctype<char>
{
letter_only(): std::ctype<char>(get_table()) {}
static std::ctype_base::mask const* get_table()
{
static std::vector<std::ctype_base::mask>
rc(std::ctype<char>::table_size,std::ctype_base::space);
std::fill(&rc['A'], &rc['z'+1], std::ctype_base::alpha);
return &rc[0];
}
};
解决方案 1
int main()
{
std::map<std::string, int> wordCount;
ifstream input;
input.imbue(std::locale(std::locale(), new letter_only())); //enable reading only letters!
input.open("filename.txt");
std::string word;
while(input >> word)
{
++wordCount[word];
}
for (std::map<std::string, int>::iterator it = wordCount.begin(); it != wordCount.end(); ++it)
{
cout << it->first <<" : "<< it->second << endl;
}
}
<小时>
解决方案 2
struct Counter
{
std::map<std::string, int> wordCount;
void operator()(const std::string & item) { ++wordCount[item]; }
operator std::map<std::string, int>() { return wordCount; }
};
int main()
{
ifstream input;
input.imbue(std::locale(std::locale(), new letter_only())); //enable reading only letters!
input.open("filename.txt");
istream_iterator<string> start(input);
istream_iterator<string> end;
std::map<std::string, int> wordCount = std::for_each(start, end, Counter());
for (std::map<std::string, int>::iterator it = wordCount.begin(); it != wordCount.end(); ++it)
{
cout << it->first <<" : "<< it->second << endl;
}
}
这篇关于计算文件中单词出现频率的优雅方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持编程学习网!
本文标题为:计算文件中单词出现频率的优雅方法
基础教程推荐
- 在 C++ 中循环遍历所有 Lua 全局变量 2021-01-01
- 如何使图像调整大小以在 Qt 中缩放? 2021-01-01
- 为 C/C++ 中的项目的 makefile 生成依赖项 2022-01-01
- 管理共享内存应该分配多少内存?(助推) 2022-12-07
- 如何在不破坏 vtbl 的情况下做相当于 memset(this, ...) 的操作? 2022-01-01
- Windows Media Foundation 录制音频 2021-01-01
- 如何“在 Finder 中显示"或“在资源管理器中显 2021-01-01
- 为什么语句不能出现在命名空间范围内? 2021-01-01
- 使用从字符串中提取的参数调用函数 2022-01-01
- 从 std::cin 读取密码 2021-01-01