Compare group of tags to find similarity/score with PHP/MySQL(比较标签组以查找与 PHP/MySQL 的相似性/分数)
问题描述
如何将一组标签与我数据库中另一个帖子的标签进行比较以获取相关帖子?
我要做的是将帖子上的一组标签与另一个帖子的标签进行比较,而不是单独比较每个标签.因此,假设您想根据帖子中的标签获取真正相关的项目,然后从最相关到最不相关显示它们.无论关系级别如何,每次都必须显示三个相关项目.
What I'm trying to do is compare a group of tags on a post to another post's tags, but not each tag individually. So say you wanted to get truly related items based on tags from a post and then show them from the most related to the least related. Each time there have to be three related items shown, no matter the relationship level.
帖子 A 的标签有:建筑"、木材"、现代"、瑞士"
帖子 B 的标签有:建筑"、木材"、现代"
Post C 有标签:建筑"、现代"、石头"
帖子 D 有标签:建筑"、房子"、住宅"
Post A has the tags: "architecture", "wood", "modern", "switzerland"
Post B has the tags: "architecture", "wood", "modern"
Post C has the tags: "architecture", "modern", "stone"
Post D has the tags: "architecture", "house", "residence"
帖子 B 与帖子 A 的相关度为 75%(3 个相关标签)
帖子 C 与帖子 A 的相关度为 50%(2 个相关标签)
帖子 D 与帖子 A 的相关度为 25%(1 个相关标签)
Post B is related to post A by 75% (3 related tags)
Post C is related to post A by 50% (2 related tags)
Post D is related to post A by 25% (1 related tag)
我该怎么做?我目前正在使用 3-tables.
How can I do that? I'm currently using a 3-tables.
posts
> id
> image
> date
post_tags
> post_id
> tag_id
tags
> id
> name
我已经搜索了 Internet 和 Stack Overflow 以了解如何执行此操作.我最接近的发现是 如何查找相关项目"在 PHP 中,但它实际上并没有为我解决太多问题.
I have searched the Internet and Stack Overflow to find out how to do this. My closest find was How to find "related items" in PHP, but it actually didn't solve much for me.
推荐答案
注意:此解决方案仅适用于 MySQL,因为 MySQL 对 GROUP BY 有自己的解释
NOTE: This solution is MySQL only, as MySQL has its own interpretation of GROUP BY
我还使用了我自己的相似度计算方法.我将相同标签的数量除以帖子 A 和帖子 B 中的平均标签数.因此,如果帖子 A 有 4 个标签,而帖子 B 有 2 个与 A 共享的标签,则相似度为 66%.
I've also used my own calculation of similarity. I've taken the number of identical tags and divided it by the average tag count in post A and post B. So if post A has 4 tags, and post B has 2 tags which are both shared with A, the similarity is 66%.
(SHARED:2/((A:4 + B:2)/2)
或 (SHARED:2)/(AVG:3)
如果您想要/需要更改公式应该很容易......
It should be easy to change the formula if you want/need to...
SELECT
sourcePost.id,
targetPost.id,
/* COUNT NUMBER OF IDENTICAL TAGS */
/* REF GROUPING OF sourcePost.id and targetPost.id BELOW */
COUNT(targetPost.id) /
(
(
/* TOTAL TAGS IN SOURCE POST */
(SELECT COUNT(*) FROM post_tags WHERE post_id = sourcePost.id)
+
/* TOTAL TAGS IN TARGET POST */
(SELECT COUNT(*) FROM post_tags WHERE post_id = targetPost.id)
) / 2 /* AVERAGE TAGS IN SOURCE + TARGET */
) as similarity
FROM
posts sourcePost
LEFT JOIN
post_tags sourcePostTags ON (sourcePost.id = sourcePostTags.post_id)
INNER JOIN
post_tags targetPostTags ON (sourcePostTags.tag_id = targetPostTags.tag_id
AND
sourcePostTags.post_id != targetPostTags.post_id)
LEFT JOIN
posts targetPost ON (targetPostTags.post_id = targetPost.id)
GROUP BY
sourcePost.id, targetPost.id
这篇关于比较标签组以查找与 PHP/MySQL 的相似性/分数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持编程学习网!
本文标题为:比较标签组以查找与 PHP/MySQL 的相似性/分数
基础教程推荐
- 在 PHP 中强制下载文件 - 在 Joomla 框架内 2022-01-01
- 超薄框架REST服务两次获得输出 2022-01-01
- 在 Woocommerce 中根据运输方式和付款方式添加费用 2021-01-01
- 如何在 PHP 中的请求之间持久化对象 2022-01-01
- 在多维数组中查找最大值 2021-01-01
- mysqli_insert_id 是否有可能在高流量应用程序中返回 2021-01-01
- Libpuzzle 索引数百万张图片? 2022-01-01
- XAMPP 服务器不加载 CSS 文件 2022-01-01
- WooCommerce 中选定产品类别的自定义产品价格后缀 2021-01-01
- 通过 PHP SoapClient 请求发送原始 XML 2021-01-01