在 Django 中确定和实现趋势算法

Deciding and implementing a trending algorithm in Django(在 Django 中确定和实现趋势算法)

本文介绍了在 Django 中确定和实现趋势算法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个 Django 应用程序,我需要在其中实现一个简单的趋势/排名算法.我很迷茫:

I have a Django application in which I need to implement a simple trending/ranking algorithm. I'm very lost as a :

我有两个模型,BookReader.每天晚上,新书都会添加到我的数据库中.每本书的读者数每晚都会更新,即一本书会有多个读者统计记录(每天一个记录).

I have two models, Book and Reader. Every night, new books are added to my database. The number of readers for each book are updated too every night i.e. One book will have multiple reader statistic records (one record for each day).

在给定的时间段内(过去一周、过去一个月或过去一年),我想列出最受欢迎的书籍,我应该为此使用什么算法?

Over a given period (past week, past month or past year), I would like to list the most popular books, what algorithm should I use for this?

热度不需要以任何方式实时,因为每本书的读者人数每天都会更新.

The popularity doesn't need to be realtime in any way because the reader count for each book is only updated daily.

我发现另一篇文章引用了另一篇 SO 帖子展示了他们如何计算维基百科文章的趋势,但该帖子仅展示了当前趋势的计算方式.

I found one article which was referenced in another SO post that showed how they calculated trending Wikipedia articles but the post only showed how the current trend was calculated.

正如有人在 SO 上指出的那样,这是一个非常简单的基线趋势算法,只计算两个数据点之间的斜率,所以我猜它显示了昨天和今天之间的趋势.

As someone pointed out on SO, it is a very simple baseline trend algorithm and only calculates the slope between two data points so I guess it shows the trend between yesterday and today.

我不是在寻找像 Hacker News、Reddit 等上使用的那些超级复杂的趋势算法.

I'm not looking for a uber complex trending algorithm like those used on Hacker News, Reddit, etc.

我只有两个数据轴,读者计数和日期.

I have only two data axes, reader count and date.

关于我应该实施什么以及如何实施的任何想法.对于从未接触过任何统计/算法相关领域的人来说,这似乎是一项非常艰巨的任务.

Any ideas on what and how I should implement. For someone who's never worked with anything statistics/algorithm related, this seems to be a very daunting undertaking.

先谢谢大家.

推荐答案

我能想到的最简单的趋势算法"可能是 n 天移动平均线.我不确定您的数据是如何构建的,但假设您有这样的数据:

Probably the simplest possible trending "algorithm" I can think of is the n-day moving average. I'm not sure how your data is structured, but say you have something like this:

books = {'Twilight': [500, 555, 580, 577, 523, 533, 556, 593],
         'Harry Potter': [650, 647, 653, 642, 633, 621, 625, 613],
         'Structure and Interpretation of Computer Programs': [1, 4, 15, 12, 7, 3, 8, 19]
        }

一个简单的移动平均只是取最后的 n 个值并取平均值:

A simple moving average just takes the last n values and averages them:

def moving_av(l, n):
    """Take a list, l, and return the average of its last n elements.
    """
    observations = len(l[-n:])
    return sum(l[-n:]) / float(observations)

切片符号只是抓取列表的尾端,从第 n 个变量开始到最后一个变量.移动平均线是消除单个尖峰或低谷可能引入的任何噪音的相当标准的方法.该函数可以像这样使用:

The slice notation simply grabs the tail end of the list, starting from the nth to last variable. A moving average is a fairly standard way to smooth out any noise that a single spike or dip could introduce. The function could be used like so:

book_scores = {}
for book, reader_list in books.iteritems():
    book_scores[book] = moving_av(reader_list, 5)

你会想要玩弄你平均的天数.如果你想强调最近的趋势,你也可以考虑使用类似加权移动平均线.

You'll want to play around with the number of days you average over. And if you want to emphasize recent trends you can also look at using something like a weighted moving average.

如果您想专注于看起来不太看绝对读者人数的内容,而是关注读者人数的增加,只需找出 30 天移动平均线和 5 天移动平均线的百分比变化:

If you wanted to focus on something that looks less at absolute readership and focuses instead on increases in readership, simply find the percent change in the 30-day moving average and 5-day moving average:

d5_moving_av = moving_av(reader_list, 5)
d30_moving_av = moving_av(reader_list, 30)
book_score = (d5_moving_av - d30_moving_av) / d30_moving_av

通过这些简单的工具,您可以在多大程度上强调过去的趋势以及在多大程度上平滑(或不平滑)峰值方面具有相当大的灵活性.

With these simple tools you have a fair amount of flexibility in how much you emphasize past trends and how much you want to smooth out (or not smooth out) spikes.

这篇关于在 Django 中确定和实现趋势算法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持编程学习网!

本文标题为:在 Django 中确定和实现趋势算法

基础教程推荐