列中的差异数

The number of differences in a column(列中的差异数)

本文介绍了列中的差异数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想检索一列,每行中的字母有多少差异.例如

I would like to retrieve a column of how many differences in letters in each row. For instance

如果你有一个值test"而另一行有一个值testing",那么test"和testing"之间的差异是4个字母.该列的数据将为值 4

If you have a a value "test" and another row has a value "testing ", then the differences is 4 letter between "test" and "testing ". The data of the column would be value 4

I have reflected about it and I don't know where to begin

id    ||  value     || category   || differences 
--------------------------------------------------
 1    ||  test      || 1          || 4
 2    ||  testing  || 1          || null   
11    ||  candy     || 2          || -3       
12    ||  ca        || 2          || null      

在这个场景和上下文中,测试"和休息"没有区别.

In this scenario and context it is no difference between "Test" and "rest".

推荐答案

我认为您正在寻找的是 编辑差异,而不仅仅是计算前缀相似度,为此有一些常用算法.Levenshtein 的方法 是我以前使用过的方法,我已经看到它作为 TSQL 函数实现.this SO question 的答案建议了一些 TSQL 中的实现,您可能只是能够按原样获取和使用.

I think what you are looking for is a measure of edit difference, rather than just counting prefix similarity, for which there are a few common algorithms. Levenshtein's method is one that I've used before and I've seen it implemented as TSQL functions. The answers to this SO question suggest a couple of implementations in TSQL that you might just be able to take and use as-is.

(尽管花时间测试代码并理解方法,而不是仅仅复制代码并使用它,以便在出现问题时您可以理解输出 - 否则您可能会产生一些技术债务你以后要还钱)

确切地说,您想要哪种距离计算方法取决于您想如何计算某些事物,例如,您是将替换算作一次更改还是将删除和插入算作一次,以及您的字符串是否足够长,可以这样做你想考虑子串移动等等.

Exactly which distance calculation method you want will depend on how you want to count certain things, for instance do you count a substitution as one change or a delete and an insert, and if your strings are long enough for it to matter do you want to consider substring moves, and so forth.

这篇关于列中的差异数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持编程学习网!

本文标题为:列中的差异数

基础教程推荐