Python: Rename duplicates in list with progressive numbers without sorting list(Python:用渐进式数字重命名列表中的重复项而不对列表进行排序)
问题描述
给定这样的列表:
mylist = ["name", "state", "name", "city", "name", "zip", "zip"]
我想通过附加一个数字来重命名重复项以获得以下结果:
I would like to rename the duplicates by appending a number to get the following result:
mylist = ["name1", "state", "name2", "city", "name3", "zip1", "zip2"]
我不想更改原始列表的顺序.针对此相关 Stack Overflow 问题建议的解决方案对列表进行排序,我不想这样做.
I do not want to change the order of the original list. The solutions suggested for this related Stack Overflow question sorts the list, which I do not want to do.
推荐答案
我会这样做.我把它写成一个更通用的实用函数,因为人们似乎喜欢这个答案.
This is how I would do it. I wrote this into a more generalized utility function since people seem to like this answer.
mylist = ["name", "state", "name", "city", "name", "zip", "zip"]
check = ["name1", "state", "name2", "city", "name3", "zip1", "zip2"]
copy = mylist[:] # so we will only mutate the copy in case of failure
from collections import Counter # Counter counts the number of occurrences of each item
from itertools import tee, count
def uniquify(seq, suffs = count(1)):
"""Make all the items unique by adding a suffix (1, 2, etc).
`seq` is mutable sequence of strings.
`suffs` is an optional alternative suffix iterable.
"""
not_unique = [k for k,v in Counter(seq).items() if v>1] # so we have: ['name', 'zip']
# suffix generator dict - e.g., {'name': <my_gen>, 'zip': <my_gen>}
suff_gens = dict(zip(not_unique, tee(suffs, len(not_unique))))
for idx,s in enumerate(seq):
try:
suffix = str(next(suff_gens[s]))
except KeyError:
# s was unique
continue
else:
seq[idx] += suffix
uniquify(copy)
assert copy==check # raise an error if we failed
mylist = copy # success
如果您想在每个计数前添加下划线,您可以执行以下操作:
If you wanted to append an underscore before each count, you could do something like this:
>>> mylist = ["name", "state", "name", "city", "name", "zip", "zip"]
>>> uniquify(mylist, (f'_{x!s}' for x in range(1, 100)))
>>> mylist
['name_1', 'state', 'name_2', 'city', 'name_3', 'zip_1', 'zip_2']
...或者如果您想改用字母:
...or if you wanted to use letters instead:
>>> mylist = ["name", "state", "name", "city", "name", "zip", "zip"]
>>> import string
>>> uniquify(mylist, (f'_{x!s}' for x in string.ascii_lowercase))
>>> mylist
['name_a', 'state', 'name_b', 'city', 'name_c', 'zip_a', 'zip_b']
注意:这不是最快的算法;为此,请参阅 ronakg 的答案.上述函数的优点是易于理解和阅读,除非您有一个非常大的列表,否则您不会看到太大的性能差异.
NOTE: this is not the fastest possible algorithm; for that, refer to the answer by ronakg. The advantage of the function above is it is easy to understand and read, and you're not going to see much of a performance difference unless you have an extremely large list.
这是我在单行中的原始答案,但是顺序没有保留,它使用 .index
方法,这是非常次优的(如 DTing的答案).请参阅 queezz 的答案,了解保持秩序的良好双线".
Here is my original answer in a one-liner, however the order is not preserved and it uses the .index
method, which is extremely suboptimal (as explained in the answer by DTing). See the answer by queezz for a nice 'two-liner' that preserves order.
[s + str(suffix) if num>1 else s for s,num in Counter(mylist).items() for suffix in range(1, num+1)]
# Produces: ['zip1', 'zip2', 'city', 'state', 'name1', 'name2', 'name3']
这篇关于Python:用渐进式数字重命名列表中的重复项而不对列表进行排序的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持编程学习网!
本文标题为:Python:用渐进式数字重命名列表中的重复项而不对
基础教程推荐
- 合并具有多索引的两个数据帧 2022-01-01
- 使用 Google App Engine (Python) 将文件上传到 Google Cloud Storage 2022-01-01
- 将 YAML 文件转换为 python dict 2022-01-01
- 使用Python匹配Stata加权xtil命令的确定方法? 2022-01-01
- 如何在Python中绘制多元函数? 2022-01-01
- 使 Python 脚本在 Windows 上运行而不指定“.py";延期 2022-01-01
- Python 的 List 是如何实现的? 2022-01-01
- 症状类型错误:无法确定关系的真值 2022-01-01
- 哪些 Python 包提供独立的事件系统? 2022-01-01
- 如何在 Python 中检测文件是否为二进制(非文本)文 2022-01-01