CUDA:在扭曲减少和 volatile 关键字中-C/C++问题

CUDA: In warp reduction and volatile keyword(CUDA:在扭曲减少和 volatile 关键字中)

本文介绍了CUDA:在扭曲减少和 volatile 关键字中的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

阅读以下问题及其答案后
链接

After reading the question and its answer from the following
LINK

我的脑海里还有一个问题.来自我的 C/C++ 背景；我知道使用 volatile 有它的缺点.并且在答案中还指出，在 CUDA 的情况下，如果不使用 volatile 关键字，优化可以用寄存器替换共享数组以保存数据.

I still have a question remaining in my mind. From my background in C/C++; I understand that using volatile has it's demerits. And also it is pointed in the answers that in case of CUDA, the optimizations can replace shared array with registers to keep data if volatile keyword is not used.

我想知道在计算(总和)减少时会遇到哪些性能问题.例如

I want to know what would be the performance issues that can be encountered when calculating (sum) reduction. e.g.

__device__ void sum(volatile int *s_data, int tid)
{
    if (tid < 16)
    {
        s_data[tid] += s_data[tid + 16];
        s_data[tid] += s_data[tid +  8];
        s_data[tid] += s_data[tid +  4];
        s_data[tid] += s_data[tid +  2];
        s_data[tid] += s_data[tid +  1];
    }
}

我正在使用减少翘曲.由于所有带有 in warp 的线程都是同步的，因此我相信没有必要使用 syncthreads() 构造.

I am using in warp reduction. Since all the threads with in warp are in sync, therefore I believe there is no need to use syncthreads() construct.

我想知道删除关键字 volatile 是否会弄乱我的总和(由于 cuda 优化)?我可以在没有 volatile 关键字的情况下使用这样的缩减吗?

I want to know will removing the keyword volatile mess up my sum (due to cuda optimizations)? Can I use reduction such as this without volatile keyword.

由于我多次使用这个reduction函数，volatile关键字会导致性能下降吗?

Since I use this reduction function multiple time, will volatile keyword cause any performance degradation?