CUDA:二维网格中的线程 ID 分配

CUDA: Thread ID assignment in 2D grid(CUDA:二维网格中的线程 ID 分配)

本文介绍了CUDA:二维网格中的线程 ID 分配的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设我有一个带有 2D 网格的内核调用,如下所示:

Let's suppose I have a kernel call with a 2D grid, like so:

dim3 dimGrid(x, y); // not important what the actual values are
dim3 dimBlock(blockSize, blockSize);
myKernel <<< dimGrid, dimBlock >>>();

现在我读到多维网格只是为了简化编程——底层硬件只会使用一维线性缓存内存(除非你使用纹理内存,但这与这里无关).

Now I've read that multidimensional grids are merely meant to ease programming - the underlying hardware will only ever use 1D linearly cached memory (unless you use texture memory, but that's not relevant here).

我的问题是:在 warp 调度期间,线程将按什么顺序分配给网格索引?它们是水平分配(迭代"x,然后是 y)还是垂直分配(迭代"y,然后是 x)?这可能与改进内存合并有关,具体取决于我在内核中访问内存的方式.

My question is: In what order will the threads be assigned to the grid indices during warp scheduling? Will they be assigned horizontally ("iterate" x, then y) or vertically ("iterate" y, then x)? This might be relevant to improve memory coalescing, depending on how I access my memory in the kernel.

为了更清楚起见,假设以下表示应用到我的(假想的)网格的线程 ID,具有水平"分布:

To make it more clear, let's say the following represents the thread's IDs as applied to my (imaginary) grid with a "horizontal" distribution:

[ 0  1  2  3 ]
[ 4  5  6  7 ]
[ 8  9 10 11 ]
[ ...        ]

垂直"分布是:

[ 0  4  8 .. ]
[ 1  5  9 .. ]
[ 2  6 10 .. ]
[ 3  7 11 .. ]

我希望您能看到这可能如何影响合并:对于每个变体,都会有一种特定的最佳方式来访问我的设备内存缓冲区.

I hope you can see how this might affect coalescing: With each variant, there will be a specific optimal way to access my device memory buffer.

很遗憾,我还没有找到任何有关此的详细信息..

Unfortunately, I have not found any detailed information on this yet..

推荐答案

横竖是任意的.但线程确实具有明确定义的 x、y 和 z 维度.线程按 x、y、z 的顺序分组到 warp 中.所以一个 16x16 的线程块将在第一个 32 线程 warp 中具有以下顺序的线程:

Horizontal and vertical is arbitrary. But threads do have a well-defined x, y, and z dimension. Threads are grouped into warps in the order of x, y, z. So a 16x16 threadblock will have threads in the following order in the first 32-thread warp:

warp lane:线程 ID (x,y,z)

warp lane: thread ID (x,y,z)

  • 0: 0,0,0
  • 1: 1,0,0
  • 2: 2,0,0
  • 3: 3,0,0
  • ...
  • 15: 15,0,0
  • 16: 0,1,0
  • 17: 1,1,0
  • 18: 2,1,0
  • 19: 3,1,0
  • ...
  • 31: 15,1,0

这篇关于CUDA:二维网格中的线程 ID 分配的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持编程学习网!

本文标题为:CUDA:二维网格中的线程 ID 分配

基础教程推荐