Skip to content

gpuClustering.h - countModules

The whole kernel:

countModules kernel
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
  template <bool isPhase2>
  __global__ void countModules(uint16_t const* __restrict__ id,
                               uint32_t* __restrict__ moduleStart,
                               int32_t* __restrict__ clusterId,
                               int numElements) {
    int first = blockDim.x * blockIdx.x + threadIdx.x;
    constexpr int nMaxModules = isPhase2 ? phase2PixelTopology::numberOfModules : phase1PixelTopology::numberOfModules;
    assert(nMaxModules < maxNumModules);
    for (int i = first; i < numElements; i += gridDim.x * blockDim.x) {
      clusterId[i] = i;
      if (invalidModuleId == id[i])
        continue;
      auto j = i - 1;
      while (j >= 0 and id[j] == invalidModuleId)
        --j;
      if (j < 0 or id[j] != id[i]) {
        // boundary...
        auto loc = atomicInc(moduleStart, nMaxModules);
        moduleStart[loc + 1] = i;
      }
    }
  }

1. Init for clustering

Again we have some part of the code here that has nothing to do with counting the modules.

 9
10
for (int i = first; i < numElements; i += gridDim.x * blockDim.x) {
    clusterId[i] = i;

We initialise the clusterIds for the findClus kernel.

2. Digi order

Let's say we have a snippet from our id array.

Instead of having numbers for the id we'll use letters, A, B, C and D, and mark invalid module ids with ❌.

idAAABBBCCD

Digis ordered by modules

It is a prerequisite and we know that digis belonging to one module will appear consecutive in our buffer. They might be separated by invalid digis/hits.

3. Look for boundary elements

Let's use our example digi array from the previous point.

In the first row we'll show id and in the second column the threadIdx.x.

idAAABBBCCD
thid.x012345678910111213

Let's execute some of our code:

11
12
13
if (invalidModuleId == id[i])
  continue;
auto j = i - 1;
idAAABBBCCD
thid.x012345678910111213
i012345678910111213
j-1034678912
14
15
while (j >= 0 and id[j] == invalidModuleId)
  --j;
idAAABBBCCD
thid.x012345678910111213
i012345678910111213
j before-1034678912
while
j after-1014578910
16
17
18
19
20
if (j < 0 or id[j] != id[i]) {
  // boundary...
  auto loc = atomicInc(moduleStart, nMaxModules);
  moduleStart[loc + 1] = i;
}

Let's set cond = (j < 0 or id[j] != id[i]). Check when this will be true (T is true, F is false, ❌ is not evaluated because that thread terminated early):

idAAABBBCCD
thid.x012345678910111213
i012345678910111213
j after-1014578910
condTFFTFFTFT

Let's just look at the first and last rows and get rid if False and not evaluated threads for cond to better see what is happening.

idAAABBBCCD
condTTTT

4. set moduleStart for each module

18
19
auto loc = atomicInc(moduleStart, nMaxModules);
moduleStart[loc + 1] = i;

We fill the moduleStart array with starting module indices. Note that we can't make sure that the first module we mark is A and then B, etc. This code is executed competitively so we might have different moduleStart array each execution:

pos0123456
moduleStart40591300
moduleStart40913500
moduleStart41309500

The order will be determined by in what order each thread reaches the line 18.

18
auto loc = atomicInc(moduleStart, nMaxModules);

5. Conclusion

Conclusion

We initialise our clusterIds for our later clustering algorrithm in findClus and fill our moduleStart array with the indices of the first digi/hit in each module.