gpuClustering.h - countModules
The whole kernel:
countModules kernel | |
|---|---|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 | |
1. Init for clustering
Again we have some part of the code here that has nothing to do with counting the modules.
9 10 | |
We initialise the clusterIds for the findClus kernel.
2. Digi order
Let's say we have a snippet from our id array.
Instead of having numbers for the id we'll use letters, A, B, C and D, and mark invalid module ids with ❌.
| id | A | A | ❌ | ❌ | A | B | ❌ | B | B | C | C | ❌ | ❌ | D |
Digis ordered by modules
It is a prerequisite and we know that digis belonging to one module will appear consecutive in our buffer. They might be separated by invalid digis/hits.
3. Look for boundary elements
Let's use our example digi array from the previous point.
In the first row we'll show id and in the second column the threadIdx.x.
| id | A | A | ❌ | ❌ | A | B | ❌ | B | B | C | C | ❌ | ❌ | D |
| thid.x | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 |
Let's execute some of our code:
11 12 13 | |
| id | A | A | ❌ | ❌ | A | B | ❌ | B | B | C | C | ❌ | ❌ | D |
| thid.x | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 |
| i | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 |
| j | -1 | 0 | ❌ | ❌ | 3 | 4 | ❌ | 6 | 7 | 8 | 9 | ❌ | ❌ | 12 |
14 15 | |
| id | A | A | ❌ | ❌ | A | B | ❌ | B | B | C | C | ❌ | ❌ | D |
| thid.x | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 |
| i | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 |
| j before | -1 | 0 | ❌ | ❌ | 3 | 4 | ❌ | 6 | 7 | 8 | 9 | ❌ | ❌ | 12 |
| while | ↓ | ↓ | ↓ | |||||||||||
| j after | -1 | 0 | ❌ | ❌ | 1 | 4 | ❌ | 5 | 7 | 8 | 9 | ❌ | ❌ | 10 |
16 17 18 19 20 | |
Let's set cond = (j < 0 or id[j] != id[i]). Check when this will be true (T is true, F is false, ❌ is not evaluated because that thread terminated early):
| id | A | A | ❌ | ❌ | A | B | ❌ | B | B | C | C | ❌ | ❌ | D |
| thid.x | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 |
| i | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 |
| j after | -1 | 0 | ❌ | ❌ | 1 | 4 | ❌ | 5 | 7 | 8 | 9 | ❌ | ❌ | 10 |
| cond | T | F | ❌ | ❌ | F | T | ❌ | F | F | T | F | ❌ | ❌ | T |
Let's just look at the first and last rows and get rid if False and not evaluated threads for cond to better see what is happening.
| id | A | A | ❌ | ❌ | A | B | ❌ | B | B | C | C | ❌ | ❌ | D |
| cond | T | T | T | T |
4. set moduleStart for each module
18 19 | |
We fill the moduleStart array with starting module indices. Note that we can't make sure that the first module we mark is A and then B, etc. This code is executed competitively so we might have different moduleStart array each execution:
| pos | 0 | 1 | 2 | 3 | 4 | 5 | 6 |
| moduleStart | 4 | 0 | 5 | 9 | 13 | 0 | 0 |
| moduleStart | 4 | 0 | 9 | 13 | 5 | 0 | 0 |
| moduleStart | 4 | 13 | 0 | 9 | 5 | 0 | 0 |
The order will be determined by in what order each thread reaches the line 18.
18 | |
5. Conclusion
Conclusion
We initialise our clusterIds for our later clustering algorrithm in findClus and fill our moduleStart array with the indices of the first digi/hit in each module.