Verified Solution[pytorch/pytorch] [CUDA] AdaptiveAvgPool2d: START_IND macro causes int32 overflow → OOB read
Sponsored Content
### ROOT CAUSE
The `START_IND` macro in `AdaptiveAveragePooling.cu` uses integer arithmetic that can overflow for large input dimensions. Specifically, the expression `(a) / (b) * (c)` and `(a) % (b)` can produce intermediate results exceeding `int32` limits when `a`, `b`, or `c` are large. This overflow leads to incorrect indices, causing an out-of-bounds memory read during the computation.
### CODE FIX
Replace the `START_IND` macro with a version using `int64_t` for intermediate calculations to prevent overflow:
```c
// Original (problematic) macro:
#define START_IND(a,b,c) ((int64_t)(a) / (b) * (c) + ((int64_t)(a) % (b))
// Fixed macro:
#define START_IND(a,b,c) (((int64_t)(a) / (b)) * (c) + ((int64_t)(a) % (b)))
```
This change ensures all intermediate calculations are performed in `int64_t`, avoiding overflow for large values. The fix is applied in the CUDA kernel code for `adaptive_average_pool`.
Deploy on DigitalOcean ($200 Credit)
Related Fixes
[rust-lang/rust] [ICE]: `type variables should not be hashed`
[pytorch/pytorch] [dynamo] Raising non-BaseException values causes graph break instead of TypeError
[StackOverflow/kubernetes] How to handle `node(s) had taint` error in eks cluster?