Verified Solution[pytorch/pytorch] MPS: scaled_dot_product_attention returns wrong output shape when value dim != query/key dim
Sponsored Content
### ROOT CAUSE
The issue stems from the MPS backend implementation of `scaled_dot_product_attention` not properly handling cases where the value tensor's last dimension (Ev) differs from the query/key tensor's last dimension (Eq). The MPS implementation incorrectly uses the query/key dimension for the output shape, while the CPU implementation correctly uses the value dimension. This is due to a mismatch in the output dimension calculation in the MPS-specific code.
### CODE FIX
We need to modify the MPS implementation of `scaled_dot_product_attention` to ensure the output shape uses the value tensor's last dimension (Ev) instead of the query/key tensor's last dimension (Eq). Here's the fix:
```python
// File: torch/csrc/autograd/functions/mps/scaled_dot_product_attention.cpp
// ... existing code ...
// Replace the current output shape calculation with the value's last dimension
output = at::empty_with_strides(
output_sizes,
output_strides,
output_contiguous
.insert(/*dim*/ -1, /*new_dim_size*/ value_size[-1])
);
// ... existing code ...
```
This change ensures the output tensor's last dimension matches the value tensor's last dimension, aligning with the expected behavior.
Deploy on DigitalOcean ($200 Credit)
Related Fixes
[microsoft/vscode] Uzun metin
[microsoft/vscode] double click on number not always selecting entire word
[pytorch/pytorch] torch.compile error in unit tests, but test passes when ran individually