Verified Solution

[pytorch/pytorch] MPS: scaled_dot_product_attention returns wrong output shape when value dim != query/key dim

Sponsored Content
### ROOT CAUSE The issue stems from the MPS backend implementation of `scaled_dot_product_attention` not properly handling cases where the value tensor's last dimension (Ev) differs from the query/key tensor's last dimension (Eq). The MPS implementation incorrectly uses the query/key dimension for the output shape, while the CPU implementation correctly uses the value dimension. This is due to a mismatch in the output dimension calculation in the MPS-specific code. ### CODE FIX We need to modify the MPS implementation of `scaled_dot_product_attention` to ensure the output shape uses the value tensor's last dimension (Ev) instead of the query/key tensor's last dimension (Eq). Here's the fix: ```python // File: torch/csrc/autograd/functions/mps/scaled_dot_product_attention.cpp // ... existing code ... // Replace the current output shape calculation with the value's last dimension output = at::empty_with_strides( output_sizes, output_strides, output_contiguous .insert(/*dim*/ -1, /*new_dim_size*/ value_size[-1]) ); // ... existing code ... ``` This change ensures the output tensor's last dimension matches the value tensor's last dimension, aligning with the expected behavior.
Deploy on DigitalOcean ($200 Credit)

Related Fixes

[microsoft/vscode] Uzun metin
[microsoft/vscode] double click on number not always selecting entire word
[pytorch/pytorch] torch.compile error in unit tests, but test passes when ran individually