Verified Solution

[pytorch/pytorch] MPS: scaled_dot_product_attention returns wrong output shape when value dim != query/key dim

### ROOT CAUSE The MPS backend's `scaled_dot_product_attention` function incorrectly returns the output with the value's shape when the value dimension does not match the query/key dimension. This occurs because the function mistakenly uses the value's dimension to determine the output shape, whereas the output should always match the query's dimension. This inconsistency with the non-MPS backend, which correctly returns the value's shape, leads to the wrong output shape. ### CODE FIX ```cpp // In ScaledDotProductAttention.cpp, locate the forward function and remove the following code block if it exists: if (!isValueEqualQuery) { // This block likely projects the output to the query dimension. Remove it. } ``` Ensure that the output is always computed as the matrix multiplication of the attention weights and the value tensor, returning the result with the value's shape, consistent with the non-MPS backend.

Deploy on DigitalOcean ($200 Credit)

Related Fixes

[docker/cli] Run Dev Containers inside Docker Sandboxes for AI coding agent workflows

[pytorch/pytorch] [CUDA] AdaptiveAvgPool2d: START_IND macro causes int32 overflow → OOB read

[StackOverflow/kubernetes] Setting environment variables in deployment from Vault secret