Verified Solution

[pytorch/pytorch] Feature Request: Extend device checks to support privateuse1 for shared at::native operators logic

### ROOT CAUSE The issue arises because the device checks in some shared at::native operators block the `privateuse1` device from reusing the stable operator implementations. These checks are too restrictive and prevent the use of `privateuse1` even in device-agnostic operators, forcing code duplication. ### CODE FIX To resolve this, modify the device checks in the at::native operators to allow `privateuse1` devices. Specifically, update the condition in the operators to include `at::DeviceType::PrivateUse1` as a valid device for execution. For example: ```cpp if (device.type() != at::DeviceType::CPU && device.type() != at::DeviceType::PrivateUse1) { // Instead of blocking all non-CPU devices, allow PrivateUse1 // This enables reuse of the operator implementation for custom backends } ``` This change ensures that the `privateuse1` device is recognized and the operator implementation is reused, reducing code duplication and leveraging PyTorch's stable logic. Apply this fix to all affected operators in the codebase.

Deploy on DigitalOcean ($200 Credit)

Related Fixes

[pytorch/pytorch] torch.dot under vmap lowers to unfused extern_kernels.bmm — pointwise mul+sum is 25-30% faster for small vectors

[microsoft/vscode] "Gemini 3 Flash (Preview)" is actually "Gemini 2.0 Flash (Preview)"

[docker/cli] Incorrect volume mapping/swapping when multiple local directories are mounted in docker sandbox run