M3 Ultra Mac Studio Review: Testing the 512GB RAM Model with Deepseek R1 AI

Summary:

The M3 Ultra Mac Studio, especially with 512GB of RAM, offers high performance but the significant RAM is the standout feature. While traditional tasks like video editing and gaming see improvements, the real advantage lies in the ability to run very large AI models locally, such as the 671 billion parameter Deepseek R1. This is particularly useful for privacy-sensitive applications like healthcare data analysis. The Mac Studio's unified memory architecture provides high bandwidth, crucial for AI model performance. Although there was an initial macOS limitation on vRAM allocation that needed a terminal command to fix, the system successfully runs the large quantized Deepseek R1 model at a usable speed (around 17-18 tokens/sec) with remarkably low power consumption (under 200W). Despite the high cost ($10K+ minimum for this configuration), the capability for local, private AI inference is the unique value proposition of this specific Mac Studio configuration.

Deepseek R1 performance benchmarks comparing M3 Ultra, M2 Ultra, and M4 Max [ 00:02:49 ]

Introduction and Specs [00:00:01]

The M3 Ultra Mac Studio features a 32-core CPU and 80-core GPU.
A key configuration offers 512GB of unified RAM with 819 GB/s bandwidth.
The reviewer's previous M1 Ultra Mac Studio was already sufficient for his video editing workflow (four simultaneous 4K cameras).
The M3 Ultra version improves this workflow but not significantly for his specific video editing needs.

Traditional Performance [00:00:44]

Apple marketing claims improvements in 3D rendering, code compiling, and other tasks.
- The M3 Ultra shows improvements in benchmarks like Cinebench and Adobe Premiere render time compared to previous Ultra/Max chips.
- CPU performance benchmarks [ 00:00:45 ]
- Xcode build time is faster on the M3 Ultra.
- Xcode benchmark showing build time in seconds [ 00:01:20 ]
Gaming performance is also faster due to the powerful GPU.
- GPU performance benchmarks for GFXBench, Baldur's Gate, and Shadow of the Tomb Raider [ 00:01:00 ]
However, the reviewer notes that users aren't likely buying this machine solely for these traditional tasks or gaming, especially considering the high RAM configuration.

The Significance of 512GB RAM [00:01:16]

The standout feature is the ability to have 512GB of high-bandwidth unified memory.
- Memory Bandwidth and Max Memory comparison across chips [ 00:02:44 ]
This large, fast memory capacity is uniquely suited for running very large AI models locally.
Local AI processing addresses privacy concerns, particularly for sensitive data like patient records in a health clinic setting.
The ability to fine-tune or use models on a local device is becoming increasingly desirable.

Running Large AI Models Locally [00:02:00]

The M3 Ultra Mac Studio was tested with several AI models, focusing on the Deepseek R1 671 billion parameter model.
Deepseek R1 requires a significant amount of memory (404GB in 4-bit quantized form).
Initial attempts to run the model failed due to a macOS limitation on vRAM allocation.
- macOS out-of-the-box limits the vRAM allocation.
- This limit was overcome by using a terminal command (sudo sysctl iogpu.wired_limit_mb=458752) to explicitly allocate 448GB of vRAM.
- Calculator and Terminal window showing vRAM allocation command [ 00:03:19 ]
The tested Deepseek R1 model was a 4-bit quantization, which reduces accuracy compared to the full version but still contains all 671 billion parameters.
Deepseek R1 Performance:
- The large 671B parameter model successfully ran on the highest configuration.
- Deepseek R1 performance benchmarks comparing different model sizes and configurations [ 00:02:59 ]
- The speed was measured around 17-18 tokens per second, which the reviewer finds usable for most applications, including code generation.
- Deepseek R1 generating JSON output [ 00:04:48 ]
- Deepseek R1 generating Python script output [ 00:04:58 ]
Output Quality:
- The reviewer notes the 4-bit quantization results in a reduction in accuracy compared to the full model.
- The quality is "not perfect" and depends on the specific task.
- Despite this, the impressive feat is that the model runs locally at all.

Power Consumption [00:03:56]

Running the large Deepseek R1 model resulted in a power draw well under 200 watts at the wall.
- Power meter showing current wattage [ 00:03:59 ]
Achieving a similar capability with traditional PC hardware (multiple GPUs) would require significantly more power (potentially 10 times more electricity).
The M3 Ultra Mac Studio's peak power consumption (480W) is higher than previous generations (M1/M2 Ultra were around 370W), reflecting the increased chip power.

Pricing and Value for Local AI [00:05:14]

The M3 Ultra with 512GB of RAM is expensive, costing at least $10,000.
While technically well-priced compared to building a consumer PC with similar cumulative vRAM, $10,000 provides substantial server time or cloud AI subscriptions.
The value proposition is specifically for users requiring private, localized large language model inference.
This configuration enables workflows that were previously not feasible on a local desktop system due to memory limitations.