Does using an eGPU with Tinygrad and a MacBook Air make LLMs faster?
Some people may have seen this in the news a little while ago, but apparently the driver released by Tinygrad lets you connect an external eGPU to a Mac and use it for LLMs.
This time I decided to use the GPD G1 eGPU I have on hand.
The specs are as follows.
- AMD Radeon RX 7600M XT 8GB GDDR6
- RDNA3
- About 225 x 111 x 30 mm
- About 0.92 kg
Tinygrad requires RDNA3+, and this seems to satisfy that requirement.
Setup
First, install the driver. There is also a desktop app called TinyGPU.app, but I followed the method below from the documentation instead.
$ curl -fsSL https://raw.githubusercontent.com/tinygrad/tinygrad/master/extra/setup_tinygpu_osx.sh | sh
It seems like there may be some preliminary steps depending on the Mac environment. In my case, I was asked to use venv. You may also be asked to clone the repository with Git and run python3 -m pip install -e ..
Next, set up the compiler.
$ curl -fsSL https://raw.githubusercontent.com/tinygrad/tinygrad/master/extra/setup_hipcomgr_osx.sh | sh
If anyone gets stuck, I think it will probably be at this stage. There may be compatibility issues with the eGPU, or conflicts with an existing Python environment. I had a little trouble, but overall it went fairly smoothly.
First, trying a small LLM
After finishing setup successfully, I tried playing with a small model before running benchmarks.
$ DEV=AMD python3 tinygrad/apps/llm.py
using model "Llama 3.2 1B Instruct" with 1,021,800,576 bytes and 1,498,482,688 params
>>> こんにちは
こんにちは!どういたしまして?
>>> あなたはだれですか?
私はAIです。
>>> これはAMDのGPUをつかっているのでしょうか?
はい、AMD Radeon GPUです。
It worked. It takes some time at first, but once it starts running, it feels fairly smooth. At this stage, it already feels like the eGPU is being used.
Checking whether the eGPU is actually recognized
Even if the LLM says, "Yes, it is an AMD Radeon GPU," that still feels a bit suspicious, so I tried using system_profiler.
$ system_profiler SPDisplaysDataType
Graphics/Displays:
Display:
Type: External GPU
Bus: PCIe
PCIe Lane Width: x8
Vendor: AMD (0x1002)
Device ID: 0x7480
Revision ID: 0x00c7
Apple M3:
Chipset Model: Apple M3
Type: GPU
Bus: Built-In
Total Number of Cores: 10
Vendor: Apple (0x106b)
Metal Support: Metal 4
Displays:
Color LCD:
Display Type: Built-in Liquid Retina Display
Resolution: 2560 x 1664 Retina
Main Display: Yes
Mirror: Off
Online: Yes
Automatically Adjust Brightness: No
Connection Type: Internal
It does look like the eGPU is recognized after all.
Running benchmarks
Now, let's compare the speed in each case and see how much faster it gets.
First, I ran it with the GPD G1 connected.
$ DEV=AMD python3 tinygrad/apps/llm.py --benchmark 50
using model "Llama 3.2 1B Instruct" with 1,021,800,576 bytes and 1,498,482,688 params
15133.79 ms, 0.07 tok/s, 0.07 GB/s, 1084/1290 MB -- <|begin_of_text|>Tags
4018.62 ms, 0.25 tok/s, 0.26 GB/s, 1043/1290 MB -- <|begin_of_text|>Tags:
1089.41 ms, 0.92 tok/s, 0.96 GB/s, 1044/1292 MB -- <|begin_of_text|>Tags:
39.04 ms, 25.62 tok/s, 26.77 GB/s, 1045/1292 MB -- <|begin_of_text|>Tags: 201
...
36.34 ms, 27.52 tok/s, 29.59 GB/s, 1075/1292 MB -- <|begin_of_text|>Tags: 2019, 2020, 2021, 2022, 2023, 2024, 2025, 2026, 2027, 2028, 2029, 2030
Next, the MacBook Air. This is probably the case where the built-in GPU is used.
$ python3 tinygrad/apps/llm.py --benchmark 50
using model "Llama 3.2 1B Instruct" with 1,021,800,576 bytes and 1,498,482,688 params
11085.22 ms, 0.09 tok/s, 0.10 GB/s, 1084/1290 MB -- <|begin_of_text|>Tags
5598.67 ms, 0.18 tok/s, 0.19 GB/s, 1043/1290 MB -- <|begin_of_text|>Tags:
861.38 ms, 1.16 tok/s, 1.21 GB/s, 1044/1292 MB -- <|begin_of_text|>Tags:
142.57 ms, 7.01 tok/s, 7.33 GB/s, 1045/1292 MB -- <|begin_of_text|>Tags: 201
...
143.95 ms, 6.95 tok/s, 7.47 GB/s, 1075/1292 MB -- <|begin_of_text|>Tags: 2019, 2020, 2021, 2022, 2023, 2024, 2025, 2026, 2027, 2028, 2029, 2030
Finally, CPU only. As expected, this is slow.
$ DEV=CPU python3 tinygrad/apps/llm.py --benchmark 50
using model "Llama 3.2 1B Instruct" with 1,021,800,576 bytes and 1,498,482,688 params
13319.61 ms, 0.08 tok/s, 0.08 GB/s, 1080/1290 MB -- <|begin_of_text|>Tags
6992.82 ms, 0.14 tok/s, 0.15 GB/s, 1043/1290 MB -- <|begin_of_text|>Tags:
1167.93 ms, 0.86 tok/s, 0.89 GB/s, 1044/1292 MB -- <|begin_of_text|>Tags:
353.84 ms, 2.83 tok/s, 2.95 GB/s, 1044/1292 MB -- <|begin_of_text|>Tags: 201
...
391.23 ms, 2.56 tok/s, 2.74 GB/s, 1073/1292 MB -- <|begin_of_text|>Tags: 2019, 2020, 2021, 2022, 2023, 2024, 2025, 2026, 2027, 2028, 2029, 2030
Results
- AMD: When using the GPD G1, it feels usable enough.
- Default: Probably using the MacBook Air's built-in graphics.
- CPU: Running on CPU only.
When using a MacBook Air M3, I found that using an eGPU makes it about four times faster. The GPD G1 is about half the size of the MacBook Air, so it feels like it would even be possible to take the eGPU outside with you. That said, remotely accessing a GPU-equipped machine at home may be the more practical option, but in terms of sheer appeal, this setup may win.