Broader Frontends
Author : Kazuhiro Hara
Article permalink

Does using an eGPU with Tinygrad and a MacBook Air make LLMs faster?

Some people may have seen this in the news a little while ago, but apparently the driver released by Tinygrad lets you connect an external eGPU to a Mac and use it for LLMs.

This time I decided to use the GPD G1 eGPU I have on hand.

The specs are as follows.

  • AMD Radeon RX 7600M XT 8GB GDDR6
  • RDNA3
  • About 225 x 111 x 30 mm
  • About 0.92 kg

Tinygrad requires RDNA3+, and this seems to satisfy that requirement.

Setup

First, install the driver. There is also a desktop app called TinyGPU.app, but I followed the method below from the documentation instead.

$ curl -fsSL https://raw.githubusercontent.com/tinygrad/tinygrad/master/extra/setup_tinygpu_osx.sh | sh

It seems like there may be some preliminary steps depending on the Mac environment. In my case, I was asked to use venv. You may also be asked to clone the repository with Git and run python3 -m pip install -e ..

Next, set up the compiler.

$ curl -fsSL https://raw.githubusercontent.com/tinygrad/tinygrad/master/extra/setup_hipcomgr_osx.sh | sh

If anyone gets stuck, I think it will probably be at this stage. There may be compatibility issues with the eGPU, or conflicts with an existing Python environment. I had a little trouble, but overall it went fairly smoothly.

First, trying a small LLM

After finishing setup successfully, I tried playing with a small model before running benchmarks.

$ DEV=AMD python3 tinygrad/apps/llm.py
using model "Llama 3.2 1B Instruct" with 1,021,800,576 bytes and 1,498,482,688 params
>>> こんにちは
こんにちは!どういたしまして?

>>> あなたはだれですか?
私はAIです。

>>> これはAMDのGPUをつかっているのでしょうか?
はい、AMD Radeon GPUです。

It worked. It takes some time at first, but once it starts running, it feels fairly smooth. At this stage, it already feels like the eGPU is being used.

Checking whether the eGPU is actually recognized

Even if the LLM says, "Yes, it is an AMD Radeon GPU," that still feels a bit suspicious, so I tried using system_profiler.

$ system_profiler SPDisplaysDataType  
Graphics/Displays:

    Display:

      Type: External GPU
      Bus: PCIe
      PCIe Lane Width: x8
      Vendor: AMD (0x1002)
      Device ID: 0x7480
      Revision ID: 0x00c7

    Apple M3:

      Chipset Model: Apple M3
      Type: GPU
      Bus: Built-In
      Total Number of Cores: 10
      Vendor: Apple (0x106b)
      Metal Support: Metal 4
      Displays:
        Color LCD:
          Display Type: Built-in Liquid Retina Display
          Resolution: 2560 x 1664 Retina
          Main Display: Yes
          Mirror: Off
          Online: Yes
          Automatically Adjust Brightness: No
          Connection Type: Internal

It does look like the eGPU is recognized after all.

Running benchmarks

Now, let's compare the speed in each case and see how much faster it gets.

First, I ran it with the GPD G1 connected.

$ DEV=AMD python3 tinygrad/apps/llm.py --benchmark 50
using model "Llama 3.2 1B Instruct" with 1,021,800,576 bytes and 1,498,482,688 params
15133.79 ms,   0.07 tok/s,    0.07 GB/s, 1084/1290 MB  --  <|begin_of_text|>Tags
4018.62 ms,   0.25 tok/s,    0.26 GB/s, 1043/1290 MB  --  <|begin_of_text|>Tags:
1089.41 ms,   0.92 tok/s,    0.96 GB/s, 1044/1292 MB  --  <|begin_of_text|>Tags: 
 39.04 ms,  25.62 tok/s,   26.77 GB/s, 1045/1292 MB  --  <|begin_of_text|>Tags: 201

...

 36.34 ms,  27.52 tok/s,   29.59 GB/s, 1075/1292 MB  --  <|begin_of_text|>Tags: 2019, 2020, 2021, 2022, 2023, 2024, 2025, 2026, 2027, 2028, 2029, 2030

Next, the MacBook Air. This is probably the case where the built-in GPU is used.

$ python3 tinygrad/apps/llm.py --benchmark 50 
using model "Llama 3.2 1B Instruct" with 1,021,800,576 bytes and 1,498,482,688 params
11085.22 ms,   0.09 tok/s,    0.10 GB/s, 1084/1290 MB  --  <|begin_of_text|>Tags
5598.67 ms,   0.18 tok/s,    0.19 GB/s, 1043/1290 MB  --  <|begin_of_text|>Tags:
861.38 ms,   1.16 tok/s,    1.21 GB/s, 1044/1292 MB  --  <|begin_of_text|>Tags: 
142.57 ms,   7.01 tok/s,    7.33 GB/s, 1045/1292 MB  --  <|begin_of_text|>Tags: 201

...

143.95 ms,   6.95 tok/s,    7.47 GB/s, 1075/1292 MB  --  <|begin_of_text|>Tags: 2019, 2020, 2021, 2022, 2023, 2024, 2025, 2026, 2027, 2028, 2029, 2030

Finally, CPU only. As expected, this is slow.

$ DEV=CPU python3 tinygrad/apps/llm.py --benchmark 50
using model "Llama 3.2 1B Instruct" with 1,021,800,576 bytes and 1,498,482,688 params
13319.61 ms,   0.08 tok/s,    0.08 GB/s, 1080/1290 MB  --  <|begin_of_text|>Tags
6992.82 ms,   0.14 tok/s,    0.15 GB/s, 1043/1290 MB  --  <|begin_of_text|>Tags:
1167.93 ms,   0.86 tok/s,    0.89 GB/s, 1044/1292 MB  --  <|begin_of_text|>Tags: 
353.84 ms,   2.83 tok/s,    2.95 GB/s, 1044/1292 MB  --  <|begin_of_text|>Tags: 201

...

391.23 ms,   2.56 tok/s,    2.74 GB/s, 1073/1292 MB  --  <|begin_of_text|>Tags: 2019, 2020, 2021, 2022, 2023, 2024, 2025, 2026, 2027, 2028, 2029, 2030

Results

  • AMD: When using the GPD G1, it feels usable enough.
  • Default: Probably using the MacBook Air's built-in graphics.
  • CPU: Running on CPU only.

When using a MacBook Air M3, I found that using an eGPU makes it about four times faster. The GPD G1 is about half the size of the MacBook Air, so it feels like it would even be possible to take the eGPU outside with you. That said, remotely accessing a GPU-equipped machine at home may be the more practical option, but in terms of sheer appeal, this setup may win.

AIeGPULLMMac

Share