AI Image generation on AMD AI MAX 395
Having gotten an AMD AI 395+-based machine this month, I’ve been using Lemonade to run models, which takes care of bootstrapping RocM, I now wanted to bring this into a workflow I had been tested a couple of months ago which was toying around with making a game. My goal was to create a point-and-click game, but given that I’m awful at graphics, I had been taking photos and using AI to generate the style of image and to make tweaks in preparation to build into a game.
To do this, I had been using AUTOMATIC1111’s STable diffusion WebUI, which runs the models and provides an API for image generation and then OpenOutpaint to load the images and rework (inpaint) and extent (outpaint) them.
I was running this originally on my 2060, but due to VRAM limitations, I could only process a 512x512 section at a time - whilst this wasn’t an issue of “I had to run it multiple times and it took too long” (especially since each render only took ~30s and ~2m for batches of ~8 - though my memory maybe hazey here), it was more that I couldn’t include enough surround image for context, especially when trying to add features that fit in with the rest of the image, especially when trying to make, not only a high resolution image, but also one that would allow the user to explore (which would require a very wide image - >3K pixels width).
So, now armed with my Framework desktop, I wanted to really make use of it’s 128GB of unified memory - though, honeslty, I knew performance would take a hit (and yes, I am writing this past tense, but I am writing this as I go, so, no, I have not yet tried it!) - not only would it be multiple times slower than the GPU, with the increase in context size (etc), this would also greatly decrease it as well - perhaps I’d get a batched render of ~1500x800 done in… 5 minutes?
But, the most important challenge to begin with was how to integrate it into my setup. I assume given the need for RocM that just using the previously mentioned projects “vanilla” would probably not work and I know that Lemonade does support some image generation (or at least provides some APIs for it). I’m not fully certain at this point whether the APIs that are propvided by stable diffusion UI are standard enough that lemonade provides the same APIs, or I need to use Stable diffusion with a “cloud” provider and point that to the Lemonade APIs.
Running Stable Diffusion WebUI
Even before this, the way I’m using this machine is fully via docker - I have absolutely nothing installed other than that and I wish to keep it this way and Stable Diffusion WebUI (SDWUI - I’m going to abbreviate it to because I’m tired of writing it) expects to be installed on a host and there’s no mention of a Dockerfile, except in a PR from someone, which doesn’t appear to even work.
The REAMDE states that for “newer” linux versions, an “old” version of Python3 (3.11) needs to be installed, using some hooky PPA. Given that debian trixie is the latest released at this point and bookworm is only just behind, but natively provides 3.11 - I’ll stick with this.
We’ll start with a very basic shim to kick us off:
FROM debian:bookworm
RUN apt update && apt install --assume-yes wget git python3 python3-venv libgl1 libglib2.0-0 && apt clean all
ADD . /code
WORKDIR /code
RUN bash webui.sh
(I’m only documenting this because I just have a feeling something AI-related is going to make this interesting).
################################################################
ERROR: This script must not be launched as root, aborting...
################################################################
huh, what a cheek!
The first build ended up failing:
Successfully installed MarkupSafe-3.0.3 certifi-2026.5.20 charset_normalizer-3.4.7 filelock-3.29.0 fsspec-2026.4.0 idna-3.16 jinja2-3.1.6 mpmath-1.3.0 networkx-3.6.1 numpy-2.4.6 pillow-12.2.0 requests-2.34.2 sympy-1.14.0 torch-2.1.2+cu121 torchvision-0.16.2+cu121 triton-2.1.0 typing-extensions-4.15.0 urllib3-2.7.0
Traceback (most recent call last):
File "/code/launch.py", line 48, in <module>
main()
File "/code/launch.py", line 39, in main
prepare_environment()
File "/code/modules/launch_utils.py", line 387, in prepare_environment
raise RuntimeError(
RuntimeError: Torch is not able to use GPU; add --skip-torch-cuda-test to COMMANDLINE_ARGS variable to disable this check
COMMIT
--> a8582561756d
a8582561756d812c5807d7423ac48a7942f3c952ce5a04873058f9a9321d6e8b
It’s super annoying to have this random script that installs dependencies and do other things in one go (especially when it exits with success!) - though maybe this isn’t a bad thing - I have the dependencies installed now - I could add the env variable after the installation and then the script will work during normally running. Unfortunately, I trashed by Docker build cache due to not ignoring the Dockerfile, so alas the 3GB of python dependencies was re-downloaded. Honestly, we should’t even need these, because this application won’t even be running inference on the model, but oh well.
Just to recap, we’re now looking closer to:
FROM debian:bookworm
RUN apt update && apt install --assume-yes wget git python3 python3-venv libgl1 libglib2.0-0 && apt clean all
RUN adduser notroot
RUN mkdir -p /code/.pip-cache
RUN chown notroot:notroot /code/.pip-cache
RUN install -d -o notroot /code
USER notroot
ADD --chown=notroot . /code
WORKDIR /code
RUN --mount=type=cache,target=/code/.pip-cache \
PIP_CACHE_DIR=/code/.pip-cache \
bash webui.sh
ENV COMMANDLINE_ARGS="--skip-torch-cuda-test"
ENTRYPOINT /bin/bash
CMD /code/webui.sh
(with a .dockerignore container Dockerfile!)
But oh no:
notroot@258e72dc42b8:/code$ /code/webui.sh
################################################################
Install script for stable-diffusion + Web UI
Tested on Debian 11 (Bullseye), Fedora 34+ and openSUSE Leap 15.4 or newer.
################################################################
################################################################
Running on notroot user
################################################################
################################################################
Repo already cloned, using it as install directory
################################################################
################################################################
Create and activate python venv
################################################################
################################################################
Launching launch.py...
################################################################
glibc version is 2.36
Cannot locate TCMalloc. Do you have tcmalloc or google-perftool installed on your system? (improves CPU memory usage)
Python 3.11.2 (main, Apr 8 2026, 01:58:00) [GCC 12.2.0]
Version: v1.10.1-1-g6b5a4c36
Commit hash: 6b5a4c36eec3ca8e9985077ec2906dcb9f518df9
Installing clip
Traceback (most recent call last):
File "/code/launch.py", line 48, in <module>
main()
File "/code/launch.py", line 39, in main
prepare_environment()
File "/code/modules/launch_utils.py", line 394, in prepare_environment
run_pip(f"install {clip_package}", "clip")
File "/code/modules/launch_utils.py", line 144, in run_pip
return run(f'"{python}" -m pip {command} --prefer-binary{index_url_line}', desc=f"Installing {desc}", errdesc=f"Couldn't install {desc}", live=live)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/code/modules/launch_utils.py", line 116, in run
raise RuntimeError("\n".join(error_bits))
RuntimeError: Couldn't install clip.
Command: "/code/venv/bin/python" -m pip install https://github.com/openai/CLIP/archive/d50d76daa670286dd6cacf3bcd80b5e4823fc8e1.zip --prefer-binary
Error code: 1
stdout: Collecting https://github.com/openai/CLIP/archive/d50d76daa670286dd6cacf3bcd80b5e4823fc8e1.zip
Using cached d50d76daa670286dd6cacf3bcd80b5e4823fc8e1.zip (4.3 MB)
Installing build dependencies: started
Installing build dependencies: finished with status 'done'
Getting requirements to build wheel: started
Getting requirements to build wheel: finished with status 'error'
stderr: error: subprocess-exited-with-error
× Getting requirements to build wheel did not run successfully.
│ exit code: 1
╰─> [20 lines of output]
Traceback (most recent call last):
File "/code/venv/lib/python3.11/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 389, in <module>
main()
File "/code/venv/lib/python3.11/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 373, in main
json_out["return_val"] = hook(**hook_input["kwargs"])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/code/venv/lib/python3.11/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 143, in get_requires_for_build_wheel
return hook(config_settings)
^^^^^^^^^^^^^^^^^^^^^
File "/tmp/pip-build-env-cc6h99ax/overlay/lib/python3.11/site-packages/setuptools/build_meta.py", line 333, in get_requires_for_build_wheel
return self._get_build_requires(config_settings, requirements=[])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/tmp/pip-build-env-cc6h99ax/overlay/lib/python3.11/site-packages/setuptools/build_meta.py", line 301, in _get_build_requires
self.run_setup()
File "/tmp/pip-build-env-cc6h99ax/overlay/lib/python3.11/site-packages/setuptools/build_meta.py", line 520, in run_setup
super().run_setup(setup_script=setup_script)
File "/tmp/pip-build-env-cc6h99ax/overlay/lib/python3.11/site-packages/setuptools/build_meta.py", line 317, in run_setup
exec(code, locals())
File "<string>", line 3, in <module>
ModuleNotFoundError: No module named 'pkg_resources'
[end of output]
note: This error originates from a subprocess, and is likely not a problem with pip.
ERROR: Failed to build 'https://github.com/openai/CLIP/archive/d50d76daa670286dd6cacf3bcd80b5e4823fc8e1.zip' when getting requirements to build wheel
I’m not sure the value in enforcing installation of packages during the start (even launcher.py, not just startui.sh). Especially, given we’re not doing any actual inference, they’re mostly pointless (I suspect?) and particularly because all of the nvidia* are literally Gigabytes in size…
But alas, just to keep it happy, I created a slightly modified Dockerfile:
FROM debian:bookworm
ENV DEBIAN_FRONTEND=noninteractive
RUN apt-get update && apt-get install -y \
git \
wget \
python3 \
python3-venv \
python3-pip \
python3-setuptools \
libgl1 \
libglib2.0-0 \
&& rm -rf /var/lib/apt/lists/*
RUN adduser notroot
WORKDIR /code
COPY --chown=notroot:notroot . /code
USER notroot
RUN python3 -m venv /code/venv
ENV VIRTUAL_ENV=/code/venv
ENV PATH="/code/venv/bin:$PATH"
ENV PIP_NO_BUILD_ISOLATION=1
ENV PIP_CACHE_DIR=/home/notroot/.cache/pip
RUN mkdir -p /home/notroot/.cache/pip
RUN --mount=type=cache,target=/home/notroot/.cache/pip \
pip install --upgrade pip setuptools wheel
RUN --mount=type=cache,target=/home/notroot/.cache/pip \
pip install -r requirements_versions.txt
ENV COMMANDLINE_ARGS="--skip-torch-cuda-test --no-half --disable-safe-unpickle --listen --port 7860"
CMD ["python", "launch.py"]
and then used the following to fix the remaining re-ocurring issues:
RUN /code/venv/bin/pip install setuptools wheel
RUN /code/venv/bin/pip install git+https://github.com/openai/CLIP.git
ENV PIP_NO_BUILD_ISOLATION=0
Re-implemented https://github.com/AUTOMATIC1111/stable-diffusion-webui/pull/17207/files into the code:
RUN sed -i 's#https://github.com/Stability-AI/stablediffusion.git#https://github.com/w-e-w/stablediffusion.git#g' /code/modules/launch_utils.py
After this, the first run of a container did download the Github repos and did try to download a model (but that was fixed by updating:)
ENV COMMANDLINE_ARGS="--skip-torch-cuda-test --no-half --disable-safe-unpickle --listen --port 7860 --no-download-sd-model"
However, I couldn’t see that it was going to work.. was my memory wrong.. but the SDWUI doesn’t appear to support connecting to a provider. On top of this, running openOutpaint, it doesn’t appear to support directly hitting lemonade (it calls to /v1/sdapi endpoints that don’t exist).
Getting Stable Diffusion WebUI working with Rocm
In the webui.sh bash script it tries to detect the device and then update an env variable to determine which pip packages to use:
...
...
So I got the latest version (rocm 7.2) and stuck into Dockerfile and allowed it to download the model again:
RUN --mount=type=cache,target=/home/notroot/.cache/pip \
pip install torch torchvision --index-url https://download.pytorch.org/whl/rocm7.2
ENV COMMANDLINE_ARGS="--skip-torch-cuda-test --no-half --disable-safe-unpickle --listen --port 7860"
Then run:
podman run -ti --device=/dev/kfd --device=/dev/dri -p 0.0.0.0:7860:7860 stable-diffusion-webui
Hmm, now have the right version installed, but we’re not seeing the GPU:
notroot@39468eb83c9c:/code$ python -c "import torch; print(torch.__version__)"
2.4.1+rocm6.0
notroot@39468eb83c9c:/code$ python -c "import torch; print(torch.cuda.is_available()); print(torch.cuda.device_count())"
False
0
Attempted to set: HIP_VISIBLE_DEVICES=0 HSA_OVERRIDE_GFX_VERSION=11.0.0 with no avail.
root@a1696dc128a3:/code# rocminfo | grep gfx
Warning: Agent creation failed.
The GPU node has an unrecognized id.
After this rebuild, I’d just fixed the caching in the previous (adding UID to adduser command and uid=1000 to fix the cache directory permissions) and thankfully had:
Using cached torch-2.12.0%2Brocm7.2-cp311-cp311-manylinux_2_28_x86_64.whl (6176.9 MB)
But, alas, upgrading to rocm7.2 didn’t help either…
I was about to start looking at ComfyUI, but I took a look at their installation and they were using https://rocm.nightlies.amd.com/ I installed rocminfo in the container and it couldn’t detect the GPU. So, my next step was to switch to the rocm docker images for pytorch - picking the latest (7.2.2), using ubuntu 22.04 (since it has a more pallatable python 3.10, rather than 3.12 in ubuntu 24.04). Since this image containers pytorch, we’ll disable it in the Dockerfile…
Some debugging later, I was able to remove the venv entirely from stable diffusion WebUI, thus more easily retaining the pre-installed packages in the image. And Finally….
root@0ef236a85088:/code# python -c "import torch; print(torch.cuda.is_available()); print(torch.cuda.device_count())"
True
1
But alas during generation:
$ rocm-smi
=================================================== Concise Info ===================================================
Device Node IDs Temp Power Partitions SCLK MCLK Fan Perf PwrCap VRAM% GPU%
^[3m (DID, GUID) (Edge) (Socket) (Mem, Compute, ID) ^[0m
====================================================================================================================
0 1 0x1586, 48292 45.0 C 106.012W N/A, N/A, 0 N/A 1000Mhz 0% auto N/A 31% 0%
The following did not work either:
export COMMANDLINE_ARGS="$COMMANDLINE_ARGS --opt-sdp-attention"
TBC?
At this point I’d spent far longer fighting container images, Python environments, ROCm versions, PyTorch wheels and hardware detection than I had actually generating images.
The GPU was now visible to PyTorch, Stable Diffusion WebUI was launching, and yet generation still wasn’t making use of the hardware. Every next step seemed to reveal another layer of assumptions baked into tooling that was primarily designed around NVIDIA GPUs and bare-metal installations.
I could have kept digging. There were still avenues left to explore: newer ROCm nightlies, ComfyUI, alternative Stable Diffusion frontends, different container configurations, and whatever combination of flags someone on a forum thread from six months ago had discovered at 2am. But by this point the original goal had been completely lost.
I wasn’t trying to become an expert in ROCm containers. I just wanted to generate some images for a point-and-click game.
So that’s where this story ends. The AMD AI MAX 395 is undoubtedly capable hardware, and I’m fairly convinced there’s a working configuration hiding somewhere behind the next few hours—or days—of debugging. But I reached the point where the value of solving the problem was lower than the cost of continuing to chase it, and I suspect I’ll be back to try again with completely different tools later.