Server Interface (REST API)#

The lemonade SDK server interface allows your application to load an LLM on Ryzen AI hardware in a process, and then communicate with this process using standard REST APIs. This allows applications written in any language (C#, JavaScript, Python, C++, etc.) to easily integrate with Ryzen AI LLMs.

Server interfaces are used across the LLM ecosystem because they allow for no-code plug-and-play between the higher level of the application stack (GUIs, agents, RAG, etc.) with the LLM and hardware that have been abstracted by the server.

For example, open source projects such as Open WebUI have out-of-box support for connecting to a variety of server interfaces, which in turn allows users to quickly start working with LLMs in a GUI.

Server Setup#

The fastest way to set up the server is with the lemonade server installer.

  1. Make sure your system has the Ryzen AI 1.3 driver installed:

  • Download the NPU driver installation package NPU Driver

  • Install the NPU drivers by following these steps:

    • Extract the downloaded NPU_RAI1.3.zip zip file.

    • Open a terminal in administrator mode and execute the .\npu_sw_installer.exe exe file.

  • Ensure that NPU MCDM driver (Version:32.0.203.237 or 32.0.203.240) is correctly installed by opening Device Manager -> Neural processors -> NPU Compute Accelerator Device.

  1. Download and install Lemonade_Server_Installer.exe from the latest TurnkeyML release.

  2. Launch the server by double-clicking the lemonade_server shortcut added to your desktop.

Server Usage#

The lemonade server provides the following OpenAI-compatible endpoints:

  • POST /api/v0/chat/completions - Chat Completions (messages to completions)

  • GET /api/v0/models - List available models

Please refer to the server specification document in the lemonade repository for details about the request and response formats for each endpoint.

The OpenAI API documentation also has code examples for integrating streaming completions into an application.

Open WebUI Demo#

The best way to experience the lemonade server is to try it with an OpenAI-compatible application, like Open WebUI.

Instructions:#

First, launch the lemonade_server (see: server setup).

In a terminal, install Open WebUI using the following commands:

conda create -n webui python=3.11
conda activate webui
pip install open-webui
open-webui serve

To launch the UI, open a browser and navigate to http://localhost:8080/.

In the top-right corner of the UI, click the profile icon and then:

  1. Go to Settings -> Connections.

  2. Click the ‘+’ button to add our OpenAI-compatible connection.

  3. In the URL field, enter http://localhost:8000/api/v0, and in the key field put “-”, then press save.

Done! You are now able to run Open WebUI with Hybrid models. Feel free to choose any of the available “-Hybrid” models in the model selection menu.

Next Steps#

  • Visit the Supported LLMs table to see the set of hybrid checkpoints that can be used with the server.

  • Check out the lemonade server specification to learn more about supported features.

  • Try out your lemonade server install with any application that uses the OpenAI chat completions API.