Using the LLM client¶
The LLM Client library in YARF provides Robot Framework keywords for interacting with Large Language Model servers that support the OpenAI Chat Completions API format. This enables you to integrate AI capabilities into your test automation workflows.
In this guide, we will cover:
Setting up an LLM server¶
The LLM Client is designed to work with any server that implements the OpenAI Chat Completions API.
Option 1: Inference Snap¶
Inference snaps are Canonical’s way of packaging AI models as snaps that are tuned for efficient local inference. Each snap automatically detects your machine’s available hardware (CPU/GPU/NPU) and selects a compatible runtime and model optimizations.
They are an easy way to run an OpenAI-compatible LLM endpoint locally, which makes them a great fit for YARF’s LLM Client. For this guide, you’ll install the Qwen VL snap, and configure it to be used with YARF.
For more details on available snaps and management commands, see the official Inference Snaps docs: https://documentation.ubuntu.com/inference-snaps/.
Steps¶
Install an inference snap. For this example, we will use the Qwen VL snap, which provides a vision-capable model:
Install the Qwen VL inference snap¶sudo snap install qwen-vl --channel "2.5/beta"
Inference snaps start their API service automatically. Check the active API URL:
Get the OpenAI-compatible endpoint¶qwen-vl status # engine: cpu-avx512 # endpoints: # openai: http://localhost:8326/v1
Get the exact model name from the snap:
List model IDs exposed by the inference snap¶curl -s http://localhost:8326/v1/models | jq -r '.data[].id' # /snap/qwen-vl/components/248/model-qwen2-5-vl-7b-instruct-q4-k-m/Qwen2.5-VL-7B-Instruct-Q4_K_M.gguf
Configure the LLM Client in your Robot Framework tests to use the snap’s endpoint and model:
Configure LLM Client for an inference snap endpoint¶*** Test Cases *** Use Inference Snap Configure Llm Client ... server_url=http://localhost:8326/v1 ... endpoint=/chat/completions ... model=/snap/qwen-vl/components/248/model-qwen2-5-vl-7b-instruct-q4-k-m/Qwen2.5-VL-7B-Instruct-Q4_K_M.gguf
Note
The inference snap only exposes a single model, so you can also omit the
modelparameter and it will use that one by default.
Option 2: Ollama¶
Ollama is a local LLM server that allows you to run various language models on your machine. It provides an OpenAI-compatible API, making it easy to integrate with the LLM Client.
Steps¶
Install Ollama:
Install Ollama on Linux¶curl -fsSL https://ollama.com/install.sh | sh
Run a vision-capable model (example:
qwen3-vl:2b-instruct):Download and run a vision-capable model¶ollama run qwen3-vl:2b-instruct
By default, Ollama uses the same default values as the LLM Client (
http://localhost:11434/v1and/chat/completions), so you only need to configure the model:Configure LLM Client for an Ollama model¶*** Test Cases *** Use Ollama Model Configure Llm Client ... model=qwen3-vl:2b-instruct
Basic text prompting¶
To use the LLM Client in your Robot Framework tests, first import the library:
*** Settings ***
Library yarf.rf_libraries.libraries.llm_client.LlmClient
Simple text prompt¶
*** Test Cases ***
Ask LLM a Question
${response}= Prompt Llm What is the capital of France?
Log ${response}
Should Contain ${response} Paris
Using system prompts¶
System prompts help guide the LLM’s behavior and responses:
*** Test Cases ***
Structured Response
${response}= Prompt Llm
... prompt=Analyze this test result
... system_prompt=You are a test automation expert. Provide concise, actionable feedback.
Log ${response}
Using images in prompts¶
The LLM Client supports multi-modal prompts that include both text and images. This is particularly useful for visual testing scenarios.
Several image-based keywords can either receive an explicit image or grab a
fresh screenshot automatically. When the image argument is omitted, the LLM
Client uses the active VideoInput library to capture the current screen.
Image from file path¶
*** Test Cases ***
Analyze Interface Screenshot
# Take a screenshot first (example using YARF's screenshot capabilities)
${image} = Grab Screenshot
# Prompt the LLM to analyze the image
${analysis}= Prompt Llm
... prompt=Describe what you see in this user interface
... image=${image}
Log ${analysis}
# The screenshot shows an image of a simple calculator
Should Contain ${analysis} calculator
Image validation workflows¶
*** Test Cases ***
Validate Installation Screen
${image} = Grab Screenshot
${validation}= Prompt Llm
... prompt=Does this screen show the ubuntu installation on the "choose your language" step? Answer with YES or NO.
... image=${image}
... system_prompt=You are a UI testing assistant. Be very precise in your answers.
Should Start With ${validation} YES
Using VQA GUI keywords¶
The LLM Client includes keywords that leverage the model’s vision capabilities to perform visual validation and assertions on the current screen.
Checking for visual corruption¶
Check For Visual Corruption asks the model whether an image contains visual
artifacts or corruption. If no image is provided, the keyword grabs a
screenshot from VideoInput.
*** Test Cases ***
Current Screen Is Not Corrupted
Check For Visual Corruption
If the model reports corruption, the keyword raises a VQAValidationError.
Asserting screen state¶
Assert State verifies that the screen matches a natural-language description.
This is useful for checks that are difficult to express with template matching
or OCR alone.
*** Test Cases ***
Desktop Is Visible
Assert State desktop is visible and ready for input
If the model decides that the state does not match, the keyword raises an
AssertionError and includes the model’s reasoning in the failure message.
Using LLM GUI keywords¶
The LLM Client also provides higher-level keywords for GUI testing. These keywords ask a vision-capable model to inspect the current screen and return structured results that can be used in Robot Framework tests.
Locating an object¶
Get Object Position finds a described object on the screen and returns a
normalized point as [x, y], where each value is relative to the screen size.
For example, [0.5, 0.5] is the center of the screen.
*** Test Cases ***
Find OK Button
${point}= Get Object Position the OK button
Log ${point}
If the object is not found, the keyword raises a VQAValidationError.
Choosing and executing a GUI action¶
Get Single Gui Action asks the model to choose one action for a task. The
returned action can then be passed to Execute Gui Action.
Supported action types are:
Left ClickRight ClickDouble ClickWriteWait
*** Test Cases ***
Click The OK Button
${action}= Get Single Gui Action click the OK button
Execute Gui Action ${action}
Pointer actions returned by Get Single Gui Action use the model’s 1000x1000
coordinate grid internally. Execute Gui Action normalizes that point before
moving the pointer with the active HID library.
For typing text directly, the action contains action_type=Write and the text
to enter:
*** Test Cases ***
Type A Search Query
${action}= Get Single Gui Action type "network settings"
Execute Gui Action ${action}
When YARF_LOG_LEVEL is set to DEBUG, the GUI action keywords log the
screenshot sent to the model, the point selected by the model, and the
screenshot after an executed action.
Configuring the client¶
The LLM Client can be configured to work with different servers, models, and parameters.
Changing the model¶
*** Test Cases ***
Use Different Model
Configure Llm Client model=phi4-mini:3.8b
${response}= Prompt Llm Hello, what model are you?
Log ${response}
Using a different server¶
*** Test Cases ***
Remote Server Setup
Configure Llm Client
... server_url=http://192.168.1.100:11434/v1
... model=llama3.2-vision:11b
${response}= Prompt Llm Test connection
Log ${response}
Adjusting token limits¶
*** Test Cases ***
Short Response
Configure Llm Client max_tokens=100
${response}= Prompt Llm Write a brief summary of automated testing
Log ${response}
Complete configuration example¶
*** Test Cases ***
Custom Configuration
Configure Llm Client
... model=qwen3-vl:7b-instruct
... server_url=http://llm-server:11434/v1
... endpoint=/chat/completions
... max_tokens=2048