Introducing Google AI Edge Portal: Benchmark Edge AI at scale. Sign-up to request access during private preview.

LiteRT-LM CLI

The Command Line Interface (CLI) lets you test models immediately—no code required.

Supported Platforms:

Linux
macOS
Windows
Raspberry Pi

Installation

Method 1: `uvx` (Recommended for quick testing)

Run litert-lm immediately without installing it permanently. Requires uv.

You can prefix any litert-lm command with uvx to run it on-demand:

uvx litert-lm run --help

Method 2: `uv` (Persistent install)

Installs litert-lm as a system-wide binary. Requires uv.

uv tool install litert-lm

Method 3: `pip`

Standard installation within a virtual environment. Using --upgrade ensures you get the latest version even if a previous version was already installed.

python3 -m venv .venv
source .venv/bin/activate
pip install --upgrade litert-lm

Upgrading

To upgrade litert-lm to the latest version:

If using `uvx` (Method 1)

No action required. uvx automatically runs the latest version.

If installed with `uv` (Method 2)

uv tool upgrade litert-lm

If installed with `pip` (Method 3)

Activate your virtual environment and run:

pip install --upgrade litert-lm

Chat

Download from HuggingFace and run the model:

litert-lm run  \
  --from-huggingface-repo=litert-community/gemma-4-E2B-it-litert-lm \
  gemma-4-E2B-it.litertlm \
  --prompt="What is the capital of France?"

🔴 New: Multi-Token Prediction (MTP)

Multi-Token Prediction (MTP) is a performance optimization that significantly accelerates decode speeds. MTP is universally recommended for all tasks on GPU backends.

To enable MTP in the CLI, use the --enable-speculative-decoding=true flag:

litert-lm run  \
  --from-huggingface-repo=litert-community/gemma-4-E2B-it-litert-lm \
  gemma-4-E2B-it.litertlm \
  --backend=gpu \
  --enable-speculative-decoding=true \
  --prompt="What is the capital of France?"

Function Calling / Tools

You can run tools with presets. Create a preset.py:

import datetime
import base64

def get_current_time() -> str:
    """Returns the current date and time."""
    return datetime.datetime.now().strftime("%Y-%m-%d %H:%M:%S")

system_instruction = "You are a helpful assistant with access to tools."
tools = [get_current_time]

Run with preset:

litert-lm run  \
  --from-huggingface-repo=litert-community/gemma-4-E2B-it-litert-lm \
  gemma-4-E2B-it.litertlm \
  --preset=preset.py

Sample prompts and interactive output:

> what will the time be in two hours?
[tool_call] {"arguments": {}, "name": "get_current_time"}
[tool_response] {"name": "get_current_time", "response": "2026-03-25 21:54:07"}
The current time is 2026-03-25 21:54:07.

In two hours, it will be **2026-03-25 23:54:07**.

What is Happening Here?

When you ask a question that requires external information (like the current time), the model recognizes that it needs to call a tool.

Model Emits tool_call: The model outputs a JSON request to call the get_current_time function.
CLI Executes Tool: The LiteRT-LM CLI intercepts this call and executes the corresponding Python function defined in your preset.py.
CLI Sends tool_response: The CLI sends the result back to the model.
Model Generates Final Answer: The model use the tool response to compute and generate the final answer for the user.

This "Function Calling" loop happens automatically within the CLI, allowing you to augment local LLMs with Python capabilities without writing any complex orchestration code.

The same capabilities are available from the Python, C++, and Kotlin APIs.

Uninstalling

To uninstall litert-lm:

If using `uvx` (Method 1)

No action required. uvx runs from a temporary cache and does not install permanently.

If installed with `uv` (Method 2)

uv tool uninstall litert-lm

If installed with `pip` (Method 3)

pip uninstall litert-lm

LiteRT-LM CLI

Installation

Method 1: uvx (Recommended for quick testing)

Method 2: uv (Persistent install)

Method 3: pip

Upgrading

If using uvx (Method 1)

If installed with uv (Method 2)

If installed with pip (Method 3)

Chat

🔴 New: Multi-Token Prediction (MTP)

Function Calling / Tools

What is Happening Here?

Uninstalling

If using uvx (Method 1)

If installed with uv (Method 2)

If installed with pip (Method 3)

Method 1: `uvx` (Recommended for quick testing)

Method 2: `uv` (Persistent install)

Method 3: `pip`

If using `uvx` (Method 1)

If installed with `uv` (Method 2)

If installed with `pip` (Method 3)

If using `uvx` (Method 1)

If installed with `uv` (Method 2)

If installed with `pip` (Method 3)