LiteRT-LM CLI

The Command Line Interface (CLI) lets you test models immediately—no code required.

Supported Platforms:

  • Linux
  • macOS
  • Windows
  • Raspberry Pi

Installation

Run litert-lm immediately without installing it permanently. Requires uv.

You can prefix any litert-lm command with uvx to run it on-demand:

uvx litert-lm run --help

Method 2: uv (Persistent install)

Installs litert-lm as a system-wide binary. Requires uv.

uv tool install litert-lm

Method 3: pip

Standard installation within a virtual environment. Using --upgrade ensures you get the latest version even if a previous version was already installed.

python3 -m venv .venv
source .venv/bin/activate
pip install --upgrade litert-lm

Upgrading

To upgrade litert-lm to the latest version:

If using uvx (Method 1)

No action required. uvx automatically runs the latest version.

If installed with uv (Method 2)

uv tool upgrade litert-lm

If installed with pip (Method 3)

Activate your virtual environment and run:

pip install --upgrade litert-lm

Chat

Download from HuggingFace and run the model:

litert-lm run  \
  --from-huggingface-repo=litert-community/gemma-4-E2B-it-litert-lm \
  gemma-4-E2B-it.litertlm \
  --prompt="What is the capital of France?"

🔴 New: Multi-Token Prediction (MTP)

Multi-Token Prediction (MTP) is a performance optimization that significantly accelerates decode speeds. MTP is universally recommended for all tasks on GPU backends.

To enable MTP in the CLI, use the --enable-speculative-decoding=true flag:

litert-lm run  \
  --from-huggingface-repo=litert-community/gemma-4-E2B-it-litert-lm \
  gemma-4-E2B-it.litertlm \
  --backend=gpu \
  --enable-speculative-decoding=true \
  --prompt="What is the capital of France?"

Function Calling / Tools

You can run tools with presets. Create a preset.py:

import datetime
import base64

def get_current_time() -> str:
    """Returns the current date and time."""
    return datetime.datetime.now().strftime("%Y-%m-%d %H:%M:%S")

system_instruction = "You are a helpful assistant with access to tools."
tools = [get_current_time]

Run with preset:

litert-lm run  \
  --from-huggingface-repo=litert-community/gemma-4-E2B-it-litert-lm \
  gemma-4-E2B-it.litertlm \
  --preset=preset.py

Sample prompts and interactive output:

> what will the time be in two hours?
[tool_call] {"arguments": {}, "name": "get_current_time"}
[tool_response] {"name": "get_current_time", "response": "2026-03-25 21:54:07"}
The current time is 2026-03-25 21:54:07.

In two hours, it will be **2026-03-25 23:54:07**.

What is Happening Here?

When you ask a question that requires external information (like the current time), the model recognizes that it needs to call a tool.

  1. Model Emits tool_call: The model outputs a JSON request to call the get_current_time function.
  2. CLI Executes Tool: The LiteRT-LM CLI intercepts this call and executes the corresponding Python function defined in your preset.py.
  3. CLI Sends tool_response: The CLI sends the result back to the model.
  4. Model Generates Final Answer: The model use the tool response to compute and generate the final answer for the user.

This "Function Calling" loop happens automatically within the CLI, allowing you to augment local LLMs with Python capabilities without writing any complex orchestration code.

The same capabilities are available from the Python, C++, and Kotlin APIs.

Uninstalling

To uninstall litert-lm:

If using uvx (Method 1)

No action required. uvx runs from a temporary cache and does not install permanently.

If installed with uv (Method 2)

uv tool uninstall litert-lm

If installed with pip (Method 3)

pip uninstall litert-lm