Skip to content

Smarter Communication with AI Agents – Voice-Controlled Command & Control in VR-Forces#

In our previous work, we focused on integrating Large Language Models (LLMs) into VR-Forces to make simulation entities smarter and more autonomous. We explored how LLM-powered agents could interpret human intent, generate Lua code, and execute meaningful actions inside complex simulation environments. While these early results were promising, one challenge remained: the user interface.

Each natural language command still required multiple clicks through menus and windows, followed by manual text entry. Communicating with AI agents was powerful—but not effortless.

Our latest update changes that.

Taking Command through Voice#

In the new version users can assume control of a Command & Control (C2) entity within the scenario. Communication with AI-controlled entities now happens through a dedicated C2 window, which simulates a realistic radio network.

All radio messages are shown in a chat-style interface. Operators can type commands as before, but now they can also speak naturally using a built-in voice recognition system.

For example, the user can say:

“Magic, proceed direct to Waypoint Alpha, speed 350 knots.”

The system recognizes the callsign Magic and sends the corresponding radio message to that entity, which forwards it to the LLM. The LLM interprets the command, generates the appropriate Lua code snippet to be executed, and produces a radio-style response—displayed directly in the chat window.

Making Voice Recognition Reliable#

Voice recognition brings a new level of immersion, but it also introduces new challenges. Speech-to-text models often misinterpret specialized scenario terms such as callsigns or waypoint names.

For instance, the AI might transcribe a message as:

“Jedi 21, proceed direct to Koningsby vest.”

If the scenario contains a waypoint called Coningsby West and a callsign called Jedi 2 1 our post-processing algorithm refines the text to:

“Jedi 2 1, proceed direct to Coningsby West.”

This correction happens through post-processing with fuzzy matching against all known scenario object names. The algorithm automatically fixes small spelling errors and restores proper capitalization, ensuring that the simulation logic can correctly identify and execute the intended command.

How It Works#

When the voice input button (or its hotkey) is pressed, the system begins recording. Upon release, a locally running voice recognition model transcribes the spoken text.

  1. The transcribed text is automatically checked and corrected for known scenario object names.
  2. The corrected text is routed to the intended simulation entity based on its callsign.
  3. That entity sends the command to the LLM.
  4. The LLM generates executable Lua code and a radio-style verbal response.
  5. The response appears in the chat window as if it were an authentic radio exchange.

The result is a fluid, natural communication loop between the human operator and AI-driven entities—just like talking over the radio in a real mission.

Easier, More Natural Interaction#

The goal of this work is to simplify communication with AI agents and make simulation control more intuitive. By combining voice input, fuzzy text correction, and LLM-powered reasoning, we’ve taken another step toward natural human–machine collaboration inside simulation environments.

No menus. No typing. Just speak naturally—and watch your AI agents respond.

Watch the Demo Video#

We’ve prepared a new video showcasing this feature in action.
🎥 Watch the video here →


Interested in AI topics? Join the RTDynamics AI Newsletter here

Visit www.rtdynamics.com