Introducing the ML Assistant

Clients using our ML tools often mention that it can be difficult to know which models to use. We generally advise users to start with the simplest appropriate model as a baseline (this is the default option), then to try additional models for comparison. However, beyond general guidance and common pitfalls such as overfitting, it is difficult to give specific advice without knowing the details of the dataset and what you are trying to predict.

The ML Assistant uses a large language model (LLM) to offer guidance tailored to your session. The aim is to make the first steps of model selection more accessible, while encouraging users to think critically about their data, their models, and the results they see. The ML Assistant is included in the latest release, dated 28th April 2026, and will be enabled by default for all licensed users at no additional cost. At launch, we are using Claude Sonnet 4.6 from Anthropic as the underlying LLM.

We have thought carefully about how to summarise the session details to provide the LLM with the context needed to offer good advice, while maintaining data confidentiality. To ensure transparency around the data that is sent, we have included a button to preview the request - this shows exactly what will be sent to the LLM. This also offers insights into how the process works for those who are curious:

The ML Assistant is designed around the workflow in our ML tools, rather than simply passing a general question to an LLM. The request is structured by IGI: we provide the relevant session context, the models and settings available in the application, and instructions that frame the response around practical experimentation, comparison, and interpretation. This helps keep the guidance aligned with the way users work through ML experiments.

What we send to the LLM

Selecting "Preview Request" will display the request content including the system prompt (the same for all sessions) and the user message which contains context about your session:

The user message includes:

Basic details of the session (what you are trying to predict and dataset size)
A list of available models with their parameters and supported ranges. This lets the LLM know which options will be available to you, so that it can make recommendations within those constraints.
Past experiments - if you have already run any experiments, details of these are sent with the request. This lets the LLM know what has been tried and how it performed, so it can tailor its advice accordingly.

Column metadata - summary information about columns is included as this is important context. However, we are careful to give a statistical overview only:

Columns such as latitude and longitude do not include stats.

The request does not include the underlying dataset or row-level values. We also avoid sending example values for text fields, so that identifiable details such as well names or sample IDs are not included.
The "System Prompt" section at the top can be expanded to show the instructions we give to the LLM about how to advise users. This gives the LLM information about the application context as well as setting out our approach to guiding users through machine learning experiments:

Asking the assistant

You can dismiss the request preview dialog when ready and click "Ask ML Assistant" to send the request.

We have been using this feature internally for a few months and have tailored both the system prompt and the session context to make responses as helpful as possible. The aim is to guide users through an exploration rather than offer a single "correct" answer. Large language models can produce variable or occasionally incorrect output, so the guidance should always be reviewed critically. However, we have found in testing that the answers typically reflect the kind of guidance we would give to users directly.

The answer format includes a quick answer at the start followed by more information and recommended next steps if appropriate:

Once you have finished with the response you can collapse the dialog or close it. Once closed, a badge will become available allowing you to reopen the dialog if needed:

If you want further advice after running additional experiments, you can click on "Ask ML Assistant" again and it will send a new request with an updated context that includes any new experiments. Each request is self-contained rather than a back-and-forth conversation, which keeps the assistant focused on giving a complete answer based on the current state of the session. The dots at the top of the dialog allow you to access previous messages:

There is much more we could do in this space, but we also want to proceed carefully and listen to feedback from users. Please let us know whether the ML Assistant's guidance is useful, where it could be clearer, and what other forms of assistance would help you when working through ML experiments.