Foundation Models for Data Visualization

Can LLMs and Diffusion Models upend a classically troublesome workflow?

May 01, 2023

If you've ever done much data visualization work, you know how annoying the process can be. Despite myriad tools with huge numbers of chart types and endless configuration options, it is often extremely difficult to show data in precisely the way you want to show it.

Yet, effective data visualization is often all about these minor, “last mile” details. Take a look at a New York Times data science article or a McKinsey slide deck, and often what makes the chart effective are the callouts, the highlights, the annotations, or more broadly the deviations from a standard bar, column, or line chart. This is why almost all great visualizations start in a BI tool, but end in powerpoint or some other pixel editing software - you often need both to create something great.

A simple example from McKinsey. Note how the segmentation by Downturn, Recovery, and Growth is critical to the story this visual tells you

If you look at the developer side of data visualization, a similar story emerges. Basic charting libraries like Chart.js are extremely easy to use, but very limited in their expressiveness. If you want to build a really compelling, dynamic data visualization, you can use a library like d3.js, but the issue is that it is extraordinarily complex. There are so many options and the APIs are so low level that to really master it takes months.

It strikes me that recent advances in foundation models may be able to change these dynamics substantially. Specifically, the following attributes of foundation models are interesting relative to data visualization:

Language models are very good at analyzing & understanding data - ChatGPT works well for data science, data preparation/munging, SQL generation, and more.
Diffusion models are very good at image creation & modification. While most work thus far in diffusion models focuses on cinematic or art-oriented images, it would not surprise me that a focused model for data visualization could be built.
Foundation models are very good at iterative refinement tasks expressed through a conversational or natural language interface. I thinks this maps very well to an ideal workflow for data visualization and storytelling.

If you combine these things, I can imagine a data visualization workflow of the future might look like the following:

Natural language is used to describe the narrative or story you want to tell relative to a dataset. “Create a graph which highlights changes in COVID hospitalizations and deaths over the last 2 years”
A fine-tuned language model, trained to map natural language inputs to a set of graphing & visualization primitives, creates a baseline visualization.
The user is then able to use natural language to specify or alter the visualization. “Change the color of the COVID deaths line to a muted orange, add a legend, and add a highlight that calls out August 23, 2021 as the date the first vaccine was released”
A language model is used to interpret the changes, and use that to edit/modify the graph further. Potentially, a diffusion model is used in some cases to “draw” or “layer” certain embellishments on the graph (discussed in more depth below).

Such an approach would dramatically improve the iteration speed of building a data visualization, especially for non-experts who have not spent years using Tableau or similar. It would also allow “institutional knowledge” to be baked in - there are a number of fairly well understood best practices in data visualization which I believe could likely be embedded into a fine tuned language model, such that it in many cases can choose the right chart type for you and your data if you do not explicitly specify it in the prompt. Finally, it would allow for much more expressive charting APIs to be taken advantage of without increasing the cognitive burden on the user.

Questions to consider for this workflow:

System Architecture - There are a number of ways to build a system like this. The most naive would be to simply prompt a language model to output a structured specification that maps to a data visualization API (e.g. imagine you prompt a language model “Output d3.js code which creates a data visualization that adheres to the user’s data visualization request. Run a javascript REPL to execute that code with the dataset provided”). The most advanced would be to train a custom model that maps natural language to a layered/componetized design file. There are many options in between. I imagine the right answer is a complex, hybrid system.
Should a diffusion/image model be used at all? You will still want to allow for traditional, non-language based editing workflows to be used, such as clicking on an axis to change the axis title. These sorts of workflows will be difficult to achieve if the output is an image without layers/components. Diffusion/image models seem most potentially useful for the last mile embellishments commonly used to decorate a data visualization
What data can be used to train/fine-tune the system? The good news is there are millions of good examples of data visualizations online, and almost all have an associated summary/story. This creates an implicitly labeled pair of natural language to data visualization. The issue is that these online examples do not contain the “intermediate” representation you really care about for the editing workflow, which is some componetized/layered design file, or the API specification for a charting library.
How does this relate to the exploratory data exploration workflow? Feature engineering and data munging are core parts of data visualization, so it is hard to consider data visualization in isolation. How should this fit into existing data tools & systems? Would it be better off bundled with them?

I suspect there may be an opportunity for a startup that deeply rethinks the end-to-end data storytelling and visualization process using these recent AI primitives. Even outside of BI and analytics tools, there are a number of companies in data visualization that generate substantial revenues with even basic functionality such as Mekko Graphics, Efficient Elements, and Prism. It would be especially interesting to see a model developed that maps natural language to a custom, layered design format for complex charts (analogous to a Figma/Sketch file). This “vertical” approach would allow one to take full advantage of image/visual models, while still preserving control for the user, and would allow for a more end-to-end feedback loop to be built.

Davis Treybig

Discussion about this post