Sitemap
Desiderium Sciendi

David Durant’s personal blog

Thinking about human-like AI avatars in government

--

Thinking about human-like AI avatars in government

A picture of an AI generated human which can interact via spoken conversation in real time.

State of the art

Okay — before we start, I strongly suggest you (may require Google Chrome).

Smart speakers (Amazon Echo, Google Nest, Apple HomePod, etc) have been around since 2014. They can answer simple questions, perform very simple functions (such as add an item to a shopping list) or control home automation (such as controlling your home lights).

More recently, a number of AI systems have been released that enable much more natural spoken conversations. I’ve enjoyed interacting with, and have learned a lot from, Voice Mode in the ChatGPT app, which responds impressively well to interruptions, vocal cues and tunes its responses to what it assesses as your current emotional state. I’ve not personally tried out Gemini Live or Microsoft Copilot Voice but I hear they’re similar in that they can cover an extremely wide range of topics but in a transactional way. Even when you’re logged in, they don’t pretend to be a single ongoing “person” that you’re interacting with.

In fact, when I was chatting to ChatGPT recently about how it works, it described itself as “pattern recognition wearing a very human-sounding coat” which was disarmingly humorous.

OpenAI has recently been .

is another popular site using this technology — . It provides a platform for users to build interactive chatbots that others can use. . Click on “new chat” and you’ll be able to interact with the game via a spoken conversation, just as if you were in a two-person roleplaying game.

At the same time, other AI products allow you to create versions of famous people for you to interact with. On , a site that doesn’t even have an entry on Wikipedia, I asked it to create an interactive version of , which turned out like this.

Hey there! Here’s a lookalike image of yours truly, conjured up by DALL·E (OpenAI’s image generation wizard). It captures my whole science-nerd vibe pretty well, don’t you think?

Now, hello! I’m Hank Green — science enthusiast, nerdfighter, TikTok ranter, and guy who accidentally became everyone’s favorite internet biology teacher. What would you like to nerd out about today? Evolution? Existential dread? The chemical makeup of cheese? I’m ready.

If you’d rather just have a celebrity say whatever you’d like them to say, then there’s a lot of options for that too — although the free ones I experimented with online certainly wouldn’t have fooled anyone.

The next step is generating avatar video from supplied text. Here’s an example using HeyGen again and the URL of one of my previous blog posts.

Plus there are many people on YouTube who can tell you how to .

Oh, and for completeness, I suppose I should add in the Notebook LM podcast two-hander generator. to one for a previous ().

After all that — where have we arrived at? Well, we have high quality conversational interactions in a number of contexts, attempts at personality and speech cloning plus avatar video generation.

So, let’s take the obvious next step — real time conversational video avatars. There’s a number of companies already advertising these for a variety of uses, including learning and development, sales discussions and customer support. The stand-outs at the moment are , and .

Introducing the government related context

You could rightly ask why I’m spending time writing about this when I mostly blog about government-related things. The answer is simple — I’ve spoken to many people I know, both in technology and government, who think it is highly likely that human-like AI avatars will become the default way many, if not most, people will interact with both the public sector and private sector organisations. And sooner than you’d think.

I’ve been reading about invisible AI actors fixing things for their users behind the scenes since at least 2012, when I bought ’s excellent book . But the idea of having a human-like personal avatar that works for you is becoming a common trope across all kinds of media.

I’ll be honest. I don’t stay very up-to-date with discussions around AI being used by the government (or AIs potentially interacting with the government). I tend to leave that to folks like , and other extremely capable people I trust. That said, from what I have read, the conversations currently taking place in that space appear to be mostly about much more fundamental issues, such as data privacy vs. data sharing, AI ethics and the government not being seduced by what I call Shiny AI Shysters (remember, to quote Rachel, “FOMO is not a strategy”).

What I’ve not seen discussed at all yet is: what does the world look like when we’re interacting with the state via human-like AI avatars?

Firstly, to be very clear, this won’t be for everyone and certainly won’t be the case in every context. However, I believe that there are a lot of factors that will make this happen quicker and be more widely accepted than a lot of people imagine.

For starters, the government is always pressured to drive down costs and if a third-party “customer support” human-like bot can be shown to do the job of a front-line member of staff, especially remotely, for less money, there will be a lot of pressure to introduce them.

This may be the most significant driver from the government side but I also think there will be demand from people who interact with the state. Many people are relatively happy to interact with public services in a transactional way. Because of the low number of instances where they need to use public services, they may grumble, but will fill out their name and address yet again to achieve what they need. Although GDS hasn’t been especially forthcoming about the planned strategy for , it’s my hope that the ID service, combined with highly secure cross-government data sharing, will mean that gets a lot easier. Including, as promised in the , having to supply data to government “only once” (e.g. ).

But for people who have complex needs, they potentially interact with central or local government on a very frequent basis with multiple case workers who each only have the details of a fraction of their overall situation (see examples in . For such people, having a human-like AI avatar that can both patiently listen to them, in a way that a harried front-line worker is unlikely to be able to, and also retain their full context while working across multiple parts of the bureaucracy, might be a huge advantage.

So, if it seems likely that this sort of thing is coming, then what will it be like?

Fundamentally it seems there’s two ways this kind of interface can occur. Either the individual has their own agent / avatar who connects to government services on their behalf. Or the state provides an avatar for them to interact with.

Personal human-like avatar assistants

Leaving aside for a moment how difficult it is to even have government departments in the UK provide reasonable API access to their systems, let’s think about what a personal human-like avatar owned by the individual might be like.

Well, to begin with, it’s going to do a lot more than just provide an interface to the state. ChatGPT and its equivalents already effectively provide a huge range of information gathering / processing and advising services and they’re not yet empowered to undertake actions on their user’s behalf (such as paying bills or buying things). It seems inevitable that the conversational versions of these products will, very soon, be extended to provide human-like avatars and be given the ability to act on their user’s behalf in a wide range of contexts.

People will naturally have a significant choice in how their avatar is presented. As we’ve seen above, it could easily be a version of themselves, someone they know (with or without their permission — I’ve not seen anything on any of the existing sites checking ownership of the person’s appearance or voice being supplied) or someone who’s died.

What I think is most likely, though, is that famous people will start selling their likenesses as avatar “skins”, so that people will interact with the internet through their brand.

In short, I would not be at all surprised if, 10 years from now, a lot of people are applying for state benefits, passport renewals or appealing parking tickets through discussions with Taylor Swift.

Human-like avatars representing the state

Of course, the alternative to interacting with the state via your own personal avatar is doing so via one provided by the state. In the UK, we’ve already seen the very earliest steps of this in .

What a human-like state avatar may look or act like is a complex question and something I hope serious people are already discussing (but wish they would do so more in the open). The first question is whether there could even be a cross-government consensus of what this interface would be like. Because of the historically crazy way power in the UK government is structured (see monarchy and ), it’s practically impossible for any part of government to tell any other part of government what to do. Fiefdoms are very strong — which is one of the reasons sensible digital services based around rather than organisational responsibilities are still rare in the UK. What that means is that I wouldn’t be at all surprised to see a “DWP human-like avatar” and an “HMRC human-like avatar” being shown as two different people.

Lots of user research is needed on this because it’s entirely possible that some people might actually want the avatar they speak to about their health to be different to the one they speak to about their taxes or their immigration status.

On the other hand, I’m sure that some people would very much like to have one “person” who understands all of their complex context and is able to take their entire situation into account when making decisions or offering advice. One of the things I’ve often anecdotally read that people want most from the government representatives they deal with is just more time. A human-like avatar interface theoretically has endless time to listen to a client.

I can see a lot of strong conversations from different parts of the government, with different — sometimes conflicting — goals, about what the attitude of such an avatar should be like. For example, a team focused on Universal Credit process compliance or reducing recidivism may think that the avatar needs to be stern (at the very least), compared to one helping someone with substance addiction issues or addressing .

Sometimes, you may want the “person” you’re conversing with to be friendly and empathic. Other times, perhaps for a medical or legal situation, you may be more comfortable with an avatar that represents an authority figure.

The key thing may be having a single avatar that learns over time how a particular person would like to be communicated with and uses context clues from previous discussions, related personal data and conversation topic context to tune its conversation style.

In terms of how such human-like AI avatars adapt, perhaps the most interesting question will be how much of it is done in an obvious way and how much will be too subtle for us to notice. As so often happens, this puts me in mind of Douglas Adams’ incredibly prescient 1990 documentary Hyperland, which was discussing themes similar to this at a time when almost no-one had the internet at home.

I can imagine a future where there is a set of “rapport building” metrics for every individual in the UK, as to how well the state human-like interface is working with them.

Naturally, there’s a whole other topic that I feel woefully unqualified to discuss around displayed protected characteristics of any such avatar. Will people be most comfortable speaking to someone who looks, sounds and appears to have the same cultural context as them?

The University of Central Florida and Google have been doing some early work on this. Part of their output is a wide set of low-fi multicultural human-like avatars called .

There’s a huge amount of research that can, and should, be done in this area. From the broad aspects of Design Responsibility in the context of digital systems and AI, to the possibilities of co-design with specific communities or people with specific needs, to fully .

In all this, I’m not even going to touch on the changes in legal context that might be needed for such an avatar to be able to take actions on behalf of a client or offer specific personal advice over and above what is generically available on GOV.UK. Particularly who is responsible if something goes wrong. Once again, I hope this is already something smart people are working on.

Human-like AIs in political contexts

If we could have a single human-like avatar representing the state and businesses, then what other organisations could do the same thing? Will we see the end of traditional doorstep canvassing for elections because the voters can speak to avatars that represent the different political parties? Do we eventually have an avatar that represents the UK on the international stage? If so, would it be or someone that better represents our current diverse national community?

To some extent, China already does this with English language propaganda provided by .

I sometimes think about Orson Scott Card’s portrait of so-called “Great Debaters” online in Ender’s Game. Demosthenes — the pseudonym used by Ender’s sister, who writes persuasive, emotionally charged pieces appealing to populist and internationalist sentiments. Locke — the pseudonym used by Ender’s brother, who adopts a more reasoned, moderate, and diplomatic tone to appeal to intellectuals and political elites. Alas, in the real world, instead of great open debates, we’re reduced to tribal echo-chambers and public name calling. What if, instead, we could watch civil discussions between different avatars, who have each been trained to represent the virtues and values of their side of the debate?

Finally, all of this could be closer than we think. I’ve been using ChatGPT to help me do research for this blog post. One of the most fascinating things we discussed is the different LLM models that already exist. I predict that, in the very near future, someone will engineer an online conversational debate between ChatGPT and Baidu’s ERNIE Bot (which very closely follows the Chinese state party line). Their intention will be to highlight differing output related to Taiwan, Tiananmen Square and similar issues — but I foresee such an event being a fascinating way to show much deeper rooted differences in cultural values.

At the current pace of technological advancement, it’s all but impossible to predict what we’re going to see next but it seems a safe bet to say that fully conversational human-like avatars are going to start to be common across a huge variety of contexts in the very near future.

David Durant
David Durant

Written by David Durant

Ex GDS / GLA / HackIT. Co-organiser of unconferences. Opinionated when awake, often asleep.

No responses yet