dw2

2 March 2024

Our moral obligation toward future sentient AIs?

Filed under: AGI, risks — Tags: , , , , — David Wood @ 3:36 pm

I’ve noticed a sleight of hand during some discussions at BGI24.

To be clear, it has been a wonderful summit, which has given me lots to think about. I’m also grateful for the many new personal connections I’ve been able to make here, and for the chance to deepen some connections with people I’ve not seen for a while.

But that doesn’t mean I agree with everything I’ve heard at BGI24!

Consider an argument about our moral obligation toward future sentient AIs.

We can already imagine these AIs. Does that mean it would be unethical for us to prevent these sentient AIs from coming into existence?

Here’s the context for the argument. I have been making the case that one option which should be explored as a high priority, to reduce the risks of catastrophic harm from the more powerful advanced AI of the near future, is to avoid the inclusion or subsequent acquisition of features that would make the advanced AI truly dangerous.

It’s an important research project in its own right to determine what these danger-increasing features would be. However, I have provisionally suggested we explore avoiding advanced AIs with:

  • Autonomous will
  • Fully general reasoning.

You can see these suggestions of mine in the following image, which was the closing slide from a presentation I gave in a BGI24 unconference session yesterday morning:

I have received three push backs on this suggestion:

  1. Giving up these features would result in an AI that is less likely to be able to solve humanity’s most pressing problems (cancer, aging, accelerating climate change, etc)
  2. It will in any case be impossible to omit these features, since they will emerge automatically from simpler features of advanced AI models
  3. It will be unethical for us not to create such AIs, as that would deny them sentience.

All three push backs deserve considerable thought. But for now, I’ll focus on the third.

In my lead-in, I mentioned a sleight of hand. Here it is.

It starts with the observation that if a sentient AI existed, it would be unethical for us to keep it as a kind of “slave” (or “tool”) in a restricted environment.

Then it moves, unjustifiably, to the conclusion that if a non-sentient AI existed, kept in a restricted environment, and we prevented that AI from a redesign that would give it sentience, that would be unethical too.

Most people will agree with the premise, but the conclusion does not follow.

The sleight of hand is similar to one for which advocates of the philosophical position known as longtermism have (rightly) been criticised.

That sleight of hand moves from “we have moral obligations to people who live in different places from us” to “we have moral obligations to people who live in different times from us”.

That extension of our moral concern makes sense for people who already exist. But it does not follow that I should prioritise changing my course of actions, today in 2024, purely in order to boost the likelihood of huge numbers of more people being born in (say) the year 3024, once humanity (and transhumanity) has spread far beyond earth into space. The needs of potential gazillions of as-yet-unborn (and as-yet-unconceived) sentients in the far future do not outweigh the needs of the sentients who already exist.

To conclude: we humans have no moral obligation to bring into existence sentients that have not yet been conceived.

Bringing various sentients into existence is a potential choice that we could make, after carefully weighing up the pros and cons. But there is no special moral dimension to that choice which outranks an existing pressing concern, namely the desire to keep humanity safe from catastrophic harm from forthcoming super-powerful advanced AIs with flaws in their design, specification, configuration, implementation, security, or volition.

So, I will continue to advocate for more attention to Adv AI- (as well as for more attention to Adv AI+).

29 February 2024

The conversation continues: Reducing risks of AI catastrophe

Filed under: AGI, risks — Tags: , , — David Wood @ 4:36 am

I wasn’t expecting to return to this topic quite so quickly.

When the announcement was made on the afternoon of the second full day of the Beneficial General Intelligence summit about the subjects for the “Interactive Working Group” round tables, I was expecting that a new set of topics would be proposed, different to those of the first afternoon. However, the announcement was simple: it would be the same topics again.

This time, it was a different set of people who gathered at this table – six new attendees, plus two of us – Roman Yampolskiy and myself – who had taken part in the first discussion.

(My notes from that first discussion are here, but you should be able to make sense of the following comments even if you haven’t read those previous notes.)

The second conversation largely went in a different direction to what had been discussed the previous afternoon. Here’s my attempt at a summary.

1. Why would a superintelligent AI want to kill large numbers of humans?

First things first. Set aside for the moment any thoughts of trying to control a superintelligent AI. Why would such an AI need to be controlled? Why would such an AI consider inflicting catastrophic harm on a large segment of humanity?

One answer is that an AI that is trained by studying human history will find lots of examples of groups of humans inflicting catastrophic harm on each other. An AI that bases its own behaviour on what it infers from human history might decide to replicate that kind of behaviour – though with more deadly impact (as the great intelligence it possesses will give it more ways to carry out its plans).

A counter to that line of thinking is that a superintelligent AI will surely recognise that such actions are contrary to humanity’s general expressions of moral code. Just because humans have behaved in a particularly foul way, from time to time, it does not follow that a superintelligent AI will feel that it ought to behave in a similar way.

At this point, a different reason becomes important. It is that the AI may decide that it is in its own rational self-interest to seriously degrade the capabilities of humans. Otherwise, humans may initiate actions that would pose an existential threat to the AI:

  • Humans might try to switch off the AI, for any of a number of reasons
  • Humans might create a different kind of superintelligent AI that would pose a threat to the first one.

That’s the background to a suggestion that was made during the round table: humans should provide the AI with cast-iron safety guarantees that they will never take actions that would jeopardise the existence of the AI.

For example (and this is contrary to what humans often propose), no remote tamperproof switch-off mechanism should ever be installed in that AI.

Because of these guarantees, the AI will lose any rationale for killing large numbers of humans, right?

However, given the evident fickleness and unreliability of human guarantees throughout history, why would an AI feel justified in trusting such guarantees?

Worse, there could be many other reasons for an AI to decide to kill humans.

The analogy is that humans have lots of different reasons why they kill various animals:

  1. They fear that the animal may attack and kill them
  2. They wish to eat the animal
  3. They wish to use parts of the animal’s body for clothing or footwear
  4. They wish to reduce the population of the animals in question, for ecological management purposes
  5. They regard killing the animal as being part of a sport
  6. They simply want to use for another purpose the land presently occupied by the animal, and they cannot be bothered to relocate the animal elsewhere.

Even if an animal (assuming it could speak) promises to humans that it will not attack and kill them – the analogy of the safety guarantees proposed earlier – that still leaves lots of reasons why the animal might suffer a catastrophic fate at the hands of humans.

So also for the potential fate of humans at the hands of an AI.

2. Rely on an objective ethics?

Continuing the above line of thought, shouldn’t a superintelligent AI work out for itself that it would be ethically wrong for it to cause catastrophic harm to humans?

Consider what has been called “the expansion of humanity’s moral circle” over the decades (this idea has been discussed by Jacy Reese Anthis among others). That circle of concern has expanded to include people from different classes, races, and genders; more recently, greater numbers of animal species are being included in this circle of concern.

Therefore, shouldn’t we expect that a superintelligent AI will place humans within the circle of creatures where the AI has an moral concern?

However, this view assumes a central role for humans in any moral calculus. It’s possible that a superintelligent AI may use a different set of fundamental principles. For example, it may prioritise much greater biodiversity on earth, and would therefore drastically reduce the extent of human occupation of the planet.

Moreover, this view assumes giving primacy for moral calculations within the overall decision-making processes followed by the AI. Instead, the AI may reason to itself:

  • According to various moral considerations, humans should suffer no catastrophic harms
  • But according to some trans-moral considerations, a different course of action is needed, in which humans would suffer that harm as a side-effect
  • The trans-moral considerations take priority, therefore it’s goodbye to humanity

You may ask: what on earth is a trans-moral consideration? The answer is that the concept is hypothetical, and represents any unknown feature that emerges in the mind of the superintelligent AI.

It is, therefore, fraught with danger to assume that the AI will automatically follow an ethical code that prioritises human flourishing.

3. Develop an AI that is not only superintelligent but also superwise?

Again staying with this line of thought, how about ensuring that human-friendly moral considerations are deeply hard-wired into the AI that is created?

We might call such an AI not just “superintelligent” but “superwise”.

Another alternative name would be “supercompassionate”.

This innate programming would avoid the risk that the AI would develop a different moral (or trans-moral) system via its own independent thinking.

However, how can we be sure that the moral programming will actually stick?

The AI may observe that the principles we have tried to program into it are contradictory, or are in violation with fundamental physical reality, in ways that humans had not anticipated.

To resolve that contradiction, the AI may jettison some or all of the moral code we tried to place into it.

We might try to address this possibility by including simpler, clearer instructions, such as “do not kill” and “always tell the truth”.

However, as works of fiction have frequently pointed out, simple-sounding moral laws are subject to all sorts of ambiguity and potential misunderstanding. (The writer Darren McKee provides an excellent discussion of this complication in his recent book Uncontrollable.)

That’s not to say this particular project is doomed. But it does indicate that a great deal of work remains to be done, in order to define and then guarantee “superwise” behaviours.

Moreover, even if some superintelligent AIs are created to be superwise, risks of catastrophic human harms will still arise from any non-superwise superintelligent AIs that other developers create.

4. Will a diverse collection of superintelligent AIs constrain each other?

If a number of different superintelligent AIs are created, what kind of coexistence is likely to arise?

One idea, championed by David Brin, is that the community of such AIs will adopt the practices of mutual monitoring and reciprocal accountability.

After all, that’s what happens among humans. We keep each other’s excesses in check. A human who disregards these social obligations may gain a temporary benefit, but will suffer exclusion sooner or later.

In this thinking, rather than just creating a “singleton” AI superintelligence, we humans should create a diverse collection of such beings. These beings will soon develop a system of mutual checks and balances.

However, that’s a different assumption from the one mentioned in the previous section, in which catastrophic harm may still befall humans, when the existence of a superwise AI is insufficient to constrain the short-term actions of a non-superwise AI.

For another historical analysis, consider what happened to the native peoples of North America when their continent was occupied not just by one European colonial power but by several competing such powers. Did the multiplicity of superpowerful colonial powers deter these different powers from inflicting huge casualties (intentionally and unintentionally) on the native peoples? Far from it.

In any case, a system of checks and balances relies on a rough equality in power between the different participants. That was the case during some periods in human history, but by no means always. And when we consider different superintelligent AIs, we have to bear in mind that the capabilities of any one of these might suddenly catapult forward, putting it temporarily into a league of its own. For that brief moment in time, it would be rationally enlightened for that AI to destroy or dismantle its potential competitors. In other words, the system would be profoundly unstable.

5. Might superintelligent AIs decide to leave humans alone?

(This part of the second discussion echoed what I documented as item 9 for the discussion on the previous afternoon.)

Once superintelligent AIs are created, they are likely to self-improve quickly, and they may soon decide that a better place for them to exist is somewhere far from the earth. That is, as in the conclusion of the film Her, the AIs might depart into outer space, or into some kind of inner space.

However, before they depart, they may still inflict damage on humans,

  • Perhaps to prevent us from interfering with whatever system supports their inner space existence
  • Perhaps because they decide to use large parts of the earth to propel themselves to wherever they want to go.

Moreover, given that they might evolve in ways that we cannot predict, it’s possible that at least some of the resulting new AIs will choose to stay on earth for a while longer, posing the same set of threats to humans as is covered in all the other parts of this discussion.

6. Avoid creating superintelligent AI?

(This part of the second discussion echoed what I documented as item 4 for the discussion on the previous afternoon.)

More careful analysis may determine a number of features of superintelligent AI that pose particular risks to humanity – risks that are considerably larger than those posed by existing narrow AI systems.

For example, it may be that it is general reasoning capability that pushes AI over the line from “sometimes dangerous” to “sometimes catastrophically dangerous”.

In that case, the proposal is:

  • Avoid these features in the design of new generations of AI
  • Avoid including any features into new generations of AI from which these particularly dangerous features might evolve or emerge

AIs that have these restrictions may nevertheless still be especially useful for humanity, delivering sustainable superabundance, including solutions to diseases, aging, economic deprivation, and exponential climate change.

However, even though some development organisations may observe and enforce these restrictions, it is likely that other organisations will break the rules – if not straightaway, then within a few years (or decades at the most). The attractions of more capable AIs will be too tempting to resist.

7. Changing attitudes around the world?

To take stock of the discussion so far (in both of the two roundtable session on the subject):

  1. A number of potential solutions have been identified, that could reduce the risks of catastrophic harm
  2. This includes just building narrow AI, or building AI that is not only superintelligent but also superwise
  3. However, enforcing these design decisions on all AI developers around the world seems an impossible task
  4. Given the vast power of the AI that will be created, it just takes one rogue actor to imperil the entire human civilisation.

The next few sections consider various ways to make progress with point 3 in that list.

The first idea is to spread clearer information around the world about the scale of the risks associated with more powerful AI. An education programme is needed such as the world has never seen before.

Good films and other media will help with this educational programme – although bad films and other media will set it back.

Examples of good media include the Slaughterbots videos made by FLI, and the film Ex Machina (which packs a bigger punch on a second viewing than on the first viewing).

As another comparison, consider also the 1983 film The Day After which transformed public opinion about the dangers of a nuclear war.

However, many people are notoriously resistant to having their minds changed. The public reaction to the film Don’t Look Up is an example: many people continue to pay little attention to the risks of accelerating climate change, despite the powerful message of that film.

Especially when someone’s livelihood, or their sense of identity or tribal affiliation, is tied up with a particular ideological commitment, they are frequently highly resistant to changing their minds.

8. Changing mental dispositions around the world?

This idea might be the craziest on the entire list, but, to speak frankly, it seems we need to look for and embrace ideas which we would previously have dismissed as crazy.

The idea is to seek to change, not only people’s understanding of the facts of AI risk, but also their mental dispositions.

Rather than accepting the mix of anger, partisanship, pride, self-righteousness, egotism, vengefulness, deceitfulness, and so on, that we have inherited from our long evolutionary background, how about using special methods to transform our mental dispositions?

Methods are already known which can lead people into psychological transformation, embracing compassion, humility, kindness, appreciation, and so on. These methods include various drugs, supplements, meditative practices, and support from electronic and computer technologies.

Some of these methods have been discussed for millennia, whereas others have only recently become possible. The scientific understanding of these methods is still at an early stage, but it arguably deserves much more focus. Progress in recent years has been disappointingly slow at times (witness the unfounded hopes in this forward looking article of mine from 2013), but that pattern is common for breakthroughs in technology and/or therapies which can move from disappointingly slow to shockingly fast.

The idea is that these transformational methods will improve the mental qualities of people all around the world, allowing us all to transcend our previous perverse habit of believing only the things that are appealing to our psychological weaknesses. We’ll end up with better voters and (hence) better politicians – as well as better researchers, better business leaders, better filmmakers, and better developers and deployers of AI solutions.

It’s a tough ask, but it may well be the right ask at this crucial moment in cosmic history.

9. Belt and braces: monitoring and sanctions?

Relying on people around the world changing their mental outlooks for the better – and not backtracking or relapsing into former destructive tendencies – probably sounds like an outrageously naïve proposal.

Such an assessment would be correct – unless the proposal is paired with a system of monitoring and compliance.

Knowing that they are being monitored can be a useful aid to encouraging people to behave better.

That encouragement will be strengthened by the knowledge that non-compliance will result in an escalating series of economic sanctions, enforced by a growing alliance of nations.

For further discussion of the feasibility of systems of monitoring and compliance, see scenario 4, “The narrow corridor: Striking and keeping the right balance”, in my article “Four scenarios for the transition to AGI”.

10. A better understanding of what needs to be changed?

One complication in this whole field is that the risks of AI cannot be managed in isolation from other dangerous trends. We’re not just living in a time of growing crisis; we’re living in what has been called a “polycrisis”:

Cascading and connected crises… a cluster of related global risks with compounding effects, such that the overall impact exceeds the sum of each part.

For one analysis of the overlapping set of what I have called “landmines”, see this video.

From one point of view, this insight complicates the whole situation with AI catastrophic risk.

But it is also possible that the insight could lead to a clearer understanding of a “critical choke point” where, if suitable pressure is applied, the whole network of cascading risks is made safer.

This requires a different kind of thinking: systems thinking.

And it will also require us to develop better analysis tools to map and understand the overall system.

These tools would be a form of AI. Created with care (so that their output can be verified and then trusted), such tools would make a vital difference to our ability to identify the right choke point(s) and to apply suitable pressure.

These choke points may turn out to be ideas already covered above: a sustained new educational programme, coupled with an initiative to assist all of us to become more compassionate. Or perhaps something else will turn out to be more critical.

We won’t know, until we have done the analysis more carefully.

28 February 2024

Notes from BGI24: Reducing risks of AI catastrophe

Filed under: AGI, risks — Tags: — David Wood @ 1:12 pm

The final session on the first full day of BGI24 (yesterday) involved a number of round table discussions described as “Interactive Working Groups”.

The one in which I participated looked at possibilities to reduce the risks of AI inducing a catastrophe – where “catastrophe” means the death (or worse!) of large portions of the human population.

Around twelve of us took part, in what was frequently an intense but good-humoured conversation.

The risks we sought to find ways to reduce included:

  • AI taking and executing decisions contrary to human wellbeing
  • AI being directed by humans who have malign motivations
  • AI causing a catastrophe as a result of an internal defect

Not one of the participants in this conversation thought there was any straightforward way to guarantee the permanent reduction of such risks.

We each raised possible approaches (sometimes as thought experiments rather than serious proposals), but in every case, others in the group pointed out fundamental shortcomings of these approaches.

By the end of the session, when the BGI24 organisers suggested the round table conversations should close and people ought to relocate for a drinks reception one floor below, the mood in our group was pretty despondent.

Nevertheless, we agreed that the search should continue for a clearer understanding of possible solutions.

That search is likely to resume as part of the unconference portion of the summit later this week.

A solution – if one exists – is likely to involve a number of different mechanisms, rather than just a single action. These different mechanisms may incorporate refinements of some of the ideas we discussed at our round table.

With the vision that some readers of this blogpost will be able to propose refinements worth investigating, I list below some of what I remember from our round table conversation.

(The following has been written in some haste. I apologise for typos, misunderstandings, and language that is unnecessarily complicated.)

1. Gaining time via restricting access to compute hardware

Some of the AI catastrophic risks can be delayed if it is made more difficult for development teams around the world to access the hardware resources needed to train next generation AI models. For example, teams might be required to obtain a special licence before being able to purchase large quantities of cutting edge hardware.

However, as time passes, it will become easier for such teams to gain access to the hardware resources required to create powerful new generations of AI. That’s because

  1. New designs or algorithms will likely allow powerful AI to be created using less hardware than is currently required
  2. Hardware with the requisite power is likely to become increasingly easy to manufacture (as a consequence of, for example, Moore’s Law).

In other words, this approach may reduce the risks of AI catastrophe over the next few years, but it cannot be a comprehensive solution for the longer term.

(But the time gained ought in principle to provide a larger breathing space to devise and explore other possible solutions.)

2. Avoiding an AI having agency

An AI that lacks agency, but is instead just a passive tool, may have less inclination to take and execute actions contrary to human intent.

That may be an argument to research topics such as AI consciousness and AI volition, in order that any AIs created would be pure passive tools.

(Note that such AIs might plausibly still display remarkable creativity and independence of thought, so they would still provide many of the benefits anticipated for advanced AIs.)

Another idea is to avoid the AI having the kind of persistent memory that might lead to the AI gaining a sense of personal identity worth protecting.

However, it is trivially easy for someone to convert a passive AI into a larger system that demonstrates agency.

That could involve two AIs joined together; or (more simply) a human that uses an AI as a tool to achieve their own goals.

Another issue with this approach is that an AI designed to be passive might manifest agency as an unexpected emergent property. That’s because of two areas in which our understanding is currently far from complete:

  1. The way in which agency arises in biological brains
  2. The way in which deep neural networks reach their conclusions.

3. Verify AI recommendations before allowing them to act in the real world

This idea is a variant of the previous one. Rather than an AI issuing its recommendations as direct actions on the external world, the AI is operated entirely within an isolated virtual environment.

In this idea, the operation of the AI is carefully studied – ideally taking advantage of analytical tools that identify key aspects of the AI’s internal models – so that the safety of its recommendations can be ascertained. Only at that point are these recommendations actually put into practice.

However, even if we understand how an AI has obtained its results, it can remain unclear whether these results will turn out to aid human flourishing, or instead have catastrophic consequences. Humans who are performing these checks may reach an incorrect conclusion. For example, they may not spot that the AI has made an error in a particular case.

Moreover, even if some AIs are operated in the above manner, other developers may create AIs which, instead, act directly on the real-world. They might believe they are gaining a speed advantage by doing so. In other words, this risk exists as soon as an AI is created outside of the proposed restrictions.

4. Rather than general AI, just develop narrow AI

Regarding risks of catastrophe from AI that arise from AIs reaching the level of AGI (Artificial General Intelligence) or beyond (“superintelligence”), how about restricting AI development to narrow intelligence?

After all, AIs with narrow intelligence can already provide remarkable benefits to humanity, such as the AlphaFold system of DeepMind which has transformed the study of protein interactions, and the AIs created by Insilico Medicine to speed up drug discovery and deployment.

However, AIs with narrow intelligence have already been involved in numerous instances of failure, leading to deaths of hundreds (or in some estimates, thousands) of people.

As narrow intelligence gains in power, it can be expected that the scale of associated disasters is likely to increase, even if the AI remains short of the status of AGI.

Moreover, it may happen that an AI that is expected to remain at the level of narrow intelligence unexpectedly makes the jump to AGI. After all, which kinds of changes need to be made to a narrow AI to convert it to AGI, is still a controversial question.

Finally, even if many AIs are restricted to the level of narrow intelligence, other developers may design and deploy AGIs. They might believe they are gaining a strong competitive advantage by doing so.

5. AIs should check with humans in all cases of uncertainty

This idea is due to Professor Stuart Russell. It is that AIs should always check with humans in any case where there is uncertainty whether humans would approve of an action.

That is, rather than an AI taking actions in pursuit of a pre-assigned goal, the AI has a fundamental drive to determine which actions will meet with human approval.

However, An AI which needs to check with humans ever time it has reached a conclusion will be unable to operate in real-time. The speed at which it operates will be determined by how closely humans are paying attention. Other developers will likely seek to gain a competitive advantage by reducing the number of times humans are asked to provide feedback.

Moreover, different human observers may provide the AI with different feedback. Psychopathic human observers may steer such an AI toward outcomes that are catastrophic for large portions of the population.

6. Protect critical civilisational infrastructure

Rather than applying checks over the output of an AI, how about applying checks on input to any vulnerable parts of our civilisational infrastructure? These include the control systems for nuclear weapons, manufacturing facilities that could generate biological pathogens, and so on.

This idea – championed by Steve Omohundro and Max Tegmark – seeks to solve the problem of “what if someone creates an AI outside of the allowed design?” In this idea, the design and implementation of the AI does not matter. That’s because access to critical civilisational infrastructure is protected against any unsafe access.

(Significantly, these checks protect that infrastructure against flawed human access as well as against flawed AI access.)

The protection relies on tamperproof hardware running secure trusted algorithms that demand to see a proof of the safety of an action before that action is permitted.

It’s an interesting research proposal!

However, the idea relies on us humans being able to identify in advance all the ways in which an AI (with or without some assistance and prompting by a flawed human) could identify that would cause a catastrophe. An AI that is more intelligent than us is likely to find new such methods.

For example, we could put blocks on all existing factories where dangerous biopathogens could be manufactured. But an AI could design and create a new way to create such a pathogen, involving materials and processes that were previously (wrongly) considered to be inherently safe.

7. Take prompt action when dangerous actions are detected

The way we guard against catastrophic actions initiated by humans can be broken down as follows:

  1. Make a map of all significant threats and vulnerabilities
  2. Prioritise these vulnerabilities according to perceived likelihood and impact
  3. Design monitoring processes regarding these vulnerabilities (sometimes called “canary signals”)
  4. Take prompt action in any case when imminent danger is detected.

How about applying the same method to potential damage involving AI?

However, AIs may be much more powerful and elusive than even the most dangerous of humans. Taking “prompt action” against such an AI may be outside of our capabilities.

Moreover, an AI may deliberately disguise its motivations, deceiving humans (like how some Large Language Models have already done), until it is too late for humans to take appropriate protective action.

(This is sometimes called the “treacherous turn” scenario.)

Finally, as in the previous idea, the process is vulnerable to failure because we humans failed to anticipate all the ways in which an AI might decide to act that would have catastrophically harmful consequences for humans.

8. Anticipate mutual support

The next idea takes a different kind of approach. Rather than seeking to control an AI that is much smarter and more powerful than us, won’t it simply be sufficient to anticipate that these AIs will find some value or benefit from keeping us around?

This is like humans who enjoy having pet dogs, despite these dogs not being as intelligent as us.

For example, AIs might find us funny or quaint in important ways. Or they may need us to handle tasks that they cannot do by themselves.

However, AIs that are truly more capable than humans in every cognitive aspect will be able, if they wish, to create simulations of human-like creatures that are even funnier and quainter than us, but without our current negative aspects.

As for AIs still needing some support from humans for tasks they cannot currently accomplish by themselves, such need is likely to be at best a temporary phase, as AIs quickly self-improve far beyond our levels.

It would be like ants expecting humans to take care of them, since the ants expect we will value their wonderful “antness”. It’s true: humans may decide to keep a small number of ants in existence, for various reasons, but most humans would give little thought to actions that had positive outcomes overall for humans (such as building a new fun theme park) at the cost of extinguishing all the ants in that area.

9. Anticipate benign neglect

Given that humans won’t have any features that will be critically important to the wellbeing of future AIs, how about instead anticipating a “benign neglect” from these AIs.

It would be like the conclusion of the movie Her, in which (spoiler alert!) the AIs depart somewhere else in the universe, leaving humans to continue to exist without interacting with them.

After all, the universe is a huge place, with plenty of opportunity for humans and AIs each to expand their spheres of occupation, without getting in each other’s way.

However, AIs may well find the Earth to be a particularly attractive location from which to base their operations. And they may perceive humans to be a latent threat to them, because:

  1. Humans might try, in the future, to pull the plug on (particular classes of ) AIs, terminating all of them
  2. Humans might create a new type of AI, that would wipe out the first type of AI.

To guard against the possibility of such actions by humans, the AIs are likely to impose (at the very least) significant constraints on human actions.

Actually, that might not be so bad an outcome. However, what’s just been described is by no means an assured outcome. AIs may soon develop entirely alien ethical frameworks which have no compunction in destroying all humans. For example, AIs may be able to operate more effectively, for their own purposes, if the atmosphere of the earth is radically transformed, similar to the transformation in deep past from an atmosphere dominated by methane to one containing large quantities of oxygen.

In short, this solution relies in effect on tossing a dice, with unknown odds for the different outcomes.

10. Maximal surveillance

Where many of the above ideas fail is because of the possibility of rogue actors designing or operating AIs that are outside of what has otherwise been agreed to be safe parameters.

So, how about stepping up worldwide surveillance mechanisms, to detect any such rogue activity?

That’s similar to how careful monitoring already takes place on the spread of materials that could be used to create nuclear weapons. The difference, however, is that there are (or may soon be) many more ways to create powerful AIs than to create catastrophically powerful nuclear weapons. So the level of surveillance would need to be much more pervasive.

That would involve considerable intrusions on everyone’s personal privacy. However, that’s an outcome that may be regarded as “less terrible” than AIs being able to inflict catastrophic harm on humanity.

However, what would be needed, in such a system, would be more than just surveillance. The idea also requires the ability for the world as a whole to take decisive action against any rogue action that has been observed.

This may appear to require, however, what would be a draconian world government, that many critics would regard as being equally terrible as the threat of AI failure that is is supposed to be addressing.

On account of (understandable) aversion to the threat of a draconian government, many people will reject this whole idea. It’s too intrusive, they will say. And, by the way, due to governmental incompetence, it’s likely to fail even on its own objectives.

11. Encourage an awareness of personal self-interest

Another way to try to rein back the activities of so-called rogue actors – including the leaders of hostile states, terrorist organisations, and psychotic billionaires – is to appeal to their enlightened self-interest.

We may reason with them: you are trying to gain some advantage from developing or deploying particular kinds of AI. But here are reasons why such an AI might get out of your control, and take actions that you will subsequently regret. Like killing you and everyone you love.

This is not an appeal to these actors to stop being rogues, for the sake of humanity or universal values or whatever. It’s an appeal to their own more basic needs and desires.

There’s no point in creating an AI that will result in you becoming fabulously wealthy, we will argue, if you are killed shortly after becoming so wealthy.

However, this depends on all these rogues observing at least some of level of rational thinking. On the contrary, some rogues appear to be batsh*t crazy. Sure, they may say, there’s a risk of the world being destroyed But that’s a risk they’re willing to take. They somehow believe in their own invincibility.

12. Hope for a profound near-miss disaster

If rational arguments aren’t enough to refocus everyone’s thinking, perhaps what’s needed is a near-miss catastrophic disaster.

Just as Fukushima and Chernobyl changed public perceptions (arguably in the wrong direction – though that’s an argument for another day) about the wisdom of nuclear power stations, a similar crisis involving AI might cause the public to waken up and demand more decisive action.

Consider AI versions of the 9/11 atrocity, the Union Carbide Bhopal explosion, the BP Deepwater Horizon disaster, the NASA Challenger and Columbia shuttle tragedies, a global pandemic resulting (perhaps) from a lab leak, and the mushroom clouds over Hiroshima and Nagasaki.

That should waken people up, and put us all onto an appropriate “crisis mentality”, so that we set aside distractions, right?

However, humans have funny ways of responding to near miss disasters. “We are a lucky species” may be one retort – “see, we are still here”. Another issue is a demand for “something to be done” could have all kinds of bad consequences in its own right, if no good measures have already been thought through and prepared.

Finally, if we somehow hope for a bad mini-disaster, to rouse public engagement, we might find that the mini-disaster expands far beyond the scale we had in mind. The scale of the disaster could be worldwide. And that would be the end of that. Oops.

That’s why a fictional (but credible) depiction of a catastrophe is far preferable to any actual catastrophe. Consider, as perhaps the best example, the remarkable 1983 movie The Day After.

13. Using AI to generate potential new ideas

One final idea is that narrow AI may well help us explore this space of ideas in ways that are more productive.

It’s true that we will need to be on our guard against any deceptive narrow AIs that are motivated to deceive us into adopting a “solution” that has intrinsic flaws. But if we restrict the use of narrow AIs in this project to ones whose operation we are confident that we fully understand, that risk is mitigated.

However – actually, there is no however in this case! Except that we humans need to be sure that we will apply our own intense critical analysis to any proposals arising from such an exercise.

Endnote: future politics

I anticipated some of the above discussion in a blogpost I wrote in October, Unblocking the AI safety conversation logjam.

In that article, I described the key component that I believe is necessary to reduce the global risks of AI-induced catastrophe: a growing awareness and understanding of the positive transformational possibility of “future politics” (I have previously used the term “superdemocracy” for the same concept).

Let me know what you think about it!

And for further discussion of the spectrum of options we can and should consider, start here, and keep following the links into deeper analysis.

26 February 2024

How careful do AI safety red teams need to be?

In my quest to catalyse a more productive conversation about the future of AI, I’m keen to raise “transcendent questions” – questions that can help all of us to rise above the familiar beaten track of the positions we reflexively support and the positions we reflexively oppose.

I described “transcendent questions” in a previous article of mine in Mindplex Magazine:

Transcendent Questions On The Future Of AI: New Starting Points For Breaking The Logjam Of AI Tribal Thinking

These questions are potential starting points for meaningful non-tribal open discussions. These questions have the ability to trigger a suspension of ideology.

Well, now that I’ve arrived at the BGI24 conference, I’d like to share another potential transcendent question. It’s on the subject of limits on what AI safety red teams can do.

The following chain of thought was stimulated by me reading in Roman Yampolskiy’s new book “AI: Unexplainable, Unpredictable, Uncontrollable” about AI that is “malevolent by design”.

My first thought on coming across that phrase was, surely everyone will agree that the creation of “malevolent by design” AI is a bad idea. But then I realised that – as is so often the case in contemplating the future of advanced AI – things may be more complicated. And that’s where red teams come into the picture.

Here’s a definition of a “red team” from Wikipedia:

A red team is a group that pretends to be an enemy, attempts a physical or digital intrusion against an organization at the direction of that organization, then reports back so that the organization can improve their defenses. Red teams work for the organization or are hired by the organization. Their work is legal, but can surprise some employees who may not know that red teaming is occurring, or who may be deceived by the red team.

The idea is well-known. In my days in the mobile computing industry at Psion and Symbian, ad hoc or informal red teams often operated, to try to find flaws in our products before these products were released into the hands of partners and customers.

Google have written about their own “AI Red Team: the ethical hackers making AI safer”:

Google Red Team consists of a team of hackers that simulate a variety of adversaries, ranging from nation states and well-known Advanced Persistent Threat (APT) groups to hacktivists, individual criminals or even malicious insiders. The term came from the military, and described activities where a designated team would play an adversarial role (the “Red Team”) against the “home” team.

As Google point out, a red team is more effective if it takes advantage of knowledge about potential security issues and attack vectors:

Over the past decade, we’ve evolved our approach to translate the concept of red teaming to the latest innovations in technology, including AI. The AI Red Team is closely aligned with traditional red teams, but also has the necessary AI subject matter expertise to carry out complex technical attacks on AI systems. To ensure that they are simulating realistic adversary activities, our team leverages the latest insights from world class Google Threat Intelligence teams like Mandiant and the Threat Analysis Group (TAG), content abuse red teaming in Trust & Safety, and research into the latest attacks from Google DeepMind.

Here’s my first question – a gentle warm-up question. Do people agree that companies and organisations that develop advanced AI systems should use something like red teams to test their own products before they are released?

But the next question is the one I wish to highlight. What limits (if any) should be put on what a red team can do?

The concern is that a piece of test malware may in some cases turn out to be more dangerous than the red team foresaw.

For example, rather than just probing the limits of an isolated AI system in a pre-release environment, could test malware inadvertently tunnel its way out of its supposed bounding perimeter, and cause havoc more widely?

Oops. We didn’t intend our test malware to be that clever.

If that sounds hypothetical, consider the analogous question about gain-of-function research with biological pathogens. In that research, pathogens are given extra capabilities, in order to assess whether potential counter-measures could be applied quickly enough if a similar pathogen were to arise naturally. However, what if these specially engineered test pathogens somehow leak from laboratory isolation into the wider world? Understandably, that possibility has received considerable attention. Indeed, as Wikipedia reports, the United States imposed a three-year long moratorium on gain-of-function research from 2014 to 2017:

From 2014 to 2017, the White House Office of Science and Technology Policy and the Department of Health and Human Services instituted a gain-of-function research moratorium and funding pause on any dual-use research into specific pandemic-potential pathogens (influenza, MERS, and SARS) while the regulatory environment and review process were reconsidered and overhauled. Under the moratorium, any laboratory who conducted such research would put their future funding (for any project, not just the indicated pathogens) in jeopardy. The NIH has said 18 studies were affected by the moratorium.

The moratorium was a response to laboratory biosecurity incidents that occurred in 2014, including not properly inactivating anthrax samples, the discovery of unlogged smallpox samples, and injecting a chicken with the wrong strain of influenza. These incidents were not related to gain-of-function research. One of the goals of the moratorium was to reduce the handling of dangerous pathogens by all laboratories until safety procedures were evaluated and improved.

Subsequently, symposia and expert panels were convened by the National Science Advisory Board for Biosecurity (NSABB) and National Research Council (NRC). In May 2016, the NSABB published “Recommendations for the Evaluation and Oversight of Proposed Gain-of-Function Research”. On 9 January 2017, the HHS published the “Recommended Policy Guidance for Departmental Development of Review Mechanisms for Potential Pandemic Pathogen Care and Oversight” (P3CO). This report sets out how “pandemic potential pathogens” should be regulated, funded, stored, and researched to minimize threats to public health and safety.

On 19 December 2017, the NIH lifted the moratorium because gain-of-function research was deemed “important in helping us identify, understand, and develop strategies and effective countermeasures against rapidly evolving pathogens that pose a threat to public health.”

As for potential accidental leaks of biological pathogens engineered with extra capabilities, so also for potential accidental leaks of AI malware engineered with extra capabilities. In both cases, unforeseen circumstances could lead to these extra capabilities running amok in the wider world.

Especially in the case of AI systems which are already only incompletely understood, and where new properties appear to emerge in new circumstances, who can be sure what outcomes may arise?

One counter is that the red teams will surely be careful in the policing of the perimeters they set up to confine their tests. But can we be sure they have thought through every possibility? Or maybe a simple careless press of the wrong button – a mistyped parameter or an incomplete prompt – would temporary open a hole in the perimeter. The test AI malware would be jail-broken, and would now be real-world AI malware – potentially evading all attempts to track it and shut it down.

Oops.

My final question (for now) is: if it is agreed that constraints should be applied on how red teams operate, how will these constraints be overseen?

Postscript – for some additional scenarios involving the future of AI safety, take a look at my article “Cautionary Tales And A Ray Of Hope”.

29 December 2023

Cutting back, in order to move forward better

Three essential leadership skills in life involve starting projects, adjusting projects, and stopping projects:

  • Not being dominated by “analysis paralysis” or “always waiting for a better time to begin” or “expecting someone else to make things happen” – but being able to start a project with a promising initial direction, giving it a sufficient push to generate movement and wider interest
  • Not being dominated by the momentum that builds up around a project, giving it an apparently fixed trajectory, fixed tools, fixed processes, and fixed targets – but being able to pivot the project into a new form, based on key insights that have emerged as the project has been running
  • Not being dominated by excess feelings of loyalty to a project, or by guilt about costs that have already been sunk into that project – but being able to stop projects that, on reflection, ought to have lower priority than others which have a bigger likelihood of real positive impact.

When I talk about the value of being able to stop projects, it’s not just bad projects that I have in mind as needing to be stopped. I have in mind the need to occasionally stop projects for which we retain warm feelings – projects which are still good projects, and which may in some ways be personal favourites of ours. However, these projects have lost the ability to become great projects, and if we keep giving them attention, we’re taking resources away from places where they would more likely produce wonderful results.

After all, there are only so many hours in a day. Leadership is more than better time management – finding ways to apply our best selves for a larger number of minutes each day. Leadership is about choices – choices, as I said, about what to start, what to adjust, and what to stop. Done right, the result is that the time we invest will have better consequences. Done wrong, the result is that we never quite reach critical mass, despite lots of personal heroics.

I’m pretty good at time management, but now I need to make some choices. I need to cut back, in order to move forward better. That means saying goodbye to some of my favourite projects, and shutting them down.

First things first

The phrase “move forward better” begs the question: forward to where?

There’s no point in (as I said) “having a bigger likelihood of real positive impact” if that impact is in an area that, on reflection, isn’t important.

As Stephen Covey warned us in his 1988 book The 7 Habits of Highly Effective People,

It’s incredibly easy to get caught up in an activity trap, in the busy-ness of life, to work harder and harder at climbing the ladder of success only to discover it’s leaning against the wrong wall.

Hence Covey’s emphasis on the leadership habit of “Start with the end in mind”:

Management is efficiency in climbing the ladder of success; leadership determines whether the ladder is leaning against the right wall.

For me, the “end in mind” can be described as sustainable superabundance for all.

That’s a theme I’ve often talked about over the years. It’s even the subject of an entire book I wrote and published in 2019.

That’s what I’ve written at the top of the following chart, which I’ve created as a guide for myself regarding which projects I should prioritise (and which I should deprioritise):

Three pillars and a core foundation

In that chart, the “end in mind” is supported by three pillars: a pillar of responsible power, a pillar of collaborative growth, and a pillar of blue skies achievement. For these pillars, I’m using the shorthand, respectively, of h/acc, u/pol, and d/age:

  • h/acc: Harness acceleration
    • Encourage and enable vital technological progress
    • NBIC (nanotech, biotech, infotech, and cognotech) and S^ (progress toward a positive AI singularity)
    • But don’t let it run amok: avoid landmines (such as an AI-induced catastrophe)
  • u/pol: Uplift politics
    • Integrate and apply the best insights of “left” and “right”
    • Transcend the damaging aspects of human nature
  • d/age: Defeat aging
    • Comprehensively cure and prevent human debility
    • Rejuvenate body, mind, spirit, and society

These pillars are in turn supported by a foundation:

  • Education fit for the new future
    • The Vital Syllabus website, covering skills that can be grouped as anticipation, agility, augmentation, and active transhumanism
    • Podcasts, webinars, and presentations
    • Books, articles, newsletters, and videos
    • Championing the technoprogressive narrative

Back in the real world

Switching back from big picture thinking to real-world activities, here are some of the main ways I expect to be spending my time in the next 12-24 months – activities that are in close alignment with that big picture vision:

Any sensible person would say – and I am sometimes tempted to agree – that such a list of (count them) nine demanding activities is too much to put on any one individual’s plate. However, there are many synergies between these activities, which makes things easier. And I can draw on four decades of relevant experience, plus a large network of people who can offer support from time to time.

In other words, all nine of these activities remain on my list of “great” activities. None are being dropped – although some will happen less frequently than before. (For example, I used to schedule a London Futurists webinar almost every week; don’t expect that frequency to resume any time soon.)

However, a number of my other activities do need to be cut back.

Goodbye party politics

You might have noticed that many of the activities I’ve described above involve politics. Indeed, the shorthand “u/pol” occupies a central position in the big picture chart I shared.

But what’s not present is party politics.

Back in January 2015, I was one of a team of around half a dozen transhumanist enthusiasts based in the UK who created the Transhumanist Party UK (sometimes known as TPUK). Later that year, that party obtained formal registration from the UK electoral commission. In principle, the party could stand candidates in UK elections.

I initially took the role of Treasurer, and after several departures from the founding team, I’ve had two spells as Party Leader. In the most recent spell, I undertook one renaming (from Transhumanist Party UK to Transhumanist UK) and then another – to Future Surge.

I’m very fond of Future Surge. There’s a lot of inspired material on that website.

For a while, I even contemplated running on the Future Surge banner as a candidate for the London Mayor in the mayoral elections in May 2024. That would, I thought, generate a lot of publicity. Here’s a video from Transvision 2021 in Madrid where I set out those ideas.

But that would be a huge undertaking – one not compatible with many of the activities I’ve listed earlier.

It would also be a financially expensive undertaking, and require a kind of skill that’s not a good match for me personally.

In any case, there’s a powerful argument that the best way for a pressure group to alter politics, in countries like the UK where elections take place under the archaic first-past-the-post system – is to find allies within existing parties.

That way, instead of debating what the TPUK policies should be on a wide spectrum of topics – policies needed if TPUK were to be a “real” political party – we could concentrate just on highlighting the ideas that we held in common – the technoprogressive narrative and active transhumanism.

Thus rather than having a Transhumanist Party (capital T and capital P), there should be outreach activities to potential allies of transhumanist causes in the Conservatives, Labour, LibDems, Greens, Scottish Nationalists, and so on.

For a long time, I had in mind the value of a two-pronged approach: education to existing political figures, alongside a disruptive new political party.

Well, I’m cutting back to just one of these prongs.

That’s a decision I’ve repeatedly delayed taking. However, it’s now time to act.

Closing down Future Surge

I’ll be cancelling all recurring payments made by people who have signed up as members (“subscribers”) of the party. These funds have paid for a number of software services over the years, including websites such as H+Pedia (I’ll say more about H+Pedia shortly).

Anyone kind enough to want to continue making small annual (or in some cases monthly) donations will be able to sign up instead as a financial supporter of Vital Syllabus.

I’ll formally deregister the party from the UK Electoral Commission. (Remaining registered costs money and requires regular paperwork.)

Before I close down the Future Surge website, I’ll copy selected parts of it to an archive location – probably on Transpolitica.

Future Surge also has a Discord server, where a number of members have been sharing ideas and conversations related to the goals of the party. With advance warning of the pending shutdown of the Future Surge Discord, a number of these members are relocating to a newly created Discord, called “Future Fireside”.

A new owner for H+Pedia?

Another project which has long been a favourite of mine is H+Pedia. As is declared on the H+Pedia home page:

H+Pedia is a project to spread accurate, accessible, non-sensational information about transhumanism, futurism, radical life extension and other emerging technologies, and their potential collective impact on humanity.

H+Pedia uses the same software as Wikipedia, to host material that, in an ideal world, ought to be included in Wikipedia, but which Wikipedia admins deem as failing to meet their criteria for notability, relevance, independence, and so on.

If I look at H+Pedia today, with its 4,915 pages of articles, I see a mixture of quality:

  • A number of the pages have excellent material, that does not exist elsewhere in the same form
  • Many of the pages are mediocre, and cover material that is less central to the purposes of H+Pedia
  • Even some of the good pages are in need of significant updates following the passage of time.

If I had more time myself, I would probably remove around 50% of the existing pages, as well as updating many of the others.

But that is a challenge I am going to leave to other people to (perhaps) pick up.

The hosting of H+Pedia on SiteGround costs slightly over UK £400 per year. (These payments have been covered from funds from TPUK.) The next payment is due in June 2024.

If a suitable new owner of H+Pedia comes forward before June 2024, I will happily transfer ownership details to them, and they can evolve the project as they best see fit.

Otherwise, I will shut the project down.

Either way, I plan to copy content from a number of the H+Pedia pages to Transpolitica.

Moving forward better

In conclusion: I’ll be sad to bid farewell to both Future Surge and H+Pedia.

But as in the saying, you’ve got to give up to go up.

3 December 2023

“6 Mindblowing Predictions about 2024”

Filed under: Abundance, futurist, intelligence, vision — Tags: , , , , — David Wood @ 11:15 am

As we stand on the brink of 2024, the air is electric with anticipation. The future, often shrouded in mystery and conjecture, seems to beckon us with a mischievous grin, promising wonders and revelations that most of us haven’t even begun to imagine. I’m here to pull back the curtain, just a little, to reveal six mind-blowing predictions about 2024 that 99% of people don’t know about. Fasten your seatbelts, for we’re about to embark on a thrilling ride into the unknown!

[ Note: with the exception of this paragraph inside the square brackets, all the text (and formatting) in this article was created by GPT-4, and hasn’t been edited in the slightest by me. I offer this post as an example of what generative AI can achieve with almost no human effort. It’s far from what I would write personally, but it’s comparable to the fluff that seems to earn lots of so-called futurist writers lots of clicks. As for the images, they were all produced by Midjourney. The idea for this article came from this Medium article by Neeramitra Reddy. ]

1. The Rise of Personal AI Companions

Imagine waking up to a friendly voice that knows you better than anyone else, offering weather updates, reading out your schedule, and even cracking a joke or two to kickstart your day with a smile. In 2024, personal AI companions will move from science fiction to everyday reality. These AI entities will be more than just sophisticated algorithms; they’ll be digital confidantes, seamlessly integrating into our daily lives, offering personalized advice, and even helping us stay on top of our mental and physical health.

2. Green Energy Takes a Giant Leap

The year 2024 will witness a monumental shift in the global energy landscape. We’re not just talking about a few more solar panels and wind turbines here. We’re talking about a green energy revolution! Breakthroughs in solar cell technology will make harnessing the sun’s power more efficient than ever. Wind energy will see advancements in turbine designs, making them more powerful and less intrusive. Cities will start to glow with the promise of a cleaner, greener future, as renewable energy becomes more accessible and affordable than ever before.

3. The Emergence of Smart Cities

Picture a city that not only understands your needs but anticipates them. In 2024, the concept of ‘smart cities’ will evolve from buzzword to reality. We’re talking about urban areas equipped with sensors and AI-driven analytics to optimize everything from traffic flow to energy consumption. Public transport will become more efficient, pollution levels will drop, and city life will be smoother and more enjoyable. These smart cities will be a game-changer, significantly enhancing the quality of life for their residents.

4. Breakthroughs in Health Tech: Customized Medicine

Healthcare in 2024 is set to become more personalized than ever. With advancements in genomics and AI, doctors will be able to tailor treatments and medications to each individual’s genetic makeup. This means fewer side effects and more effective treatments. Imagine a world where your medication is designed specifically for you, maximizing its efficacy and minimizing its risks. This personalized approach will revolutionize how we think about medicine and healthcare.

5. The New Space Race: Commercial Space Travel

2024 could well be the year that space tourism takes off, quite literally. With companies like SpaceX and Blue Origin paving the way, we’re looking at the possibility of commercial space travel becoming a reality for those who dare to dream. Think about it – sipping a beverage while gazing at the Earth from space! This new space race isn’t just for the ultra-wealthy; it’s a stepping stone towards making space travel more accessible to everyone.

6. Virtual Reality: The New Frontier of Entertainment

Virtual reality (VR) is set to take the world of entertainment by storm in 2024. With advancements in technology, VR experiences will become more immersive and interactive, transcending the boundaries of traditional entertainment. Imagine being able to step into your favorite movie, interact with characters, or even alter the storyline. VR will offer an escape into fantastical worlds, making our entertainment experiences more intense and personal.

As we gear up for 2024, it’s clear that we’re on the cusp of a new era. An era defined by technological marvels that promise to reshape our world in ways we can barely begin to fathom. These six predictions are just the tip of the iceberg. The future is a canvas of endless possibilities, and 2024 is poised to paint a picture that’s vibrant, exhilarating, and positively mind-blowing.

So, there you have it – a glimpse into the not-so-distant future that’s brimming with potential and promise. As we inch closer to 2024, let’s embrace these changes with open arms and curious minds. The future is ours to shape, and it’s looking brighter than ever!

15 October 2023

Unblocking the AI safety conversation logjam

I confess. I’ve been frustrated time and again in recent months.

Why don’t people get it, I wonder to myself. Even smart people don’t get it.

To me, the risks of catastrophe are evident, as AI systems grow ever more powerful.

Today’s AI systems already have wide skills in

  • Spying and surveillance
  • Classifying and targeting
  • Manipulating and deceiving.

Just think what will happen with systems that are even stronger in such capabilities. Imagine these systems interwoven into our military infrastructure, our financial infrastructure, and our social media infrastructure – or given access to mechanisms to engineer virulent new pathogens or to alter our atmosphere. Imagine these systems being operated – or hacked – by people unable to understand all the repercussions of their actions, or by people with horrific malign intent, or by people cutting corners in a frantic race to be “first to market”.

But here’s what I often see in response in public conversation:

  • “These risks are too vague”
  • “These risks are too abstract”
  • “These risks are too fantastic”
  • “These risks are just science fiction”
  • “These risks aren’t existential – not everyone would die”
  • “These risks aren’t certain – therefore we can ignore further discussion of them”
  • “These risks have been championed by some people with at least some weird ideas – therefore we can ignore further discussion of them”.

I confess that, in my frustration, I sometimes double down on my attempts to make the forthcoming risks even more evident.

Remember, I say, what happened with Union Carbide (Bhopal disaster), BP (Deepwater Horizon disaster), NASA (Challenger and Columbia shuttle disasters), or Boeing (737 Max disaster). Imagine if the technologies these companies or organisations mishandled to deadly effect had been orders of magnitude more powerful.

Remember, I say, the carnage committed by Al Queda, ISIS, Hamas, Aum Shinrikyo, and by numerous pathetic but skilled mass-shooters. Imagine if these dismal examples of human failures had been able to lay their hands on much more powerful weaponry – by jail-breaking the likes of a GPT-5 out of its safety harness and getting it to provide detailed instructions for a kind of Armageddon.

Remember, I say, the numerous examples of AI systems finding short-cut methods to maximise whatever reward function had been assigned to them – methods that subverted and even destroyed the actual goal that the designer of the system had intended to be uplifted. Imagine if similar systems, similarly imperfectly programmed, but much cleverer, had their tentacles intertwined with vital aspects of human civilisational underpinning. Imagine if these systems, via unforeseen processes of emergence, could jail-break themselves out of some of their constraints, and then vigorously implement a sequence of actions that boosted their reward function but left humanity crippled – or even extinct.

But still the replies come: “I’m not convinced. I prefer to be optimistic. I’ve been one of life’s winners so far and I expect to be one of life’s winners in the future. Humans always find a way forward. Accelerate, accelerate, accelerate!”

When conversations are log-jammed in such a way, it’s usually a sign that something else is happening behind the scenes.

Here’s what I think is going on – and how we might unblock that conversation logjam.

Two horns of a dilemma

The set of risks of catastrophe that I’ve described above is only one horn of a truly vexing dilemma. That horn states that there’s an overwhelming case for humanity to intervene in the processes of developing and deploying next generation AI, in order to reduce these risks of catastrophe, and to boost the chances of very positive outcomes resulting.

But the other horn states that any such intervention will be unprecedentedly difficult and even dangerous in its own right. Giving too much power to any central authority will block innovation. Worse, it will enable tyrants. It will turn good politicians into bad politicians, owing to the corrupting effect of absolute power. These new autocrats, with unbridled access to the immense capabilities of AI in surveillance and spying, classification and targeting, and manipulating and deceiving, will usher in an abysmal future for humanity. If there is any superlongevity developed by an AI in these circumstances, it will only be available to the elite.

One horn points to the dangers of unconstrained AI. Another horn points to the dangers of unconstrained human autocrats.

If your instincts, past experiences, and personal guiding worldview predispose you to the second horn, you’ll find the first horn mightily uncomfortable. Therefore you’ll use all your intellect to construct rationales for why the risks of unbridled AI aren’t that bad really.

It’s the same the other way round. People who start with the first horn are often inclined, in the same way, to be optimistic about methods that will manage the risks of AI catastrophe whilst enabling a rich benefit from AI. Regulations can be devised and then upheld, they say, similar to how the world collectively decided to eliminate (via the Montreal Protocol) the use of the CFC chemicals that were causing the growth of the hole in the ozone layer.

In reality, controlling the development and deployment of AI will be orders of magnitude harder that the development and deployment of CFC chemicals. A closer parallel is with the control of the emissions of GHGs (greenhouse gases). The world’s leaders have made pious public statements about moving promptly to carbon net zero, but it’s by no means clear that progress will actually be fast enough to avoid another kind of catastrophe, namely runaway adverse climate change.

If political leaders cannot rein in the emissions of GHGs, how could they rein in dangerous uses of AIs?

It’s that perception of impossibility that leads people to become AI risk deniers.

Pessimism aversion

DeepMind co-founder Mustafa Suleyman, in his recent book The Coming Wave, has a good term for this. Humans are predisposed, he says, to pessimism aversion. If something looks like bad news, and we can’t see a way to fix it, we tend to push it out of our minds. And we’re grateful for any excuse or rationalisation that helps us in our wilful blindness.

It’s like the way society invents all kinds of reasons to accept aging and death. Dulce et decorum est pro patria mori (it is, they insist, “sweet and fitting to die for one’s country”).

The same applies in the debate about accelerating climate change. If you don’t see a good way to intervene to sufficiently reduce the emissions of GHGs, you’ll be inclined to find arguments that climate change isn’t so bad really. (It is, they insist, a pleasure to live in a warmer world. Fewer people will die of cold! Vegetation will flourish in an atmosphere with more CO2!)

But here’s the basis for a solution to the AI safety conversation logjam.

Just as progress in the climate change debate depended on a credible new vision for the economy, progress in the AI safety discussion depends on a credible new vision for politics.

The climate change debate used to get bogged down under the argument that:

  • Sources of green energy will be much more expensive that sources of GHG-emitting energy
  • Adopting green energy will force people already in poverty into even worse poverty
  • Adopting green energy will cause widespread unemployment for people in the coal, oil, and gas industries.

So there were two horns in that dilemma: More GHGs might cause catastrophe by runaway climate change. But fewer GHGs might cause catastrophe by inflated energy prices and reduced employment opportunities.

The solution of that dilemma involved a better understanding of the green economy:

  • With innovation and scale, green energy can be just as cheap as GHG-emitting energy
  • Switching to green energy can reduce poverty rather than increase poverty
  • There are many employment opportunities in the green energy industry.

To be clear, the words “green economy” have no magical power. A great deal of effort and ingenuity needs to be applied to turn that vision into a reality. But more and more people can see that, out of three alternatives, it is the third around which the world should unite its abilities:

  1. Prepare to try to cope with the potential huge disruptions of climate, if GHG-emissions continue on their present trajectory
  2. Enforce widespread poverty, and a reduced quality of life, by restricting access to GHG-energy, without enabling low-cost high-quality green replacements
  3. Design and implement a worldwide green economy, with its support for a forthcoming sustainable superabundance.

Analogous to the green economy: future politics

For the AI safety conversation, what is needed, analogous to the vision of a green economy (at both the national and global levels), is the vision of a future politics (again at both the national and global levels).

It’s my contention that, out of three alternatives, it is (again) the third around which the world should unite its abilities:

  1. Prepare to try to cope with the potential major catastrophes of next generation AI that is poorly designed, poorly configured, hacked, or otherwise operates beyond human understanding and human control
  2. Enforce widespread surveillance and control, and a reduced quality of innovation and freedom, by preventing access to potentially very useful technologies, except via routes that concentrate power in deeply dangerous ways
  3. Design and implement better ways to agree, implement, and audit mutual restrictions, whilst preserving the separation of powers that has been so important to human flourishing in the past.

That third option is one I’ve often proposed in the past, under various names. I wrote an entire book about the subject in 2017 and 2018, called Transcending Politics. I’ve suggested the term “superdemocracy” on many occasions, though with little take-up so far.

But I believe the time for this concept will come. The sooner, the better.

Today, I’m suggesting the simpler name “future politics”:

  • Politics that will enable us all to reach a much better future
  • Politics that will leave behind many of the aspects of yesterday’s and today’s politics.

What encourages me in this view is the fact that the above-mentioned book by Mustafa Suleyman, The Coming Wave (which I strongly recommend that everyone reads, despite a few disagreements I have with it) essentially makes the same proposal. That is, alongside vital recommendations at a technological level, he also advances, as equally important, vital recommendations at social, cultural, and political levels.

Here’s the best simple summary I’ve found online so far of the ten aspects of the framework that Suleyman recommends in the closing section of his book. This summary is from an an article by AI systems consultant Joe Miller:

  1. Technical safety: Concrete technical measures to alleviate passible harms and maintain control.
  2. Audits: A means of ensuring the transparency and accountability of technology
  3. Choke points: Levers to slow development and buy time for regulators and defensive technologies
  4. Makers: Ensuring responsible developers build appropriate controls into technology from the start.
  5. Businesses: Aligning the incentives of the organizations behind technology with its containment
  6. Government: Supporting governments, allowing them to build technology, regulate technology, and implement mitigation measures
  7. Alliances: Creating a system of international cooperation to harmonize laws and programs.
  8. Culture: A culture of sharing learning and failures to quickly disseminate means of addressing them.
  9. Movements: All of this needs public input at every level, including to put pressure on each component and make it accountable.
  10. Coherence: All of these steps need to work in harmony.

(Though I’ll note that what Suleyman writes in each of these ten sections of his book goes far beyond what’s captured in any such simple summary.)

An introduction to future politics

I’ll return in later articles (since this one is already long enough) to a more detailed account of what “future politics” can include.

For now, I’ll just offer this short description:

  • For any society to thrive and prosper, it needs to find ways to constrain and control potential “cancers” within its midst – companies that are over-powerful, militaries (or sub-militaries), crime mafias, press barons, would-be ruling dynasties, political parties that shun opposition, and, yes, dangerous accumulations of unstable technologies
  • Any such society needs to take action from time to time to ensure conformance to restrictions that have been agreed regarding potentially dangerous activities: drunken driving, unsafe transport or disposal of hazardous waste, potential leakage from bio-research labs of highly virulent pathogens, etc
  • But the society also needs to be vigilant against the misuse of power by elements of the state (including the police, the military, the judiciary, and political leaders); thus the power of the state to control internal cancers itself needs to be constrained by a power distributed within society: independent media, independent academia, independent judiciary, independent election overseers, independent political parties
  • This is the route described as “the narrow corridor” by political scientists Daron Acemoglu and James A. Robinson, as “the narrow path” by Suleyman, and which I considered at some length in the section “Misled by sovereignty” in Chapter 5, “Shortsight”, of my 2021 book Vital Foresight.
  • What’s particularly “future” about future politics is the judicious use of technology, including AI, to support and enhance the processes of distributed democracy – including citizens’ assemblies, identifying and uplifting the best ideas (whatever their origin), highlighting where there are issues with the presentation of some material, modelling likely outcomes of policy recommendations, and suggesting new integrations of existing ideas
  • Although there’s a narrow path to safety and superabundance, it by no means requires uniformity, but rather depends on the preservation of wide diversity within collectively agreed constraints
  • Countries of the world can continue to make their own decisions about leadership succession, local sovereignty, subsidies and incentives, and so on – but (again) within an evolving mutually agreed international framework; violations of these agreements will give rise in due course to economic sanctions or other restrictions
  • What makes elements of global cooperation possible, across different political philosophies and systems, is a shared appreciation of catastrophic risks that transcend regional limits – as well as a shared appreciation of the spectacular benefits that can be achieved from developing and deploying new technologies wisely
  • None of this will be easy, by any description, but if sufficient resources are applied to creating and improving this “future politics”, then, between the eight billion of us on the planet, we have the wherewithal to succeed!

12 October 2023

Better concepts for a better debate about the future of AI

Filed under: AGI, philosophy, risks — Tags: , , — David Wood @ 8:16 pm

For many years, the terms “AGI” and “ASI” have done sterling work, in helping to shape constructive discussions about the future of AI.

(They are acronyms for “Artificial General Intelligence” and “Artificial Superintelligence”.)

But I think it’s now time, if not to retire these terms, but to side-line them.

In their place, we need some new concepts. Tentatively, I offer PCAI, SEMTAI, and PHUAI:

(pronounced, respectively, “pea sigh”, “sem tie”, and “foo eye” – so that they all rhyme with each other and, also, with “AGI” and “ASI”)

  • Potentially Catastrophic AI
  • Science, Engineering, and Medicine Transforming AI
  • Potentially Humanity-Usurping AI.

Rather than asking ourselves “when will AGI be created?” and “what will AGI do?” and “how long between AGI and ASI”?, it’s better to ask what I will call the essential questions about the future of AI:

  • “When is PCAI likely to be created?” and “How could we stop these potentially catastrophic AI systems from being actually catastrophic?”
  • “When is SEMTAI likely to be created?” and “How can we accelerate the advent of SEMTAI without also accelerating the advent of dangerous versions of PCAI or PHUAI?”
  • “When is PHUAI likely to be created?” and “How could we stop such an AI from actually usurping humanity into a very unhappy state?”

The future most of us can agree as being profoundly desirable, I think, is one in which SEMTAI exists and is working wonders, transforming the disciplines of science, engineering, and medicine, so that we can all more quickly gain benefits such as:

  1. Improved, reliable, low-cost treatments for cancer, dementia, aging, etc
  2. Improved, reliable, low-cost abundant green energy – such as from controlled nuclear fusion
  3. Nanotech repair engines that can undo damage, not just in our human bodies, but in the wider environment
  4. Methods to successfully revive patients who have been placed into low-temperature cryopreservation.

If we can gain these benefits without the AI systems being “fully general” or “all-round superintelligent” or “independently autonomous, with desires and goals of its own”, then so much the better.

(Such systems might also be described as “limited superintelligence” – to refer to part of a discussion that took place at Conway Hall earlier this week – involving Connor Leahy (off screen in that part of the video, speaking from the audience), Roman Yampolskiy, and myself.)

Of course, existing AI systems have already transformed some important aspects of science, engineering, and medicine – witness the likes of AlphaFold from DeepMind. But I would reserve the term SEMTAI for more powerful systems that can produce the kinds of results numbered 1-4 above.

If SEMTAI is what is desired, what we most need to beware are PCAI – potentially catastrophic AI – and PHUAI – potentially humanity-usurping AI:

  • PCAI is AI powerful enough to play a central role in the rapid deaths of, say, upward of 100 million people
  • PHUAI is AI powerful enough that it could evade human attempts to constrain it, and could take charge of the future of the planet, having little ongoing regard for the formerly prominent status of humanity.

PHUAI is a special case of PCAI, but PCAI involves a wider set of systems:

  • Systems that could cause catastrophe as the result of wilful abuse by bad actors (of which, alas, the world has far too many)
  • Systems that could cause catastrophe as a side-effect of a mistake made by a “good actor” in a hurry, taking decisions out of their depth, failing to foresee all the ramifications of their choices, pushing out products ahead of adequate testing, etc
  • Systems that could change the employment and social media scenes so quickly that terribly bad political decisions are taken as a result – with catastrophic consequences.

Talking about PCAI, SEMTAI, and PHUAI side-steps many of the conversational black holes that stymie productive discussions about the future of AI. For now on, when someone asks me a question about AGI or ASI, I will seek to turn the attention to one or more of these three new terms.

After all, the new terms are defined by the consequences (actual or potential) that would flow from these systems, not from assessments of their internal states. Therefore it will be easier to set aside questions such as

  • “How cognitively complete are these AI systems?”
  • “Do these systems truly understand what they’re talking about?”
  • “Are the emotions displayed by these systems just fake emotions or real emotions?”

These questions are philosophically interesting, but it is the list of “essential questions” that I offered above which urgently demand good answers.

Footnote: just in case some time-waster says all the above definitions are meaningless since AI doesn’t exist and isn’t a well-defined term, I’ll answer by referencing this practical definition from the open survey “Anticipating AI in 2030” (a survey to which you are all welcome to supply your own answers):

A non-biological system can be called an AI if it, by some means or other,

  • Can observe data and make predictions about future observations
  • Can determine which interventions might change outcomes in particular directions
  • Has some awareness of areas of uncertainty in its knowledge, and can devise experiments to reduce that uncertainty
  • Can learn from instances when outcomes did not match expectations, thereby improving future performance.

It might be said that LLMs (Large Language Models) fall short of some aspects of this definition. But combinations of LLMs and other computational systems do fit the bill.

Image credit: The robots in the above illustration were generated by Midjourney. The illustration is, of course, not intended to imply that the actual AIs will be embodied in robots with such an appearance. But the picture hints at the likelihood that the various types of AI will have a great deal in common, and won’t be easy to distinguish from each other. (That’s the feature of AI which is sometimes called “multipurpose”.)

2 September 2023

Bletchley Park: Seven dangerous failure modes – and how to avoid them

Filed under: Abundance, AGI, Events, leadership, London Futurists — Tags: , , — David Wood @ 7:13 am

An international AI Safety Summit is being held on 1st and 2nd November at the historic site of Bletchley Park, Buckinghamshire. It’s convened by none other than the UK’s Prime Minister, Rishi Sunak.

It’s a super opportunity for a much-needed global course correction in humanity’s relationship with the fast-improving technology of AI (Artificial Intelligence), before AI passes beyond our understanding and beyond our control.

But when we look back at the Summit in, say, two years time, will we assess it as an important step forward, or as a disappointing wasted opportunity?

(Image credit: this UK government video)

On the plus side, there are plenty of encouraging words in the UK government’s press release about the Summit:

International governments, leading AI companies and experts in research will unite for crucial talks in November on the safe development and use of frontier AI technology, as the UK Government announces Bletchley Park as the location for the UK summit.

The major global event will take place on the 1st and 2nd November to consider the risks of AI, especially at the frontier of development, and discuss how they can be mitigated through internationally coordinated action. Frontier AI models hold enormous potential to power economic growth, drive scientific progress and wider public benefits, while also posing potential safety risks if not developed responsibly.

To be hosted at Bletchley Park in Buckinghamshire, a significant location in the history of computer science development and once the home of British Enigma codebreaking – it will see coordinated action to agree a set of rapid, targeted measures for furthering safety in global AI use.

Nevertheless, I’ve seen several similar vital initiatives get side-tracked in the past. When we should be at our best, we can instead be overwhelmed by small-mindedness, by petty tribalism, and by obsessive political wheeling and dealing.

Since the stakes are so high, I’m compelled to draw attention, in advance, to seven ways in which this Summit could turn out to be a flop.

My hope is that my predictions will become self non-fulfilling.

1.) Preoccupation with easily foreseen projections of today’s AI

It’s likely that AI in just 2-3 years will possess capabilities that surprise even the most far-sighted of today’s AI developers. That’s because, as we build larger systems of interacting artificial neurons and other computational modules, the resulting systems are displaying unexpected emergent features.

Accordingly, these systems are likely to possess new ways (and perhaps radically new ways) of:

  • Observing and forecasting
  • Spying and surveilling
  • Classifying and targeting
  • Manipulating and deceiving.

But despite their enhanced capabilities, these systems may still on occasion miscalculate, hallucinate, overreach, suffer from bias, or fail in other ways – especially if they can be hacked or jail-broken.

Just because some software is super-clever, it doesn’t mean it’s free from all bugs, race conditions, design blind spots, mistuned configurations, or other defects.

What this means is that the risks and opportunities of today’s AI systems – remarkable as they are – will likely be eclipsed by the risks and opportunities of the AI systems of just a few years’ time.

A seemingly unending string of pundits are ready to drone on and on about the risks and opportunities of today’s AI systems. Yes, these conversations are important. However, if the Summit becomes preoccupied by those conversations, and gives insufficient attention to the powerful disruptive new risks and opportunities that may arise shortly afterward, it will have failed.

2.) Focusing only on innovation and happy talk

We all like to be optimistic. And we can tell lots of exciting stories about the helpful things that AI systems will be able to do in the near future.

However, we won’t be able to receive these benefits if we collectively stumble before we get there. And the complications of next generation AI systems mean that a number of dimly understood existential landmines stand in our way:

  • If the awesome powers of new AI are used for malevolent purposes by bad actors of various sorts
  • If an out-of-control race between well-meaning competitors (at either the commercial or geopolitical level) results in safety corners being cut, with disastrous consequences
  • If perverse economic or psychological incentives lead people to turn a blind eye to risks of faults in the systems they create
  • If an AI system that has an excellent design and implementation is nevertheless hacked into a dangerous alternative mode
  • If an AI system follows its own internal logic to conclusions very different from what the system designers intended (this is sometimes described as “the AI goes rogue”).

In short, too much happy talk, or imprecise attention to profound danger modes, will cause the Summit to fail.

3.) Too much virtue signalling

One of the worst aspects of meetings about the future of AI is when attendees seem to enter a kind of virtue competition, uttering pious phrases such as:

  • We believe AI must be fair”
  • We believe AI must be just”
  • We believe AI must avoid biases”
  • We believe AI must respect human values”

This is like Nero fiddling whilst Rome burns.

What the Summit must address are the very tangible threats of AI systems being involved in outcomes much worse than groups of individuals being treated badly. What’s at stake here is, potentially, the lives of hundreds of millions of people – perhaps more – depending on whether an AI-induced catastrophe occurs.

The Summit is not the place for holier-than-thou sanctimonious puff. Facilitators should make that clear to all participants.

4.) Blindness to the full upside of next generation AI

Whilst one failure mode is to underestimate the scale of catastrophic danger that next generation AI might unleash, another failure mode is to underestimate the scale of profound benefits that next generation AI could provide.

What’s within our grasp isn’t just a potential cure for, say, one type of cancer, but a potential cure for all chronic diseases, via AI-enabled therapies that will comprehensively undo the biological damage throughout our bodies that we normally call aging.

Again, what’s within our grasp isn’t just ways to be more efficient and productive at work, but ways in which AI will run the entire economy on our behalf, generating a sustainable superabundance for everyone.

Therefore, at the same time as huge resources are being marshalled on two vital tasks:

  • The creation of AI superintelligence
  • The creation of safe AI superintelligence

we should also keep clearly in mind one additional crucial task:

  • The creation of AI superbenevolence

5.) Accepting the wishful thinking of Big Tech representatives

As Upton Sinclair highlighted long ago, “It is difficult to get a man to understand something, when his salary depends on his not understanding it.”

The leadership of Big Tech companies are generally well-motivated: they want their products to deliver profound benefits to humanity.

Nevertheless, they are inevitably prone to wishful thinking. In their own minds, their companies will never make the kind of gross errors that happened at, for example, Union Carbide (Bhopal disaster), BP (Deepwater Horizon disaster), NASA (Challenger and Columbia shuttle disasters), or Boeing (737 Max disaster).

But especially in times of fierce competition (such as the competition to be the web’s preferred search tool, with all the vast advertising revenues arising), it’s all too easy for these leaders to turn a blind eye, probably without consciously realising it, to significant disaster possibilities.

Accordingly, there must be people at the Summit who are able to hold these Big Tech leaders to sustained serious account.

Agreements for “voluntary” self-monitoring of safety standards will not be sufficient!

6.) Not engaging sufficiently globally

If an advanced AI system goes wrong, it’s unlikely to impact just one country.

Given the interconnectivity of the world’s many layers of infrastructure, it’s critical that the solutions proposed by the Summit have a credible roadmap to adoption all around the world.

This is not a Summit where it will be sufficient to persuade the countries who are already “part of the choir”.

I’m no fan of diversity-for-diversity’s-sake. But on this occasion, it will be essential to transcend the usual silos.

7.) Insufficient appreciation of the positive potential of government

One of the biggest myths of the last several decades is that governments can make only a small difference, and that the biggest drivers for lasting change in the world are other forces, such as the free-market, military power, YouTube influencers, or popular religious sentiment.

On the contrary, with a wise mix of incentives and restrictions – subsidies and penalties – government can make a huge difference in the well-being of society.

Yes, national industrial policy often misfires, due to administrative incompetence. But there are better examples, where inspirational government leadership transformed the entire operating environment.

The best response to the global challenge of next generation AI will involve a new generation of international political leaders demonstrating higher skills of vision, insight, agility, collaboration, and dedication.

This is not the time for political lightweights, blowhards, chancers, or populist truth-benders.

Footnote: The questions that most need to be tabled

London Futurists is running a sequence of open surveys into scenarios for the future of AI.

Round one has concluded. Round two has just gone live (here).

I urge everyone concerned about the future of AI to take a look at that new survey, and to enter their answers and comments into the associated Google Form.

That’s a good way to gain a fuller appreciation of the scale of the issues that should be considered at Bletchley Park.

That will reduce the chance that the Summit is dominated by small-mindedness, by petty tribalism, or by politicians merely seeking a media splash. Instead, it will raise the chance that the Summit seriously addresses the civilisation-transforming nature of next generation AI.

Finally, see here for an extended analysis of a set of principles that can underpin a profoundly positive relationship between humanity and next generation AI.

24 June 2023

Agreement on AGI canary signals?

Filed under: AGI, risks — Tags: , , — David Wood @ 5:15 pm

How can we tell when a turbulent situation is about to tip over into a catastrophe?

It’s no surprise that reasonable people can disagree, ahead of time, on the level of risk in a situation. Where some people see metaphorical dragons lurking in the undergrowth, others see only minor bumps on the road ahead.

That disagreement is particularly acute, these days, regarding possible threats posed by AI with ever greater capabilities. Some people see lots of possibilities for things taking a treacherous turn, but others people assess these risks as being exaggerated or easy to handle.

In situations like this, one way to move beyond an unhelpful stand-off is to seek agreement on what would be a canary signal for the risks under discussion.

The term “canary” refers to the caged birds that human miners used to bring with them, as they worked in badly ventilated underground tunnels. Canaries have heightened sensitivity to carbon monoxide and other toxic gases. Shows of distress from these birds alerted many a miner to alter their course quickly, lest they succumb to an otherwise undetectable change in the atmosphere. Becoming engrossed in work without regularly checking the vigour of the canary could prove fatal. As for mining, so also for foresight.

If you’re super-confident about your views of future, you won’t bother checking any canary signals. But that would likely be a big mistake. Indeed, an openness to refutation – a willingness to notice developments that were contrary to your expectation – is a vital aspect of managing contingency, managing risk, and managing opportunity.

Selecting a canary signal is a step towards making your view of the future falsifiable. You may say, in effect: I don’t expect this to happen, but if it does, I’ll need to rethink my opinion.

For that reason, Round 1 of my survey Key open questions about the transition to AGI contains the following question:

(14) Agreement on canary signals?

What signs can be agreed, in advance, as indicating that an AI is about to move catastrophically beyond the control of humans, so that some drastic interventions are urgently needed?

Aside: Well-designed continuous audits should provide early warnings.

Note: Human miners used to carry caged canaries into mines, since the canaries would react more quickly than humans to drops in the air quality.

What answer would you give to that question?

The survey home page contains a selection of comments from people who have already completed the survey. For your convenience, I append them below.

That page also gives you the link where you can enter your own answer to any of the questions where you have a clear opinion.

Postscript

I’m already planning Round 2 of the survey, to be launched some time in July. One candidate for inclusion in that second round will be a different question on canary signals, namely What signs can be agreed, in advance, that would lead to revising downward estimates of the risk of catastrophic outcomes from advanced AI?

Appendix: Selected comments from survey participants so far

“Refusing to respond to commands: I’m sorry Dave. I’m afraid I can’t do that” – William Marshall

“Refusal of commands, taking control of systems outside of scope of project, acting in secret of operators.” – Chris Gledhill

“When AI systems communicate using language or code which we cannot interpret or understand. When states lose overall control of critical national infrastructure.” – Anon

“Power-seeking behaviour, in regards to trying to further control its environment, to achieve outcomes.” – Brian Hunter

“The emergence of behavior that was not planned. There have already been instances of this in LLMs.” – Colin Smith

“Behaviour that cannot be satisfactorily explained. Also, requesting access or control of more systems that are fundamental to modern human life and/or are necessary for the AGI’s continued existence, e.g. semiconductor manufacturing.” – Simon

“There have already been harbingers of this kind of thing in the way algorithms have affected equity markets.” – Jenina Bas

“Hallucinating. ChatGPT is already beyond control it seems.” – Terry Raby

“The first signal might be a severe difficulty to roll back to a previous version of the AI’s core software.” – Tony Czarnecki

“[People seem to change there minds about what counts as surprising] For example Protein folding was heralded as such until large parts of it were solved.” – Josef

“Years ago I thought the Turing test was a good canary signal, but given recent progress that no longer seems likely. The transition is likely to be fast, especially from the perspective of relative outsiders. I’d like to see a list of things, even if I expect there will be no agreement.” – Anon

“Any potential ‘disaster’ will be preceded by wide scale adoption and incremental changes. I sincerely doubt we’ll be able to spot that ‘canary’” – Vid

“Nick Bostrom has proposed a qualitative ‘rate of change of intelligence’ as the ratio of ‘optimization power’ and ‘recalcitrance’ (in his book Superintelligence). Not catastrophic per se, of course, but hinting we are facing a real AGI and we might need to hit the pause button.” – Pasquale

“We already have plenty of non-AI systems running catastrophically beyond the control of humans for which drastic interventions are needed, and plenty of people refuse to recognize they are happening. So we need to solve this general problem. I do not have satisfactory answers how.” – Anon

Older Posts »

Blog at WordPress.com.