Preliminary results suggest there will never be a major accident in a nuclear power plant. The odds on a major catastrophe were one in one billion to one in ten billion years for a given reactor.[Dr. Herbert Kouts, Head of AEC Division of Reactor Safety to Associated Press, 1974-01-14]
If another accident were to occur, I fear the general public will no longer believe any contention that the risk of a severe accident is so small as to be almost negligible.[Hans Blix, IAEA Director General to the IAEA Board of Governors, 1986-05-12]
Nothing can replace the knowledge that when all else fails, the consequences of the worst realistic casualty are tolerable.[Ted Rockwell, 2008]
Some choir members are puzzled why I never write about PRA (Probabilistic Risk Assessment). It is a good question. PRA is the centerpiece of the NRC's approach to nuclear safety analysis. Millions of ratepayer and taxpayer funded man-hours are expended on this activity annually. Plus I'm supposedly an expert on probability. I taught probability at MIT. My first book was on Bayesian decision making. The reason I don't write about PRA is that, as practiced by the NRC, it's a bunch of crap. But that's not good enough. I need to tell you why.
The Event Tree is a Fractal Bush
To implement PRA, we need to enumerate all possible casualties and then create a tree of all the possible events that could lead up to this casualty. If such a tree exists, it is a fractal bush, which no matter how detailed could be made more detailed. And if we could somehow come up with this bush, we would not only have to assign probabilities to the infinite number of branches, but also to all the possible interdependencies which are factorial in the number of branches.
This is manifestly impossible. In practice, the tree is a tiny subset of all the possibilities, which minuscule subset is chosen by the applicant, and perhaps expanded a little by the NRC. The result is unrepresentative of the real world.1 It should come as no surprise that almost all nuclear casualties to date involved a series of events that were not in the PRA tree.
In March, 1975, a workman accidentally set fire to the sensor and control cables at the Browns Ferry Plant in Alabama. He was using a candle to check the polyurethane foam seal that he had applied to the opening where the cables entered the spreading room. The foam caught fire and this spread to the insulation. The whole thing got out of control and the plant was shut down for a year for repairs. Are we to blame the PRA analysts for not including this event in their event tree? (If they did, what should they use for the probability?) Not if we are rational. The blame should be for focusing on the event tree instead of picking a non-flammable sealant and insulation.
Figure 1. Davis Besse Loss of Feedwater Event Tree
Here's another example. On June 9, 1985, the Davis-Besse plant experienced a complete loss of feedwater. That's a major problem. The casualty sequence included 12 different equipment failures and one operator error. He hit the wrong pair of poorly marked buttons. Figure 1 shows NRC's diagram of a tiny portion of the PRA event tree, with the red line supposedly representing the sequence that actually happened.\cite{nrc-2008} At each node, the lower branch is Fails, the upper branch is Works. PRA requires that we put probabilities on all these branches. According to this figure, there were 39 possible sequences.
But this post-hoc drawing is not representative of this casualty. The 12 failures identified by the investigation team have been turned into three. The operator error isn't even shown, in part because how do you put probabilities on human screw ups. And failures don't have to be binary. Partial failures are not uncommon. In this case, a pressure relief valve called a PORV worked twice, then failed open, then later closed itself. A semi-realistic event tree of this casualty would require a drawing the size of large table. A semi-realistic event tree of all possible casualty sequences would require a drawing the size of a football field.
PRA probabilities are unreliable to meaningless
Suppose we could come up with a meaningful event tree. Now we must put probabilities on each of the branches. On many of the branches, we will be dealing with extremely rare events, often events that have never happened. In such cases, we have no data on which to base a probability. But PRA says we must have a probability. So we concoct them. There are two ways to do this:
A. Build a Model
These models need to make a whole range of arguable assumptions. Almost invariably, one or more of these assumptions is crucial to the probability that emerges from the model. The problem becomes:
1) Create a model and set of assumptions that cranks out the target probability.
2. Convince the NRC guy that the model and the assumptions are acceptable. What comes out of this process is a negotiated number. Different negotiators will end up with different numbers. This is inherent in a situation where we do not have the data needed to come up with an objective probability. The problem is compounded by the multiplicative nature of probabilities. It only takes one incorrectly low number in a chain of probabilities to render the output meaningless.
B. Make the numbers up
Even with models, there will still be blanks. To fill in these blanks, PRA uses the aptly named Delphi Method, although the NRC prefers "expert elicitation". The Delphi Method is based on asking a group of ``experts" what they think the probability of an event is, often an event that has never happened. Sometimes the answers differ by a factor of 1000 or more.\cite{wellock-2021}[p 71] You mush all the guesses together and come up with a distribution, from which you grab a statistic, say the mean, which you treat as if it were an objective probability, as if you had all sorts of data on the event in question. Pick the right experts and you can come with just about any target number. PRA is supposed to be objective, but there is nothing objective about unsupported opinions.
Figure 2. NRC vs plant estimate of increase in CDF due to fault
Often the NRC's and the plant's probabilities don't match. If an inspection reveals a fault, the NRC bins the problem: green, white, yellow or red. The increase in Core Damage Frequency associated with the failure is calculated by the operator using his PRA model and by the NRC using the Standardized Plant Analysis Risk (SPAR) model. Figure 2 compares the two numbers for five of the yellow and red faults.\cite{lochbaum-2015} The y-axis is logarithmic. None of the numbers match within a factor of ten. In one case, the NRC number is 800 times higher than the plant's number. These analyses represent a tiny, well defined portion of the tree by two groups supposedly following the same rules. Both sets of numbers are meaningless.
PRA more important than the design
Despite the impossibility of doing a meaningful PRA, Probabilistic Safety Analysis has become the principal focus of the applicant and the NRC. Events that are not in the tree are ignored. The focus is not on a robust, well-engineered design but making the number, convincing the NRC that the PRA proves that the design meets the target probabilities. People who are good at this make great salesmen, lawyers, and politicians. They tend to be lousy engineers. Good engineers when presented with a bogus number have this nasty habit of saying this looks like a bogus number. To get through the process, the applicant needs to put the salesmen in charge. The wrong people get promoted, and this starts a vicious circle in which like picks like in the promotion process.
It also creates a cottage industry of PRA experts, hired guns who claim to know the secrets of getting through the process. When these people are not out selling their magic potion, they are spending their time on various industry groups, strengthening PRA, making sure that PRA is more firmly ingrained into the regulatory process, producing still more consulting fees.
Here's a proposition for these experts in probability. I will bet $10,000 even money that the next significant release involves a chain of events that was not in the plant's PRA event tree. Any takers?
PRA breeds complexity
One way of making the number is to add layers of backup or redundancy. Double or triple the number of pumps or valves. Tack on safety system after safety system. This is often call Defense in Depth. As long as you assume independent failures, with enough layers and redundancy, you can make any target probability. But you also make the system exponentially more complex. You add new failure modes and factorially more interdependencies, some of which you will not catch. And you multiply the number of individual failures which put the system in a non-normal state.2 PRA favors fragile complex designs over robust simple designs.
And then a common mode casualty comes along and wipes out your redundancy. In August, 1984, the Indian Point plant lost all its emergency cooling water pumps. The pumps were in the same space which became flooded and all the motors shorted out.3 Much the same thing happened at San Onofre, 1982-02-27, at Cooper, 1984-04-04, at LaSalle, 1985-05-31, at Hatch, 1985-12-21, and most importantly at Fukushima.
``Adding provisions to solve a non-problem merely provides additional paths to failure." Ted Rockwell.\cite{rockwell-2008} Zirconium sheets covering the stainless steel core spreader in the Fermi plant were a last minute safety add to handle an event that was later determined to be impossible. But they also added a new failure mode that apparently no one thought much about. In operation some of the zirconium pulled off the steel, balled up, and clogged some of the coolant channels which overheated portions of the core. The plant was shut down for four years to try and correct this.
Bogus Probabilities will be Misused
In a light water reactor, the used fuel elements are transferred from the core to a spent fuel pool where they are allowed to cool under water for about four years. The water provides both shielding and cooling. The original plan was that after cooling for four years the fuel elements would be sent to a reprocessing facility or a centralized air cooled repository. But in the US, both reprocessing and a repository got hung up in political wrangling and neither materialized. The obvious fallback was on-site dry cask storage. But dry cask storage adds about 0.03 to 0.06 cents per kWh to the cost of the electricity.\cite{alvarez-2003}
Most spent fuel pools are outside containment and many are elevated. They could be damaged and drained either by a screw up, a natural event such as an earthquake, or terrorist attack. If the fuel elements overheat to about 600C, the gas pressure inside the elements will burst the cladding and cause a release. Therefore, the original plan called for \DEF{open-racking}. The fuel elements were spaced far enough apart so that, even if the pool drained, air cooling by natural circulation would keep the elements below the temperature at which the cladding would rupture. It was a good plan.
But when the spent fuel pools started filling up, the NRC approved \DEF{dense-packing, which quadrupled the capacity of the pools by encasing each bundle of fuel elements in a neutron absorbing shield to avoid criticality. The problem is NRC's own study indicated that air cooling would no longer keep the elements intact if the pool were drained.\cite{sandia-1979} The NRC justified dense-packing by doing a PRA, which came up with a probability of pool draining of less than one in one million per pool year. I have no idea how they arrived at this probability. The NRC itself admitted than the probability does not take into account terrorist attacks.
So now we have some 35,000 tons of used fuel sitting in vulnerable spent fuel pools waiting for something bad to happen and cause a major release in order to put off spending about 0.05 cents per kWh for a few years. Absolutely nuts, but with bogus probabilities you can defend just about anything.
PRA means we don't have to test. Glory be.
PRA was concocted by the 1974 Reactor Safety Study (RSS). Their job was to show that the worst case in the Brookhaven Study (WASH 1400) had such an extremely low probability, we don't have to worry about it. They were given this job after Brookhaven National Laboratory, despite intense pressure from the AEC, refused to come up with this probability, honestly saying: ``a quantitative determination of reactor accident probabilities cannot be made at this time due to the paucity of input data."\cite{ford-1982}[p 77] At the time, the RSS results were considered fraudulent by almost all statisticians.
The RSS was reviewed by the Lewis Panel, a group of prominent physicists, almost all of whom consulted to the US government. As they politely put it, ``Based on our experience with problems of this nature involving very low probabilities, we do not now have confidence in the presently calculated values of the probabilities."\cite{lewis-1975} In other words, your probabilities are bogus. Steven Hanauer, one of the key NRC organizers of the RSS, earlier wrote in 1971, ``I do not consider the numerical results [from fault tree analysis] to be reliable.\cite{ford-1982}[p 146] Even the NRC itself agrees. In 1979, the Commission announced
In the light of the [Lewis] Review Group's conclusions on accident probabilities, the Commission does not regard as reliable the Reactor Safety Study's numerical estimate of the overall risk of reactor accident.\cite{nrc-1979a}
Despite this, PRA was pounced on by the industry and NRC and became not just part of the regulatory process, but the centerpiece of this process. The reason was it relieved the industry of the need to do full scale casualty tests. The PRA paperwork might be horribly expensive, but it was a hell of a lot cheaper than building a plant just to put it through a series of rigorous stress tests.
For new nukes, PRA is a Catch 22
If existing nuclear technologies can't produce meaningful event trees and probabilities, think where that puts nuclear technologies for which we have no operating experience. We need a PRA before we can get a license. But in order to do a PRA, we need all sorts of probabilities. To get the data to do a meaningful PRA, we need some operating experience and a set of casualty tests. But we can't test without a license. Catch PRA.
PRA is a stupid lie
PRA is an embodiment of the nuclear power establishment's philosophy that any major casualty is intolerable where major casualty is defined as any unplanned release of radioactive material. The perception is that nuclear has to be perfect or at least claim to be perfect for political reasons. Since we can't actually say a large release is impossible, we use PRA to produce astronomically low probabilities and use those to imply that it is virtually impossible, or in the industry jargon ``not credible".
This is a stupid, self-defeating lie. Radioactive releases are inevitable; and when they do happen, public trust is lost for a very long time, if not forever. While we should take reasonable measures to make casualties like large radioactive releases rare, the real issue is what are the consequences of the casualty. How many people were killed? How many were injured? And most importantly, how does this compare with the alternatives?
Real Safety Analysis focuses at least as strongly on the consequences as the casualty itself. In dealing with the latter, the underlying principle is: if it can happen, it will happen. This avoids made up probabilities. It avoids a lie that is certain to backfire. And now we can go about the process of designing plants which have reasonably low --- albeit unknown --- probability of major casualties and, when those casualties occur, reasonably low consequences.
If prior to TMI the nuclear power establishment had said
We are working hard to make casualties such as core meltdown very rare. But sooner or later we will have a major casualty at a nuclear plant, and, when that happens, we have taken a series of measures to insure that over time nuclear will result in far fewer deaths and injuries than coal, or gas, or oil.
Then when TMI happened, the establishment would have been able to say:
Damn, we had a major casualty. Lost a brand new plant. We will learn from it just like the airlines learn something from every crash, and use that info to make such casualties rarer.
But thank God, the casualty was almost entirely contained and nobody was hurt. Nuclear remains by far the safest source of electricity. This slide shows the up to date numbers.
No lies. No loss of trust.
Prior to the Three Mile Island release, a key weapon in pruning the bush down to a manageable tree was the ``single failure criterion" which was interpreted to mean we don't have to consider sequences of events involving multiple failures. This defied experience in which the vast number of major casualties involve a chain of failures the non-occurrence of any of which would have avoided the actual outcome. TMI gave birth to the Interim Reliability Evaluation Program which was supposed ``to identify high risk accident sequences and determine regulatory initiatives to reduce these high-risk sequences". What in the world was PRA doing up to this point? In any event, the IREP goal explicitly admits we are only looking at a very small part of the bush.
One consequence is more costly shutdowns. If we add a fourth pump to get more redundancy, we increase the probability of a pump failure. But under NRC rules if any of the pumps are down, the plant must shutdown until it is fixed.
The flooding required three valves in series to fail. The valves were rarely tested. PRA would prescribe a fourth valve. A much better solution would be frequent tests of a two valve system. And if you're depending on pump redundancy, don't put them in the same space.
Fantastic writing! This should be must read for anyone interested in NP, clean power, real safety, policies that work etc. Always wondered how this PRA come to be as they are, and there was always tingling feeling that it must be easy to get any desired number. And then forget about something important which will happen in real life.
Congratulations. You nailed it by laying out sound principles & indisputable logic. I have tried to summarize & fill in the blanks of what I understood: Not in any order:
NPP Technology "Fitness for Purpose" Principles:
1. Design for simplicity
2. Design for consequence mitigation & asset hardening
3. Tie exclusion zone to worst accident case after PRA rationalization. Set insurance rates as a function of Exclusion Zone radius.
4. Independently Monitor Operations to ensure no encroachment upon Defense-in-Depth & PRA envelope
Operations:
1. Tighten Monitoring for complexity & fragility i.e., non-normal states, interdependencies, unpredicted behaviour
2. Tighten Monitoring to ensure no encroachment upon Defense-in-Depth & PRA envelopes. This includes any interdependencies between envelopes. Set triggers to review D-in-D & PRA "envelopes"