Repeater Failure Planning: SOPs for When the Public Safety System Goes Down

A repeater dies. A trunked site goes offline. A microwave link drops in a thunderstorm. Your crews are still on a working fire and the radios just got a lot quieter. This is the SOP framework that keeps everyone alive and on the same channel when the system fails.

In this guide
  1. Why repeater failure planning is not optional
  2. How public safety systems actually fail
  3. Detection: knowing the system is down
  4. The four-tier fallback model
  5. Simplex and talkaround tactics on the fireground
  6. A working SOP template
  7. Training and exercise: making the SOP muscle memory
  8. Capturing what happened: the after-action loop

Why repeater failure planning is not optional

Most departments that run on a county or regional public safety system treat the radio system the way they treat the power grid. It works. It almost always works. The few times it doesn't are uncomfortable but brief.

Then a tower loses commercial power, the generator fails to transfer, the backup batteries last six hours, and you have a working structure fire two hours into hour seven. Now your crews are inside a residential structure and the IC's portable will not key up. That is not a hypothetical. That sequence appears in FEMA after-action reports going back two decades, with surprising regularity.

FEMA's after-action publications from major incidents repeatedly list communications failures as a primary contributing factor to firefighter and law enforcement injuries. The 9/11 Commission Report flagged radio interoperability as a critical gap. After-action reviews from Hurricane Katrina, the 2017 California wildfires, and Hurricane Maria all describe extended periods where public safety repeaters and trunked sites were offline for hours or days. The pattern is consistent: when systems go down, departments that planned for it kept operating. Departments that didn't, didn't.

The Department of Homeland Security has been telling us this for twenty years. The SAFECOM Interoperability Continuum and the National Emergency Communications Plan (NECP) both name resilience and survivability as core capabilities. Translation for a working chief: your primary system will fail at some point, and your job is to have a plan that does not depend on it.

The hard truth

If your "backup plan" is a single sentence in a binder that says "switch to channel 2," you don't have a plan. You have a wish.

How public safety systems actually fail

Knowing the failure modes shapes the SOP. Here are the ones that show up most often in real outages.

Single repeater failure

The simplest case. One conventional VHF or UHF repeater goes down. Could be a transmitter PA failure, a duplexer issue, antenna or feedline damage from ice or wind, or a power supply. Coverage on that channel collapses to whatever simplex range your portables have, which on a fireground is typically a quarter-mile to a mile depending on terrain and building construction.

Site failure on a trunked system

Trunked P25 systems (Phase 1 or Phase 2) usually have multiple sites. When one site loses its backhaul to the master controller or the controller itself goes down, the site can drop into "site trunking" mode, which keeps local talkgroups working but cuts off cross-site communication. If the site loses everything, radios in that coverage area will scan and try to affiliate with the next strongest site, which may be too far away. Your crews will see "Out of Range" or "No Service" on the display.

Master controller / core failure

Rare but catastrophic. The trunked controller dies and every site drops into local trunking or fails over to a backup core. Patches between agencies break. Console operations at dispatch may go down. This is the failure mode that takes a regional system offline for hours.

Backhaul failure

Microwave links, fiber circuits, or T1/IP backhaul between sites and the dispatch center fail. The repeaters and sites might be physically fine, but they cannot reach dispatch. You will hear unit-to-unit traffic but no dispatch acknowledgment.

Power failure with depleted backup

Commercial power drops, generators don't start (or run out of fuel), batteries last their rated time and then quit. Most public safety sites are spec'd for 8 to 24 hours of battery backup and 72 hours of generator runtime if the fuel is topped off. In a multi-day storm event, that is not enough.

Console or CAD failure

The radio system is fine. The dispatch consoles are not. Dispatchers can't transmit, can't see talkgroup activity, or the CAD that drives status changes is down. Field units can still talk to each other but coordination from dispatch evaporates.

Cyber events

Increasingly common. A ransomware event hits the county IT network and either takes CAD down or affects the IP backhaul that the radio system rides on. Treat this as a backhaul failure but assume restoration takes days, not hours.

Each of these failure modes calls for a different fallback. A single repeater failure means you switch to a different channel. A master controller failure means you might be running the entire incident on direct simplex.

Detection: knowing the system is down

You cannot fall back to a plan if you do not know the system has failed. The first 30 seconds of an outage are the worst, because crews assume the problem is on their end. They check their volume, swap batteries, change channels. Meanwhile the IC has lost accountability.

A good SOP defines failure indicators clearly:

Build the failure check into your radio discipline. The first unit on scene calling for a working fire should always get an explicit acknowledgment from dispatch. If they don't, the IC's first action is a radio check, not a tactical assignment.

Train dispatchers on the announcement

When dispatch detects an outage (console alarm, supervisor notification, or repeated unit complaints), they should broadcast on every working channel: "Attention all units, the [system name] is experiencing a partial outage. All units are directed to [fallback channel] effective immediately." Short, scripted, repeatable. Many comm centers do not have this script written down. Write it.

The four-tier fallback model

DHS and SAFECOM materials describe interoperability in tiers. Adapt the same idea to failure planning. Build your SOP around four tiers, in order, and train every member to know which tier they're operating on.

Tier 1: Primary system, normal operation

Your daily talkgroup or repeater. Dispatch in the loop. Status changes via CAD or radio. Mutual aid available through patches.

Tier 2: Alternate channel on the same infrastructure

A different talkgroup or a different repeater on the same trunked system or conventional infrastructure. Useful when one channel is overloaded or one repeater is down but the system itself is fine. Dispatch is still in the loop. Most departments already use this for major incidents (a dedicated tactical channel).

Tier 3: Independent secondary system

A separate radio system not dependent on the primary. Examples that small and mid-size departments actually use:

Tier 4: Direct simplex / talkaround

No infrastructure. Radio to radio, line of sight, on the channel's transmit frequency with no repeater offset. This is the last resort and the most likely tier you will actually need on a fireground when the local repeater fails. Range is short, terrain matters, and dispatch is not in the loop unless someone is relaying.

The order matters. A good SOP does not jump from Tier 1 to Tier 4. The IC moves down one tier at a time, makes the call once, and broadcasts the change to all units before continuing tactical operations. Anyone who didn't get the change is now on a different channel from the rest of the crew. That is how people get hurt.

The pre-programmed "fallback bank"

Every portable in your department should have a clearly labeled zone or bank called something like "FALLBACK" or "BACKUP" with the Tier 2, Tier 3, and Tier 4 channels in a fixed order. Same order on every radio. Same labels. If you are reaching for the knob in a smoke-filled hallway, you should not have to think about which position you need.

Simplex and talkaround tactics on the fireground

When you drop to direct simplex on a working incident, the rules of the fireground change. You no longer have system-wide coverage. You have a bubble around the apparatus that may extend a quarter-mile in good conditions and 200 feet inside a residential structure with the doors closed.

Adjust tactics accordingly:

NFPA 1221 (now consolidated into NFPA 1225) deals with public safety communications systems and addresses backup and survivability for the dispatch side. NFPA 1561 covers incident management and the radio discipline expected on the ground. Both are worth pulling off the shelf when you write the SOP.

A working SOP template

Here is a structure that works for departments running 25 to 200 members. Adapt the specifics to your jurisdiction, but keep the bones.

Section 1: Purpose and scope

One paragraph. Names the purpose of the SOP, the systems it covers, and the personnel it applies to. Reference DHS / SAFECOM guidance and NFPA 1225 / 1561 as the authority basis.

Section 2: Definitions

Define terms explicitly. Repeater. Trunked site. Site trunking. Talkaround. Simplex. Mutual aid channel. Patch. Console. Tier 1 through Tier 4. Don't assume members coming through new academies know all of these. Many do not.

Section 3: Failure indicators

Bullet list of the five or six things that mean "the system is degraded or down." From the detection section above.

Section 4: Tier definitions

Name the actual channels at each tier. Be specific.

Frequencies are illustrative. Use yours.

Section 5: Decision authority

Who can call the tier change? Spell it out. Typically:

The IC does not need permission to drop to a lower tier on their incident. They need to announce it.

Section 6: Notification procedure

When a tier change is called, the announcement is made on the channel being abandoned and on the new channel. Three repetitions, slowly, with PAR (Personnel Accountability Report) to follow within five minutes on the new channel. Every member acknowledges the channel change. Anyone who doesn't is presumed lost on the old channel and someone is dispatched to find them physically.

Section 7: Dispatch coordination

If dispatch is still operational, they switch to the new tier with the field units. If dispatch is part of the failure (console down, CAD down), the SOP names the alternate dispatch process - might be the on-duty BC running tactical dispatch from a chief's vehicle, might be a neighboring center taking over on a backup talkgroup. Have it written.

Section 8: Mutual aid coordination

Name the regional plan you participate in. Reference the DHS NECP and your state communications interoperability plan (SCIP) by name. List the interop channels available in your radios and the agencies pre-authorized to share them.

Section 9: Documentation requirements

The IC documents the tier change, time, reason, and duration in the incident report. The shift commander notifies the radio system manager (county comm tech or vendor) of the failure with timestamps. This is what feeds the after-action loop.

Section 10: Training and drill requirements

Frequency of drills. Who runs them. What is tested. Covered in the next section.

Radios that aren't programmed are radios that don't work

None of this matters if half your portables don't have the fallback channels programmed, or have them in a different bank position than the apparatus radios. Auditing radio programming is a one-day project that prevents an indefinite problem. Do it once a year, after every fleet upgrade, and after every new member is issued a radio.

Training and exercise: making the SOP muscle memory

An SOP that has never been drilled is a document, not a procedure. The members who need to act on it during an actual outage are not going to read it on the fireground.

Tabletop, monthly

15 minutes at the start of a training night or shift meeting. The training officer presents a scenario: "It is 0200 on a Tuesday. You are at a working basement fire on Maple Street. Dispatch has gone silent and your last three transmissions were not acknowledged. Walk me through what you do." Cycle through scenarios for repeater failure, trunked site failure, console failure, full system outage. Members talk through the response. Five minutes of debrief.

Radio drill, quarterly

Take a piece of apparatus to the worst part of your district - a low spot, a steel-frame industrial building, a basement. Have the crew operate on Tier 4 simplex from inside. Map the dead spots. The fact that simplex doesn't reach the IC from the southeast corner of the church basement is something you want to know on a Saturday morning, not on a working fire.

Live exercise, annually

Coordinate with the 911 center and run a planned outage. Switch the entire department to Tier 3 for a two-hour shift, with dispatch coordinating on the alternate channel. Run actual calls (low-acuity ones, with a primary plan to switch back if anything serious comes in). Document what broke. Practice the PAR on the alternate channel.

Include mutual aid

At least once a year, your drill should include the neighboring departments who would arrive on a mutual aid box alarm. They need to know your fallback channels and you need to know theirs. Regional comm exercises run by your state interoperability coordinator are the cheapest way to do this.

The training requirement should be written into the SOP itself. NFPA 1561 expects regular exercise of the incident management system, and that includes the comms component. ISO PPC reviews increasingly look for documented comm training and outage drills as part of the dispatch and communications credit.

Capturing what happened: the after-action loop

Every outage, even a five-minute one, should generate a brief incident note. Not a full after-action report - just a structured log entry that captures:

Aggregate these quarterly. Patterns emerge. The repeater that drops every time it rains hard. The console that the night dispatcher cannot operate without help. The mutual aid channel that three of your portables don't have programmed. These are the actionable findings. Send them to the radio system manager, the comm center supervisor, and your county emergency management coordinator.

FEMA's after-action publications, available through the Lessons Learned Information Sharing platform and the Homeland Security Digital Library, are full of incidents where the gap between "we have a plan" and "the plan worked" was bridged only by departments that ran this kind of feedback loop. The DHS SAFECOM program and the National Council of Statewide Interoperability Coordinators (NCSWIC) publish guidance and templates that align with this approach. Useful starting points:

The single biggest finding across decades of after-action reports

Departments that maintained communications during major outages had practiced fallback procedures within the previous twelve months. Departments that lost communications during major outages had a written plan but had never drilled it. Time on the practice range is the difference.

A short closing note for small departments

If you are running a 25-member volunteer department in a rural county, this whole thing can feel oversized. It is not. The smallest departments are often the most exposed, because the county-run trunked system is twenty miles away, the closest mutual aid is fifteen minutes out, and you might be the only crew on scene for an hour. When the system goes down, you are it.

The minimum viable plan for a small department:

  1. Every portable has the same four channels in the same four positions: Primary, Alt, County Mutual Aid VHF, Simplex.
  2. Every member can name those four channels and the order without thinking.
  3. One radio drill a year that uses the simplex channel from inside a structure.
  4. A one-page laminated card on every apparatus with the tier definitions and the IC's notification script.
  5. An annual review with the county radio shop or comm center to verify the programming is current.

That's it. Five things. None of them are expensive. All of them save lives the day the tower stops working.

Document your comms SOP where the crew will actually find it

RunBoard's SOP and Training modules let you publish your fallback comms plan, track which members have signed off on it, schedule the quarterly radio drills, and log every outage with timestamps. The Asset module tracks portable programming versions so you know which radios are due for an audit. Built for departments that have to run this stuff with three people on a Tuesday.

Try RunBoard Free for 30 Days

Further reading