Repeater Failure Planning: SOPs for When the Public Safety System Goes Down
A repeater dies. A trunked site goes offline. A microwave link drops in a thunderstorm. Your crews are still on a working fire and the radios just got a lot quieter. This is the SOP framework that keeps everyone alive and on the same channel when the system fails.
- Why repeater failure planning is not optional
- How public safety systems actually fail
- Detection: knowing the system is down
- The four-tier fallback model
- Simplex and talkaround tactics on the fireground
- A working SOP template
- Training and exercise: making the SOP muscle memory
- Capturing what happened: the after-action loop
Why repeater failure planning is not optional
Most departments that run on a county or regional public safety system treat the radio system the way they treat the power grid. It works. It almost always works. The few times it doesn't are uncomfortable but brief.
Then a tower loses commercial power, the generator fails to transfer, the backup batteries last six hours, and you have a working structure fire two hours into hour seven. Now your crews are inside a residential structure and the IC's portable will not key up. That is not a hypothetical. That sequence appears in FEMA after-action reports going back two decades, with surprising regularity.
FEMA's after-action publications from major incidents repeatedly list communications failures as a primary contributing factor to firefighter and law enforcement injuries. The 9/11 Commission Report flagged radio interoperability as a critical gap. After-action reviews from Hurricane Katrina, the 2017 California wildfires, and Hurricane Maria all describe extended periods where public safety repeaters and trunked sites were offline for hours or days. The pattern is consistent: when systems go down, departments that planned for it kept operating. Departments that didn't, didn't.
The Department of Homeland Security has been telling us this for twenty years. The SAFECOM Interoperability Continuum and the National Emergency Communications Plan (NECP) both name resilience and survivability as core capabilities. Translation for a working chief: your primary system will fail at some point, and your job is to have a plan that does not depend on it.
If your "backup plan" is a single sentence in a binder that says "switch to channel 2," you don't have a plan. You have a wish.
How public safety systems actually fail
Knowing the failure modes shapes the SOP. Here are the ones that show up most often in real outages.
Single repeater failure
The simplest case. One conventional VHF or UHF repeater goes down. Could be a transmitter PA failure, a duplexer issue, antenna or feedline damage from ice or wind, or a power supply. Coverage on that channel collapses to whatever simplex range your portables have, which on a fireground is typically a quarter-mile to a mile depending on terrain and building construction.
Site failure on a trunked system
Trunked P25 systems (Phase 1 or Phase 2) usually have multiple sites. When one site loses its backhaul to the master controller or the controller itself goes down, the site can drop into "site trunking" mode, which keeps local talkgroups working but cuts off cross-site communication. If the site loses everything, radios in that coverage area will scan and try to affiliate with the next strongest site, which may be too far away. Your crews will see "Out of Range" or "No Service" on the display.
Master controller / core failure
Rare but catastrophic. The trunked controller dies and every site drops into local trunking or fails over to a backup core. Patches between agencies break. Console operations at dispatch may go down. This is the failure mode that takes a regional system offline for hours.
Backhaul failure
Microwave links, fiber circuits, or T1/IP backhaul between sites and the dispatch center fail. The repeaters and sites might be physically fine, but they cannot reach dispatch. You will hear unit-to-unit traffic but no dispatch acknowledgment.
Power failure with depleted backup
Commercial power drops, generators don't start (or run out of fuel), batteries last their rated time and then quit. Most public safety sites are spec'd for 8 to 24 hours of battery backup and 72 hours of generator runtime if the fuel is topped off. In a multi-day storm event, that is not enough.
Console or CAD failure
The radio system is fine. The dispatch consoles are not. Dispatchers can't transmit, can't see talkgroup activity, or the CAD that drives status changes is down. Field units can still talk to each other but coordination from dispatch evaporates.
Cyber events
Increasingly common. A ransomware event hits the county IT network and either takes CAD down or affects the IP backhaul that the radio system rides on. Treat this as a backhaul failure but assume restoration takes days, not hours.
Each of these failure modes calls for a different fallback. A single repeater failure means you switch to a different channel. A master controller failure means you might be running the entire incident on direct simplex.
Detection: knowing the system is down
You cannot fall back to a plan if you do not know the system has failed. The first 30 seconds of an outage are the worst, because crews assume the problem is on their end. They check their volume, swap batteries, change channels. Meanwhile the IC has lost accountability.
A good SOP defines failure indicators clearly:
- No acknowledgment from dispatch on a transmission you would expect to be answered. Single missed transmission, no big deal. Three in a row, something is wrong.
- Bonk tone or "out of range" indicator on a P25 or DMR portable when you are in normal coverage area.
- Site trunking announcement on the radio (some systems beep and display a site-trunking icon).
- Loss of carrier when you key up on a conventional system, or no PL/DPL squelch break on a working repeater.
- Dispatch broadcast announcing system status. The best ones do this proactively. Many don't.
Build the failure check into your radio discipline. The first unit on scene calling for a working fire should always get an explicit acknowledgment from dispatch. If they don't, the IC's first action is a radio check, not a tactical assignment.
When dispatch detects an outage (console alarm, supervisor notification, or repeated unit complaints), they should broadcast on every working channel: "Attention all units, the [system name] is experiencing a partial outage. All units are directed to [fallback channel] effective immediately." Short, scripted, repeatable. Many comm centers do not have this script written down. Write it.
The four-tier fallback model
DHS and SAFECOM materials describe interoperability in tiers. Adapt the same idea to failure planning. Build your SOP around four tiers, in order, and train every member to know which tier they're operating on.
Tier 1: Primary system, normal operation
Your daily talkgroup or repeater. Dispatch in the loop. Status changes via CAD or radio. Mutual aid available through patches.
Tier 2: Alternate channel on the same infrastructure
A different talkgroup or a different repeater on the same trunked system or conventional infrastructure. Useful when one channel is overloaded or one repeater is down but the system itself is fine. Dispatch is still in the loop. Most departments already use this for major incidents (a dedicated tactical channel).
Tier 3: Independent secondary system
A separate radio system not dependent on the primary. Examples that small and mid-size departments actually use:
- The county's old VHF conventional system that was kept in service when they upgraded to trunked P25.
- A regional mutual aid VHF or UHF channel licensed under 47 CFR Part 90 (the FCC public safety pool), often a state-designated common channel.
- The federal interoperability channels VCALL10, VTAC11-14 (VHF), UCALL40, UTAC41-43 (UHF), or the 700/800 MHz interop channels - if your radios are programmed for them and your agency has the appropriate sharing agreements in place.
- Amateur radio operators (47 CFR Part 97) under a formal MOU, used at the strategic and logistics level, not on the fireground.
Tier 4: Direct simplex / talkaround
No infrastructure. Radio to radio, line of sight, on the channel's transmit frequency with no repeater offset. This is the last resort and the most likely tier you will actually need on a fireground when the local repeater fails. Range is short, terrain matters, and dispatch is not in the loop unless someone is relaying.
The order matters. A good SOP does not jump from Tier 1 to Tier 4. The IC moves down one tier at a time, makes the call once, and broadcasts the change to all units before continuing tactical operations. Anyone who didn't get the change is now on a different channel from the rest of the crew. That is how people get hurt.
Every portable in your department should have a clearly labeled zone or bank called something like "FALLBACK" or "BACKUP" with the Tier 2, Tier 3, and Tier 4 channels in a fixed order. Same order on every radio. Same labels. If you are reaching for the knob in a smoke-filled hallway, you should not have to think about which position you need.
Simplex and talkaround tactics on the fireground
When you drop to direct simplex on a working incident, the rules of the fireground change. You no longer have system-wide coverage. You have a bubble around the apparatus that may extend a quarter-mile in good conditions and 200 feet inside a residential structure with the doors closed.
Adjust tactics accordingly:
- Tighten the span of control. If you were running four divisions, you may need to collapse to two. The IC needs to be able to reach every supervisor on simplex.
- Move the IC's command post. Get the IC physically closer to the structure. The driver-in-the-cab IC position works fine on a trunked system. On simplex with a working interior fire, the IC needs to be at the front of the apparatus or between the engine and the structure to maintain contact with interior crews.
- Use a dedicated relay. Assign one member - typically the safety officer or a chief's aide - whose only job is to relay traffic between the interior division and the IC if the radio path is marginal. This is sometimes called a "radio runner" and it is exactly as low-tech as it sounds.
- Reduce non-essential traffic. No status reports, no progress updates that aren't requested, no extraneous chatter. Every transmission on simplex blocks the channel for everyone.
- Mayday discipline becomes more critical. NFPA 1561 calls out the importance of tactical channels and accountability. On simplex, a Mayday transmitted from inside a structure may not reach command. Consider rotating brief channel checks every five minutes during interior operations - "Division A to Command, all clear on radio check" - so the IC catches a degraded link before a Mayday tries to use it.
- Mutual aid units may not have your simplex programmed. If a mutual aid engine arrives mid-incident on a normal trunked talkgroup that no longer works, they need somewhere to go. Have a "common simplex" channel that every department in your mutual aid box has agreed to use, and that every responding officer knows by heart.
NFPA 1221 (now consolidated into NFPA 1225) deals with public safety communications systems and addresses backup and survivability for the dispatch side. NFPA 1561 covers incident management and the radio discipline expected on the ground. Both are worth pulling off the shelf when you write the SOP.
A working SOP template
Here is a structure that works for departments running 25 to 200 members. Adapt the specifics to your jurisdiction, but keep the bones.
Section 1: Purpose and scope
One paragraph. Names the purpose of the SOP, the systems it covers, and the personnel it applies to. Reference DHS / SAFECOM guidance and NFPA 1225 / 1561 as the authority basis.
Section 2: Definitions
Define terms explicitly. Repeater. Trunked site. Site trunking. Talkaround. Simplex. Mutual aid channel. Patch. Console. Tier 1 through Tier 4. Don't assume members coming through new academies know all of these. Many do not.
Section 3: Failure indicators
Bullet list of the five or six things that mean "the system is degraded or down." From the detection section above.
Section 4: Tier definitions
Name the actual channels at each tier. Be specific.
- Tier 1 (Primary): [Department] Operations talkgroup, primary trunked system.
- Tier 2 (Alternate): [Department] Tac-2 talkgroup or county VHF Repeater 2 (151.xxxx, PL xxx).
- Tier 3 (Independent): County VHF Mutual Aid (155.xxxx simplex), state interop channel, or VCALL10.
- Tier 4 (Simplex): Department fireground simplex channel (154.xxxx, no PL on receive).
Frequencies are illustrative. Use yours.
Section 5: Decision authority
Who can call the tier change? Spell it out. Typically:
- The IC on an incident, for that incident.
- The on-duty supervisor or shift commander, department-wide.
- Dispatch, for system-wide outages affecting all units.
The IC does not need permission to drop to a lower tier on their incident. They need to announce it.
Section 6: Notification procedure
When a tier change is called, the announcement is made on the channel being abandoned and on the new channel. Three repetitions, slowly, with PAR (Personnel Accountability Report) to follow within five minutes on the new channel. Every member acknowledges the channel change. Anyone who doesn't is presumed lost on the old channel and someone is dispatched to find them physically.
Section 7: Dispatch coordination
If dispatch is still operational, they switch to the new tier with the field units. If dispatch is part of the failure (console down, CAD down), the SOP names the alternate dispatch process - might be the on-duty BC running tactical dispatch from a chief's vehicle, might be a neighboring center taking over on a backup talkgroup. Have it written.
Section 8: Mutual aid coordination
Name the regional plan you participate in. Reference the DHS NECP and your state communications interoperability plan (SCIP) by name. List the interop channels available in your radios and the agencies pre-authorized to share them.
Section 9: Documentation requirements
The IC documents the tier change, time, reason, and duration in the incident report. The shift commander notifies the radio system manager (county comm tech or vendor) of the failure with timestamps. This is what feeds the after-action loop.
Section 10: Training and drill requirements
Frequency of drills. Who runs them. What is tested. Covered in the next section.
None of this matters if half your portables don't have the fallback channels programmed, or have them in a different bank position than the apparatus radios. Auditing radio programming is a one-day project that prevents an indefinite problem. Do it once a year, after every fleet upgrade, and after every new member is issued a radio.
Training and exercise: making the SOP muscle memory
An SOP that has never been drilled is a document, not a procedure. The members who need to act on it during an actual outage are not going to read it on the fireground.
Tabletop, monthly
15 minutes at the start of a training night or shift meeting. The training officer presents a scenario: "It is 0200 on a Tuesday. You are at a working basement fire on Maple Street. Dispatch has gone silent and your last three transmissions were not acknowledged. Walk me through what you do." Cycle through scenarios for repeater failure, trunked site failure, console failure, full system outage. Members talk through the response. Five minutes of debrief.
Radio drill, quarterly
Take a piece of apparatus to the worst part of your district - a low spot, a steel-frame industrial building, a basement. Have the crew operate on Tier 4 simplex from inside. Map the dead spots. The fact that simplex doesn't reach the IC from the southeast corner of the church basement is something you want to know on a Saturday morning, not on a working fire.
Live exercise, annually
Coordinate with the 911 center and run a planned outage. Switch the entire department to Tier 3 for a two-hour shift, with dispatch coordinating on the alternate channel. Run actual calls (low-acuity ones, with a primary plan to switch back if anything serious comes in). Document what broke. Practice the PAR on the alternate channel.
Include mutual aid
At least once a year, your drill should include the neighboring departments who would arrive on a mutual aid box alarm. They need to know your fallback channels and you need to know theirs. Regional comm exercises run by your state interoperability coordinator are the cheapest way to do this.
The training requirement should be written into the SOP itself. NFPA 1561 expects regular exercise of the incident management system, and that includes the comms component. ISO PPC reviews increasingly look for documented comm training and outage drills as part of the dispatch and communications credit.
Capturing what happened: the after-action loop
Every outage, even a five-minute one, should generate a brief incident note. Not a full after-action report - just a structured log entry that captures:
- Date, time start, time restored.
- What system failed and what was the root cause if known.
- Which tier the department dropped to.
- Whether any incidents were active during the outage and how they were handled.
- Anything that didn't work as the SOP said it would.
Aggregate these quarterly. Patterns emerge. The repeater that drops every time it rains hard. The console that the night dispatcher cannot operate without help. The mutual aid channel that three of your portables don't have programmed. These are the actionable findings. Send them to the radio system manager, the comm center supervisor, and your county emergency management coordinator.
FEMA's after-action publications, available through the Lessons Learned Information Sharing platform and the Homeland Security Digital Library, are full of incidents where the gap between "we have a plan" and "the plan worked" was bridged only by departments that ran this kind of feedback loop. The DHS SAFECOM program and the National Council of Statewide Interoperability Coordinators (NCSWIC) publish guidance and templates that align with this approach. Useful starting points:
- CISA SAFECOM program - interoperability guidance, the SAFECOM continuum, and templates.
- National Emergency Communications Plan (NECP) - the federal framework for emergency communications resilience.
- FEMA Lessons Learned - searchable database of after-action material from real incidents.
- Your state's Statewide Communication Interoperability Plan (SCIP) - every state has one, published by the state interoperability coordinator.
Departments that maintained communications during major outages had practiced fallback procedures within the previous twelve months. Departments that lost communications during major outages had a written plan but had never drilled it. Time on the practice range is the difference.
A short closing note for small departments
If you are running a 25-member volunteer department in a rural county, this whole thing can feel oversized. It is not. The smallest departments are often the most exposed, because the county-run trunked system is twenty miles away, the closest mutual aid is fifteen minutes out, and you might be the only crew on scene for an hour. When the system goes down, you are it.
The minimum viable plan for a small department:
- Every portable has the same four channels in the same four positions: Primary, Alt, County Mutual Aid VHF, Simplex.
- Every member can name those four channels and the order without thinking.
- One radio drill a year that uses the simplex channel from inside a structure.
- A one-page laminated card on every apparatus with the tier definitions and the IC's notification script.
- An annual review with the county radio shop or comm center to verify the programming is current.
That's it. Five things. None of them are expensive. All of them save lives the day the tower stops working.
Document your comms SOP where the crew will actually find it
RunBoard's SOP and Training modules let you publish your fallback comms plan, track which members have signed off on it, schedule the quarterly radio drills, and log every outage with timestamps. The Asset module tracks portable programming versions so you know which radios are due for an audit. Built for departments that have to run this stuff with three people on a Tuesday.
Try RunBoard Free for 30 DaysFurther reading
- CISA SAFECOM - interoperability continuum, governance templates, and grant guidance.
- DHS National Emergency Communications Plan - the federal framework for resilient public safety communications.
- FEMA Lessons Learned - after-action material from past incidents.
- NFPA 1225 - public safety communications systems standard (consolidates the former NFPA 1221).