In this deep-dive episode, Brian Scanlan, Principal Systems Engineer at Intercom, describes how the company’s on-call process works. He explains how the process started and key changes they’ve made over the years, including a new volunteer model, changes to compensation, and more.
Discussion points:
(1:28) How on-call started at Intercom
(10:11) Brian’s background and interest in being on-call
(14:06) Getting engineers motivated to be on-call
(16:37) Challenges Intercom saw with on-call as it grew
(19:53) Having too many people on-call
(23:20) Having alarms that aren’t useful
(26:03) Recognizing uneven workload with compensation
(27:22) Initiating changes to the on-call process
(30:08) Creating a volunteer model
(33:02) Addressing concerns that volunteers wouldn’t take action on alarms
(34:40) Equitability in a volunteer model
(36:36) Expectations of expertise for being on-call
(40:56) How volunteers sign up
(44:15) The Incident Commander role
(46:19) Using code review for changes to alarms
(50:02) On-call compensation
(52:50) Other approaches to compensating on-call
(55:08) Whether other companies should compensate on-call
(57:32) How Intercom’s on-call process compares to other companies
(1:00:46) Recent changes to the on-call process
(1:04:13) Balancing responsiveness and burnout
(1:07:12) Signals for evaluating the on-call process
Mentions and links:
Brian’s article: How we fixed our on call process to avoid engineer burnout
Gergely Orosz’s On-Call Compensation
Intercom’s approach to a great on-call experience | Brian Scanlan (Intercom)