(10/18-5/20)
Challenge
Ongoing reliability issues with 3rd party chat solution crucial for business operation w/o documentation and integration monitoring.
Action
The key aspect of the Decorist experience is the connection between the Client (Demand-side) and Designer (Supply-side.) To facilitate that connection, many years prior, the business had included a (at that time) nascent Chat-as-a-Solution provider as part of the user experience; it made sense to “Buy” instead of “Build.”
Architecture Overview
The chat UX is loaded into an iFrame. When users chat, their payload is posted to the 3rd party’s backend. The 3rd party then fires a webhook to Decorist which tracks the event in the DB and fires a transactional email depending on business logic.
The Problem
The ways the bugs manifested boiled down to:
- chat UX not appearing (like below, often as the result of a 3rd party deploy gone bad)
- emails not sending because webhooks not called
Lacking integration monitoring, issues often bubbled up through first-tier support.

Improving the Dependency
Noticing the reliability issues, I first delegated to one experienced engineer to triage and then another.
I also dug in on my own and discovered/remedied bugs in our own webhooks while also providing data-backed reliability outage information to 3rd party, escalating to 3rd party’s CTO when necessary
Almost quarterly, as a company, the decision to continue using the 3rd party is re-visited given the reliability issues. Each time, though all stakeholders are aware of the pain collectively experienced, the decision has been made to punt replacing the solution.
Results
- Ensured remediation of 3rd party issues in 24-36h, even on lowest support tier.