Categories
Database Monitoring ORM Performance Engineering Troubleshooting

40s to 10s

(Decorist : 10/19-10/19)

Challenge

Business-critical page used internally and externally was taking 40s to load, then started timing out for everyone.

Action

  • Dug in, realized DB CPU Util was pegging at 100%, found and killed runaway DB process.
  • Setup alarm to be notified whenever CPU over 70%.
  • Page was still taking 40s, realized queries weren’t being logged, turned quereies on, saw LEFT OUTER JOINS across eight large tables.
  • Removed six lines of Django ORM select_related syntax without affecting page functionality.

Results

  • Dropped page load speed for critical internal Admin UX from 40s to 10s.
Categories
Management Monitoring Process Site Reliability Troubleshooting

All the False Positives

(Decorist : 9/18-12/18)

Challenge

Inherited a situation where there were incomprehensible 1000+ issues per day in exception reporting software (Sentry.)

Action

Identifed and delegated KR to FE lead to create a daily process to chip away at remediation. Encouraged accountability by having FE lead give status report weekly.

Results

  • 1000+ to 7 system alert notifications.