Categories
Affiliate Growth Innovation Troubleshooting

Business Model Shift

(3/20-9/20)

Challenge

Given a steady stream of revenue, fundamentally alter the underlying business model towards CPC/CPA.

Action

Phase 1

In March 2020 pre-COVID, did a quick dive on Skimlinks documentation and put together a quick explanation/overview to demonstrate how easy it would be to include. Business climate wasn’t right so didn’t pursue.

Phase 2

Sep 2020, post-CEO-stepping-down and several months after Phase 1, explored use of Viglink (Sovrn) only to discover that business had previously been CPA-based before fulfillment was brought in-house ~2017.

Digging around, found some old Skimlinks code, then leveraged updated documentation to prove click-tracking with custom params could still work.

Phase 3

The business, being unsure whether to use one or several affiliate networks, needed partnership to figure out the best implementation path forward. For four networks, I kludge’d scripts into production and verified clicks, evaluating custom parameters as they’d flow through the lifecycle and eventually be reported through APIs.

When the business decided to focus on Commission Junction and Skimlinks, I led development to integrate JS libs (monkey-patching CJ to play-well on the same page as Skimlinks) and verify clicks / custom params.

Result

Overcame fits-and-starts to deliver affiliate model.


Categories
Collaboration Database Troubleshooting

Fixing Database Replication

(6/20-6/20)

Challenge

Usually our MySQL->Postgres data replication powering the business’s analytics dashboard runs without issue. One day, it completely failed.

Action

  • Spotted AWS alert notification and worked with Data Engineer to realize it wasn’t a normal hiccup in our replication pipeline.
  • Stopped the AWS DMS task for replication.
  • Examined Cloudwatch logs to see if we could find direction as to where the problem was. This told us there was a table (awsdms_apply_exceptions) that didn’t exist.
  • Dug into online documentation about the issue.
  • Created a new Postgres copy target of the analytics database.
  • Created a new AWS DMS task with the copy target DB which should create table public.awsdms_apply_exceptions.
  • Grabbed the DDL statement (e.g. CREATE TABLE) for the awsdms_apply_exceptions table.
  • In the (original) analytics DB, created the ‘public’ schema
  • Also in the (original) analytics DB, applied the CREATE TABLE for awsdms_apply_exceptions.
  • Deleted 1) the new Postgres copy target and 2) the AWS DMS task as cleanup.
  • (Never did figure out why the public schema and table disappeared.)

Results

  • Resolved data replication issue leading to minimal downtime for analytics dashboard.
Categories
Forecasting Management Process Troubleshooting

Rolling With the Punches

(11/18-5/20)

Challenge

Finding the right-sized engineering team as the business ebbed and waned.

Action

Nov 2018 – Mar 2019

Two months after joining, in Oct 2018, was surprised by request to provide KLO budget slashing engineering by 60%.

Having only a basic understanding of team members’ strengths and weaknesses, I anticipated the following year’s needs and then presented guidance w/SVP PROD & CEO to corporate parent COO.

We secured fiscal year funding to ensure team/business continuity @ 38 headcount.

Dec 2019

In Nov 2019, was informed we needed to reduce our 25 person Delhi team to 8 for the 2020 fiscal year. I looked over the skillsets of the team and – with a solid year under my belt and experience of who top performers were – decided who would stay.

Saved company $800K by reducing engineering headcount from 38 to 25.

Mar 2020

Then, with the onset of COVID, needed to further reduce team-size, for both India and Pakistan teams.

Given impact of COVID, made decisions leading to add’l $840K reduction from 25 to 9.

Results

  • Adjusted team size as necessary to meet needs of the business.
Categories
Architecture Chat Collaboration Integration Site Reliability SOA Troubleshooting

Connecting Supply with Demand

(10/18-5/20)

Challenge

Ongoing reliability issues with 3rd party chat solution crucial for business operation w/o documentation and integration monitoring.

Action

The key aspect of the Decorist experience is the connection between the Client (Demand-side) and Designer (Supply-side.) To facilitate that connection, many years prior, the business had included a (at that time) nascent Chat-as-a-Solution provider as part of the user experience; it made sense to “Buy” instead of “Build.”

Architecture Overview

The chat UX is loaded into an iFrame. When users chat, their payload is posted to the 3rd party’s backend. The 3rd party then fires a webhook to Decorist which tracks the event in the DB and fires a transactional email depending on business logic.

The Problem

The ways the bugs manifested boiled down to:

  • chat UX not appearing (like below, often as the result of a 3rd party deploy gone bad)
  • emails not sending because webhooks not called

Lacking integration monitoring, issues often bubbled up through first-tier support.

Chat not loading

Improving the Dependency

Noticing the reliability issues, I first delegated to one experienced engineer to triage and then another.

I also dug in on my own and discovered/remedied bugs in our own webhooks while also providing data-backed reliability outage information to 3rd party, escalating to 3rd party’s CTO when necessary

Almost quarterly, as a company, the decision to continue using the 3rd party is re-visited given the reliability issues. Each time, though all stakeholders are aware of the pain collectively experienced, the decision has been made to punt replacing the solution.

Results

  • Ensured remediation of 3rd party issues in 24-36h, even on lowest support tier.
Categories
Process Quality Troubleshooting

Zero to Hero QA

(Decorist : 10/18-1/20)

Challenge

Inherited a codebase having no unit nor functional testing nor CI/CD.

Action

Recruited a Sr. Automation QA Engineer, faciliating the introduction of automated testing to the company, and had him create Happy-Path coverage of main UXes.

After three months, all Happy Paths of demand- and supply-side user experiences were covered.

After six months, 100% of site (positive and negative scenarios) has been covered.

Results

  • Introduced automated QA testing, saving the team from shipping show-stopping bugs.
Categories
Analytics Growth Troubleshooting

Jumping In For Analytics

(Decorist : 11/19-11/19)

Challenge

A newly-added Custom Dimension VWO wasn’t tracking.

Action

We had just introduced VWO into the stack for A/B testing when one of the experiements wasn’t reporting correctly.

I jumped in with PROD to debug and worked with VWO to resolve and figured out there was a race condition with our Angular code preventing the beaconing of the Custom Dimension.

I fixed the issue and deployed to prod.

Results

  • Collaborated w/Product Manager to get VWO experimentation working.
Categories
Emails Process Site Reliability Troubleshooting

Flying Blind

(Decorist : 5/19-10/19)

Challenge

Lack of logging/monitoring (including for email sending/deliverability) made it impossible to know if integrations were affecting system/application uptime.

Action

Odd for an email marketing company: stakeholders would ask about email deliverability and engineering had no insight because no loggging/tracking had ever been instrumented.

Planned initiatve for adding integration monitoring while digging deep into application code and Amazon SES, instrumenting new features going forward, but given time and resource constratints, not retrofitting existing ones.

Results

  • Created visibility into metrics for company’s key component: email marketing.
Categories
Database Monitoring ORM Performance Engineering Troubleshooting

40s to 10s

(Decorist : 10/19-10/19)

Challenge

Business-critical page used internally and externally was taking 40s to load, then started timing out for everyone.

Action

  • Dug in, realized DB CPU Util was pegging at 100%, found and killed runaway DB process.
  • Setup alarm to be notified whenever CPU over 70%.
  • Page was still taking 40s, realized queries weren’t being logged, turned quereies on, saw LEFT OUTER JOINS across eight large tables.
  • Removed six lines of Django ORM select_related syntax without affecting page functionality.

Results

  • Dropped page load speed for critical internal Admin UX from 40s to 10s.
Categories
Analytics Troubleshooting

Improving Our Data Collection

(Decorist : 6/19-6/19)

Challenge

Anytime instrumentation changes were made, different engineers applied different standards of implementation, causing discrepencies in data collection.

Action

Created templatized checklist in Jira (in conjunction with Dir of Data Sci) to improve analytics instrumentation.

Results

  • Introduced checklist leading to consistent data analytics collection, growing trust with non-technical stakeholders.
Categories
Analytics Troubleshooting

Missing Mixpanel

(Decorist : 3/19-5/19)

Challenge

Late discovery that new initaitive/partnership was missing a key metrics instrumentation.

Action

Jumped in and instrumented, verifying in Mixpanel. Worked with Director of Data Science to formulate Jira stories breaking down the problem into discrete chunks/units-of-work towards cleaning up/adding new events in the future.

Results

  • Launched new strategic initiative on time and empowered the business to have greater analytics clarity.