Categories
Architecture Chat Collaboration Integration Site Reliability SOA Troubleshooting

Connecting Supply with Demand

(10/18-5/20)

Challenge

Ongoing reliability issues with 3rd party chat solution crucial for business operation w/o documentation and integration monitoring.

Action

The key aspect of the Decorist experience is the connection between the Client (Demand-side) and Designer (Supply-side.) To facilitate that connection, many years prior, the business had included a (at that time) nascent Chat-as-a-Solution provider as part of the user experience; it made sense to “Buy” instead of “Build.”

Architecture Overview

The chat UX is loaded into an iFrame. When users chat, their payload is posted to the 3rd party’s backend. The 3rd party then fires a webhook to Decorist which tracks the event in the DB and fires a transactional email depending on business logic.

The Problem

The ways the bugs manifested boiled down to:

  • chat UX not appearing (like below, often as the result of a 3rd party deploy gone bad)
  • emails not sending because webhooks not called

Lacking integration monitoring, issues often bubbled up through first-tier support.

Chat not loading

Improving the Dependency

Noticing the reliability issues, I first delegated to one experienced engineer to triage and then another.

I also dug in on my own and discovered/remedied bugs in our own webhooks while also providing data-backed reliability outage information to 3rd party, escalating to 3rd party’s CTO when necessary

Almost quarterly, as a company, the decision to continue using the 3rd party is re-visited given the reliability issues. Each time, though all stakeholders are aware of the pain collectively experienced, the decision has been made to punt replacing the solution.

Results

  • Ensured remediation of 3rd party issues in 24-36h, even on lowest support tier.
Categories
Capacity Planning Collaboration Culture Forecasting Process Program Management Roadmapping

Introducing Program Management

(11/19-3/20)

Challenge

Absence of project or program management made it challenging to forecast delivery and could not get engineer time set-aside for maintenance.

Action

Investigated a few tools (Portfolio for Jira, BigPicture, Aha) for visually depicting dependencies and estimating team capacity.

Ultimately introduced lightweight (spreadsheet-based) processes to support capacity planning and crafted a business case for the CEO on why we should invest in infrastructure.

Results

  • Empowered the organization by increasing capacity planning visibility.
Categories
Emails Execution Process

A Tale of Two Email Providers

(Decorist : 8/18-2/20)

Challenge

Transctional emails were being inefficiently managed via two email providers.

Action

Worked with Product and Design to take inventory of transactional emails needing to be migrated, created line items in Epic as project plan, and incorporated sprint-by-sprint as capacity allowed.

Results

  • Saved business $120K/yr by enabling shutdown of Iterable account.
Categories
Process Quality Troubleshooting

Zero to Hero QA

(Decorist : 10/18-1/20)

Challenge

Inherited a codebase having no unit nor functional testing nor CI/CD.

Action

Recruited a Sr. Automation QA Engineer, faciliating the introduction of automated testing to the company, and had him create Happy-Path coverage of main UXes.

After three months, all Happy Paths of demand- and supply-side user experiences were covered.

After six months, 100% of site (positive and negative scenarios) has been covered.

Results

  • Introduced automated QA testing, saving the team from shipping show-stopping bugs.
Categories
Analytics Growth Troubleshooting

Jumping In For Analytics

(Decorist : 11/19-11/19)

Challenge

A newly-added Custom Dimension VWO wasn’t tracking.

Action

We had just introduced VWO into the stack for A/B testing when one of the experiements wasn’t reporting correctly.

I jumped in with PROD to debug and worked with VWO to resolve and figured out there was a race condition with our Angular code preventing the beaconing of the Custom Dimension.

I fixed the issue and deployed to prod.

Results

  • Collaborated w/Product Manager to get VWO experimentation working.
Categories
eCommerce Performance Engineering Site Reliability

Always Be Improving

(Decorist : 8/18-12/19)

Challenge

Leading-by-example to own and improve systems as sole ENGR having SRE/DevOps/Frontend/Backend experience.

Action

Mar 2019

Watching our AWS costs rise ~8% monthly…

Costs Rising

I learned about and subscribed to Reserved Instances to realize costs savings for our hosting spend:

Dec 2019

Though not leading to cost savings or revenue generation, part of my responsbilities have been database administration, jumping in when the production DB would spike like below, figuring out if a runaway process needed to be terminated, if a slow query was bringing it to its knees, if a cron job was introducing load, or whatever needed to be done to keep the site up.

Or when bots would crawl the site, bringing it down, necessitating an IP block:

Or when digging into the logs to find that a route was 500ing and had to be fixed:

Mar 2020

Using Cloudcraft, I diagrammed our AWS infrastructure, identifying and deleting 1000 unused SQS instances.

Also identified and deleted numerous unused RDS snapshots:

All changes led to a yet another 37% reduction in MoM AWS costs:

Results

  • Saved company 115% of my salary in 2019 through process improvements.
Categories
Collaboration Process

CCPA

(Decorist : 7/19-12/19)

Challenge

Getting ready for CCPA.

Action

Worked with Corporate Counsel to determine necessary site updates/changes, slated Jira stories for the team in conjunction with in-house Designer.

Worked with Corporate IT to receive customers access and delete requests via SFTP and created a bash script for parsing as delete or access requests.

Results

  • Met 1/1/20 deadline to launch for CCPA.
Categories
Backend eCommerce Frontend Performance Engineering Process

What To Do With a 9s Page Load?

(Decorist : 9/18-12/19)

Challenge

Inherited a two-fold challenge: 1) less-than-optimal user experience of 9.01s Avg. Page Load Speed and 2) a culture that did not yet value performance engineering.

Action

Idenfitied initial frontend and backend low-hanging fruit (e.g. page structure, image resolutions, N+1 queries, lazy-loading, etc.) Identifed and delegated KRs to FE leads. Introduced process to methodically follow up each week, pressing the case for performance engineering over months.

Below, a sampling …

Some initial efforts shaved ~3s of the Homepage:

Lighthouse score soared from 5 to 61:

There were multiple speed improvements of up to 50% on various pages across the site as a result of backend query improvements and a +300% bump YoY in SEO traffic (verified by 3rd party SEO consultant)

After Dec 2019, we shifted focus to non-converted UXes after having addressed low-hanging fruit for all UXes.
Sept 2018 to Dec 2019

Results

  • Record Avg. Site Page Load Speed of prevous four years : 9.03s (Sep 2018) to 4.13s (Dec 2019.)
Categories
Architecture eCommerce Management Process Re-platforming Roadmapping SOA

Changing Engine Mid-flight : Again

(Decorist : 8/18-12/19)

Challenge

Take a half-baked v1.5 SOA and continue re-platforming to support scaled integration with corporate parent and its subsidiaries.

Action

Aug 2018

When I joined, it was with the idea that I would be instrumental in helping scale the Decorist relationship-creation paradigm leading to higher AOV to its corporate parent – Bed Bath and Beyond – and its subsidiaries: Buy Buy Baby and Cost Plus World Market.

I inherited the beginnings of a movement away from a Django/Angular/MySQL monolith and towards a Django/React/Postgres multi-tenant platform; a v1.5 of the application architecture. My predecessor had extracted some facets of the monolith to build out the initial stages of the SOA in partnership with another Bed Bath and Beyond subsidiary: One Kings Lane.

Although labeled a multi-tenant platform, it had been built quickly – effectively as a prototype – to service one tenant and was not truly extensible for other silos, though it did encompass solid SOA principles for reuse.

Dec 2018

As part of roadmapping for 2019, wanting to strategically position ‘scalability’ for the corporate org, I worked with my boss – SVP Prod/Tech – to detail a project plan to leverage the exisitng nascent SaaS paradigm for the next generation of the Decorist user experience.

We were sanctioned by the CEO for reduced scope on the roadmap; it would be the very first effort internally for moving towards a new platform to future-power the site. We focused on porting a back-office, manual task of determinging supply-side match and availability with demand-side need.

Wanting to keep a engineer engaged, I shepherded him and helped him understand the Cost-Benefit trade-offs we would need to make in order to meet the deadline, setting a course of re-using the User service for authentication, builidng out a new React-based UI, and integrating with the monolith.

Feb 2019

A few months later, we delivered the feature of supply-side Matching and Availability:

Shortly after launch, business priorities shifted towards a focus on a new eCom offering and we had to back-burner re-platforming efforts.

Dec 2019

Given guidance that 2020 might be the year we could re-visit re-platforming, I ideated around what was still lacking in v1.5 and what would need to be built out, creating a project plan and technical roadmap for the go-forward.

I worked with my boss again around planning and prioritization and we again provided guidance to CEO. Ultimately, we could not secure the necessary corporate integration buy-in for further progress and have had to defer any further progress given other priorities.

Results

  • Mentored Lead Engineer towards implementation of first feature for new SOA paradigm.
Categories
Distributed Teams eCommerce Management Process Release Management

Managing Distributed Teams (pre-COVID)

(Decorist : 8/18-12/19)

Challenge

Inherited a new, flat, remote, full-time team (web) of 25 (mostly) junior engineers in Delhi when dominant local (SF) office culture was not optimized for remote work. Needed to transform them into Bay Area-level talent while maneuvering them into a manageable hierarchy while at the same time incorporating a contracting team (VR) of 13 in Lahore

Action

Team Organization

Re-organized (and became productive) as Tribes.

Transitioned work culture to remote by getting stakeholders more comfortable with off-peak-hour meetings.

Processes

Introduced Agile paradigm / SCRUM meetings.

Leveraged Google Forms to craft surveys towards quantitatively and qualitatively bettering processes.

Introduced story-point estimations.

Began mastering Jira / work-breakdown through Epics, etc. Started Automated Testing.

Introduced Coda for Program Management

Performance

Tried Koan for performance management and remote visibility.

Introduced reviews and performance tracking with PeopleGoal, empowering local Indian General Manager with structural tools.

Introduced OKRs.

Introduced accountability through personal / sprint goals.

Introduced bi-weekly series email “Get to Know Me!” (as managed through Google Forms); highlighting two Delhi team members per installment. Also eventally created team videos using smartphone-captured content, stitched together through iMovie.

Releases

Created branching and release strategies.

Provided team with Release Notes template for improving communication with stakeholders.

Speed

Incorporated Performance Engineering mindset.

Quality

Led by example by including hyperlinks and project IDs in correspondance.

Reminded people to provide more context in Jira stories.

Reminded QA to provide easy, concise steps-to-verify and testing credentials to stakeholders.

Introduced the concept of “Bug Severity,” Estimated Time to Resolution, and a simple Google Form for reporting (and emailing) bugs to multiple engineering stakeholders who can triage.

Overall, the whole company has seen a qualitative improvement of content of communications between distributed offices; we are all now incorporating more details leading to faster turn-around of higher-quality code.

Results

  • Created outcome-driven processes, best practices, structure, and mentorship for a new, offshore engineering team.