Categories
Budgeting Collaboration Roadmapping VR

3D Cost Modeling

(7/20-7/20)

Challenge

To scale v2 of our shop-the-room experience, the business turned to me to understand how our process would grow cost-wise.

Action

The interim CEO asked me to determine costs as we considered beefing up our rendering pipeline by orders of magnitude.

Applying a KISS paradigm, I threw together an inital back-of-napkin estimate. That was good enough, but then the interim CEO needed a deeper level of understanding, so I worked with my Pakistani Technical Project Manager to build out a more comprehensive version of the model, taking parameters into account like the following:

  • Scale Factor
  • Model Throughput Per Week
  • Aspirational Efficency Factor
  • Cost Per Model
  • Number of Modelers Needed
  • Model Category

Results

  • Delivered cost model iteratively for a 3D modeling pipeline to support shop-the-room.
Categories
Collaboration Database Troubleshooting

Fixing Database Replication

(6/20-6/20)

Challenge

Usually our MySQL->Postgres data replication powering the business’s analytics dashboard runs without issue. One day, it completely failed.

Action

  • Spotted AWS alert notification and worked with Data Engineer to realize it wasn’t a normal hiccup in our replication pipeline.
  • Stopped the AWS DMS task for replication.
  • Examined Cloudwatch logs to see if we could find direction as to where the problem was. This told us there was a table (awsdms_apply_exceptions) that didn’t exist.
  • Dug into online documentation about the issue.
  • Created a new Postgres copy target of the analytics database.
  • Created a new AWS DMS task with the copy target DB which should create table public.awsdms_apply_exceptions.
  • Grabbed the DDL statement (e.g. CREATE TABLE) for the awsdms_apply_exceptions table.
  • In the (original) analytics DB, created the ‘public’ schema
  • Also in the (original) analytics DB, applied the CREATE TABLE for awsdms_apply_exceptions.
  • Deleted 1) the new Postgres copy target and 2) the AWS DMS task as cleanup.
  • (Never did figure out why the public schema and table disappeared.)

Results

  • Resolved data replication issue leading to minimal downtime for analytics dashboard.
Categories
Architecture Chat Collaboration Integration Site Reliability SOA Troubleshooting

Connecting Supply with Demand

(10/18-5/20)

Challenge

Ongoing reliability issues with 3rd party chat solution crucial for business operation w/o documentation and integration monitoring.

Action

The key aspect of the Decorist experience is the connection between the Client (Demand-side) and Designer (Supply-side.) To facilitate that connection, many years prior, the business had included a (at that time) nascent Chat-as-a-Solution provider as part of the user experience; it made sense to “Buy” instead of “Build.”

Architecture Overview

The chat UX is loaded into an iFrame. When users chat, their payload is posted to the 3rd party’s backend. The 3rd party then fires a webhook to Decorist which tracks the event in the DB and fires a transactional email depending on business logic.

The Problem

The ways the bugs manifested boiled down to:

  • chat UX not appearing (like below, often as the result of a 3rd party deploy gone bad)
  • emails not sending because webhooks not called

Lacking integration monitoring, issues often bubbled up through first-tier support.

Chat not loading

Improving the Dependency

Noticing the reliability issues, I first delegated to one experienced engineer to triage and then another.

I also dug in on my own and discovered/remedied bugs in our own webhooks while also providing data-backed reliability outage information to 3rd party, escalating to 3rd party’s CTO when necessary

Almost quarterly, as a company, the decision to continue using the 3rd party is re-visited given the reliability issues. Each time, though all stakeholders are aware of the pain collectively experienced, the decision has been made to punt replacing the solution.

Results

  • Ensured remediation of 3rd party issues in 24-36h, even on lowest support tier.
Categories
Capacity Planning Collaboration Culture Forecasting Process Program Management Roadmapping

Introducing Program Management

(11/19-3/20)

Challenge

Absence of project or program management made it challenging to forecast delivery and could not get engineer time set-aside for maintenance.

Action

Investigated a few tools (Portfolio for Jira, BigPicture, Aha) for visually depicting dependencies and estimating team capacity.

Ultimately introduced lightweight (spreadsheet-based) processes to support capacity planning and crafted a business case for the CEO on why we should invest in infrastructure.

Results

  • Empowered the organization by increasing capacity planning visibility.
Categories
Collaboration Process

CCPA

(Decorist : 7/19-12/19)

Challenge

Getting ready for CCPA.

Action

Worked with Corporate Counsel to determine necessary site updates/changes, slated Jira stories for the team in conjunction with in-house Designer.

Worked with Corporate IT to receive customers access and delete requests via SFTP and created a bash script for parsing as delete or access requests.

Results

  • Met 1/1/20 deadline to launch for CCPA.
Categories
3D Collaboration Process Troubleshooting VR

Salvaged PR Opportunity

(Decorist : 3/19-3/19)

Challenge

Marketing opportunity showcasing company’s proprietary shop-the-room modeling was four months behind and at-risk after key in-house designer departed and PR client was losing patience.

Action

I took ownership of project management, working with new designer and remote Pakistani VR engineering team, introducing a lightweight process (spreadsheet-based and daily 15m check-ins) in order to get backlog of shop-the-room models modeled and bundled into our VR (Unity-based) design app.

Results

Categories
APIs Collaboration Emails Management

Improving Deliverability

(The RealReal : 3/17-9/17)

In a bid to

  1. improve deliverability of our email campaigns (by sending from same IP) and
  2. free Engineering resources (by shifting email template management to Marketing)

… a(n under-resourced) team1 I was head of was tasked with further migrating from SendGrid templates to ExactTarget.

Given competing priorities, resource availability, and trying to support the engineer’s professional development around further full-stack experience, I tasked the Sr. Frontend Engineer (newest member of the team and who had never done email marketing before) with the port.

For some further context, by the time I had arrived in Q3/16, approximately 10% of system-generated transactional emails had already been ported (such as the demand-side signup confirmation email,) so there was already some precedent and knowledge in the team on how to do such ports.

Scope Change

Working with Product and Marketing, we identified the accounting email sent monthly to supply-side users (30K recipients, arguably the most important email in the system) as the next candidate for porting. According to what we could deduce from the documentation and making certain assumptions about implementation based on similar (previous) endeavors, it seemed like the port would be a slam-dunk in two weeks.

As the engineer got into implementation, I realized that the scope was much larger than originally anticipated. Instead of a one-off API call like had previously been the case (e.g. for the demand-side signup confirmation email,) we were looking at a fundamentally different business process – not of a one-off send but a batch send of 30K emails as part of the monthly payout managed by the Director of Accounting (as kicked-off by a single admin button-push.)

Furthermore: if a demand-side signup confirmation email did not land in a user’s inbox, certainly that was problematic, but if a supply-side user’s monthly accounting email (or, worst case, multiple 1000s of them) did not arrive, that would cause an exponential increase in Customer Service requests and – much worse – violate the company’s contractual SLA with supply-side customers around informing them what their monthly payout would be.

We (all of us – me, the engineer, Marketing, Design, the Product Manager) had wandered into uncharted territory by tackling a template that was blasted out at such volume. We had to revisit our assumptions about how to build out the feature.

Feature Diligence

To emulate a send at a user level was fairly trivial; simply invoke a method call from a console. To truly emulate the batch blast on staging proved much more complicated, as two conditions were necessary: a block of 30K test users and integration with (and kick-off of) the monthly payout (which took four hours to finish.)

Though I assumed the engineer would be able to navigate an unfamiliar domain because he had some full-stack abilities and had been able to reason systematically about architecture in other circumstances, he had difficulties iterating at a unit and functional level to get a local (development) email send working. I coached him on the first tactic – providing guidance to test code at various levels (unit, integration) to mitigate risk – focusing on a method call as it would SMTP email from a local dev environment and then the invoke the ExactTarget API. Then, I coached him to deploy a feature branch integrating with the payout onto staging (where users had already been anonymized) and setup a test blast of 200.

With that blast, we ran into a race-condition around data extensions – which no one in either Engineering or Marketing had complete mastery of – where emails were being delivered empty (simply explained: because no sequential dependency existed between data population and emailing.) Over the following weeks, we tried to determine the best way forward while interacting with our ExactTarget sales representative and then their engineers. We experienced suboptimal support/delays in her getting back with answers to our questions.

Result

After two months of development effort, back-and-forths with Product and Marketing, and several back-and-forths with ExactTarget sales reps and engineers, in a meeting shortly before production go-live of the port (involving the VP PROD, CTO, and VP Marketing,) given the potential risk to the business of such important emails not being delivered, an executive decision was made to abandon the port.

Takeaways

If I had the opportunity to manage it again, I would have:

  1. found a way to free up time for the more experienced Full-stack engineer – who had had some experience with ExactTarget but was overcommitted at that point on other priorities – to initially pair with the Sr. Frontend Engineer. That would have exposed the initial underestimation in scope earlier, leading to a better cost-benefit analysis (for all stakeholders) given the information we had.
  2. advocated for refactoring, given what I learned about the state of the code, architecture, and process, and how tightly-coupled the mailing code was with the payout code (making even a semblance of proxy integration testing arduous.) Refactoring would have allowed us the opportunity to identify business critical functions and ensure we had reliable automated testing around them.

In reflecting on what we could take away around process improvements, I have learned to leverage these ideas:

  1. kick off any new project with a technical design review
  2. understand the typical load of a feature before making a change
  3. identify system changes affecting user experience as one-off or batch (which impacts scope accordingly)
  4. for anything involving a 3rd party, scope work at 3X as a rule-of-thumb (this heuristic is informed not only by this project but by others that involved 3rd party APIs)

I hope that one further follow-on highlighted by the experience for Product and Engineering executives was that, given not only the difficulties in this project but others involving Salesforce/ExactTarget, smaller vendors with less confusing/better/easier technical documentation could be more cost-effective in the long run.

[1] : a Jr. Full-stack Engineer (2 yrs. exp.,) a Full-stack Engineer (7 yrs. exp.,) and a Sr. Frontend Engineer (7 yrs. exp.)

Results

  • Led engineering effort to improve email deliverability (which we ultimately had to abandon.)
Categories
Collaboration Innovation Management

Collaboration @ TRR

(The RealReal : 9/16-5/17)

Facilitated smooth cross-team (PROD, DSGN, Platform, Mobile, Web) interactions towards scoping and delegating engineering work for:

  • Black Friday/Cyber Monday
  • modifications to supply-side NUX towards increased signups
    • first in altering/improving the basic flow
    • then while incorporating different verticals (Art Home)
  • conversion of non-responsive UIs to responsive
  • shipping/up-sell through Narvar
  • payment processing through Hyperwallet
  • analytics reporting shift from Snowplow to Segment
  • Google Tag Manager modifications/optimizations
  • cross-browser test coverage of business critical paths through Rainforest QA
  • shoppability/conversion usability improvements (e.g. faceted navigation filters)

Results

  • Built rapport with business partners (Product Managers, Designers, Marketing) and direct reports to get work done.
Categories
APIs Architecture Backend Collaboration Frontend Innovation SOA SPAs

Next Gen FE

(Bluxome Labs : 7/16-7/16)

Leading the Charge

Upon returning from vacation, I disovered that the VP ENGR gave the recently-joined Team Lead carte blanche around moving ahead with the next phase of re-engnineering the codebase for a further SOA-influenced frontend and backend split.

Being intimately familiar with the current system architecture, in a race against time for Week One (only had a week before VP ENGR wanted Best Practices/new workflows/adoption of new technolgies and because one substanitally differing proposal – likely much more costly from development and operational standpoints – was on the table,) I set out to formulate a concrete strategy towards delivering in the short-term while providing a basis to iterate against a longer-term architectural shift.

Architecting the Solution

First, I advocated for the adoption of the ‘Launch Page’ as the testbed with which to move forward. Then, I reviewed that page in order to reverse-engineer whatever model attributes I might be able to use (in a Backbone model) as exposed by in a simplistic GET request on page load for initial state.

I then modeled that simplistic endpoint using the Swagger Editor

Next, I dumped the YAML of that endpoint and leveraged Swagger code gen tools to create an initial version of the endpoint in Node.js (and Ruby and Java) and fired up that Node.js for serving HTTP requests.

At that point, having proved to myself the utility of Swagger for spinning-up some simple Node.js services, I turned to building out a placeholder SPA using React that would consume a Node.js service. I opted to stay within the Rails paradigm (instead of adopting an approach of statically hosting assets elswhere) and first used react_on_rails to incorporate React and then the react-rails-hot-loader for HMR.

I wound up needing to patch the latter locally in order to get Web Sockets working correctly in my environment.

Having put the Big Rocks in place, I could hand off the SPA POC to the F2E consultant for further refinement and could focus on building more of a case for the overall approach which the VP ENGR then gave his blessing to.

Building the API

During this phase (Weeks Two and Three,) I implemented GET, POST, and PUT JSON-serving routes in Rails (not to have to worry about CORS) while also developing in the context of Amazon’s Lambda and API Gateway, taking the Node.js server stub above and dropping it in as an HTTP Proxy to collate information from existing API endpoints (as microservices.)

I also worked with five backend specialists and my work guided the conversation around establishing Best Practices for API development.

Results

  • Championed bottoms-up approach to decouple microservices, influenced VPE/Team Lead/F2E to adopt, and prototyped using React / Hot Module Reloading
Categories
Architecture Building buy-in Collaboration demand-side eCommerce Frontend Leadership Performance Engineering Process

UI Tech Debt

(Bluxome Labs : 3/16-6/16)

A remnant of the legacy codebase from the company’s origins eight years ago, the two most important routes on the platform for quality assurance had never been moved from the company’s original Merb app to its companion (upgraded) Rails experience (back when a first movement towards SOA happened three years prior.)

They were rightly regarded with trepidation given their dependence on MooTools when the rest of the platform had been moved to jQuery, especially as the Jasmine suite for their coverage had been mothballed about 18 months before and those same MooTools libs were tightly-coupled between two platform applications.

Towards a future of less frontend tech debt, I seized on the opportunity to champion and shepherd the project (as a Q2 engineering goal) through to completion.

Over the course of three months, I led the effort around project scoping, weekly communication around engineering effort, and architecting and leading the implementation, interfacing with Product and Design to ensure quality in light of the absent test coverage.

Results

  • Delivered on decoupling strategy for static asset management / Webpack’d bundles while collabratively iterating with VPE/VP PROD and Sr. Engineers