Categories
Collaboration Database Troubleshooting

Fixing Database Replication

(6/20-6/20)

Challenge

Usually our MySQL->Postgres data replication powering the business’s analytics dashboard runs without issue. One day, it completely failed.

Action

  • Spotted AWS alert notification and worked with Data Engineer to realize it wasn’t a normal hiccup in our replication pipeline.
  • Stopped the AWS DMS task for replication.
  • Examined Cloudwatch logs to see if we could find direction as to where the problem was. This told us there was a table (awsdms_apply_exceptions) that didn’t exist.
  • Dug into online documentation about the issue.
  • Created a new Postgres copy target of the analytics database.
  • Created a new AWS DMS task with the copy target DB which should create table public.awsdms_apply_exceptions.
  • Grabbed the DDL statement (e.g. CREATE TABLE) for the awsdms_apply_exceptions table.
  • In the (original) analytics DB, created the ‘public’ schema
  • Also in the (original) analytics DB, applied the CREATE TABLE for awsdms_apply_exceptions.
  • Deleted 1) the new Postgres copy target and 2) the AWS DMS task as cleanup.
  • (Never did figure out why the public schema and table disappeared.)

Results

  • Resolved data replication issue leading to minimal downtime for analytics dashboard.
Categories
eCommerce Performance Engineering Site Reliability

Always Be Improving

(Decorist : 8/18-12/19)

Challenge

Leading-by-example to own and improve systems as sole ENGR having SRE/DevOps/Frontend/Backend experience.

Action

Mar 2019

Watching our AWS costs rise ~8% monthly…

Costs Rising

I learned about and subscribed to Reserved Instances to realize costs savings for our hosting spend:

Dec 2019

Though not leading to cost savings or revenue generation, part of my responsbilities have been database administration, jumping in when the production DB would spike like below, figuring out if a runaway process needed to be terminated, if a slow query was bringing it to its knees, if a cron job was introducing load, or whatever needed to be done to keep the site up.

Or when bots would crawl the site, bringing it down, necessitating an IP block:

Or when digging into the logs to find that a route was 500ing and had to be fixed:

Mar 2020

Using Cloudcraft, I diagrammed our AWS infrastructure, identifying and deleting 1000 unused SQS instances.

Also identified and deleted numerous unused RDS snapshots:

All changes led to a yet another 37% reduction in MoM AWS costs:

Results

  • Saved company 115% of my salary in 2019 through process improvements.
Categories
eCommerce Frontend Innovation supply-side

Supply-side Refresh

(CrowdFlower : 7/13-12/13)

This was an enormous effort to overhaul a product whose UX had not been altered much in five years.

We took a piece-by-piece approach to swapping out components because of the complexity of the legacy behemoth. First, we refreshed the views in the legacy app, which involved changing styling in three different places (because the app had grown “organically” over the years, taking on three different styling paradigms styling was defined in custom stylesheets, in Less, and inline.)

In parallel, part of the team started building out the new peer Rails 3 app, the eventual destination for all views, complete with the company’s brand-new proprietary SSO solution (also built in parallel.) Finally, routing was updated to send all traffic to the Rails app.

Forming

Between August and September of 2013, we coalesced as a team under the project champion, the company’s CTO, and began formulating what the new UX should be and do.

Below is a screenshot of an example of the dashboard as seen by the end user (Merb, built in 2008)

Below is a screenshot of the progress of a microtask job, also as seen by the user (sensitive information redacted)

Norming

Between September and October of 2013, we cranked out the new experience.

Based on a design concept by the other F2E in the team, we began restyling low-risk interfaces of the system. The new design was not simply a reskin, but involved introducing a similar-yet-improved information architecture, an example of which can be seen below

Following are a few more example screenshots demonstrating the evolving look-and-feel

Configuration Panel

As we were tackling the UX, a backend engineer in a peer team was working in parallel to create a custom role-based SSO system that we would leverage for enforcing authentication and authorization in a new way for the company.

Shortly before the conference, a decision was made to go with a second design concept, not entirely different from the original, but a little more polished. A designer was requisitioned to provide the new design. From that point forward to product launch, we mostly fine-tuned the details.

The following screenshot demonstrates not only the new design but also the use of the new SSO solution, which can be seen where certain UI elements are disabled based on the user’s permissions

To QA the new experience, we ran it in alpha against production data repositories just prior to the conference.

Performing

After the launch, we maintained the product, adding features we had not been able to squeeze in.

Below is an example screenshot of how the final product shaped up

Results

  • Consolidated multiple styling paradigms for new UX ahead of company-sponsored conference.
Categories
CMS demand-side Frontend SPAs

Demand-side Internal Tooling

(CrowdFlower : 7/13-8/13)

The CrowdFlower platform is consumed via a number of microtasking sites. Each site registers and maintains its own users, but to better track unique identities across the CrowdFlower platform, we built a Single Page App in Ember.js to allow associating users across partner microtasking sites with one unique identifier in the CrowdFlower platform.

Results

  • Implemented a CRUD tool for managing users using Ember.js while iterating in conjunction with Product Manager as requirements changed.
Categories
Architecture demand-side Frontend Innovation SPAs TDD

Improving QA with SPA

CrowdFlower : 4/13-7/13)

Test Questions are used as the gold standard of quality in the CF platform, but they can be laborious to create, particularly for work that’s periodically repeated.

As no templatized solution existed, a team of three of us (me as F2E, Product Manager, and Backend Engineer) tackled creation of an internal product to simplify the workflow.

The user flow was to create “Cases” of Test Questions that got sent to jobs as “Batches”; where the composite idea of a “Mold” encompassed all “Cases” and “Batches” for a particular set of target jobs.

(“The Forge” was the product’s original name, derived from a time when “Test Questions” were known as “Gold.”)

One of the more challenging aspects of the project was the testing of the app. Selenium has always been a robust solution for testing even JS-heavy experiences, but given its heft, Poltergeist was used instead.

The product was to eventually be made available externally but never was.

Results

  • Built internal workflow tool using Ember.js.
Categories
demand-side eCommerce Frontend

Owned Demand-side UX

(CrowdFlower : 1/13-3/13)

The app is CrowdFlower’s most highly-trafficked app. It also happens to be one of the company’s most technically complex, given its history.

Its architecture is that of a Rails app, wrapping a Gem that extracted business logic from the company’s legacy (original) Merb app. The Gem contains all logic around rendering, styling, and providing interactivity for CML, the basis of abstracting microtasks in the platform.

The app was built (before my time) in order to bring a richer, more interactive experience to those doing microtasking work. When the original architect departed only weeks after I joined the company, maintenance and feature implementation fell to me.

Results

  • Supported site’s most highly-trafficked, revenue-generating UI (allowing for custom JS and CSS.)