nat@natwelch.com +1 707 799 8675
Based in Beacon, NY, I am a Software Developer, Technical Lead and Manager. I have been writing software for the web professionally since 2005. I have two passions in software. The first is building beautiful tools for people to share knowledge, advice and experience. The second is creating reliable infrastructure that is easy to use and maintain. Outside of building systems, I love growing team culture and helping folks grow in their careers.
Outside of the tech-world, I am an Eagle Scout. I enjoy contributing to open source projects, writing, reading, fishing, listening to and creating music and wandering through cities and countrysides. I maintain a personal website at natwelch.com.
Writing, Speaking & Open Source
Author & Speaker
- Co-Author of the book Reliable Webservers with Go from Newline. 2021.
- Author of the book Real World SRE from Packt Publishing. 2018.
- Actively working on my third book, titled "locative.garden."
- Published in Issue Three and Issue Six of Code Words. 2015 & 2016.
- Spoke at LinuxConf 2014, Strange Loop 2017, SRECon Americas 2017, SRECon Americas 2019, Illuminate 2022 and others.
- Since 2015, mentor via natwelch.com/wiki/mentoring. From 2016 to 2020, I mentored through Out of Office Hours, and in 2022 and 2023 through ADPList. I help folks with career and architecture questions weekly, averaging around 30 a year (over 300 engineers to date).
Open Source
Former lead of fog-google (~89 million installs), lead maintainer of danluu/post-mortems (over 12k GitHub stars), and author of ~440 public repos across infrastructure, tooling, and personal projects.
Experience
Nat Welch Consulting
Sole Proprietor 2016 to present.
- Helped with software infrastructure for Cory Booker's 2020 campaign launch.
- Helped with final weeks of Harris for President 2024 Campaign's software infrastructure.
- Various consultations with small and medium startups around infrastructure, incident response, navigating vendors and handling scale.
Laurel (fka Time by Ping)
Engineering Manager November 2024 to Present.
- Manage infrastructure and security teams.
- Contribute code to all micro-services.
- In charge of all technical vendor relationships, evaluating new vendors and managing engineering budget.
- Led efforts to regionalize all data processing to comply with data privacy laws and requirements.
- Drove adoption of continuous deployment using ArgoCD across all teams.
- Improved developer experience by promoting AI adoption and building internal tools.
- Led the team to build and deploy a variety of security focused features, including audit logs and BYOK.
Principal Software Engineer, Cloud Platform Lead November 2020 to November 2024.
- Led the infrastructure team.
- Wrote code across our micro-services in TypeScript, Go, Python and React.
- Managed cloud infrastructure (AWS, Terraform), databases (Postgres and MongoDB), security, automation, developer tooling, observability (OpenTelemetry), performance and reliability efforts.
- Owned all technical vendor relationships and evaluated new vendors. Drove a vendor consolidation effort that shrank us from nine observability providers to one and three CI/CD providers to one. Sat on one vendor CAB and helped multiple vendors ship features that made our developers more successful.
- Led two significant migrations: Aptible to MongoDB Atlas, and Aptible to AWS EKS.
- Managed the infrastructure budget alongside our Head of Finance, reporting on CoGS and technical spend. Led multiple projects to lower costs and improve efficiency.
- Provided architectural guidance and defined infrastructure and reliability requirements for product teams. Regularly drove myself and the team to push contributions back to the open source community.
- Owned much of our technical communication with customers, and led integrations with four large law firms and one of the world's largest accounting firms.
- Defined and managed our on-call policies and rotations, and was an active member of the infrastructure rotation.
- Led the team to acquire SOC 2 Type 2 and HIPAA.
Senior Site Reliability Engineer November 2018 to November 2020.
- Worked on the Customer Reliability Engineering team. CRE helps customers achieve Google-level reliability by partnering with them to implement SRE operational best practices. My role was a mix of Tech Lead, Developer Advocate, Software Developer and Traveling Consultant.
- Helped companies architect and plan for global launches, re-architect on-prem systems as they moved to cloud, and develop SRE programs. Built multiple data pipelines to evaluate customer reliability.
- CRE's small team is listed as one of the top three strengths of GCP in both the 2018 and 2019 IAAS Gartner Report.
- Regularly talked to customers on building reliable systems, for groups as large as 500. Some of those presentations helped close multi-million dollar deals.
- SRE lead on Google's COVID-19 Exposure Notification System, the cross-platform Apple/Google protocol deployed by U.S. states and dozens of countries.
First Look Media
Lead Site Reliability Engineer March 2017 to October 2018.
- Migrated three services from colos to AWS. Maintained Terraform config and AWS infrastructure for the company.
- Improved deploy reliability and automation, wrote new features for most services and refactored our entire ECS infrastructure.
- Automated capacity planning, started a postmortem culture, and improved performance and reliability of our CMS platform.
- Mentored engineers around infrastructure, reliability and architecture design.
- Wrote Go and Node.js, with extensive work on a GraphQL API.
Hillary for America
Staff Site Reliability Engineer January 2016 to December 2016.
Survived constant attacks throughout the 2016 cycle with minimal externally visible downtime. Promoted reliability across web-serving infrastructure and data analytics pipelines, and built tools and guardrails to prevent humans from making mistakes while sleep deprived.
littleBits Electronics
Technical Lead August 2015 to January 2016.
- Led optimization efforts for holiday traffic. Cut average site load time in half and shrunk average API response time by two magnitudes.
- Managed a team of three full time software engineers. Helped define code review and code style policies for the development team.
Site Reliability Engineer III April 2012 to March 2015.
- SRE for Google Compute Engine in London and San Francisco. Joined the on-call rotation. Wrote software to maintain, monitor and optimize millions of servers globally.
- While on SRE I also worked on Google Cloud Storage and designed and built Google Cloud Status.
Software Engineer II August 2011 to April 2012.
I worked on Punchd, Google Offers and Google Local.
Punchd
Software Developer January 2011 to September 2012.
Maintained backend app. Migrated to AWS. Acquired by Google.
iFixit
Software Developer April 2009 to April 2011.
Built and launched Answers. Greatly improved wiki and image processors. Wrote the first version of the oManual specification.
Pseudoweb Contracting
Software Developer 2005 to 2009.
Web design and Linux systems management for various clients.
Early Career
Adobe Systems (Dreamweaver QE Intern, Summers 2007 & 2008), Cal Poly CSC Department (Computer Lab Staff, 2007), County of Sonoma ISD (Software Development Intern, Summer 2005), and BSA Camp Oljato (Nature Director 2006, Camp Counselor 2002 to 2004).
Education
Computer Science, B.S. California Polytechnic State University, San Luis Obispo. Fall 2006 to Spring 2011.
Recurser Recurse Center, New York. Spring 2015