nat@natwelch.com +1 707 799 8675
Based in Beacon, NY, I am a Software Developer, Technical Lead and Manager. I have been writing software for the web professionally since 2005. I have two passions in software. The first is building beautiful tools for people to share knowledge, advice and experience. Second is creating reliable infrastructure that is easy to use and maintain. Outside of building systems, I love growing team culture and helping folks grow in their careers.
Outside of the tech-world, I am an Eagle Scout. I enjoy contributing to open source projects, writing, reading, fishing, listening to music and wandering through cities and countrysides. I maintain a personal website at natwelch.com.
Experience
Author & Speaker
- Co-Author of the book Reliable Webservers with Go from Newline. 2021.
- Author of the book Real World SRE from Packt Publishing. 2018.
- Published in Issue Three and Issue Six of Code Words. 2015 & 2016.
- Spoke at LinuxConf 2014, Strange Loop 2017, SRECon Americas 2017, SRECon Americas 2019, Illuminate 2022 and others.
- Since 2015, Mentor through natwelch.com/wiki/mentoring. From 2016 to 2020, I mentored through Out of Office Hours. In 2022 and 2023 I mentored through ADPList. I help folks with career and architecture questions weekly, averaging around 30 individuals a year.
Open Source Developer
I am the former lead of the open source project fog-google (~75 million installs), am the lead maintainer on danluu/post-mortems and also contribute to a bunch of other small projects.
Nat Welch Consulting
Sole Proprietor 2016 to present.
- Helped with software infrastructure for Cory Booker's 2020 campaign launch.
- Helped with final weeks of Harris for President 2024 Campaign's software infrastructure.
- Various consultations with small and medium startups around infrastructure, incident response, navigating vendors and handling scale.
Laurel (fka Time by Ping)
Engineering Manager November 2024 to Present.
- Manage infrastrucutre and security teams.
- Contribute code to all mico-services.
- In charge of all technical vendor relationships and evaluating new vendors.
Principal Software Engineer, Cloud Platform Lead November 2020 to November 2024.
- Led the infrastructure team.
- Wrote code for our micro-services in Typescript, Go, Python and React.
- Managed our cloud infrastructure, databases (Postgres and MongoDB), security, automation, developer tooling, observability (OpenTelemetry), performance and reliability efforts.
- In charge of all technical vendor relationships and evaluating new vendors. I drove a vendor minification project that shrank us from nine observability providers to one and three CI/CD providers to one. I also am a member of one vendor CAB and have helped multiple vendors implement features to help our developers be more successful.
- Led two significant migrations, one from Aptible to MongoDB Atlas, the other from Aptible to AWS EKS.
- I manage our infrastructure budget and work closely with our Head of Finance to manage and report our CoGS and other technical spend. I have led multiple projects to lower our costs and improve efficiency.
- Provide architectural guidance. Define infrastructure and reliability requirements for teams. I also regularly drive myself and the team to push our contributions back to the open source community.
- I often do technical communication with customers. I led integrations with four large law firms, one of the world's largest accounting firms.
- I define and manage our on-call policies and rotations, and am an active member of the infrastructure rotation.
- Led the team to acquire SOC II Type 2 and HIPAA.
Senior Site Reliability Engineer November 2018 to November 2020.
- Worked on the Customer Reliability Engineering team. CRE helps customers achieve Google-level reliability by partnering with them to implement SRE operational best practices. I gave presentations to groups of varying levels of seniority and size at every level of every size company. I helped companies architect and plan for global launches, re-architect on-prem systems as they moved to cloud, and develop SRE programs. My role was a mix of Tech Lead, Developer Advocate, Software Developer and Traveling Consultant.
- CRE's small team is listed as one of the top three strengths of GCP in both the 2018 and 2019 IAAS Gartner Report.
- Built multiple data pipelines to evaluate customer reliability.
- Regularly gave talks to customers on building reliable systems to groups as large as 500 and gave presentations that helped close multi-million dollar sales deals.
- Worked as the SRE lead on Google's Covid19 Exposure Notification system.
First Look Media
Lead Site Reliability Engineer March 2017 to October 2018.
Migrated three services from Colos to AWS. Maintained Terraform config and AWS for company. Improved deploy reliability and automation, wrote new features for most services and refactored entire ECS infrastructure. Mentored engineers around infrastructure, reliability and architecture design. Automated capacity planning, started a postmortem culture, and improved performance and reliability of our CMS platform. Wrote Go and Node.js with extensive work with a GraphQL API.
Hillary for America
Staff Site Reliability Engineer January 2016 through December 2016.
Promoted reliability in both our web serving infrastructure and data analytics pipelines. Built tools and infrastructure to prevent humans from making mistakes while sleep deprived. Survived constant attacks with minimal external visible downtime the entire campaign.
littleBits Electronics
Technical Lead August 2015 to January 2016.
- Led optimization efforts for holiday traffic. Cut average site load time in half and shrunk average API response time by two magnitudes.
- Managed a team of three full time software engineers. Helped define code review and code style policies for the development team.
Site Reliability Engineer III April 2012 to March 2015.
- SRE for Google Compute Engine in London and San Francisco. My job included being part of an on-call rotation and writing software to maintain, monitor and optimize millions of servers globally.
- While on SRE I also worked on Google Cloud Storage and designed and built Google Cloud Status.
Software Engineer II August 2011 to April 2012.
I worked on Punchd, Google Offers and Google Local.
Punchd
Software Developer January 2011 to September 2012.
Maintained backend app. Migrated to AWS. Acquired by Google.
iFixit
Software Developer April 2009 to April 2011.
Built and launched Answers. Greatly improved wiki and image processors. Wrote the first version of the oManual specification.
Pseudoweb Contracting
Software Developer 2005 to 2009.
I was a software developer contractor dealing mainly in web design and Linux systems management.
Adobe Systems Incorporated
Dreamweaver Quality Engineering Intern Summer 2007 and 2008.
BSA Camp Oljato
Nature Director Summer 2006.
Camp Counselor Summer 2002, 2003 and 2004.
Education
Computer Science, B.S. California Polytechnic State University, San Luis Obispo. Fall 2006 - Spring 2011.
Recurser Recurse Center, New York. Spring 2015