Senior Platform Engineer
Company: OnHires
Location: San Francisco
Posted on: February 16, 2026
|
|
|
Job Description:
Job Description Job Description About Us: We are building a
robust, scalable trading platform to serve high-traffic,
latency-sensitive applications. Our infrastructure leverages
state-of-the-art technologies to support real-time trading while
providing unparalleled reliability and performance. Join us to
shape the future of our platform and engineering culture. Job
Summary: We are looking for a Senior DevOps & Platform Engineer to
lead the design, implementation, and management of our AWS-centric
infrastructure. You will play a pivotal role in maximizing the
velocity of our product engineering team, ensuring platform
scalability, reliability, and security. This is a high-impact role,
combining elements of DevOps, Platform Engineering, and Site
Reliability Engineering (SRE). You will champion best practices,
shape the engineering culture, and ensure our platform is robust,
efficient, and ready for the future. Key Responsibilities: Platform
Engineering Infrastructure Design: Architect and implement scalable
infrastructure to support the deployment and management of our
trading platform. Developer Tooling: Build and maintain internal
tools to streamline developer workflows, including advanced CI/CD
pipelines. Infrastructure as Code (IaC): Champion IaC practices
using Terraform, CloudFormation, or Pulumi. Core Services
Management: Manage and optimize platform-critical services such as:
NATS Cluster RabbitMQ AWS RDS PostgreSQL Redis Cluster DevOps
Automation and CI/CD: Automate and optimize deployment processes to
ensure seamless continuous integration and delivery. Container
Orchestration: Manage and scale containerized workloads using
Kubernetes and Docker. Cloud Optimization: Monitor and optimize
cloud resource usage for performance and cost efficiency. Site
Reliability Engineering (SRE) Reliability Metrics: Define and
maintain Service Level Objectives (SLOs) and Service Level
Indicators (SLIs). Monitoring & Observability: Implement
observability tools and dashboards (e.g., Prometheus, Datadog,
Grafana) for real-time system monitoring. Incident Management: Lead
incident response efforts, conduct root cause analysis, and
implement actionable postmortem reviews. Infrastructure Management
AWS Expertise: Architect and manage cloud-based systems to handle
high-traffic, latency-sensitive applications. Disaster Recovery:
Implement robust disaster recovery and business continuity
strategies, including backups and multi-region failover. Security
Practices: Collaborate with security teams to enforce best
practices for IAM, encryption, and compliance. Collaboration &
Leadership Cross-Team Collaboration: Partner with software
engineers to design infrastructure solutions tailored to their
application needs. Culture Building: Help shape the engineering
culture, promoting a philosophy of security, velocity, and
reliability. Mentorship: Mentor junior engineers and document best
practices to drive knowledge sharing and operational excellence.
Long-Term Tech Evolution Backend Transition: Contribute to evolving
our backend microservices (currently NodeJS, with some Python and
C#) towards Go and Rust. Third-Party Integration: Evaluate and
integrate critical third-party software and infrastructure, such as
payment gateways and mobility stacks. Your Impact: Simplify
infrastructure concerns for product teams to accelerate builds,
deployments, and scaling. Advocate for modern practices like Zero
Trust Networking and continuously improve platform architecture.
Balance the demands of product velocity with a well-managed,
secure, and scalable platform. Required Skills & Experience:
Technical Expertise Cloud Experience: 5-8 years of hands-on
experience with cloud platforms, particularly AWS, including
services like EC2, RDS, S3, Lambda, and VPC. Containerization:
Proficiency with Docker and Kubernetes (EKS) or ECS. Infrastructure
as Code (IaC): Strong experience with Terraform, CloudFormation, or
Pulumi. Programming Skills: Proficiency in at least one programming
language (e.g., Python, Go, TypeScript/JavaScript, Ruby, Java).
DevOps & SRE CI/CD Pipelines: Expertise in building and maintaining
CI/CD workflows using tools like GitLab CI, Jenkins, or GitHub
Actions. Monitoring Tools: Experience with observability platforms
(e.g., Prometheus, Datadog, Grafana). Incident Management: Proven
ability to handle incident response, root cause analysis, and
postmortem reviews. Soft Skills Problem-Solving: Ability to
research, design, and deliver solutions to complex infrastructure
challenges. Collaboration: Experience working directly with product
engineers to improve workflows incrementally. Leadership: Ownership
mindset with the ability to mentor team members and advocate for
best practices. Preferred Skills (Nice-to-Have): Familiarity with
backend languages like Go or Rust. AWS certifications (e.g.,
Solutions Architect, DevOps Engineer). Experience with networking
concepts (e.g., load balancers, DNS, VPNs) and traffic
optimization. Knowledge of emerging CNCF technologies and CI/CD
trends. What We Offer: Competitive salary with future equity
options Opportunities to work with cutting-edge technologies and
evolve our platform. Flexible working hours and a remote-friendly
environment. Professional growth through certifications,
conferences, and internal training. Collaborative culture focused
on innovation and operational excellence.
Keywords: OnHires, San Mateo , Senior Platform Engineer, IT / Software / Systems , San Francisco, California