Site Reliability Engineer

Float - هندسة

عن بعد

الكتروني

experience managing Kubernetes

Deadline: 2025-08-16

About the Company

Float is the leading resource management software for professional services teams. Since 2012, we’ve grown every year—independently, self-funded, and profitably. We’re rated #1 for resource management on G2 and trusted by 4,500+ customers worldwide.

As a certified B Corporation, we’re committed to making a positive impact on our team, customers, the environment, and the remote community. Our 50+ person team works 100% remotely across the globe, with designed to support us in living our Best Work Life. You'll collaborate with teammates across Australia, Mexico, the UK, Nigeria, Canada, and the US. Learn more about our data security practices for employment or service contracts . Browse to get a glimpse of life at Float and check out our . See why our customers love Float .

We’re on a scale-up journey, and we’re seeking people who thrive in this stage. We want Float to be the place where you have the autonomy and opportunity to do the best work of your career.

Job Description

Why We’re Hiring For This Role

Float’s infrastructure has grown rapidly, meaning more customers, more complex systems, and more opportunities to build for scale. As the scale of our systems increases, we’re growing our SRE team to match. You’ll be the third site reliability engineer, and will be working alongside our QA team. This role is about stepping into a high-impact space: helping us automate smarter, improve visibility across engineering, and ensure reliability as we scale. You’ll join a team that’s laying the groundwork for stronger SLAs and an even better experience for our customers.

You’ll be working asynchronously with a bright, dedicated team from across the globe, with a strong focus on taking complex problems and creating solutions that feel simple and intuitive for our customers.

Once you are a bit more settled, we expect that you will jump into the following projects:

Service mesh & ingress security: Lead our exploration and implementation of service mesh options and harden ingress layers to defend against spam and abuse.
Incident response playbooks: Define and roll out standardised playbooks to improve clarity and speed during production incidents.
CDC layer support: Build deep familiarity with our next-gen data layer (CDC) to support new teams building on top of it.
SLO coaching & support: Help teams define, measure, and meet reliability goals—enabling engineering to own quality into production and drive better outcomes for customers.

Benefits

Pay for this role is US $133,000 (Level 2)

Responsibilities

Upgrade paths: Maintain and validate the processes that keep our Kubernetes infrastructure up-to-date, ensuring upgrades happen smoothly, safely, and regularly.

Service hygiene: Remove noisy, unused, or misfiring boot alerts and improve the team's ability to trust alerts as meaningful signals.

Service integration: Partner with engineers to configure services within our clusters and support service migrations where possible.

Kubernetes optimisation: Review and optimise usage across Kubernetes services, including right-sizing scale node specifications.

Requirements

Bash + programming language: Confident writing scripts in Bash and proficient in at least one go-to language (ideally PHP, NodeJS, or Python).

Kubernetes: Strong production experience managing and optimising Kubernetes clusters.

Terraform: Solid understanding of infrastructure as code using Terraform.

GCP: Familiarity with Google Cloud Platform, or eagerness to get up to speed quickly.

Iteration mindset: You believe in shipping value early and improving over time, not chasing one-shot perfection.

Written communication: You write clearly and concisely, whether it's documenting infrastructure, proposing changes, or sharing learnings across teams.

Timezone Preference: We’re ideally looking for someone based between UTC -5 and UTC +3 so there’s good overlap with the rest of the team for hands-on support.

Apply for Job

If you have the necessary qualifications and want to join a successful team, apply for the job.:

Quick Actions

Job Details

Location عن بعد

Address online

Job Type الكتروني

Degree N\A

Position Level N\A

Experience experience managing Kubernetes

Similar Jobs

Site Reliability Engineer

Float

Site Reliability Engineer

Site Reliability Engineer

من نحن ؟

معلومات

روابط مخصصة

النشرة الإخبارية