Job Description
JOB CODE
TEC4017
Staff Software Engineer
MANAGEMENT LEVEL
6 Senior Manager
DISCIPLINE
Software Engineering
REPORTS TO
Director of Engineering
FSLA CLASSIFICATION
Exempt
The Staff Software Engineer provides technical leadership for complex, scalable systems. This role drives architecture, code quality, and reliability across critical services while mentoring engineers and aligning solutions with business outcomes. The engineer partners with product, security, and operations to design resilient platforms, reduce risk, and accelerate delivery. Success requires hands-on development, thoughtful tradeoffs, and clear communication that advances engineering standards and unlocks team effectiveness.
Staff Software Engineer – Enterprise Event Streaming
Location: Nashville, TN or Sterling, VA
We are an Enterprise Event Streaming platform team. We own the Kafka-based substrate that every product domain in the company uses to exchange data — the brokers, the producer and consumer services, the stream-processing applications, the client libraries, and the developer tooling that makes it self-serve.
This year's largest initiative is a multi-quarter modernization of our managed-Kafka topology that simultaneously rethinks our network posture, our payload-level security model, and our approach to event-contract enforcement.
As a Staff Software Engineer, you will provide technical leadership for these complex, scalable systems. This role drives architecture, code quality, and reliability across our critical services while mentoring engineers and aligning solutions with business outcomes. You will partner with product, security, operations, and consumer teams to design resilient platforms, reduce risk, and accelerate delivery. Success requires hands-on development, thoughtful tradeoffs, and clear communication that advances engineering standards and unlocks team effectiveness.
Core Responsibilities
- Platform Operations & Modernization: Design, build, and operate Kafka-based streaming services and stream-processing applications running 24x7 in multi-cloud production. Lead an end-to-end stream of the platform modernization initiative — sequence the cutover, prove equivalence, and partner with the consumer teams in the blast radius.
- Event Governance & Architecture: Lead system architecture and evolve event governance, including topic conventions, access control, encryption posture, schema/contract evolution, and GitOps tooling. Author written design records for non-trivial decisions and contribute to architectural reviews.
- Developer Experience: Improve DX for internal customers through client libraries, self-serve tooling, and onboarding automation that lets a new team start producing without filing a ticket.
- Technical Leadership & Quality: Guide engineers through complex implementation decisions, elevate code quality with rigorous reviews, and mentor team members with constructive feedback.
- Reliability & Incident Management: Improve system reliability through observability and automation. Carry the on-call rotation for the full platform, diagnosing unfamiliar failure modes and authoring actionable runbooks.
Technology Stack
Kotlin • Java • Spring Boot • Apache Kafka • Apache Flink • AWS • GCP • Kubernetes • Terraform • TypeScript
On-Call Expectations
This is a platform engineering role. The on-call rotation covers the entire platform we operate today — including services, stream-processing apps, connectors, client libraries, and operational tooling that pre-date your joining. Candidates who are not comfortable reading code to diagnose unfamiliar failure modes, authoring runbooks, and being a first responder for production systems will not be a fit.
Minimum Qualifications
- Education: Bachelor's degree in Computer Science, a related technical field, or equivalent practical experience.
- Experience: 8+ years of professional software engineering experience, with a proven track record of delivering production systems.
- Language Expertise: 5+ years of software engineering experience in a JVM language (Java or Kotlin).
- Distributed Systems: 3+ years of experience designing, building, and operating distributed systems or streaming systems in production.
- Kafka Mastery: Hands-on production experience with Apache Kafka — partitioning, consumer-group rebalances, idempotent producers, transactional writes, retention, and compaction.
- Cloud & Infrastructure: Experience with at least one major public cloud (AWS, GCP, or Azure) and infrastructure-as-code (Terraform).
- Operations: Experience supporting 24x7 production systems on a rotating on-call schedule, including the triage of services you did not author.
Preferred Qualifications
- Education: Master's degree in a relevant discipline.
- Prior Impact: Prior impact as a Staff-level engineer driving cross-team technical change (written communication plans, partner-team office hours, deprecation enforcement).
- Stream Processing: Production experience with a stream-processing framework — Apache Flink, Kafka Streams, or comparable.
- Contract Design: Schema-based event-contract design in production — Avro, Protobuf, or JSON Schema — including backward and forward compatibility.
- Complex Migrations: Direct experience migrating production workloads at scale (broker change, datastore swap, encryption-stack change, network re-architecture), including cutover sequencing and rollback design.
- Managed Kafka: Operational experience with a managed Kafka offering — cluster sizing, private networking, ACL administration, schema-registry administration.
- Additional Tech: Experience with reactive programming on the JVM (Project Reactor, RxJava), working knowledge of Kubernetes and GitOps deployment patterns, and TypeScript/Node.js for serverless components.
Expected Technical Competencies
- Data Engineering & Systems Design: Designing and managing data infrastructure, pipelines, and processing systems to support scalable analytics and multi-cloud workloads.
- APIs & Integration: Designing and implementing APIs that enable seamless communication between systems and platforms.
- DevOps & CI/CD: Integrating development and operations to enable continuous delivery, utilizing containerization (Kubernetes) and test automation.
- Security Engineering: Incorporating security controls and safeguards into network postures and payload-level security models.
Leadership & Soft Skills
- Influences without authority across teams; drives consensus with pragmatic decision-making.
- Communicates complex ideas clearly and concisely.
- Balances short-term delivery and long-term technical health.