Job Description
We are building a high-fidelity software platform for space. The system spans a high-performance physics simulation engine, a scientific data pipeline, a REST and event-streaming backend, and a native desktop client with 3D rendering.
This is not an architecture-only role. The Principal Software Engineer is the final technical authority and is expected to be hands-on across all layers — writing simulation code, designing data interfaces, making HPC infrastructure decisions, and being the person who can sit down, open an editor, and solve the hard problem anywhere in the system.
You will own the architecture, make and document the decisions that stick, and be directly accountable for correct implementation across the team.
What You Will Actually Do
- Own the end-to-end software architecture and be the final arbiter of all design decisions across every layer
- Write and review production-quality numerical simulation code in a compiled systems language
- Design and implement the data access layer that sits between HPC job output and the application backend — handling large binary scientific datasets, incremental writes, and on-the-fly computation at query time
- Define and enforce interface contracts between the simulation core, the data layer, the API backend, and the desktop client
- Design the HPC job lifecycle — submission, monitoring, interactive and batch execution modes, command channels, and event streaming to connected clients
- Build and maintain the application backend including real-time event fan-out, role-based access control, and report generation
- Make technology selection decisions with explicit written rationale that survives team turnover
- Write detailed design documents that junior engineers implement from — unambiguous, complete, and correct
- Mentor engineers across the stack; conduct reviews with the depth of someone who wrote the code themselves
Requirements
Orbital Mechanics and Astrodynamics
- Solid working knowledge of analytical orbit propagation — not just calling a library, but understanding what it computes, its accuracy envelope, and where it breaks down
- Hands-on experience with numerical orbit propagation using adaptive step integrators, with full perturbation models: atmospheric drag, higher-order gravity, solar radiation pressure, third-body effects
- Familiarity with standard orbit element formats, epoch handling, and the practical limits of catalogue-quality data
- Working knowledge of coordinate frame transforms and access to authoritative ephemeris sources
- Understanding of close-approach geometry and conjunction screening concepts
- Familiarity with empirical upper atmosphere models and their sensitivity to solar activity inputs
High-Performance Simulation Core (Compiled Language)
- Strong, recent hands-on experience in a compiled systems language (C++ preferred) writing numerically intensive code that must be correct first and performant second
- Shared-memory parallelism: threading models, race condition analysis, false sharing avoidance, thread-safe logging
- Memory layout design for cache-friendly access across large object populations
- Integration with third-party C scientific libraries without memory leaks or undefined behaviour
- Attitude representation: quaternion algebra, renormalisation, singularity avoidance — able to articulate from first principles why gimbal-lock representations are unacceptable in a simulation context
- Adaptive step integration: step acceptance and rejection logic, error estimation, handling of state variables that change continuously during propulsive events
- Spacecraft sensor and actuator modelling: understanding of the measurement chain from physical sensor to estimated state, and actuator dynamics including saturation behaviour
Scientific Data Pipeline
- Design and implementation of binary scientific data formats for large time-series datasets produced by long-running HPC jobs: incremental write patterns, crash safety, and reading partially written output from a concurrent consumer
- On-the-fly computation at query time over large catalogues — understanding the trade-offs between pre-computation and real-time evaluation
- Ground truth vs estimated state: designing a system where operators never see true simulation state, only what an estimation chain would produce from simulated sensor measurements
Application Backend
- Async server development: async/await patterns, event loops, long-lived connection management
- Server-Sent Events or equivalent push mechanisms: fan-out to multiple heterogeneous subscribers, backpressure handling, reconnection
- Managing long-running subprocesses from the backend: holding a subprocess handle, writing commands to its input, reading events from its output asynchronously, detecting and handling crashes
- Lightweight relational storage for operational metadata: schema design, migration strategy, single-writer concurrency constraints
- Role-based access control enforced at the API layer: route guards, token-based authentication, optional second-factor support in air-gapped environments
- Background scheduling within the application process: periodic external data refresh, job status polling
- Programmatic report generation (PDF): layout, tables, structured data rendering — not template-based
HPC Job Management
- Direct experience writing job scripts for a workload manager (SLURM or equivalent): resource allocation, node selection, job arrays
- Interactive vs batch job submission — knowing the operational difference and when each is appropriate
- Job lifecycle monitoring: status polling, sentinel-file-based completion detection, epilog handling
- Single-node multi-core parallelism within a job; understanding when intra-job distributed parallelism is unnecessary complexity
- Shared storage access from multiple compute nodes; filesystem coherency considerations
- Deploying scientific software on Linux HPC clusters: build systems, shared library management, packaging for the deployment OS
Native Desktop Client and 3D Rendering
- Native desktop application development (not web-based): signal/slot or equivalent event model, OpenGL integration via a widget, animation timers decoupled from simulation time
- Custom OpenGL rendering: vertex array objects, instanced geometry, batched buffer uploads, shader authorship — able to render tens of thousands of objects in a single draw call
- 3D Earth visualisation: geodetic ellipsoid rendering, instanced marker sprites, trajectory polyline batching, arcball or equivalent camera control
- Role-based panel visibility driven by login state within a single application — not multiple executables
- Desktop application packaging: producing an installer for analyst workstations with a configurable server address at install time
Build, Packaging, and Tooling
- Build system authorship for compiled code: dependency resolution, compiler flags, packaging for the target deployment OS
- Python environment and dependency management: deterministic lockfiles, reproducible environments across nodes
- Structured logging strategy: separating per-job logs from per-process service logs, human-readable and machine-parseable outputs, never sharing file handles across process boundaries
Benefits
We offer great career growth, ESOPs, Gratuity, PF and Health Insurance.