FPT Software

G13 - Operations Support Engineer

FPT Software  •  Singapore, SG (Onsite)  •  2 months ago
Apply
AI can make mistakes so check important info. Chat history is never stored.

Job Description

Responsibilities:

  • Design & own service observability usage model: ensure all service metrics, logs, traces flow into Elastic Cloud (authoritative); maintain dashboards & SLOs; evaluate pragmatic use of CloudWatch, AWS Managed Prometheus / Grafana for supplemental or fallback views.
  • Build proactive, noise‑reduced alerting and incident response playbooks; drive post‑incident RCA & remediation tracking (closure SLA).
  • Optimize service performance (profiling, caching layers, autoscaling heuristics, concurrency tuning) meeting latency & throughput targets.
  • Implement secure supply chain & runtime controls (image scanning, SBOM consumption, secrets management, TLS / mTLS) leveraging shared platform tooling.
  • Curate operational runbooks, golden dashboards, reliability readiness + production readiness checklists.
  • Integrate model / guardrail service telemetry (latency, queue depth, GPU/CPU utilization) into unified Elastic Cloud views.
  • Support compliance & audit evidence collection (access logs, config lineage, change histories) via automated evidence capture fed into Elastic.
  • Introduce configuration drift detection & policy-as-code guardrails (OPA / Kyverno) at the workload / namespace layer to enforce baseline controls.
  • Mentor engineers on production readiness, observability patterns, and operational excellence; evolve on-call playbooks.
  • Participate in (and improve) an equitable on-call rotation focusing on sustainable alert volumes & burnout prevention.

Requirements

  • 4+ years (or equivalent impact) in SRE / Production Ops / Platform / Reliability for SaaS or high-throughput services.
  • Working knowledge of AWS & Kubernetes (deployment, troubleshooting, networking concepts) sufficient to collaborate effectively with platform owners (not necessarily owning cluster upgrade orchestration).
  • Familiarity with Infrastructure as Code & GitOps (Terraform, Argo, etc.) to consume modules, review changes, and enforce policy.
  • Observability implementation & usage (metrics, logs, traces, profiling) with Elastic Cloud; understanding of Prometheus / OpenTelemetry concepts.
  • Proven on-call & incident management experience (triage, MTTR reduction, RCA authorship).
  • Scripting / automation in Python, Bash, or Go for ops tooling.
  • Security & compliance aware: vulnerability management, image scanning, supply chain controls.
  • Clear, concise communication of operational risk & trade-offs to technical + non-technical stakeholders.
FPT Software

About FPT Software

FPT Software, a subsidiary of FPT Corporation, is a global technology and IT services provider headquartered in Vietnam, with USD 1.22 billion in revenue (2024) and over 33,000 employees in 30 countries.

Embracing an AI-first approach, FPT Software enables breakthrough speed, scalability and quality through AI-powered services and solutions and an AI-augmented workforce. It has partnered with over 1,100 clients worldwide, more than 130 of which are Fortune Global 500 companies in Aviation, Automotive, Banking, Financial Services and Insurance, Healthcare, Logistics, Manufacturing, Utilities, and more.

For more information, please visit https://fptsoftware.com/.

Industry
IT & Software
Company Size
10,000+ employees
Headquarters
Hanoi, VN
Year Founded
Unknown
Social Media