Razer Inc.

Senior AIOps Engineer

Razer Inc.  •  Shah Alam, MY (Onsite)  •  4 days ago
Apply
AI can make mistakes so check important info. Chat history is never stored.

Job Description

Joining Razer will place you on a global mission to revolutionize the way the world games. Razer is a place to do great work, offering you the opportunity to make an impact globally while working across a global team located across 5 continents. Razer is also a great place to work, providing you the unique, gamer-centric #LifeAtRazer experience that will put you in an accelerated growth, both personally and professionally.

Job Responsibilities :

We are seeking an experienced Senior AIOps Engineer to enhance the reliability, scalability, and operational intelligence of mission-critical payment platform infrastructure and services.

This role focuses on leveraging automation, advanced analytics, and AI-driven operational tooling to improve system observability, incident response efficiency, performance optimization, and proactive risk detection across high-throughput transaction processing environments.

The successful candidate will work closely with DevOps, SRE, Engineering, and Platform teams to design intelligent operational workflows that reduce manual intervention, improve service availability, and support continuous platform growth.

Key Responsibilities:

AIOps Platform Development & Automation

  • Design and implement intelligent automation solutions to improve operational efficiency and reduce repetitive infrastructure and application support tasks.

  • Develop tools and pipelines for automated incident triage, alert enrichment, and operational diagnostics.

  • Integrate AI/ML capabilities into monitoring, logging, and event management platforms.

  • Improve signal-to-noise ratio by optimizing alerting strategies and anomaly detection mechanisms.

  • Other duties as assigned.

Observability Engineering & Operational Intelligence

  • Enhance monitoring frameworks covering infrastructure, applications, transaction flows, and distributed system dependencies.

  • Build intelligent dashboards and predictive insights to support proactive reliability management.

  • Analyze large-scale operational datasets including logs, metrics, traces, and transaction telemetry.

  • Define and track SLIs, SLOs, and reliability indicators for critical payment services.

Incident Prediction & Reliability Optimization

  • Implement predictive models and heuristics to identify early indicators of system degradation or failure.

  • Collaborate with SRE and platform teams to automate remediation workflows and self-healing mechanisms.

  • Reduce Mean Time to Detect (MTTD) and Mean Time to Recover (MTTR) through intelligent automation and operational playbooks.

  • Contribute to resilience engineering initiatives including chaos testing and reliability simulations.

Platform Performance & Capacity Intelligence

  • Develop analytics to forecast workload growth, capacity requirements, and scaling thresholds.

  • Provide recommendations for infrastructure tuning, cost efficiency, and performance optimization.

  • Support engineering teams in identifying performance bottlenecks across compute, database, messaging, and network layers.

Security, Compliance & Governance Support

  • Ensure AI-driven operational tooling aligns with secure engineering practices and regulated environment requirements.

  • Support audit readiness through improved operational visibility and traceability.

  • Contribute to anomaly detection use cases related to infrastructure misuse or unusual operational patterns.

AI Innovation & Research Collaboration

  • Evaluate emerging AIOps tools, frameworks, and techniques for suitability in high-availability payment environments.

  • Prototype intelligent operational capabilities such as:

    • predictive incident correlation

    • automated runbook execution

    • intelligent deployment risk analysis

    • log summarization and pattern clustering

    • transaction degradation early-warning signals

  • Promote responsible AI adoption and knowledge sharing across engineering teams.

Requirements:

  • Bachelor’s Degree in Computer Science, Engineering, Data Science, or related field.

  • Minimum 5 years experience in DevOps, SRE, Platform Engineering, or Operational Analytics roles.

  • Strong understanding of distributed systems, cloud infrastructure, and reliability engineering principles.

  • Experience working with monitoring and observability platforms.

  • Familiarity with scripting or programming languages such as Python, Go, or Bash.

  • Experience analyzing operational data such as logs, metrics, or event streams.

  • Strong troubleshooting, analytical, and problem-solving skills.

Preferred Qualifications:

  • Experience in Payment Gateway, FinTech, Banking, or high-transaction systems.

  • Exposure to cloud platforms such as AWS, GCP, or Azure.

  • Experience with container platforms and orchestration environments.

  • Familiarity with messaging systems or event streaming architectures.

  • Knowledge of AI/ML tooling for anomaly detection, pattern recognition, or predictive analytics.

  • Experience supporting regulated environments such as PCI DSS.

Pre-Requisites :

Razer is proud to be an Equal Opportunity Employer. We believe that diverse teams drive better ideas, better products, and a stronger culture. We are committed to providing an inclusive, respectful, and fair workplace for every employee across all the countries we operate in. We do not discriminate on the basis of race, ethnicity, colour, nationality, ancestry, religion, age, sex, sexual orientation, gender identity or expression, disability, marital status, or any other characteristic protected under local laws. Where needed, we provide reasonable accommodations - including for disability or religious practices - to ensure every team member can perform and contribute at their best.

Are you game?

Razer Inc.

About Razer Inc.

Razer™ is the world’s leading lifestyle brand for gamers.

The triple-headed snake trademark of Razer is one of the most recognized logos in the global gaming and esports communities.

With a fan base that spans every continent, the company has designed and built the world’s largest gamer-focused ecosystem of hardware, software and services.

Razer’s award-winning hardware includes high-performance gaming peripherals and Blade gaming laptops. Razer’s software platform, with over 70 million users, includes Razer Synapse (an Internet of Things platform), Razer Chroma™ (a proprietary RGB lighting technology system), and Razer Cortex (a game optimizer and launcher).

In services, Razer Gold is one of the world’s largest virtual credit services for gamers, and Razer Fintech is one of the largest online-to-offline digital payment networks in SE Asia.

Founded in 2005 and dual-headquartered in Irvine and Singapore, Razer has 18 offices worldwide and is recognized as the leading brand for gamers in the USA, Europe and China.

Industry
Hardware & Semiconductors
Company Size
1,001-5,000 employees
Headquarters
Irvine, CA
Year Founded
2005
Website
razer.com
Social Media