Ness Digital Engineering

IT AI Operations Engineer

Ness Digital Engineering  •  Timişoara, RO (Onsite)  •  2 hours ago
Apply
AI can make mistakes so check important info. Chat history is never stored.

Job Description

Job ID 8783

Why Ness

We know that people are our greatest asset. Our staff’s professionalism, innovation, teamwork, and dedication to excellence have helped us become one of the world’s leading technology companies. It is these qualities that are vital to our continued success. As a Ness employee, you will be working on products and platforms for some of the most innovative software companies in the world.

You’ll gain knowledge working alongside other highly skilled professionals that will help accelerate your career progression.

You’ll also benefit from an array of advantages like access to trainings and certifications, bonuses, and aids, socializing activities and attractive compensation.

Requirements and responsibilities

Our client is running a growing internal AI platform that powers intelligent automation across the enterprise. As IT AI Operations Engineer, you are the person who keeps that platform healthy in day-to-day use, administering, supporting, and monitoring the AI solutions that the Architecture and Development teams design and the DevOps team provisions.

Where the DevOps team builds the pipelines and infrastructure, you operate what runs on top of it. You own the administration, support, and monitoring of the live AI services, measure the KPIs that show the platform is delivering value, and act as the first point of contact for users when something needs attention.

A core part of your mission is enabling business users: helping them publish and deploy their own self-service AI deliverables (agents, assistants, automations, and dashboards) safely and within governance, so the platform scales adoption without scaling friction. If you enjoy operating production systems, supporting people, and turning telemetry into insight, this is that role.

What You’ll Do

You will operate and support our client's AI platform in production, administer its components, monitor health and usage, measure KPIs, and empower business users to deploy their own self-service AI deliverables, working closely with the Development, DevOps, and Security teams.

  • Platform Administration & Operations

- Administer the live AI platform components built by the Development and DevOps teams: AI agents, the Models Gateway, RAG services, connectors, and self-service tooling.
- Manage day-to-day operational tasks: user and access administration, configuration changes, license and quota management, and routine maintenance.
- Operate within established guardrails and, raising change requests to DevOps rather than modifying infrastructure directly.
- Maintain operational runbooks, standard operating procedures, and a knowledge base for recurring tasks.

  • Monitoring, Health & Incident Support

- Monitor platform health, availability, and performance using the dashboards and observability stack (Azure Monitor, Application Insights, Log Analytics, Grafana).
- Act as first-line and second-line support for AI platform issues: triage, diagnose, resolve, or escalate to Development / DevOps with clear context.
- Own the incident and ticket lifecycle (e.g. in ServiceNow): logging, prioritization, communication, resolution, and follow-up.
- Track recurring issues and feed improvement requests back to the engineering teams.

  • KPI Measurement & Reporting

- Define, measure, and report the KPIs that show platform value: adoption, active users, usage by team, token and cost consumption, deflection / automation rates, latency, and reliability.
- Build and maintain operational and executive dashboards that turn telemetry into clear, decision-ready insight.
- Produce regular service reports (usage, SLAs, cost, satisfaction) for stakeholders and leadership.
- Surface trends and anomalies, and recommend actions to improve adoption, performance, and cost efficiency.

  • Business-User Enablement & Self-Service

- Help business users publish and deploy their own self-service AI deliverables, agents, assistants, automations, and dashboards, safely and within governance.
- Provide hands-on onboarding, guidance, and office hours; review self-service submissions for compliance and quality before they go live.
- Maintain enablement materials: how-to guides, templates, training sessions, and FAQs.
- Champion adoption across business units, gathering feedback and translating it into platform improvements.

  • Governance, Security & Compliance Support

- Operate the platform in line with security and governance policies: access reviews, usage policies, data-handling rules, and audit logging.
- Partner with the Security team on monitoring for misuse, policy violations, and compliance reporting.
- Manage user access and permissions following least-privilege, and support periodic certification and audits.
- Ensure self-service deployments meet approval and guardrail requirements before release.

  • Continuous Improvement & Collaboration

- Work closely with the Development and DevOps teams as the operational voice of the platform, feeding real-world usage and pain points into the roadmap.
- Contribute to automation of repetitive operational tasks to reduce toil and improve response times.
- Participate in on-call / support rotation for business-hours (and critical after-hours) platform support.
- Continuously refine support processes, KPIs, and enablement based on user feedback and service data.

What You’ll Bring

  • 3+ years in IT operations, application support, platform operations, or a similar service-oriented role;
  • Hands-on experience operating and monitoring cloud services (Azure preferred): reading dashboards, logs, and metrics to diagnose issues;
  • Familiarity with observability and monitoring tools: Azure Monitor, Application Insights, Log Analytics, or Grafana;
  • Experience with IT service management and ticketing (e.g. ServiceNow): incident, request, and change processes;
  • Strong support and customer-service mindset, with the ability to explain technical topics to non-technical business users;
  • Understanding of identity and access administration: Entra ID, RBAC, least-privilege, and access reviews;
  • Ability to define and report KPIs and build clear dashboards and service reports;
  • Comfort working with AI / LLM-based products as an operator and enabler (agents, assistants, RAG, copilots);
  • Fluency in English.

Nice to have

  • Experience supporting or administering AI / LLM platforms, chatbots, or automation tooling (e.g. Claude, Copilot, n8n, Power Platform);
  • Basic scripting for automation of operational tasks (Python, PowerShell, or Bash);
  • Familiarity with FinOps concepts: usage tracking, cost dashboards, and chargeback / showback;
  • Exposure to data visualization tools (Power BI, Grafana) for KPI and adoption reporting;
  • Understanding of AI governance, responsible-AI, and data-handling practices in an enterprise;
  • ITIL foundation or equivalent service-management knowledge;
  • We are hiring across the band, AI Operations Engineer (L2–L3) and Senior AI Operations Engineer (L3–L4), calibrated on experience.

Not checking every single requirement?

If this role sounds good to you, even if you don’t meet every single bullet point in the job description, we encourage you to apply anyway. For most of the candidates that applied, we found a role that was a very good fit with their skills.

Let’s meet and you may just be the right candidate for one of our roles.

At Ness Digital Engineering we are willing to build a work culture that is based on diversification, inclusion, and authenticity.

Ness Digital Engineering

About Ness Digital Engineering

Ness Digital Engineering is a global provider of Intelligent Data and Software Engineering services, specializing in data, AI, and cloud-powered solutions that drive innovation and deliver measurable business outcomes. With over 25 years of engineering expertise and our proprietary data and software platforms and accelerators, Ness helps enterprises modernize systems, accelerate product development, and achieve scalable impact with speed and precision.

Our differentiation lies at the intersection of Industry domain and Technology. We focus on clients across financial services, technology and ISVs, Media & Entertainment, manufacturing, transportation and retail. Ness is known for its delivery excellence, deep domain knowledge, and product engineering expertise. Ness Digital Engineering is a portfolio company of KKR, a leading global investment firm.

Industry
IT & Software
Company Size
5,001-10,000 employees
Headquarters
New York, NY
Year Founded
1999
Website
ness.com
Social Media