Ingram Micro

Lead AI Engineer

Ingram Micro  •  Republic of India (Onsite)  •  1 month ago
Apply
AI can make mistakes so check important info. Chat history is never stored.
71
AI Success™

Job Description

It's fun to work in a company where people truly BELIEVE in what they're doing!

Deployment & Infrastructure Management:

  • Deploy, configure, and manage AI models, agentic systems, and supporting infrastructure in cloud (e.g., GCP) and on-premise environments.
  • Implement and maintain CI/CD pipelines for AI/ML models and agentic applications (MLOps/Agent Ops).
  • Manage and optimize cloud resources, ensuring cost-effectiveness and scalability for AI workloads.
  • Collaborate with infrastructure teams to ensure network, storage, and compute resources meet the demands of AI systems.​

Monitoring, Logging & Alerting:

  • Develop and implement comprehensive monitoring, logging, and alerting solutions for AI agents and infrastructure to ensure high availability and performance.
  • Proactively identify and address potential issues, performance bottlenecks, and anomalies in production AI systems.
  • Track key operational metrics and create dashboards for system health and performance.

Incident Response & Troubleshooting:

  • Provide operational support for production AI systems, including incident response, root cause analysis, and resolution of technical issues.
  • Develop and maintain runbooks and standard operating procedures for common operational tasks and incident management.
  • Participate in on-call rotations as needed to support critical AI services.

Automation & Operational Excellence:

  • Automate routine operational tasks, deployment processes, and system maintenance activities using scripting (e.g., Python, Bash) and automation tools.
  • Contribute to the development and enforcement of operational best practices, security standards, and compliance requirements for AI systems.
  • Work with development teams to improve the deployability, manageability, and observability of AI applications.

Collaboration & Documentation:

  • Collaborate effectively with AI developers, data scientists, AI architects, and other stakeholders to ensure smooth transitions from development to production.
  • Maintain clear and comprehensive documentation for system configurations, operational procedures, and troubleshooting guides.
  • Provide feedback to development teams on operational aspects and system performance.

Preferred Qualifications & Experience:

  • Bachelor’s degree in Computer Science, Information Technology, Engineering, or a related technical field.
  • 4-7+ years of experience in a MLOps or Agent Ops role, preferably supporting AI/ML or data-intensive applications.
  • Hands-on experience with cloud computing platforms (e.g., Google Cloud Platform - especially Vertex AI) and managing cloud-based infrastructure.
  • Proficiency in scripting languages such as Python, Bash, or PowerShell for automation.
  • Experience with CI/CD tools and practices (e.g., Bitbucket, GitLab CI, GitHub Actions).
  • Familiarity with containerization technologies (e.g., Docker, Kubernetes) and orchestration.
  • Experience with monitoring and logging tools (e.g., Prometheus, Grafana, ELK Stack, Datadog, Google Cloud Monitoring, Langfuse).
  • Understanding of networking concepts, security best practices, and infrastructure-as-code (IaC) principles (e.g., Terraform, Ansible).
  • Strong troubleshooting and problem-solving skills with an analytical mindset.
  • Excellent communication skills and ability to work collaboratively in a team environment.
  • A proactive approach to identifying and resolving issues and improving system reliability.
  • Master's degree in a relevant field.
  • Specific experience in MLOps or Agent Ops, including deploying and managing machine learning models or large language model applications in production.
  • Familiarity with AI/ML frameworks and libraries (e.g., TensorFlow, PyTorch, scikit-learn).
  • Understanding of agentic AI concepts and the operational challenges they present.
  • Experience with managing vector databases or other specialized data stores for AI.
  • Knowledge of data pipeline tools (e.g., Apache Airflow, Kubeflow Pipelines).
  • Relevant cloud certifications (e.g., Google Cloud Professional ML Engineer).
  • Experience working in an agile development environment.

Why Join Us?

  • Play a critical role in operationalizing cutting-edge Agentic AI and AI systems for a global industry leader.

  • Gain hands-on experience with the latest MLOps, Agent Ops, and cloud technologies.
  • Work in a dynamic, innovative, and collaborative AI Center of Excellence.
  • Opportunity to significantly impact the reliability and efficiency of transformative AI solutions.
  • Competitive salary, bonus, and benefits package.
Ingram Micro

About Ingram Micro

Ingram Micro is a leading technology company for the global information technology ecosystem. With the ability to reach nearly 90% of the global population, we play a vital role in the worldwide IT sales channel, bringing products and services from technology manufacturers and cloud providers to a highly diversified base of business-to-business technology experts. Through Ingram Micro Xvantage™, our AI-powered digital platform, we offer what we believe to be the industry’s first comprehensive business-to-consumer-like experience, integrating hardware and cloud subscriptions, personalized recommendations, instant pricing, order tracking, and billing automation. We also provide a broad range of technology services, including financing, specialized marketing, and lifecycle management, as well as technical pre- and post-sales professional support. Learn more at www.ingrammicro.com.

Industry
IT & Software
Company Size
10,000+ employees
Headquarters
Irvine, CA
Year Founded
1979
Social Media