Amazon

Server Lab Engineer , ML-IL

Amazon  •  Tel Aviv, IL (Onsite)  •  12 days ago
Apply
AI can make mistakes so check important info. Chat history is never stored.

Job Description

Machine Learning Israel (MLIL), as part of Annapurna Labs / Amazon, is hiring a Lab Engineer to own and operate the labs that powers the bring-up and validation of our next-generation ML training and inference racks. In this role you will build, maintain, and continuously evolve the lab infrastructure — from bench setups to server racks — used daily by HW, FW, and SW engineers. You will be the go-to person for delivering working, instrumented setups that the R&D teams can pick up and run with.

Key job responsibilities
• Own the MLIL hardware lab in the Tel-Aviv office: physical layout, power and cooling budget, network topology, cabling, asset tracking, and day-to-day operations.
• Build, configure, and connect new lab setups for HW, FW, and SW engineers — including Servers, GPU sleds, PCIe switches, retimers, NICs, and DRAM modules — and deliver them ready for R&D use.
• Administer and maintain Linux-based servers and systems, including installation, configuration, and optimization
• Manage and configure network services such as DHCP, PXE, and other critical infrastructure components.
• Run sanity tests on every delivered setup — boot, PCIe enumeration, basic DRAM check, network reachability — so R&D teams pick up a known-good baseline and can focus on their work.
• Write and maintain automation scripts (Python / Bash) for repetitive lab tasks — power cycling, log collection, provisioning, imaging, test-harness setup.
• Procure, inventory, and manage lab equipment: bench PSUs, scopes, protocol analyzers, thermal chambers, JTAG debuggers, cables, and fixtures.
• Triage lab-level issues (power, network, cabling, imaging) to unblock R&D fast; escalate deep HW / FW / SW debug (e.g., RDMA / GPU / EFA internals) to the relevant specialist teams.

Basic Qualifications


- 3+ years experience as a System-Admin/Lab Engineer or in a similar role
- Knowledge of Linux operating systems and server administration
- Solid understanding of networking fundamentals — Ethernet, TCP/IP, link-layer debug, switch / NIC configuration.

Preferred Qualifications

- Proven hands-on experience with lab instrumentation: scopes, logic analyzers, protocol analyzers, bench PSUs, JTAG / BMC debug.
- B.Sc in Electrical / Electronics / Computer Engineering, or a Practical Engineer diploma (הנדסאי) with hands-on experience.
- Solid understanding of PCIe — enumeration, link training, lane configuration, error reporting (AER), and common debug flows.
- Experience with BMC / BIOS / UEFI debug, IPMI, Redfish.
- Experience with high-speed serial debug — SerDes, equalization, eye diagrams, BER testing.
- Proficient in Python / Bash automation and willing to write production-grade lab tooling.

Our inclusive culture empowers Amazonians to deliver the best results for our customers. If you have a disability and need a workplace accommodation or adjustment during the application and hiring process, including support for the interview or onboarding process, please visit https://amazon.jobs/content/en/how-we-hire/accommodations for more information. If the country/region you’re applying in isn’t listed, please contact your Recruiting Partner.
Amazon

About Amazon

Amazon is guided by four principles: customer obsession rather than competitor focus, passion for invention, commitment to operational excellence, and long-term thinking. We are driven by the excitement of building technologies, inventing products, and providing services that change lives. We embrace new ways of doing things, make decisions quickly, and are not afraid to fail. We have the scope and capabilities of a large company, and the spirit and heart of a small one.

Together, Amazonians research and develop new technologies from Amazon Web Services to Alexa on behalf of our customers: shoppers, sellers, content creators, and developers around the world.

Our mission is to be Earth's most customer-centric company. Our actions, goals, projects, programs, and inventions begin and end with the customer top of mind.

You'll also hear us say that at Amazon, it's always "Day 1."​ What do we mean? That our approach remains the same as it was on Amazon's very first day - to make smart, fast decisions, stay nimble, invent, and focus on delighting our customers.

Industry
IT & Software
Company Size
10,000+ employees
Headquarters
Seattle, WA
Year Founded
Unknown
Social Media