Job Description
Meta's Lab Infrastructure, Network, Compliance, and Security (LINCS) team is seeking a network engineer to help build and scale the network infrastructure supporting Meta's global engineering labs. Our team is responsible for network design, deployment, and operations for Meta's global engineering labs where we support multiple engineering teams. With the importance of rapidly maturing new technologies like the Metaverse and Gen AI, there are significant opportunities to re-think traditional networking and iterate quickly in our environment. This role offers an opportunity to work directly with engineering teams that are maturing new hardware and software on the path to production.
Responsibilities
Own end-to-end frontend and backend network design, deployment, and operations for AI and compute lab clusters
* Serve as a primary networking point of contact for backend fabrics, including Arista- and internally developed network OS-based scale-out networks supporting AI workloads
* Design, deploy, and support high-throughput, low-latency cluster networking, including congestion management (PFC/ECN), RDMA validation, and lossless transport
* Perform hands-on troubleshooting and root-cause analysis across L1–L4 using packet captures, telemetry, and vendor tools to resolve complex lab issues
* Support silicon, hardware, and software bring-ups, ensuring reliable connectivity and on-time validation
* Lead and execute lab network lifecycle activities, including upgrades, migrations, capacity expansions, and decommissioning across regions
* Develop and maintain network automation, configuration templates, and zero-touch provisioning (ZTP) workflows
* Create and maintain MOPs, runbooks, and readiness checklists for internal teams and vendor executions
* Provide direct consultation and training to cross-functional partners, enabling teams to operate and troubleshoot lab networks
* End-to-end ownership of projects from requirements definition through customer handoff
* Collaborate closely with hardware, software, systems, and lab operations teams to validate new platforms, optics, and network designs
* Support limited travel (about 10%) for critical lab builds, migrations, or escalations
Qualifications
Bachelor's degree in Computer Science, Computer Engineering, relevant technical field, or equivalent practical experience
* Bachelor's degree in Computer Science, Computer Engineering, a relevant technical field, or equivalent practical experience
* 6+ years of experience designing, deploying, and operating network infrastructure in production or lab environments
* Experience working in multi-vendor environments, including Arista, FBOSS-based platforms, and lab networking hardware
* Experience with configuration management, code repositories, and zero-touch provisioning (ZTP) for network infrastructure
* Experience with IPv4/IPv6, L2/L3 protocols, including STP, OSPF, BGP, TCP/IP, DHCP, DNS, VLANs, VRRP, LACP, MC-LAG, ACLs, MACsec, and EVPN/VXLAN
* Working knowledge of scripting or programming languages (e.g., Python, shell) for automation and tooling
* Demonstrated experience to operate consistently while working under your own initiative, seeking feedback and input where appropriate in a global, time-critical environment, managing multiple priorities and mission-critical timelines Understanding of physical infrastructure design, including structured cabling, space, power, and cooling systems
* Experience adhering to and implementing responsible, ethical AI practices (e.g., risk assessment, bias mitigation, quality and accuracy review)
* Networking L1 expertise in validating multi-vendor optics, with proficiency using the BCM shell and I2C utilities to troubleshoot hardware-level issues
* Experience with network automation, CI/CD pipelines, audit frameworks, and validation tooling
* Hands-on experience with backend cluster networking, including scale-out fabrics, RDMA networks, and congestion management
* Experience supporting AI/ML or high-performance compute clusters in lab or pre-production environments
* Hands-on experience with lab test equipment, optics qualification (e.g., 400G/800G), optical switches and physical infrastructure
* Experience adhering to and implementing responsible, ethical AI practices (e.g., risk assessment, bias mitigation, quality and accuracy reviews)
* Hold networking certifications such as CCIE, JNCIE or equivalent
* Demonstrated ongoing AI skill development (e.g., prompt/context engineering, agent orchestration) and staying current with emerging AI technologies
* Demonstrated ability to integrate AI tools to optimize/redesign workflows and drive measurable impact (e.g., efficiency gains, quality improvements)
* Hands-on experience with disaggregated networking products and software, such as Meta's open network OS (FBOSS), SONiC, Cumulus Linux, or equivalent open networking platforms