Job Description
Meta's Reality Labs Wearables division is developing future products in augmented reality and virtual reality. Our work in conversational AI, computer vision, advanced optics, eye tracking, machine learning and reasoning will enable and empower consumers and businesses in new ways.
As a Linguistic Engineer (LE), you will focus on helping deliver datasets, models, and knowledge that power the ML systems across all components in the multimodal assistant product, such as ASR/TTS, NLU, NLG, Dialog, LLMs, and Knowledge Graph. The Linguistic Engineer delivers datasets, consistent models and representations across languages and areas, as well as data infrastructure that powers the voice assistant product.
A strong applicant has demonstrated technical, analytical, and collaboration skills with experience in building datasets for ML applications. Experience as an ML practitioner is useful but not required as the role focuses on the datasets and provides an opportunity to interface and work directly with ML teams as part of the greater voice assistant organization.
Experience with data tools, pipelines, and analytics is needed for this role. The ideal candidate will be multi-lingual, as the role involves building and evaluating datasets across multiple languages for the voice assistant product, have an interest in NLP and/or conversational AI systems and being at the latest advancements of their development. A plus is experience in any subfield of computational linguistics, though that is not specifically a requirement for the role.
Responsibilities
Build datasets, pipelines, and models for ML applications
* Directly support product development with rules, prompts, and data patches
* Evaluate the quality of models and product experiences and close the feedback loop
* Clearly communicate with project stakeholders
* Identify best practices and improve procedures across data systems
* Drive and deliver projects from conceptualization through launch and beyond with continual improvement and support
* Design and conduct product experiments
* Solve complex problems and embrace ambiguity to drive innovative and impactful solutions
* Manage and prioritize multiple work streams
* Collaborate effectively with cross-functional teams
Qualifications
Experience as data scientist, software engineer, computational linguist or in similar role
* Experience with programming and data analysis with languages and platforms such as Python, SQL, PHP/Hack
* Experience with text analysis, scripting, relational database, No SQL databases or similar
* Experience shipping multiple products across various platforms
* Degree in Linguistics, Computational Linguistics, Computer Science, Data Science, Information Systems, or related fields, or equivalent experience Experience with larger scripting projects that involve combining language data from different sources, computing complex metrics over large datasets, and so on
* 2 + years of work experience as data scientist/software engineer/computational linguist, with experience in machine learning and knowledge graph integrations with lexicons or ontologies
* Demonstrated ongoing AI skill development (e.g., prompt/context engineering, agent orchestration) and staying current with emerging AI technologies
* Experience adhering to and implementing responsible, ethical AI practices (e.g., risk assessment, bias mitigation, quality and accuracy reviews)
* Fluency in additional languages to support multilingual dataset development and product localization efforts
* Demonstrated ability to integrate AI tools to optimize/redesign workflows and drive measurable impact (e.g., efficiency gains, quality improvements)
* Practical knowledge of the relationship between data and machine learning models
* Advanced coursework and/or research in Linguistics, Computer Science, Data Science, Computational Linguistics, Information Systems, or related fields
* Experience designing and conducting data experiments
* Experience with version control, unit tests, and other programming best practices