Job Description

Data Engineer-(CREQ259206)

We are looking for a skilled and motivated Python Data Engineer with strong expertise in SQL, AWS, and Redshift to design, develop, and optimize scalable data processing solutions. The ideal candidate should also have exposure to AI/LLM data preparation techniques including text normalization, chunking, embedding generation, and vectorization.

The role involves working on cloud-based data platforms, large-scale data pipelines, and modern AI-enabled search/retrieval solutions.

Key Responsibilities
Design and develop scalable backend and data processing applications using Python.
Build and optimize complex SQL queries, ETL pipelines, and data transformation workflows.
Work with AWS services such as S3, Lambda, Glue, EC2, IAM, and Redshift.
Develop and maintain data warehousing solutions using Redshift.
Process structured and unstructured data for analytics and AI use cases.
Implement text preprocessing techniques including
Data normalization
Chunking
Embedding generation
Vectorization
Work with vector databases and semantic search concepts (good to have).
Collaborate with cross-functional teams including Data Engineers, AI/ML teams, and Business Analysts.
Ensure data quality, performance optimization, scalability, and security best practices.
Participate in code reviews, debugging, testing, and deployment activities.
Required Skills
Strong programming experience in Python.
Strong SQL knowledge including query optimization and data modeling.
Hands-on experience with AWS cloud services.
Experience working with Amazon Redshift.
Understanding of ETL/Data Pipeline development.
Experience with APIs and data integration.
Good analytical and problem-solving skills.
Good to Have Skills
Knowledge of Generative AI / LLM data preparation concepts.
Experience in
Text normalization
Chunking strategies
Embedding models
Vectorization techniques
Exposure to vector databases such as Pinecone, FAISS, ChromaDB, or Weaviate.
Familiarity with LangChain or Retrieval-Augmented Generation (RAG) concepts.
Knowledge of Docker, CI/CD, and Git.
Preferred Qualifications
Bachelor’s/Master’s degree in Computer Science, IT, Data Science, or related field.
Experience working in cloud-native data engineering environments.
Prior experience in AI-enabled analytics or search platforms is a plus.

Primary Location

IN-AP-Hyderabad

Schedule

Full Time

Employee Status

Individual Contributor

Job Type

Experienced

Travel

Job Posting

03/06/2026, 11:28:18 AM

About Virtusa

Virtusa is a global product and platform engineering services company that makes experiences better with technology. We help organizations grow faster, more profitably, and more sustainably by reimagining enterprises through domain-driven solutions. We combine strategy, design, and engineering, backed by unmatched expertise at the intersection of industry, business, and technology to generate real-world business impact for clients.

Headquartered in Massachusetts with global delivery centers, Virtusa provides a broad range of services, solutions, and assets, including strategy and design, AI advisory and services, digital engineering, data and analytics, digital assurance, cloud and security, cx transformation and managed services across industries such as financial services, healthcare, communications, media, entertainment, travel, manufacturing, and technology.

Industry

IT & Software

Company Size

10,000+ employees

Headquarters

Southborough, MA

Year Founded

1996

Website

virtusa.com

Social Media