DSA

HPC Support Engineer

DSA  •  Charlottesville, VA (Onsite)  •  2 months ago
Apply
AI can make mistakes so check important info. Chat history is never stored.

Job Description

All hired employees are expected to have experience with Microsoft Copilot and / or an approved equivalent AI solution.

Data Systems Analysts, Inc. (DSA) is seeking a TS/SCI cleared HPC Support Engineer to assist users executing computational workloads within secure High Performance Computing (HPC) environments. The HPC Support Engineer will work directly with engineers, analysts, and researchers to support job execution, troubleshoot workload failures, and improve the performance and efficiency of compute workloads running on HPC clusters.

The Engineer will assist users with scheduler job scripts, application execution, and workload performance troubleshooting while promoting HPC best practices for efficient cluster utilization. This role serves as the primary interface between mission users and HPC platform infrastructure teams.

This position requires strong Linux experience, scripting capability, and familiarity with distributed computing environments supporting scientific or engineering workloads.

This position is onsite in Charlottesville, VA.

Responsibilities:

  • Provide user support for computational workloads running on HPC clusters in classified and unclassified environments.
  • Assist users in developing, submitting, and troubleshooting scheduler job scripts for systems such as Slurm or PBS, including resource allocation for CPU, GPU, and distributed compute workloads.
  • Troubleshoot slow, hanging, or failing HPC jobs including MPI based distributed workloads, GPU jobs, and large scale parallel applications.
  • Support users compiling and executing scientific, modeling, or data processing applications within Linux based HPC environments.
  • Provide guidance on HPC best practices for job scheduling, compute resource allocation, and workload performance.
  • Monitor workload execution patterns and provide guidance to improve cluster throughput and resource utilization.
  • Develop scripts or tools using Bash or Python to automate common operational tasks.
  • Maintain documentation and knowledge base articles describing system capabilities, job execution procedures, and troubleshooting guidance.
  • Support performance analysis of compute workloads to identify inefficiencies or configuration issues.
  • Coordinate with HPC systems engineers when infrastructure or cluster configuration issues impact workload performance.
  • Provide responsive on site support for users executing HPC workloads in mission environments.
  • Maintain source controlled scripting and tools using Git or similar version control platforms.
  • Assist users with environment modules and runtime environments required for executing HPC applications.

Required Education, Certifications and Security Clearance:

  • BS degree in Engineering, Computer Science, or related STEM field
    • Experience may be substituted for degree
  • TS/SCI Clearance
  • Ability to obtain DoD 8140 (8570) IAT Level II certification

Required Experience/Qualifications:

  • Minimum 5 years of Linux experience including command line system usage, scripting, and troubleshooting applications in multi-user server environments.
  • Professional experience administering or supporting command line Linux systems (RHEL derivatives preferred).
  • Experience developing scripts using Bash, Python, or similar scripting languages.
  • Experience troubleshooting software execution issues in distributed computing environments.
  • Working knowledge of job scheduling systems such as Slurm, PBS, Torque, or similar platforms.
  • Experience supporting users in technical computing or engineering environments.
  • Strong troubleshooting and analytical skills.
  • Ability to communicate technical concepts clearly to both technical and non technical users.
  • Active TS/SCI security clearance.

Preferred Experience/Qualifications:

  • Experience as a user or administrator of HPC clusters.
  • Experience supporting parallel computing frameworks such as MPI, OpenMP, or CUDA based GPU workloads.
  • Experience supporting scientific or engineering applications requiring large scale compute resources.
  • Experience using performance monitoring and optimization tools for compute workloads.
  • Experience compiling applications using C, C++, Fortran, or Python based environments.
  • Experience working in classified computing environments.
  • Experience supporting GPU enabled workloads.

#DSA209

#LI-KE1

Many of DSA's positions require the ability to obtain a security clearance. Security clearances may only be granted to U.S. citizens. In addition, applicants who accept a conditional offer of employment may be subject to government security investigation(s) and must meet eligibility requirements for access to classified information. DSA is proud to be an Equal Opportunity Employer. DSA is committed to treating all employees and applicants for employment with respect and dignity and maintaining a workplace that is free from unlawful discrimination. All qualified applicants will receive consideration for employment without regard to race, color, sex, sexual orientation, gender identity, religion, national origin, disability, veteran status, or other legally protected status. DSA requires background checks , where permitted , by law. DSA is an E-Verify Employer.

DSA

About DSA

Data Systems Analysts, Inc. (DSA) has been providing mission essential solutions for Defense, Federal Government, Academia, and Commercial customers since 1963. DSA's employees excel in helping our customers achieve sensitive, mission-critical business goals and objectives. DSA is a 100 percent employee-owned company: every employee has a stake in the success of our company. Since its inception, DSA has played an important role in supporting the U.S. Government. As one of its first major programs, DSA partnered with the U.S. Department of Defense (DoD) to provide a secure and reliable messaging system for America's fighting forces and our allies. Today, we provide high-end solutions across the Federal customer spectrum, including DoD and Civil agencies. We have expanded our customer base to include Universities and private industry partners as we provide Critical Infrastructure support and monitoring.

DSA has deep expertise and comprehensive understanding of the operational, security, collaboration, and identity management challenges our customers must address. We are CMMC Level 2 (C3PAO), ISO 9001:2015 and ISO/IEC 27001:2013 registered and appraised at CMMI Maturity Level 3 for Service Projects and for Development. DSA’s service offerings include Systems Engineering and Integration, Software Development, Data Analytics, Cyber Operations and Security, Systems Modernization, Cloud Solutions, Enterprise Collaboration and Knowledge Management, and Critical Infrastructure Intelligence Systems. DSA is headquartered in Trevose, PA and has major operations in the United States and support over 60 various locations across the country.

Industry
IT & Software
Company Size
1,001-5,000 employees
Headquarters
Trevose, PA
Year Founded
1963
Social Media