Science Systems and Applications, Inc (SSAI)

Earth Systems Modeling Operations Analyst

Science Systems and Applications, Inc (SSAI)  •  $75k - $105k/yr  •  Greenbelt, MD / Lanham, MD (Onsite)  •  9 days ago
Apply
AI can make mistakes so check important info. Chat history is never stored.

Job Description

Science Systems and Applications, Inc (SSAI) is seeking an Operations Analyst to support the reliable and timely production of near real-time GEOS model products by monitoring operational workflows and executing approved workflow scripts within an HPC environment. This individual will ensure scheduled run cycles complete successfully and near real-time outputs are delivered according to operational timelines.

Key Responsibilities:

  • Operate during scheduled shifts and participate in on-call rotation to support near real-time product generation.
  • Monitor workflow execution for operational cycles (e.g., data staging/ingest steps, model runs, post-processing, output archive, and product distribution).
  • Execute approved workflow scripts and operational commands according to operational procedures.
  • Monitor job status and system health in scheduling tools like Slurm, PBS , and Cylc (job states, failures, retries, dependencies) and confirm expected workflow progress.
  • Perform routine operational checks:
    • Validate inputs/paths and confirm required inputs and dependencies exist
    • Inspect key logs for known error signatures
    • Run basic QC “sanity checks” on outputs as defined by operations procedures
  • Diagnose issues at the workflow level (missing inputs, scheduler issues, environment/module mismatches, missing/corrupt inputs, storage/permission problems) and initiate recovery actions per operational procedures.
  • Escalate to model/system specialists when problems exceed operator scope; provide actionable incident details (error logs, job IDs, timestamps, impacted cycles).
  • Maintain operational documentation:
    • Submit and update error tracking tickets.
    • Update web-based documentation of operational procedures.
  • Coordinate with upstream data providers on data outages, file modifications, or network issues.
  • Coordinate with downstream product users/teams to ensure timely near real-time delivery and communicate delays or expected recovery timelines.
  • Note: Operators do not modify or develop the model, but they are responsible for workflow execution, monitoring, and recovery within their authorized procedures.

Required Qualifications:

  • Bachelor's Degree (B.S.) and a minimum of 2 years related experience and/or training, or equivalent combination of education and experience.
  • Specifically, 1-3 years of Earth System Modeling operations experience in a production environment with scheduled near real-time workloads.
  • Hands-on Linux operations and troubleshooting in production:
    • Log review and diagnostics
    • Environment/module awareness
    • File system/storage space and permissions checks
    • Comfort using standard admin tools and CLIs for troubleshooting
  • Experience using a job scheduler (Slurm, PBS, Cylc, or equivalent) for monitoring and operational troubleshooting (job states, dependencies, reruns, resource/time failures).
  • Demonstrated experience supporting shift/on-call responsibilities and responding to time-critical incidents.
  • Basic knowledge of scripting/programming for operations:
    • bash/csh: workflow execution, wrapper scripts, log parsing, operational utilities
    • Python: simple tool development for status reporting, log parsing/QC automation, incident summaries
    • Perl: ability to maintain or extend existing operational scripts (at least to the level needed for troubleshooting and minor updates.
  • Workflow operations mindset with an ability to follow procedures precisely; in addition, practice safe recovery (e.g., when reruns are permitted, how to avoid data corruption or duplicate outputs)
  • Ability to inspect output presence, metadata, and perform basic sanity checks (e.g., netCDF/HDF5 familiarity at a practical level)
  • Strong attention to detail and ability to follow procedures under time pressure.
  • Clear communication during escalations (what failed, when, where, which logs/job IDs).
  • Team collaboration during cross-functional troubleshooting (operations ↔ science teams ↔ data providers and users).

Desired Qualifications:

  • Familiarity with numerical weather/climate operations concepts (cycles, near real-time product timing, typical failure modes).
  • Experience integrating lightweight monitoring/alerting (dashboards, alerts, automated status emails/messages).
  • Prior participation in incident management and structured post-incident review.
  • Running jobs in an HPC environment.
  • Sphinx: for operational document generation from rst (reStructuredText) files.

Note: The actual salary offered will be determined based on factors including experience, qualifications, tenure, skill set, availability of qualified candidates, geographic location, certifications, and other job-related criteria deemed relevant to the position.

EEO/AA Veterans and Individuals with Disabilities

Physical Requirements: While performing the duties of this job, the employee is regularly required to stand, walk, and use hands to touch, handle or feel objects, tools or controls. The employee frequently is required to talk and hear and occasionally required to reach with hands and arms and stoop, kneel, crouch, or crawl. Must regularly lift and/or move up to 10 pounds, and occasionally lift and/or move up to 25 pounds. Specific vision abilities required by this job include close vision, peripheral vision, depth perception and the ability to adjust focus.

Science Systems and Applications, Inc (SSAI)

About Science Systems and Applications, Inc (SSAI)

Science Systems and Applications, Inc. (SSAI) is a leading provider of science, engineering, and technology solutions for customers who seek new frontiers. For more than 47 years, we have been by their side, aligning with their vision and goals to provide excellent research and technical support. We support pioneers in science and engineering—such as NASA and NOAA—and we’ve made significant contributions to more than 150 Earth and space science missions. SSAI’s exceptional services are built on our genuine passion for research and innovative solutions. Our expert scientists, engineers, and IT professionals share a commitment to providing solutions for the unique needs of each client.

SCIENCE - We focus our passion for science on exploring important questions to improve the quality of life for all of us.

ENGINEERING - SSAI engineers design and build new technology to gather accurate data that keep us informed.

INFORMATION ANALYTICS - We provide advanced information technology solutions to meet the needs of our customers and their end users.

Industry
Aviation & Aerospace
Company Size
501-1,000 employees
Headquarters
Lanham, MD
Year Founded
1977
Social Media