AgileEngine is an Inc. 5000 company that creates award-winning software for Fortune 500 brands and trailblazing startups across 17+ industries. We rank among the leaders in areas like application development and AI/ML, and our people-first culture has earned us multiple Best Place to Work awards.
WHY JOIN US
If you're looking for a place to grow, make an impact, and work with people who care, we'd love to meet you!
ABOUT THE ROLE
We are looking for a
Production Support Engineer
to monitor and support production systems across a multi-account AWS environment, serving as the front line of a tiered support model for a fintech platform. You will triage incidents, execute runbooks, manage SLA performance, and coordinate with engineering, help desk, and security partners. The role includes on-call rotation and structured post-incident review with a focus on continuous operational improvement.
WHAT YOU WILL DO
- Monitor production systems and respond to alerts across infrastructure, application, and data layers;
- Perform first-level triage on incidents and support requests; escalate to developers with thorough context and diagnostics;
- Execute patching, operational tasks, and documented runbooks;
- Participate in on-call rotation and support scheduled deployments as needed;
- Conduct post-incident reviews and feed lessons back into runbooks and playbooks;
- Identify recurring issues and systemic risks before they escalate;
- Improve documentation and monitoring coverage between active support activities;
- Contribute to operational reporting and SLA dashboards;
- Manage and track SLA performance across all supported services; surface risks proactively;
- Coordinate with Help Desk / Deskside Support partner for production tasks affecting employees;
- Escalate security incidents and vulnerabilities to the vCISO partner per documented procedures.
MUST HAVES
-
3+ years in production support, SRE, NOC, or operations engineering
;
- Hands-on
AWS
experience with
EC2/ECS
, networking (
VPC
, security groups, ACLs), and
IAM
;
- Operational proficiency with
PostgreSQL
and / or
Amazon RDS
;
-
Incident triage
across infrastructure and application layers;
- Track record managing
SLAs
in a ticketed support environment such as
Jira
;
- Strong written communication for escalation and post-incident reporting;
- Upper-intermediate English level.
NICE TO HAVES
- Experience with structured incident response such as
ITIL
or
NIST
;
- Familiarity with
Datadog
,
CloudWatch
, or comparable observability platforms;
- Exposure to AWS data services including
Glue
,
S3
,
Athena
, and
EventBridge
;
- Basic IaC familiarity with
CloudFormation
,
SAM
, or
Terraform
;
- Background in financial services or regulated environments;
- AWS certification such as
SysOps Administrator
or
Solutions Architect
;
- Experience with scripting/automation to reduce manual toil.
PERKS AND BENEFITS
-
Professional growth
: Mentorship, TechTalks, and personalized growth roadmaps.
-
Competitive compensation
: USD-based pay with education, fitness, and team activity budgets.
-
Exciting projects
: Modern solutions with Fortune 500 and top product companies.
-
Flextime
: Flexible schedule with remote and office options.
Meet Our Recruitment Process
Application â Coding Challenge â Video Interview â Technical Interview or Hiring Manager Interview
Each step helps us understand your skills and overall fit.
If it’s a match, you’ll receive an offer.