Job Title: Cloud Operations Lead
Location: Rockville, MD | Princeton, NJ | New York City, NY
Employment Type: Full-Time (Onsite/Hybrid as required)
Note: This is a 100% hands-on technical role. Architect-level profiles will not be considered.
We are seeking a highly skilled and technically hands-on
Cloud Operations Lead to manage and optimize our multi-cloud infrastructure, with a primary focus on
AWS environments. The ideal candidate will have extensive experience in
cloud infrastructure management,
automation, and
hands-on support for critical cloud services. You will lead day-to-day operations, ensure reliability, drive automation, and enforce best practices across our cloud environments.
Primary Focus Areas
- AWS Control Tower, Organizations, and policy management
- Multi-account deployment and governance
- Detailed expertise in AWS Backup, SSM Patching, and AMI deployments
- Automation for AMI rollout and configuration across accounts
- Deep hands-on in AWS core services: EC2, ECS, EKS, RDS, S3, CloudFront, Lambda, SageMaker, etc.
- Managing S3, SFTP, and site externalization
- Infrastructure as Code (IaC) using Terraform, CloudFormation, and Python
- Strong knowledge of IAM, access controls, and resource-based policy enforcement
Key Responsibilities
- Lead and manage cloud infrastructure operations ensuring high availability, security, and performance
- Serve as primary escalation point for cloud operational issues
- Maintain AWS environments following best practices around cost, security, and performance
- Lead and manage incident response, perform RCA, and implement preventative measures
- Design and implement cloud automation using IaC and scripting
- Mentor cloud engineers, review code/configurations, and guide operational best practices
- Implement monitoring and alerting systems for proactive issue resolution
- Ensure regulatory compliance (e.g., GDPR, HIPAA) and enforce cloud governance standards
- Drive cloud cost optimization efforts including tagging, budgeting, and forecasting
- Develop and maintain disaster recovery and business continuity plans
- Create and maintain technical documentation, SOPs, and runbooks
- Collaborate with cross-functional teams including Security, DevOps, and App Engineering
Required Qualifications
- Bachelor's degree in Computer Science, Information Technology, Engineering, or related field
- 7+ years of total IT experience, with 3+ years in cloud operations leadership roles
- Strong hands-on experience in AWS; additional exposure to Azure and Oracle Cloud (OCI) a plus
- Experience with Terraform, CloudFormation, Python, PowerShell, and related automation tools
- Proven experience in managing multi-account cloud environments, CI/CD pipelines, and backup/recovery setups
- Deep understanding of IAM, network security, encryption, and secure cloud design
- Familiarity with Windows/Linux server administration, VMWare, Active Directory, and Azure AD SSO
- Strong networking fundamentals - DNS, DHCP, LAN/WAN, and PKI
Preferred Certifications
- AWS Certified Solutions Architect - Associate or Professional (Required)
- Microsoft Certified: Azure Architect Technologies (Preferred)
- OCI Certifications (Preferred)
- ITIL Foundation or related service management experience (Preferred)
Required Technical Skills
- AWS core services: EC2, EKS, ECS, Lambda, S3
- IaC tools: Terraform, CloudFormation
- Scripting/Automation: Python (Required), PowerShell, Bash
- Experience with DevOps tools: Git, Jenkins, Ansible, CI/CD (Preferred)
Soft Skills & Traits
- Strong communication and stakeholder engagement skills
- Proven ability to lead, mentor, and manage technical teams
- Analytical mindset with strong troubleshooting and root cause analysis abilities
- Ability to handle high-pressure incidents with calm, structured responses
- Driven by continuous improvement and automation
Skill Matrix Template -
Full Name:
Degree Major with University and Completion Year:
Total Experience in Cloud Infrastructure / IT Operations:
Total Experience in AWS Cloud Operations (Hands-on):
Which AWS Services have you extensively worked with? (e.g., EC2, EKS, ECS, RDS, Lambda, S3, CloudFront, SageMaker, etc.):
Experience with AWS Organizations / Control Tower / SCP Policies:
Experience with Multi-Account Management and Deployment (e.g., Config pushing, AMIs):
Experience with AWS S3, SFTP & Site Externalization Methods:
AWS Backup and SSM Patching Process Experience (Detail your involvement):
Experience with AMI Creation, Deployment, and Configuration across Accounts:
Infrastructure as Code (IaC) Proficiency: (Terraform, CloudFormation - Please describe experience & tools used):
Python Scripting Experience:
Experience in Incident and Problem Management (RCA, Incident Communication):
Cloud Monitoring and Reporting Tools Used (e.g., CloudWatch, Dynatrace, PowerBI):
Experience in Leading Cloud Teams or Managing Technical Engineers:
Experience with Security in Cloud (IAM policies, Access Management, Encryption, Compliance):
Experience with Disaster Recovery / Business Continuity Planning in Cloud Environments:
Have you worked in a multi-cloud environment (AWS, Azure, OCI)?
Experience with CI/CD and DevOps Toolchains (e.g., Git, Jenkins, Ansible, etc.):
Experience with VMWare, Active Directory, Azure AD SSO Integration:
Networking Experience (DNS, DHCP, LAN/WAN, PKI, etc.):
Cloud Certifications (e.g., AWS Certified Solutions Architect, Azure, OCI):
Motivation/Reason for Interest in This Role:
Motivation/Reason for Relocation (if not local to job location):
Contact Number:
Email ID:
LinkedIn Profile URL:
Full Address (Street, City, State, ZIP Code):
Notice Period (in weeks):
Current Work Authorization Status (e.g., US Citizen, Green Card, H1B, etc.):
Expected Salary:
Are you willing to relocate at your own expense and work hybrid at the specified location Rockville, MD / Princeton, NJ / NYC, NY? (Yes/No):