This SysOps Engineer role focuses on ensuring stability, performance, and resilience of large-scale cloud and hybrid systems through continuous monitoring, incident management, and observability framework design. Key responsibilities include configuring monitoring tools, leading incident response, managing backups and disaster recovery, and collaborating with cross-functional teams to maintain high availability. Candidates must have strong Linux/Windows administration skills, experience with AWS/Azure/GCP, and proficiency with monitoring platforms like New Relic, Prometheus, and Grafana.
Key Highlights
Key Responsibilities
Technical Skills Required
Benefits & Perks
Job Description
This position is posted by Jobgether on behalf of a partner company. We are currently looking for a SysOps Engineer - Monitoring & Cloud Operations in India.
This role sits at the core of mission-critical infrastructure operations, ensuring the stability, performance, and resilience of large-scale cloud and hybrid systems. You will be responsible for continuously monitoring production environments, identifying and resolving incidents, and maintaining high availability across distributed services. Working within a fast-paced engineering organization, you will collaborate closely with cloud, DevOps, and DataOps teams to safeguard system health and optimize performance. The environment is highly production-driven, requiring strong operational discipline, rapid troubleshooting skills, and a proactive mindset toward risk prevention. You will play a key role in designing and maintaining observability frameworks, ensuring that alerts, dashboards, and monitoring tools provide actionable insights. This is a high-impact position where your work directly supports system uptime, service reliability, and business continuity.
Accountabilities
- Monitor infrastructure and production systems using observability tools such as New Relic, Prometheus, Grafana, or similar platforms, ensuring full visibility into system health.
- Configure and maintain alerts, dashboards, and service-level monitoring to proactively detect anomalies and prevent incidents.
- Lead incident management activities including troubleshooting, root cause analysis (RCA), and post-incident reporting.
- Ensure system uptime, performance, and SLA compliance across cloud and on-premise environments.
- Manage operating system-level tasks (Linux and Windows), including patching, tuning, and service management.
- Oversee backup processes and regularly validate restoration procedures to ensure data reliability.
- Execute and support disaster recovery (DR) plans, including failover/failback testing and DR drills across environments.
- Collaborate with DataOps and infrastructure teams to ensure replication integrity, system resilience, and business continuity readiness.
- Perform capacity planning, performance optimization, and infrastructure health assessments.
- Maintain operational documentation, including runbooks, monitoring guidelines, and incident playbooks.
Interested in remote work opportunities in Devops? Discover Devops Remote Jobs featuring exclusive positions from top companies that offer flexible work arrangements.
- Bachelor’s degree in Computer Science, Engineering, Information Technology, or equivalent practical experience.
- Proven experience in SysOps, Cloud Operations, SRE, or Infrastructure Support roles in production environments.
- Strong hands-on experience with Linux and Windows system administration.
- Experience using monitoring and observability tools such as New Relic, Prometheus, Grafana, Datadog, or equivalent solutions.
- Solid understanding of incident management, problem management, and root cause analysis methodologies.
- Experience working with cloud platforms such as AWS, Azure, or Google Cloud Platform.
- Strong knowledge of disaster recovery, backup strategies, and business continuity planning.
- Familiarity with infrastructure components such as virtual machines, compute instances, and physical servers.
- Understanding of web and system services such as Nginx, IIS, and systemd.
- Strong analytical and troubleshooting skills with the ability to resolve complex production issues under pressure.
- Excellent communication and collaboration skills for cross-functional coordination.
- Experience in high-availability, mission-critical environments is highly preferred.
Browse our curated collection of remote jobs across all categories and industries, featuring positions from top companies worldwide.
- Competitive compensation package aligned with experience and market standards.
- Fully remote work environment with flexible arrangements.
- Opportunity to work on large-scale, mission-critical infrastructure systems.
- Exposure to modern cloud technologies and advanced observability platforms.
- Professional growth in a fast-paced, high-impact engineering organization.
- Collaborative and cross-functional team culture.
- Involvement in disaster recovery planning, system resilience design, and cloud operations at scale.
We use an AI-powered matching process to ensure your application is reviewed quickly, objectively, and fairly against the role's core requirements. Our system identifies the top-fitting candidates, and this shortlist is then shared directly with the hiring company. The final decision and next steps (interviews, assessments) are managed by their internal team.
We appreciate your interest and wish you the best!
Why Apply Through Jobgether?
Data Privacy Notice: By submitting your application, you acknowledge that Jobgether will process your personal data to evaluate your candidacy and share relevant information with the hiring employer. This processing is based on legitimate interest and pre-contractual measures under applicable data protection laws (including GDPR). You may exercise your rights (access, rectification, erasure, objection) at any time.
We may use artificial intelligence (AI) tools to support parts of the hiring process, such as reviewing applications, analyzing resumes, or assessing responses. These tools assist our recruitment team but do not replace human judgment. Final hiring decisions are ultimately made by humans. If you would like more information about how your data is processed, please contact us.
Similar Jobs
Explore other opportunities that match your interests
Senior MuleSoft Integrations Support Engineer (Remote)
LIXIL
ML Platform Engineer (Senior)
Jobgether