Site Reliability Engineer (Contract)

hire feed • Indonesia
Remote
Apply
AI Summary

We are hiring a Site Reliability Engineer to work on a contractor basis, applying expertise to help train next-generation AI systems. The role involves leading deployment, monitoring, and recovery of complex containerized AI training environments, proactively identifying infrastructure bottlenecks, and orchestrating resilient system builds. Key requirements include terminal-native problem-solving skills, Linux and containerized environment mastery, Python proficiency, and strong collaboration abilities.

Key Highlights
Contractor position with $40-$70/hour payout
Terminal-native problem-solving with Linux/container expertise
No prior AI experience required, domain knowledge is key
Lead deployment, monitoring, and recovery of AI training environments
Key Responsibilities
Lead the deployment, monitoring, and recovery of complex, containerized AI training environments using advanced terminal techniques, ensuring stability and optimal resource utilization.
Proactively identify, diagnose, and resolve infrastructure bottlenecks and failures in long-running processes, minimizing downtime and ensuring business continuity.
Orchestrate resilient system builds and infrastructure management, collaborating closely with engineering teams to refine CI/CD pipelines and automate routine operational tasks.
Collaborate with cross-functional teams to identify and prioritize improvements to system architecture, infrastructure, and process, driving continuous growth and improvement.
Manage and optimize filesystem structure to ensure efficient data storage and retrieval, reducing latency and improving overall system performance.
Technical Skills Required
Linux Containerized environments Python Terminal-native problem-solving
Benefits & Perks
Remote work

Job Description


  • Role: Site Reliability Specialist (Remote)
  • Location: Remote (Work from Anywhere)
  • Payout: $40-$70/hour


Role Overview:

We are hiring for one of our clients, seeking a Site Reliability Engineer to work on a contractor basis. This Site Reliability Engineer will apply their expertise to help train next-generation AI systems, shaping how models learn, reason, and perform through high-quality, real-world input. With no prior experience in AI required, domain knowledge is the key to success in this role. The client is a leader in the AI industry, leveraging their platform to connect domain experts with the development of frontier AI models.


Key Responsibilities:

• Lead the deployment, monitoring, and recovery of complex, containerized AI training environments using advanced terminal techniques, ensuring stability and optimal resource utilization.

• Proactively identify, diagnose, and resolve infrastructure bottlenecks and failures in long-running processes, minimizing downtime and ensuring business continuity.

• Orchestrate resilient system builds and infrastructure management, collaborating closely with engineering teams to refine CI/CD pipelines and automate routine operational tasks.

• Collaborate with cross-functional teams to identify and prioritize improvements to system architecture, infrastructure, and process, driving continuous growth and improvement.

• Manage and optimize filesystem structure to ensure efficient data storage and retrieval, reducing latency and improving overall system performance.


Required Skills & Qualifications:

• Terminal-native problem-solving skills, with a strong understanding of Linux and containerized environments.

• Dynamic infrastructure recovery and containerized environment mastery, with experience in deploying and managing complex systems.

• Proficiency in Python, with a strong understanding of software development and testing principles.

• Strong collaboration and communication skills, with experience working with cross-functional teams to drive business outcomes.

• Ability to adapt to changing priorities and requirements, with a strong focus on delivering high-quality results under tight deadlines.


More About the Opportunity:

This role offers a unique opportunity to work with a global leader in the AI industry, leveraging their platform to connect domain experts with the development of frontier AI models. With a focus on continuous growth and improvement, this role will challenge you to think critically and creatively, driving innovation and excellence in the field of AI systems.


Equal Opportunity Employer:

We hire based on skills and expertise. All qualified candidates are welcome regardless of background, experience, or prior employment history. Applications are reviewed solely on demonstrated technical ability and qualifications.


Apply Now!


Similar Jobs

Explore other opportunities that match your interests

AI Automation Manager

Programming
•
1d ago
Visa Sponsorship Relocation Remote
Job Type Full-time
Experience Level Not Applicable

STERRY

Indonesia
Visa Sponsorship Relocation Remote
Job Type Full-time
Experience Level Mid-Senior level

Bamboo Works

Indonesia
Visa Sponsorship Relocation Remote
Job Type Volunteer
Experience Level Not Applicable

remote job network

Indonesia

Subscribe our newsletter

New Things Will Always Update Regularly