Site Reliability Engineer Opportunity

Aryan Solutions Pte Ltd company

Subscribe to our Telegram & Twitter Channel

Site Reliability Engineer in SINGAPORE

Visa sponsorship & Relocation 10 months ago

Role: Site Reliability Engineer (SRE)         

Location: Singapore ( Onsite Role )

Long-Term Contract Role


Please read these Important Points before Applying for this onsite Singapore-based job:-


  • It's a long-term contract onsite Singapore-based role
  • If you are a Singapore-based candidate, you do not need to relocate anywhere and are good to go but If you are applying for this onsite Singapore role from a different country and you want to relocate to onsite Singapore then You will have to handle your accommodation by yourself only, and bring enough money for your first month's expenses, the relocation allowances will be provided by our agency Aryan Solutions after the client's approval along with your first month's salary
  • If you are applying for this onsite Singapore role from a different country, you will have to book your traveling tickets to Singapore by yourself only and ticket cost will be reimbursed later by our agency Aryan Solutions after the client's approval
  • Your Singapore work visa will be applied for by our company Aryan Solutions and you will be on our Aryan Solutions payroll


Job Purpose:-

  • The Site Reliability Engineer (SRE) combines software development and system engineering to build and run distributed solutions in a secured multi-tier heterogeneous environment to safeguard, provide, and continuously improve the software and systems behind the organization’s cloud platform solutions. 


The Job:-

  • With a vigilant eye on their availability, latency, performance, and capacity. Ultimately, you will view software as the primary tool for optimizing systems, building infrastructure, and removing mundane work through automation.                                     
  • As part of the Cloud Engineering Team, the SRE Engineer engages in and improves the full lifecycle of cloud platform solutions from design, deployment, operation, and refinement with accuracy and in compliance with organization policies and security requirements.                                               
  • The SRE Engineer treats operations as a software problem and therefore will code to automate repetitive tasks and optimize cloud operations.                                                                                                            
  • Support services before going live through activities like system design consulting, developing software platforms, and launch reviews. Maintain post-live cloud operations by measuring and monitoring availability, latency, and overall system health with any prompt and remediation actions.                             
  • Scale sustainably through mechanisms like automation and evolve services/solutions, leveraging IaaS, CaaS, and PaaS by pushing for changes that improve reliability and velocity.                                                           
  • Deploy product updates as required while implementing integrations when they arise. Specifying, documenting, and developing new product features, and writing automated scripts.                                                           
  • Work with open-source technologies, CI/CD, SCM tools as necessary, and source control such as Bitbucket, implement organization containers (e.g. Docker and Kubernetes). Stay current with industry trends and propose new ways for business improvements.                                          
  • Takes accountability in considering business and regulatory compliance risks and takes appropriate steps to mitigate the risks.                                                                                                                           
  • Maintains awareness of industry trends on regulatory compliance, emerging threats, and technologies to understand the risk and better safeguard the company.                                                               
  • Highlights any potential concerns /risks and proactively shares best risk management practices.       

                                                                                  

Our Requirements ( Must Have ):-

  • VMware Cloud Foundry solution
  • NSX-T
  • vRealize Suite
  • vSphere/vCenter


Job Scope and Responsibilities

  • Serve as a primary point responsible for the overall health, performance, and capacity of the Great Eastern VMware Cloud Foundation platform.
  • Function well in a fast-paced, rapidly changing environment where things need to be sorted in a dynamic environment
  • Experience with VMware virtualization skills is a MUST (vSphere, NSX-T, vSAN, VCF, vROPS, vRNI, vRLI)
  • Experience using and utilizing VROps, VRNI, and VRLI for troubleshooting and analysis of incidents
  • Understanding of NSX-T for configuration and using NSX-t for incident troubleshooting
  • Knowledge and ability to use NSX-T Load Balancer
  • Knowledge of renewing certificates in NSX-T
  • Able to use and configure hardware alerts using available tools (VMware/HP/PaloAlto)
  • Able to understand and use VM functions, data stores, and backup application
  • Knowledge and understanding of storage functions in VMs and ability to manage allocation and distribution of presented storage, for eg. VSAN, is required
  • Prior experience with any one of the cloud platforms - vCA, AWS, or Azure
  • Run the production environment by monitoring availability and taking a holistic view of system health
  • Measure and optimize system performance, to push our capabilities forward, getting ahead of internal customer needs, and innovate for continual improvement
  • Experience with general performance tuning and optimization of all aspects of platforms and services (systems, network).
  • Gather and analyze metrics from operating systems as well as applications to assist in performance tuning and fault finding (via vROPS, vRLI, vRNI)
  • Enforce best practices for metrics gathering, monitoring, and alerting
  • Participate in platform management, capacity planning, and incident recovery
  • Provide network administration and troubleshooting via vROPS and NSX-T
  • Perform deep dives into both systemic and latent reliability issues
  • Create sustainable systems and services through automation and uplift
  • Networking knowledge is a plus


Apply now

Subscribe our newsletter

New Things Will Always Update Regularly