Machine Learning Operations (MLOps) Engineer (AWS/Azure) Opportunity

the institute of clever stuff company

Subscribe to our Telegram Channel

Machine Learning Operations (MLOps) Engineer (AWS/Azure) in INDIA

Remote 6 hours ago

MLOps Engineer (AWS/Azure)

Role Type: Contract, Full-Time 8-Hour Day.

Location: India (Fully Remote).

Hours: Follow the UK time zone for business hours.

Start Date: ASAP.

End Date: End of December 2025 (with potential extension). 

Day Rate: To be discussed (GBP £ only).


Process:

1.     ICS first Interview (30 minutes).

2.    CV shared with the client.

3.    Meet the client and complete a technical interview/assessment.


Context of the client

  • The client is a global energy company undergoing a significant transformation to support the energy transition.
  • We work within their Customers & Products (C&P) division, serving both B2C and B2B customers across key markets, including the UK, US, Germany, Spain, and Poland.
  • This business unit includes mobility (fuel and EV), convenience retail, and loyalty.


Context of the ICS:

At The Institute of Clever Stuff (ICS), we don’t just solve problems... we revolutionise results. Our mission is to empower a new generation of Future Makers today, to revolutionise results and create a better tomorrow. Our vision is to pioneer a better future together. We are a consulting firm with a difference, powered by AI, driving world-leading results from data and change. We partner with visionary organisations to solve their toughest challenges, drive transformation, and deliver high-impact results.


We combine a diverse network of data professionals, designers, software developers, and rebel consultants alongside our virtual AI consultant, fortu.ai, which blends human ingenuity with AI-powered intelligence to deliver smarter, faster, and more effective results.


Essential Requirements

  • 9+ years of relevant professional experience, including 5+ years in platform engineering, designing, deploying, and managing scalable, secure cloud infrastructure across both Azure and AWS.
  • Strong grounding in governance, audit, observability, and compliance for cloud-based GenAI/ML ecosystems.
  • Proven experience setting up and managing CI/CD using Azure DevOps or AWS CodePipeline.
  • Proficiency with infrastructure‑as‑code (ARM/Bicep, Terraform, CloudFormation, CDK) and containerisation (Docker, Kubernetes).
  • Advanced understanding of networking (DNS, load balancing, VPNs, VNets/VPCs) and security (IAM, RBAC, policies, SCPs).
  • Solid programming skills in Python plus scripting (BashPowerShell); familiarity with mainstream AI/ML libraries (TensorFlow, PyTorch, scikit‑learn).
  • Experience with cloud data stores and key management (Azure Blob, Cosmos DB, SQL, Key Vault; AWS S3, DynamoDB, RDS/KMS) and their integrations with AI services.


Core Technical Expertise (Must Have):

  • Azure & AWS ML/AI services: Azure ML, Azure AI Services, Azure AI Search; AWS SageMaker, AWS Bedrock, AWS Lambda.
  • GenAI & Agentic ecosystems: Exposure to Generative AI and Agentic AI ecosystems, such as Azure OpenAI, Azure AI Foundry/Hub, Bedrock, Anthropic Claude, OpenAI API, LlamaCloud, LangChain.
  • Security & identity: Azure Policy, Azure RBAC, AWS IAM, AWS SCPs; audit logging; least‑privilege design.
  • IaC & platform automation: ARM/Bicep, Terraform, CloudFormation, CDK.
  • DevOps/CI‑CD: Azure DevOps or AWS CodePipeline; integration and delivery for data science and ML workflows.
  • Data & storage: Azure Blob/Cosmos/SQL/Key Vault; AWS S3/DynamoDB/RDS; understanding of OLTP and OLAP patterns.
  • Containers & orchestration: Docker and Kubernetes (including AKS/EKS patterns and ECR/ACR usage).
  • Monitoring & observability: Grafana, Prometheus, Azure Monitor, Application Insights, Log Analytics Workspaces.
  • Networking: DNS management, load balancing, VPNs, virtual networks (VNets/VPCs).
  • Testing: Unit and integration testing as part of CI/CD (ideally on Azure DevOps).
  • ML tooling: Azure ML Studio, Python SDK (v2), CLI (v2) for monitoring, retraining, and redeployment.
  • AI safety & evaluation: Token usage comprehension; prompt injection/jailbreak risks and mitigations; Azure AI Evaluation SDK; AI red‑teaming prompt security scans.


Working Methods:

  • Agile, sprint‑based delivery with Azure DevOps (boards, repos, pipelines).
  • Strong DevOps and CI/CD pipeline management across environments.
  • Close collaboration with Data Scientists, Data Analysts, Software Engineers, and platform teams.
  • Clear documentation and communication suited to distributed teams.
  • Stakeholder engagement to troubleshoot ML pipeline issues and support modelling infrastructure needs.


Beneficial Experience:

  • Developer productivity: GitHub Copilot, Cursor, Claude Code.
  • Microsoft/Azure services: Azure Bot Framework, API Management, Application Gateway, M365 Copilot.
  • AWS SDKs & tooling: Boto3, AWS CDK.
  • Notebooks & experimentation: Jupyter Notebook.
  • ML frameworks: PyTorch, TensorFlow, scikit‑learn; practical E2E ML workflow design.


Responsibilities


Platform & Infrastructure

  • Design, deploy, and manage scalable and secure cloud infrastructure across Azure and AWS using IaC (ARM/Bicep/Terraform/CloudFormation/CDK).
  • Implement core networking (DNS, load balancing, VPNs, VNets/VPCs) and platform services for reliability and performance.
  • Build and operate container platforms (Docker, Kubernetes; ACR/AKS and ECR/EKS patterns).
  • Set up comprehensive monitoring and logging (Grafana, Prometheus, Azure Monitor, Application Insights, Log Analytics).


Security & Compliance:

  • Apply the principle of least privilege across cloud platforms (Azure RBAC, AWS IAM) and enforce policy (Azure Policy, AWS SCPs).
  • Enable audit logging and controls appropriate for GenAI/ML workloads.
  • Manage secrets and keys with Azure Key Vault and AWS KMS.


CI/CD & Testing

  • Implement CI/CD for data science/ML pipelines with Azure DevOps or AWS CodePipeline.
  • Embed robust unit and integration testing in the pipeline; champion code quality and operational readiness.


Infrastructure as Code (IaC)

  • Define and evolve cloud resources as code; review and maintain standards, patterns, and reusable modules.
  • Use Python or TypeScript where appropriate to codify infrastructure definitions.


Cloud Services (AWS & Azure)

  • AWS: RDS, DynamoDB, Redshift, Aurora; EC2 (scaling), EBS/EFS; serverless (Lambda, SQS, SNS, EventBridge, Step Functions); containers (ECR); Bedrock; SageMaker; CloudFormation (CDK); KMS.
  • Azure: Cosmos DB, Azure SQL (including Serverless); compute (VMs, Scale Sets); serverless (Functions, Event Grid/Hub, Queue Storage, Service Bus); container services (ACR/AKS); Azure Resource Manager (ARM)/Bicep; Azure Key Vault; Azure Machine Learning; Azure Data Lake Storage.


MLOps & Model Lifecycle

  • Enable production models across the ML lifecycle (deployment, monitoring for drift, retraining, technical evaluation, and business validation).
  • Implement CI/CD orchestration for data science pipelines and support model governance.
  • Collaborate with stakeholders to resolve ML pipeline issues and evolve the modelling platform.

Apply now

Subscribe our newsletter

New Things Will Always Update Regularly