Infrastructure Engineer (On-Premise) Opportunity

jan company

Subscribe to our Telegram & Twitter Channel

Infrastructure Engineer (On-Premise) in TAIWAN

Remote 1 year ago

Homebrew is an AI R&D Lab. We train our own models, are the creators and maintainers of popular open-source AI tools:

  • Jan: Desktop Copilot (>1 million downloads)
  • Cortex: Local, open-source alternative to OpenAI Platform
  • Menlo: GPU Training Cluster

We are a fully remote company. In the long term, our objective is to train useful, safe AI that helps improve humanity.


Job Description

Homebrew is looking for an Infrastructure Engineer to help run our GPU Training Cluster, internal GPU Cloud. Please note that this is an On-Premise role, as we build our own infrastructure.


Responsibilities

  • Design and maintain the organization's infrastructure, including compute and storage nodes, high-bandwidth networking infrastructure, and security and monitoring infrastructure
  • Design and maintain software for infrastructure management and orchestration (e.g. Openstack, Kubeflow, Proxmox, etc)
  • Participate in incident response and resolution to ensure high availability and performance
  • Develop and maintain solutions for day-to-day operational administration, system/data backup, disaster recovery, and security/performance monitoring.
  • Collaborate with Engineering team to implement DevSecOps practices (e.g. IAAC, CI/CD)


Requirements

  • Familiar with on-premise Infrastructure (e.g. Racks with power, storage, compute, network nodes)
  • Ability to do basic to intermediate hardware troubleshooting, servicing and repairs
  • [Plus] Experience with Slurm, Kubeflow or alternative cluster orchestration tools
  • [Plus] Experience with Openstack, VMWare, Proxmox or alternative cloud orchestrator tools
  • [Plus] Experience with designing GPU Clusters or HPC systems (inter-cluster networking)
  • [Plus] Familiarity with software-defined storage technologies (Ceph, ZFS, NFS, etc.)



Benefits

  • We pay an “all-in” pay and you will cover your own insurance/medical from the amount.
  • 14 days leave (and unlimited sick days)
  • Annual equipment budget (once 2 month probation has been completed)


Apply now

Subscribe our newsletter

New Things Will Always Update Regularly