Director, AI Research Computing
Empire AI is establishing New York as the national leader in responsible artificial intelligence. Backed by a consortium of top academic and research institutions including Columbia, Cornell, NYU, CUNY, RPI, SUNY, Rochester Schools, Mount Sinai, Simons Foundation, and the Flatiron Institute.
By leveraging the state’s rich academic resources and research institutions, Empire AI is driving innovation in fields like medicine, education, energy, and climate change, all while giving New York’s researchers access to computing resources that are often prohibitively expensive and only available to big tech companies, fueling statewide innovation, driving economic growth, and preparing a future-ready AI workforce to tackle society’s most complex challenges.
The initiative is funded by $500+ million in public and private investments, State Capital Grant, Academic Institutions, Simons Foundation, Flatiron Institute, and Tom Secunda (Co-Founder of Bloomberg).
Position Summary
The Director, AI Research Computing oversees the deployment and operations of Empire AI’s high-performance and AI computing infrastructure. This includes coordinating data center operations, ensuring system uptime, leading systems teams, and providing strategic direction for compute platforms.
Reporting to the Consortium Executive Director, the Director is responsible for overseeing the operations and evolution of Empire AI’s multi-institutional compute platforms including GPU clusters, federated storage systems, and hybrid cloud architectures. This role ensures the secure, scalable, and sustainable operation of critical research infrastructure, enabling groundbreaking work in machine learning, large-scale analytics, and responsible AI development. The Director also leads a distributed team of engineers and contributes to infrastructure planning, compliance strategy, and the design of technical standards that serve as the foundation of Empire AI’s mission.
Duties and Responsibilities
- Lead the design and coordination of cross-institutional HPC and AI infrastructure strategy.
- Align technical operations with the Empire AI strategic roadmap, funding goals, and emerging research priorities.
- Represent infrastructure planning in consortium governance forums and SUNY/CUNY technical leadership groups.
- Oversee technical architecture and lifecycle management of core GPU clusters, HPC systems, and federated storage environments across consortium institutions.
- Ensure reliable, scalable, and optimized system operations, including hybrid cloud/on-prem models.
- Coordinate workload distribution, performance tuning, and site-specific system enhancements.
- Design and maintain secure, compliant computing environments in accordance with HIPAA, NIST 800-171, NYS cybersecurity policies, and federal research mandates.
- Liaise with institutional research security officers to support regulated research and ethical AI workflows.
- Implement standardized monitoring, auditing, and access control protocols.
- Provide leadership in platform readiness for large-scale AI workloads, including support for LLM training, model inference, and advanced analytics pipelines.
- Evaluate and implement tools and workflows tailored to machine learning, reinforcement learning, and data assimilation models.
- Supervise a distributed team of systems engineers and research computing professionals across consortium partners.
- Facilitate cross-site collaboration, professional development, and technical standardization.
- Support capital planning for facilities, power, cooling, and network expansion in collaboration with host institutions.
- Guide technical input for RFPs, vendor engagement, grant proposals, and system procurement.
- Develop and track key performance indicators (KPIs) for system availability, usage, energy efficiency, and cost-performance metrics.
- Deliver strategic infrastructure reports to consortium leadership, funders, and advisory bodies.
- Contribute to emerging technical initiatives or federal/state partnerships aligned with Empire AI’s mission.
Minimum Qualifications
- Bachelor’s degree in Computer Science, Engineering, or a related technical discipline
- 10+ years of experience in research computing, systems engineering, or enterprise HPC/AI architecture
- Demonstrated success operating large-scale computing environments, including GPU clusters and distributed storage
- Experience managing or supporting compliant computing environments (HIPAA, NIST, FISMA)
- Proven ability to lead teams, manage cross-institutional projects, and build scalable research platforms
Preferred Qualifications
- Master’s or Ph.D. in a STEM or technical leadership field
- Familiarity with federated data systems and multi-institutional infrastructure governance
- Experience with cloud integration (e.g., OpenStack, Kubernetes, hybrid burst models)
- Strong background in AI/ML platforms (e.g., Slurm, PyTorch, NVIDIA H100 architecture, Singularity/Apptainer)
- Experience contributing to major infrastructure grants (e.g., NSF CC*, NIH GDS, DOE AI initiatives)
Compensation
Our compensation reflects the cost of labor across several US geographic markets. The base pay and target total cash for this position range from $100,000 to $250,000. Pay is based on a number of factors including market location and may vary depending on job-related knowledge, skills, and experience.
Travel Requirements
This role requires 20% regional travel and availability to work from our corporate office when not traveling. Candidates should either live near or be willing to relocate within a reasonable commuting distance of the office. Relocation assistance may be provided.