Remote Machine Learning Evaluation Engineer

emma of torre.ai • Argentina

Remote

Apply

AI Summary

Design and implement ML/LLM evaluation tasks, rubrics, and metrics to shape AI model behavior. Grade model/agent outputs and improve evaluation quality through expert training-side judgment. Requires 5+ years of MLE experience with hands-on training/fine-tuning and proficiency in PyTorch, Hugging Face, and RLHF.

Key Highlights

Freelance role with hourly rate $45-$140 based on local market

Design ML/LLM evaluation tasks, rubrics, and metrics

Grading model/agent outputs and improving evaluation quality

Key Responsibilities

Design ML/LLM evaluation tasks, rubrics, and metrics

Grade model/agent outputs and improve evaluation quality through review

Bring training-side judgment (SFT / RLHF / reward modeling) to evaluation design

Technical Skills Required

PyTorch JAX Hugging Face Reinforcement learning from human feedback (RLHF) SFT Reward modeling Experiment tracking

Benefits & Perks

Remote work

Flexible 30+ hours/week

Weekly payments via Stripe or Wise

Job Description

I’m helping MC find a top candidate to join their team as a freelancer for the role of Remote Machine Learning Evaluation Engineer.

You'll shape AI model behavior and evaluation quality through expert training-side judgment.

Compensation

USD45 - 140/hour

Location:

Remote (for residents of the United States, United Kingdom, Canada, Argentina, Australia, Belgium, Brazil, Chile, Colombia, Czech Republic, Denmark, Finland, France, Germany, Ireland, Italy, Mexico, Netherlands, Norway, Peru, Poland, Portugal, Romania, Spain, Sweden, Switzerland, and Uruguay)

What makes you a strong candidate:

You have 5+ years of experience in Machine learning
You are proficient in Hugging Face, PyTorch, Reinforcement learning from human feedback (RLHF)
English - Conversational

Responsibilities and deliverables

Code-Data Eval Author

Machine Learning Engineer (Pilot).

Hourly contract.

Interested in remote work opportunities in Machine Learning & AI? Discover Machine Learning & AI Remote Jobs featuring exclusive positions from top companies that offer flexible work arrangements.

Remote.

$45-$140 per hour.

AI labs to build the evaluations their models are trained and measured against. You will design ML/LLM evaluation tasks and rubrics and grade model/agent outputs. Your training-side knowledge directly shapes reward and evaluation signals.

Responsibilities:

Design ML/LLM evaluation tasks, rubrics, and metrics.
Grade model/agent outputs and improve evaluation quality through review.
Bring training-side judgment (SFT / RLHF / reward modeling) to evaluation design.

Qualifications:

Approximately 5+ years as an MLE at a real product organization with hands-on training/fine-tuning and evaluations.
Ideally fluent in SFT / RLHF / reward modeling / evaluation metrics (rare, high-leverage here).
Proficient in PyTorch/JAX, Hugging Face, experiment tracking.
Clear written communication.

Engagement & Pay:

Browse our curated collection of remote jobs across all categories and industries, featuring positions from top companies worldwide.

Remote contract, flexible 30+ hours/week.
Hourly rate set to your local market (e.g., US/Canada $100
$140/hr; Europe and LatAm scaled to region).

Hiring Process:

A short Mercor Technical Screen.
A live Code Review Session.
A Domain Expert Interview.
You are paid $200 for completing all three, regardless of outcome.
We consider all qualified applicants without regard to legally protected characteristics and provide reasonable accommodations upon request.

Contract and Payment Terms:

You will be engaged as an independent contractor.
This is a fully remote role that can be completed on your own schedule.
Projects can be extended, shortened, or concluded early depending on needs and performance.
Your work will not involve access to confidential or proprietary information from any employer, client, or institution.
Payments are weekly on Stripe or Wise based on services rendered.
Please note: We are unable to support H1-B or STEM OPT candidates at this time.

Job Overview

Posted Date Jun 14, 2026

Employment Type Contract

Experience Level Mid-Senior level

Location Argentina

Category Machine Learning

Company emma of torre.ai

Mentioned Skills

Similar Jobs

Explore other opportunities that match your interests

Senior DevOps Engineer - Model Context Protocol Infrastructure

Machine Learning

•

1d ago

Visa Sponsorship Relocation Remote

Job Type Full-time

Experience Level Not Applicable

oneseven tech (ost)

Argentina

Machine Learning Engineer - Computer Vision Specialist

Machine Learning

•

13m ago

Visa Sponsorship Relocation Remote

Job Type Full-time

Experience Level Entry level

arc labs

Türkiye

AI/ML Research Contractor - Evaluating AI-Generated Content for Training Data

Machine Learning

•

1h ago

Visa Sponsorship Relocation Remote

Job Type Full-time

Experience Level Not Applicable

handshake

United State

Remote Machine Learning Evaluation Engineer

Key Highlights

Key Responsibilities

Technical Skills Required

Benefits & Perks

Job Description

Job Overview

Mentioned Skills

Industries

Similar Jobs

Senior DevOps Engineer - Model Context Protocol Infrastructure

oneseven tech (ost)

Machine Learning Engineer - Computer Vision Specialist

arc labs

AI/ML Research Contractor - Evaluating AI-Generated Content for Training Data

handshake

Subscribe our newsletter