Remote Machine Learning Evaluation Engineer

emma of torre.ai Argentina
Remote
Apply
AI Summary

Design and implement ML/LLM evaluation tasks, rubrics, and metrics to shape AI model behavior. Grade model/agent outputs and improve evaluation quality through expert training-side judgment. Requires 5+ years of MLE experience with hands-on training/fine-tuning and proficiency in PyTorch, Hugging Face, and RLHF.

Key Highlights
Freelance role with hourly rate $45-$140 based on local market
Design ML/LLM evaluation tasks, rubrics, and metrics
Grading model/agent outputs and improving evaluation quality
Key Responsibilities
Design ML/LLM evaluation tasks, rubrics, and metrics
Grade model/agent outputs and improve evaluation quality through review
Bring training-side judgment (SFT / RLHF / reward modeling) to evaluation design
Technical Skills Required
PyTorch JAX Hugging Face Reinforcement learning from human feedback (RLHF) SFT Reward modeling Experiment tracking
Benefits & Perks
Remote work
Flexible 30+ hours/week
Weekly payments via Stripe or Wise

Job Description


I’m helping MC find a top candidate to join their team as a freelancer for the role of Remote Machine Learning Evaluation Engineer.


You'll shape AI model behavior and evaluation quality through expert training-side judgment.


Compensation

USD45 - 140/hour


Location:

Remote (for residents of the United States, United Kingdom, Canada, Argentina, Australia, Belgium, Brazil, Chile, Colombia, Czech Republic, Denmark, Finland, France, Germany, Ireland, Italy, Mexico, Netherlands, Norway, Peru, Poland, Portugal, Romania, Spain, Sweden, Switzerland, and Uruguay)


What makes you a strong candidate:

  • You have 5+ years of experience in Machine learning
  • You are proficient in Hugging Face, PyTorch, Reinforcement learning from human feedback (RLHF)
  • English - Conversational


Responsibilities and deliverables

Code-Data Eval Author

Machine Learning Engineer (Pilot).

Hourly contract.

Remote.

$45-$140 per hour.


AI labs to build the evaluations their models are trained and measured against. You will design ML/LLM evaluation tasks and rubrics and grade model/agent outputs. Your training-side knowledge directly shapes reward and evaluation signals.


Responsibilities:

  • Design ML/LLM evaluation tasks, rubrics, and metrics.
  • Grade model/agent outputs and improve evaluation quality through review.
  • Bring training-side judgment (SFT / RLHF / reward modeling) to evaluation design.


Qualifications:

  • Approximately 5+ years as an MLE at a real product organization with hands-on training/fine-tuning and evaluations.
  • Ideally fluent in SFT / RLHF / reward modeling / evaluation metrics (rare, high-leverage here).
  • Proficient in PyTorch/JAX, Hugging Face, experiment tracking.
  • Clear written communication.


Engagement & Pay:

  • Remote contract, flexible 30+ hours/week.
  • Hourly rate set to your local market (e.g., US/Canada $100
  • $140/hr; Europe and LatAm scaled to region).


Hiring Process:

  • A short Mercor Technical Screen.
  • A live Code Review Session.
  • A Domain Expert Interview.
  • You are paid $200 for completing all three, regardless of outcome.
  • We consider all qualified applicants without regard to legally protected characteristics and provide reasonable accommodations upon request.


Contract and Payment Terms:

  • You will be engaged as an independent contractor.
  • This is a fully remote role that can be completed on your own schedule.
  • Projects can be extended, shortened, or concluded early depending on needs and performance.
  • Your work will not involve access to confidential or proprietary information from any employer, client, or institution.
  • Payments are weekly on Stripe or Wise based on services rendered.
  • Please note: We are unable to support H1-B or STEM OPT candidates at this time.


Similar Jobs

Explore other opportunities that match your interests

Visa Sponsorship Relocation Remote
Job Type Full-time
Experience Level Not Applicable

oneseven tech (ost)

Argentina
Visa Sponsorship Relocation Remote
Job Type Full-time
Experience Level Entry level

arc labs

Türkiye
Visa Sponsorship Relocation Remote
Job Type Full-time
Experience Level Not Applicable

handshake

United State

Subscribe our newsletter

New Things Will Always Update Regularly