Design and implement ML/LLM evaluation tasks, rubrics, and metrics to shape AI model behavior. Grade model/agent outputs and improve evaluation quality through expert training-side judgment. Requires 5+ years of MLE experience with hands-on training/fine-tuning and proficiency in PyTorch, Hugging Face, and RLHF.
Key Highlights
Key Responsibilities
Technical Skills Required
Benefits & Perks
Job Description
I’m helping MC find a top candidate to join their team as a freelancer for the role of Remote Machine Learning Evaluation Engineer.
You'll shape AI model behavior and evaluation quality through expert training-side judgment.
Compensation
USD45 - 140/hour
Location:
Remote (for residents of the United States, United Kingdom, Canada, Argentina, Australia, Belgium, Brazil, Chile, Colombia, Czech Republic, Denmark, Finland, France, Germany, Ireland, Italy, Mexico, Netherlands, Norway, Peru, Poland, Portugal, Romania, Spain, Sweden, Switzerland, and Uruguay)
What makes you a strong candidate:
- You have 5+ years of experience in Machine learning
- You are proficient in Hugging Face, PyTorch, Reinforcement learning from human feedback (RLHF)
- English - Conversational
Responsibilities and deliverables
Code-Data Eval Author
Machine Learning Engineer (Pilot).
Hourly contract.
Interested in remote work opportunities in Machine Learning & AI? Discover Machine Learning & AI Remote Jobs featuring exclusive positions from top companies that offer flexible work arrangements.
Remote.
$45-$140 per hour.
AI labs to build the evaluations their models are trained and measured against. You will design ML/LLM evaluation tasks and rubrics and grade model/agent outputs. Your training-side knowledge directly shapes reward and evaluation signals.
Responsibilities:
- Design ML/LLM evaluation tasks, rubrics, and metrics.
- Grade model/agent outputs and improve evaluation quality through review.
- Bring training-side judgment (SFT / RLHF / reward modeling) to evaluation design.
Qualifications:
- Approximately 5+ years as an MLE at a real product organization with hands-on training/fine-tuning and evaluations.
- Ideally fluent in SFT / RLHF / reward modeling / evaluation metrics (rare, high-leverage here).
- Proficient in PyTorch/JAX, Hugging Face, experiment tracking.
- Clear written communication.
Engagement & Pay:
Browse our curated collection of remote jobs across all categories and industries, featuring positions from top companies worldwide.
- Remote contract, flexible 30+ hours/week.
- Hourly rate set to your local market (e.g., US/Canada $100
- $140/hr; Europe and LatAm scaled to region).
Hiring Process:
- A short Mercor Technical Screen.
- A live Code Review Session.
- A Domain Expert Interview.
- You are paid $200 for completing all three, regardless of outcome.
- We consider all qualified applicants without regard to legally protected characteristics and provide reasonable accommodations upon request.
Contract and Payment Terms:
- You will be engaged as an independent contractor.
- This is a fully remote role that can be completed on your own schedule.
- Projects can be extended, shortened, or concluded early depending on needs and performance.
- Your work will not involve access to confidential or proprietary information from any employer, client, or institution.
- Payments are weekly on Stripe or Wise based on services rendered.
- Please note: We are unable to support H1-B or STEM OPT candidates at this time.
Similar Jobs
Explore other opportunities that match your interests
oneseven tech (ost)
arc labs