Job Description
We are seeking multilingual raters to assist in creating evaluation sets for LLM model by using the product based on a given scenario. Raters will simulate real-world research tasks, collect source documents, generate prompts, and evaluate responses to ensure high-quality model outputs.
Responsibilities:
- Review the provided topic/domain and refine it into a specific sub-topic (e.g., from "Law" to "Arbitration").
- Conduct online research to collect relevant source documents, ensuring compliance with the given source type and word count guidelines.
- Identify and collect sources from Google search using:
- PDFs
- Google Docs
- Markdown files
- Pasted text from publicly available web pages
- Formulate prompts according to the specified category and input them into LLM model.
- Analyze the responses, and validate if the prompts can be answered by the LLM Model.
- Generate additional prompts within the same category to deepen the evaluation process.
Provide feedback on usability, accuracy, and comprehensiveness of LLM outputs.