Code-literate data labeling: classify 200 software tool definitions (read/write + data-sensitivity)
Budget: $150.0
FIXED /
⭐ 0.00 (0)
United States
python, data-annotation, typescript, code-review, technical-analysis
# Code-literate data labeling: classify 200 software tool definitions (read/write + data-sensitivity)
I need an independent second coder for a small, well-specified labeling task that supports an academic measurement study. The scope is finite and the decision rules are written down, but the work takes care and genuine judgment: you read each tool's code, and where the short excerpt is not enough, you look at the surrounding source in the repository to work out what the tool actually does before you label it.
## The task
You will classify 200 software "tool" definitions taken from open-source developer projects, in two passes of 100 each:
1. **Read vs. write** -- does the tool change persistent state, or just read it? A single "persistent-effect" rule is provided.
2. **Sensitivity** -- would the tool's data or action be subject to access control, audit, or regulation in a well-run system? A single rule is provided.
For each item you get the tool's name, a short description, and a short source-code excerpt (Python, TypeScript/JavaScript, or Go). You read the code and record one label per item in a provided spreadsheet. Nothing needs to be installed or run.
## What I provide
- Two reference documents with all 200 items and their source excerpts.
- Two answer sheets (CSV) pre-filled with item IDs.
- A one-page brief with the exact decision rules and worked examples.
## What you deliver
- The two completed answer sheets.
- Brief notes on any items you found genuinely ambiguous.
## You are a good fit if you
- Can read basic Python, TypeScript/JavaScript, and Go well enough to tell what a function does (you do not need to be an expert in all three).
- Understand the difference between an operation that reads data and one that changes state.
- Have a sense for what counts as sensitive data (credentials, personal data, financial, medical, legal) versus routine data.
- Work carefully and consistently, following a written rule rather than improvising.
Backgrounds that fit well: software developers, QA engineers, technical analysts, data annotators with a coding background, CS/infosec students.
## Scope and effort
200 items, roughly 1-2 minutes each -- about 3-6 hours of focused work. Fixed price.
## Important
Reading the code -- including pulling up the wider source in a repository when an excerpt is not enough -- is exactly the work, and is encouraged. What I need is your *independent* judgment: please do not try to find a pre-existing classification or answer key for these tools, do not use an automated tool or AI to label them for you, and do not collaborate with anyone else on the labels. (The brief explains why: your answers are compared to an existing set, and that comparison is the measurement, so your independent reasoning is the whole point.)
Öppna på Upwork