← Állások

AI Document Data Extraction — Test Assignment Specification

Költségvetés: $30.0 - $40.0 HOURLY / PART_TIME ⭐ 5.00 (15) United States

1. Objective We need to build a solution that can automatically extract specific business data points from complex document packages. The documents may include contracts, leases, amendments, reports, financial documents, due diligence materials, exhibits, schedules, and other supporting files. A document package may contain hundreds of pages and several related documents. The same business field may be mentioned in multiple places, changed by later documents, or appear in sections that look relevant but should not be used. The goal is to return the most accurate, applicable, and business-correct value for each requested data point. 2. Input The system should receive: one or more documents as a document package; a list of requested data points; a business description of what each data point means; an expected output format. The documents may contain: original agreements; amendments or addendums; exhibits and schedules; supporting reports; historical or superseded provisions; similar but non-applicable clauses; conflicting or overlapping language. 3. Output For each requested data point, the system should return: extracted value or summary; source reference showing where the answer came from; indication when no relevant information is available; indication when the result is uncertain or requires human review; short explanation of why the value was selected. The output should be concise, structured, and suitable for review by an operations or business user. 4. Example Data Point: Landlord Audit Rights Business Meaning Determine whether the Landlord has the right to audit, inspect, review, or verify Tenant’s books and records when those records relate to Gross Sales, Percentage Rent, sales reporting, or similar revenue-based rent obligations. This is an example of a complex data point. It is not enough to simply search for the words “audit”, “records”, “books”, or “sales”. The system must determine whether the language actually describes an applicable Landlord audit right. 5. Business Rules for This Example The system should include provisions where Landlord has a right to: audit, inspect, examine, review, verify, or reconcile Tenant’s books or records; review records related to Gross Sales, Percentage Rent, sales statements, or sales reporting; exercise the audit right within a specific time period or lookback window; verify reported sales or revenue-based rent obligations. The system should exclude provisions that only describe: procedural audit details; notice requirements; audit location; who performs the audit; business hours; audit costs or reimbursement; penalties or remedies; consequences of audit findings; default-related rights; termination-related rights; expired or historical rights; finality, waiver, or dispute periods that do not grant an actual audit right. If a later related document changes the original provision, the system should use the currently applicable language. If the relevant information is not available, the system should return: No information available 6. Example Output ( "datapoint": "Landlord Audit Rights", "value": "Landlord may audit Tenant's books and records relating to Gross Sales within two years after receipt of the applicable sales statement.", "source": ( "document": "Lease Agreement", "page": 84, "section": "Percentage Rent / Sales Records" ), "status": "Extracted", "requiresReview": false, "explanation": "The selected clause grants Landlord the right to audit Tenant's books and records relating to Gross Sales and includes a two-year audit period." ) 7. Quality and Iteration Expectations The extraction results should be evaluated against expected business-approved answers. The solution should be designed with the understanding that complex documents may contain edge cases, ambiguous wording, conflicting provisions, superseded language, and similar but non-applicable text. Business rules, extraction instructions, examples, and edge cases should be possible to refine over time based on review results and quality findings. The approach should make it possible to understand: which data points were extracted correctly; which data points were missed; where similar but non-applicable text was incorrectly selected; where conflicting or updated document language caused an incorrect result; which cases require clearer business rules or additional examples. The goal is not only to produce one extraction result, but to demonstrate a path for improving accuracy and consistency over time. 8. Expected Deliverable Provide a short explanation of the proposed approach and, if possible, a working prototype or pseudo-implementation. The solution should demonstrate how it would: process a package of related documents; identify relevant information for a requested data point; avoid extracting similar but non-applicable text; handle conflicts between original documents and later related documents; return a structured answer with source reference; identify cases where no reliable answer can be found; identify cases that require human review; support evaluation of extraction quality; allow business rules and extraction behavior to be improved over time. The technical approach, tools, architecture, and implementation details may be chosen freely. (edited)
Megnyitás Upworkön