Freelance Developer Needed — Local Python / Excel Document Processing Tool
Budget: $3000.0
FIXED /
⭐ 1.67 (2)
Australia
python, microsoft-excel, data-scraping
We are looking for an experienced freelance developer to build a local document-processing tool for an internal legal costing workflow.
The tool will scan a local matter folder containing emails, PDFs, Word documents and attachments, then produce a structured Excel workbook for review.
This is not a web app and does not need to make legal decisions. The purpose is to assist with document organisation, metadata extraction, duplicate flagging, word/page counting, and Excel preparation.
The ideal developer will have strong experience with Python automation, Excel reporting, document parsing, and Outlook email files.
Project objective
We want to build a practical MVP that reduces the repetitive administrative work involved in preparing legal costing files.
The first version should focus on:
indexing matter documents;
extracting useful metadata;
linking emails with attachments;
identifying likely duplicates;
counting words/pages where possible;
generating a clean Excel workbook for manual review.
The tool should support human review and manual override at all stages.
Required features
The tool should:
run locally on a Windows computer;
allow the user to select or nominate a matter folder;
scan subfolders;
identify common file types, including:
PDF files;
Word documents;
Excel files;
Outlook .msg emails;
attachments;
extract email metadata where possible, including:
date;
sender;
recipient;
cc;
subject;
body preview;
attachment names;
extract basic document metadata;
estimate word counts for emails and Word documents where possible;
estimate page counts for PDFs where possible;
link attachments to their parent emails where possible;
flag likely duplicates or possible email-chain duplicates;
export the results into a structured Excel workbook.
Excel output required
The Excel workbook should include the following tabs:
Raw Index
A full list of all files and extracted metadata.
Review
A clean working sheet for manual review.
Validation / Warnings
Any files that could not be processed, duplicate flags, missing dates, unusual file types, etc.
Export
A simplified output sheet that can be copied into a final costing document or billing template.
Review tab columns
The Review tab should include columns such as:
Include / exclude
Date
File name
File path
File type
Document category
Email subject / document title
Sender
Recipient
CC
Attachment names
Parent email
Word count
Page count
Duplicate flag
Suggested category
Scale item
Units
Draft item description
Final item description
Notes
The user must be able to manually edit the Excel workbook after it is generated.
Suggested categories
The tool may provide basic suggested categories, such as:
email;
letter;
medical report;
counsel correspondence;
court document;
pleading;
invoice;
attachment;
unknown.
These are only suggestions. The tool does not need to make final legal or costing decisions.
Important constraints
This project involves confidential legal material, so privacy and security are important.
Requirements:
The tool must run locally.
Client documents must not be uploaded to external services.
No client documents may be used for AI training.
Development and testing should initially use dummy or redacted sample files.
We require clean, readable source code.
We require basic handover documentation.
An NDA may be required before any sample material is provided.
Preferred technical approach
We are open to the developer’s recommendation, but expect the solution may involve:
Python;
openpyxl or similar for Excel generation;
document parsing libraries for Word/PDF;
Outlook .msg parsing libraries;
a simple desktop interface or clear command-line script.
A polished interface is not required for the first version, but the tool should be usable by a non-technical user with clear instructions.
Preferred milestone structure
We are open to discussion, but a possible structure is:
Proof of concept
Scan folder and generate a basic Excel file index.
Email and document extraction
Extract metadata from .msg, Word and PDF files.
Excel workbook structure
Create Raw Index, Review, Validation and Export tabs.
Attachment linking and duplicate flags
Link attachments to parent emails and flag likely duplicates.
Testing, bug fixes and handover
Test with sample folders, fix issues, provide source code and documentation.
Please respond with
Please include the following in your response:
Relevant experience with Python automation, Excel reporting or document parsing.
Any experience working with Outlook .msg files.
Your proposed technical approach.
What you would include and exclude from the MVP.
Proposed milestone structure and pricing.
Any key risks, limitations or assumptions.
Confirmation that the tool can run locally without uploading confidential documents.
Ideal candidate
The ideal person is a practical automation developer who understands document-heavy workflows and can build a useful, reliable internal tool without overcomplicating the first version.
Experience with legal, accounting, insurance, medical, professional-services or document-heavy workflows would be highly regarded.
Apri su Upwork