IDP with VESTIGAS.
๐ Key facts
- When: Start anytime. Applications are open!
- How to apply: Send an e-mail to Paul Kaiser (kaiser@vestigas.com) and submit your CV and grade report.
- ๐ Key facts
- ๐ก Projekt Objective
- ๐ฆพWho We Are
- ๐ Requirements
- ๐ฏ Proposed Project Workflow
- โ Framework Conditions
- ๐ How to Apply
๐ก Projekt Objective
The objective of the project is to utilize the latest developments of large language models that is capable to support the automatic extraction of important data fields from XML, JSON, PDFs or even scans of delivery notes in the construction industry. The model should be able to handle various document structures (ideally even unknown formats) and output either the extracted data in a predefined data format or a suggestion for a VESTIGAS parser configuration.
๐ฆพWho We Are
VESTIGAS is the Supply Chain OS for the construction industry and allows suppliers, carriers, and construction companies to exchange orders, delivery notes and invoices fully digitally and process them automatically. This saves vast amounts of money, increases transparency and allows to control for environmental factors along the whole chain. VESTIGAS already works with some of the largest names in the industry like BayWa, Wรผrth & Wolff & Mรผller and supplies some of the largest infrastructure projects in Germany.
๐ Requirements
- Experience with the usage of large language models like GPT-3/4, LLaMA or similar
- Good overview of the current state of the Art within the NLP and OCR domain
- Optional: Experience with fine-tuning LLMs
- Good programming skills in Python
- Knowledge of structured data and PDF processing
- Ability to perform structured data analysis
- Great team spirit
- Motivation to take responsibility for a central feature
๐ฏ Proposed Project Workflow
- Familiarization with the problem, the document and data structure
- Screen the AI landscape for existing solutions and available models
- Augment the existing delivery note data and create a training (and validation) data set
- Create a LLM plugin for e.g. GPT-3/4 with the corresponding prompts to extract data from a arbitrary document
- Create a LLM plugin for e.g. GPT-3/4 with the corresponding prompts to create a suggestion for a parser configuration
- Validate the model results and further improve the prompts
- Optional: Fine tune the LLM to better fit to the delivery note data or to save the structure knowledge on a long-term
- Integrate the developed plugin into the VESTIGAS IT tooling landscape
โ Framework Conditions
An initial contingent of test data can be provided by VESTIGAS. In addition, industrial partners can also contribute and thus provide insight into potential cooperation.
๐ How to Apply
Interested? Please contact Paul Kaiser (kaiser@vestigas.com) by submitting your CV and grade report.
We're greatly looking forward to hearing more about you!
More about VESTIGAS: https://vestigas.com