Structured procedural knowledge extraction from industrial documentation using large language models
Elmouss, Imad Eddine (2025)
Elmouss, Imad Eddine
2025
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi:amk-2025121235460
https://urn.fi/URN:NBN:fi:amk-2025121235460
Tiivistelmä
Industrial manuals and SOPs are packed with procedural logic, ordered steps, prerequisites, “if–then” rules, safety guards, and parameter thresholds, but that logic is usually implicit in prose and hard for machines to use. This thesis introduces IPKE (Industrial Procedural Knowledge Extraction), a local, privacy-preserving pipeline that converts safety-critical documentation into queryable Procedural Knowledge Graphs (PKGs), making workflows and constraints explicit and directly usable by AI agents and human engineers for search, verification, and decision support.
IPKE makes two core contributions. First, Dual Semantic Chunking (DSC) segments documents by combining structural cues (headings/sections) with embedding-based cohesion, reducing the “context fragmentation” that breaks multi-step instructions. Second, P3 Two-Stage Decomposition separates step extraction from constraint attachment, forcing each extracted rule to link to concrete step IDs and improving constraint recall on smaller local models.
Evaluated on real industrial documents (marine repair and safety/manufacturing procedures), IPKE reaches ~75% constraint coverage and Procedural Fidelity Φ = 0.699 using a quantized 7B local model, outperforming a 70B model under standard prompting (Φ = 0.439, 50% constraint coverage). The results highlight a practical takeaway for Industry 5.0, pipeline design and task decomposition can matter more than sheer model scale when extracting logic-dense procedures for safety-critical, human-in-the-loop systems.
IPKE makes two core contributions. First, Dual Semantic Chunking (DSC) segments documents by combining structural cues (headings/sections) with embedding-based cohesion, reducing the “context fragmentation” that breaks multi-step instructions. Second, P3 Two-Stage Decomposition separates step extraction from constraint attachment, forcing each extracted rule to link to concrete step IDs and improving constraint recall on smaller local models.
Evaluated on real industrial documents (marine repair and safety/manufacturing procedures), IPKE reaches ~75% constraint coverage and Procedural Fidelity Φ = 0.699 using a quantized 7B local model, outperforming a 70B model under standard prompting (Φ = 0.439, 50% constraint coverage). The results highlight a practical takeaway for Industry 5.0, pipeline design and task decomposition can matter more than sheer model scale when extracting logic-dense procedures for safety-critical, human-in-the-loop systems.
