Abstract:
The regulatory clauses of building and construction standards or codes usually include diverse engineering concepts, implicit common knowledge and complex combination of rules, which poses great challenges to the automatic learning and reasoning of standards. Therefore, the authors established a systematic framework for intelligent interpretation of regulatory texts by integrating domain-specific large language models (DSLLM) and knowledge graph of common domain knowledge (comKG). Firstly, by constructing a domain-specific corpus and pre-training a large model, the learning and representation of hidden knowledge such as the writing style and grammars of the regulatory texts are achieved, and a large-scale knowledge graph with common knowledge is constructed to lay the foundation for the intelligent interpretation of rules. Next, based on DSLLM and comKG, core algorithms and technologies such as recognition and structure decomposition of building standards and specifications, semantic labelling, syntax parsing and complex rule handling are developed, creating an end-to-end rule interpretation pipeline that could automatically generate executable codes from raw regulatory texts. Moreover, potential applications of the proposed methods in typical scenarios such as clause searching, question answering for building standards, automatic compliance checking and proactive BIM optimization are also investigated. Results show that the proposed method can effectively overcome the challenging problems such as complex rule interpretation, with an interpretation accuracy of over 95%, an interpretation efficiency five times faster than manual methods, and improve the compliance checking of BIM by about 40 times. This study provides a foundation and stepstone for the digitalization and intelligent application of standards or regulator documents in building and construction domain.