InitialGPT

Introduction

InitialGPT is a ChatGPT-enhanced tool for automaticlly generating initialization data of prototype to support rapid requirements validation. The benefits of InitialGPT are as follows:

  1. Automatic Generating Initialization Data of Prototype. InitialGPT can automatically refactor the prototype generated by RM2PT to support generating the initial data of prototypes.

  2. Automatic Data Prompts Generation for ChatGPT. We propose a method for automatically generating prompts for large language models from requirement models, which can be used to generate initial data for prototypes.

  3. More effiency for requirement validation. Compared with the original prototype generated from RM2PT and InputGen, the enhanced prototype can automatically generate substantial and reasonable initial data of the prototype, this will boost the validation process. InitialGPT_Overview11

The video cast its feature is listed as follows (Youtube):

InitialGPT Installation

Prerequest

InitialGPT is an advanced feature of RM2PT. We recommend you to use InitialGPT in RM2PT. If you don’t have RM2PT, download here.

Installation

Click here to download InitialGPT. Follow the steps below to install.

2offline

3load

4add

InitialGPT333

6installanyway

InitialGPT Tutorial

Prerequest

In order to generate the prototype initial data, you need a requirement model, the RM2PT project. For creating or importing a RM2PT project,you can see the tutorial here. We recommend importing RM2PT projects from Git, which is avaliable at CaseStudies. The tutorial is here.

Input of InitialGPT — Requirements Model

InitialGPT_Overview_10

rm

The input to InitialGPT is a UML requirements model with OCL constraints. The model includes: a conceptual class diagram, a use case diagram, system sequence diagrams, contracts of and system operations.

Input of InitialGPT — Prompt Template

A good prompt template approach is the key to data generation, which can effectively prompt the large language model to generate data so as to better utilize the performance of the large model.

Since ChatGPT has the limitation of data response time and length for the api usage, we have designed two sets of templates for ChatGPT, which are used for generating general amount of data and large amount of data respectively.

InitialGPT_Overview11

General Prompt Template

The general prompt template consists of three main components. The Initial Information Prompt is a well-tested general prompt, the Input section consists of user interaction information and domain entity information automatically generated by the requirements model, and the Output section specifies the format of the output data. GeneralFarmat

Large Prompt Template

The Large Data Amount Prompt Template is similar to the General Prompt Template, but is generated for a single entity, 40 at one time, and post-processed until the final required number is fully generated. largeFarmat

1) Generate a prototype from the requirement model

After you import a requirements model, first, we use the RM2PT to generate a prototype from the requirements model by right click on cocome.remodel -> RM2PT-> OO Prototype-> ` Generate Desktop Prototype`

10generateprototype

2) Run the InitialGPT tool to refactor and enhance the prototype

after you generate a prototype, we use the InitialGPT to refactor the prototype from the requirements model by right click on cocome.remodel -> RM2PT-dev-> InitialGPT, and update the project.

InitialGPT9refactor

3) The third step is to run the refactored prototype

Run the refactored prototype to validate the requirements by right click on COCOMEPrototype -> pom.xml-> run-> maven build . 8runprototype

4) Generate the initial data to validate the requirements.

The Output of InitialGPT

After automatically refactoring and enhancing the generated prototype by the tool InitialGPT, the enhanced prototype contains two advantages as follows:

For example

In the system status, Click on the initial data generation button and in the data generation screen InitialGPT_chushi

  1. The first step is to select a model. The current models used are ChatGPT’s gpt-4 and gpt-3.5-turbo; more large language models will be added later. InitialGPT_kaishi2

  2. Click on Generate Settings and then choose whether to use a proxy on the pop-up page. If you do, please delete “no” and fill in your own proxy port.

  3. Fill in your openai key.

  4. here you can set the number of entities we want to generate for each entity. Also, you can add as many prompts as you want to the note box. InitialGPT_setting_openaikey

  5. After the settings have been made, click on the Generation button and it will be generated automatically. Here you can see the time corresponding to the number of generated entities we have tested. InitialGPT_generate7

  6. After successful generation, you can view the generated entities in the table view or yaml view, and you can also modify them to better match your requirements.

  7. Finally, click on the load data button to import the initial data into the prototype. InitialGPT_generate8

  8. The generated data will also be saved and can be imported again by clicking the loadfile button. InitialGPT_load

For more details, please see CaseStudies.

InitialGPT Evalution

Case Studies

In this section, we first present the case studies and then show the evaluation results based on the case studies.

This paragraph discusses the reuse of four case studies\cite{yang2019automated-6} to demonstrate the validity and functionality of InitialGPT. The case studies include systems that are widely used in daily life, namely a supermarket system (CoCoME), a library management system (LibMS), an automated teller machine (ATM) and a loan processing system (LoanPS). More details of the requirements models can be found at GitHub$\footnotemark$\footnotetext{https://github.com/RM2PT/CaseStudies}.

The complexity of these requirement models is shown in Table 1. InitialGPT’s experimental settings are 2.8GHz Intel Core i5, 16GB DDR3, JDK 11 (JDK 11 has been embedded directly into RM2PT). The large model is tested using the api of ChatGPT’s gpt-4 and gpt-3.5-turbo. We have a concise training on utilizing CASE tools for requirements validation.

InitialGPT_table1

Evaluation Results

In this section, we first present the case studies and then show the evaluation indicators with evaluation results based on the case studies.

Efficiency Experments:

InitialGPT can refactor the prototype generated by RM2PT in combination with ChatGPT to automatically generate the initial data. In order to assess effectiveness, we conducte a comparison between two versions:

(1) The prototype generated by RM2PT tool and (2) The improved prototype that underwent refactoring by InitialGPT.

We invite four software engineers from industry and academia to participate in our validation experiments. Three of the invitees have master’s degrees and one has a bachelor’s degree in computer science.

The experiments involve validating requirements through the generation of initial data using InitialGPT or manual writing of initial data. The results of the experiments are shown in Table II.

InitialGPT_table2

We calculate the time efficiency of using the prototype for requirements validation. As shown in Table 3 on average, it takes a developer 29.37 minutes to write 100 initial entities data, while ChatGPT takes only 2.96 minutes to automatically generate, and the enhanced prototype is able to improve the efficiency of requirements validation by 6x times **over a manually written prototype, and **the more initial data required, the more time is saved.

InitialGPT_table3

Quality Evaluation

At the same time, we also evaluate the quality of GPT-generated data. For the evaluation of data quality, we propose 10 evaluation dimensions with 15 evaluation indicators, and the method of evaluation is to use GPT-4 to evaluate the quality of manually written data and GPT-generated data.

Evaluation indicators for data quality

The detailed evaluation dimensions and evaluation indicators are shown in the figure:

InitialGPT_table5

Evaluation Rule for data quality

The evaluation rule is a 5-point scale, where the score of each indicator is calculated and then the total score is calculated based on the corresponding weights.

The evaluation rule is a 5-point scale, using the “Number of Entities Coverage” as an example. 5 points: The data cover all entities and there are enough of each. 4 points: The data covers most entities, but the number of entities is sufficient. 3 points: The data covers most of the entities, but there are some omissions in the number of entities generated. 2 points: The data covers a basic range of entities and there are omissions in the number of entities generated. 1 point: The data covers very few entities and the number of entities generated is small. 0 points: The data does not cover any entities or the data is not available at all. From high to low, a score of 5 indicates that the data fully meets the requirements of the evaluation indicators, a score of 1 indicates that the data rarely meets the requirements of the evaluation indicators, and a score of 0 indicates that the data does not meet the requirements of the evaluation indicators at all and is not available.

Data Evaluation Promt Template

We design the following promt template for prompting GPT4 to evaluate the data, the prompt template is divided into 5 parts, which are evaluation prompt steps, evaluation rules, evaluation indicators for data quality, data to be evaluated and entity information, as shown in the figure.

InitialGPT_table6

Results of data quality evaluation

We conducted three experiments as shown in Table IV. Table IV show that the quality of data generated by GPT-4 is better than GPT-3.5-turbo, which is basically similar to the quality of manually written data.

InitialGPT_table4

For more details, the manually written initial CoCoME data and the GPT-generated initial data are here.