Skip to main content

Table 4 Analysis of the reading notes of different LLMs

From: Performance of large language models (LLMs) in providing prostate cancer information

Characteristic

ChatGPT-3.5

ChatGPT-4

Google Bard

p-value

Overall (n = 52)

   

< 0.001

Plain English

2 (3.9%)

1 (1.9%)

12 (23.1%)

 

Fairly easy to read

0 (0.0%)

0 (0.0%)

2 (3.8%)

 

Difficult to read

26 (51.0%)

32 (61.5%)

19 (36.5%)

 

Fairly difficult to read

6 (11.8%)

9 (17.3%)

18 (34.6%)

 

Very difficult to read

17 (33.3%)

10 (19.2%)

1 (1.9%)

 

General (n = 9)

   

0.105

Plain English

2 (22.2%)

1 (11.1%)

3 (33.3%)

 

Fairly easy to read

0 (0.0%)

0 (0.0%)

2 (22.2%)

 

Difficult to read

2 (22.2%)

5 (55.6%)

0 (0.0%)

 

Fairly difficult to read

3 (33.3%)

3 (33.3%)

4 (44.4%)

 

Very difficult to read

2 (22.2%)

0 (0.0%)

0 (0.0%)

 

Diagnosis (n = 5)

   

0.044

Plain English

0 (0.0%)

0 (0.0%)

2 (40.0%)

 

Fairly easy to read

0 (0.0%)

0 (0.0%)

0 (0.0%)

 

Difficult to read

3 (75.0%)

3 (60.0%)

0 (0.0%)

 

Fairly difficult to read

0 (0.0%)

1 (20.0%)

3 (60.0%)

 

Very difficult to read

1 (25.0%)

1 (20.0%)

0 (0.0%)

 

Treatment (n = 27)

   

< 0.001

Plain English

0 (0.0%)

0 (0.0%)

6 (22.2%)

 

Fairly easy to read

0 (0.0%)

0 (0.0%)

0 (0.0%)

 

Difficult to read

13 (48.1%)

19 (70.4%)

13 (48.1%)

 

Fairly difficult to read

2 (7.4%)

0 (0.0%)

7 (25.9%)

 

Very difficult to read

12 (44.4%)

8 (29.6%)

1 (3.7%)

 

Screening & Prevention (n = 11)

   

0.245

Plain English

0 (0.0%)

0 (0.0%)

1 (9.1%)

 

Fairly easy to read

0 (0.0%)

0 (0.0%)

0 (0.0%)

 

Difficult to read

8 (72.7%)

5 (45.5%)

6 (54.5%)

 

Fairly difficult to read

1 (9.1%)

5 (45.5%)

4 (36.4%)

 

Very difficult to read

2 (18.2%)

1 (9.1%)

0 (0.0%)

 
  1. Analysis of the reading notes of different LLMs in all categories, general knowledge, diagnosis, treatment, and screening and prevention