Performance of large language models (LLMs) in providing prostate cancer information

BMC Urology

Table 4 Analysis of the reading notes of different LLMs

Characteristic	ChatGPT-3.5	ChatGPT-4	Google Bard	p-value
Overall (n = 52)				< 0.001
Plain English	2 (3.9%)	1 (1.9%)	12 (23.1%)
Fairly easy to read	0 (0.0%)	0 (0.0%)	2 (3.8%)
Difficult to read	26 (51.0%)	32 (61.5%)	19 (36.5%)
Fairly difficult to read	6 (11.8%)	9 (17.3%)	18 (34.6%)
Very difficult to read	17 (33.3%)	10 (19.2%)	1 (1.9%)
General (n = 9)				0.105
Plain English	2 (22.2%)	1 (11.1%)	3 (33.3%)
Fairly easy to read	0 (0.0%)	0 (0.0%)	2 (22.2%)
Difficult to read	2 (22.2%)	5 (55.6%)	0 (0.0%)
Fairly difficult to read	3 (33.3%)	3 (33.3%)	4 (44.4%)
Very difficult to read	2 (22.2%)	0 (0.0%)	0 (0.0%)
Diagnosis (n = 5)				0.044
Plain English	0 (0.0%)	0 (0.0%)	2 (40.0%)
Fairly easy to read	0 (0.0%)	0 (0.0%)	0 (0.0%)
Difficult to read	3 (75.0%)	3 (60.0%)	0 (0.0%)
Fairly difficult to read	0 (0.0%)	1 (20.0%)	3 (60.0%)
Very difficult to read	1 (25.0%)	1 (20.0%)	0 (0.0%)
Treatment (n = 27)				< 0.001
Plain English	0 (0.0%)	0 (0.0%)	6 (22.2%)
Fairly easy to read	0 (0.0%)	0 (0.0%)	0 (0.0%)
Difficult to read	13 (48.1%)	19 (70.4%)	13 (48.1%)
Fairly difficult to read	2 (7.4%)	0 (0.0%)	7 (25.9%)
Very difficult to read	12 (44.4%)	8 (29.6%)	1 (3.7%)
Screening & Prevention (n = 11)				0.245
Plain English	0 (0.0%)	0 (0.0%)	1 (9.1%)
Fairly easy to read	0 (0.0%)	0 (0.0%)	0 (0.0%)
Difficult to read	8 (72.7%)	5 (45.5%)	6 (54.5%)
Fairly difficult to read	1 (9.1%)	5 (45.5%)	4 (36.4%)
Very difficult to read	2 (18.2%)	1 (9.1%)	0 (0.0%)

Analysis of the reading notes of different LLMs in all categories, general knowledge, diagnosis, treatment, and screening and prevention

ISSN: 1471-2490