From: Performance of large language models (LLMs) in providing prostate cancer information
Characteristic | ChatGPT-3.5 | ChatGPT-4 | Google Bard | p-value |
---|---|---|---|---|
Overall (n = 52) | < 0.001 | |||
Plain English | 2 (3.9%) | 1 (1.9%) | 12 (23.1%) | |
Fairly easy to read | 0 (0.0%) | 0 (0.0%) | 2 (3.8%) | |
Difficult to read | 26 (51.0%) | 32 (61.5%) | 19 (36.5%) | |
Fairly difficult to read | 6 (11.8%) | 9 (17.3%) | 18 (34.6%) | |
Very difficult to read | 17 (33.3%) | 10 (19.2%) | 1 (1.9%) | |
General (n = 9) | 0.105 | |||
Plain English | 2 (22.2%) | 1 (11.1%) | 3 (33.3%) | |
Fairly easy to read | 0 (0.0%) | 0 (0.0%) | 2 (22.2%) | |
Difficult to read | 2 (22.2%) | 5 (55.6%) | 0 (0.0%) | |
Fairly difficult to read | 3 (33.3%) | 3 (33.3%) | 4 (44.4%) | |
Very difficult to read | 2 (22.2%) | 0 (0.0%) | 0 (0.0%) | |
Diagnosis (n = 5) | 0.044 | |||
Plain English | 0 (0.0%) | 0 (0.0%) | 2 (40.0%) | |
Fairly easy to read | 0 (0.0%) | 0 (0.0%) | 0 (0.0%) | |
Difficult to read | 3 (75.0%) | 3 (60.0%) | 0 (0.0%) | |
Fairly difficult to read | 0 (0.0%) | 1 (20.0%) | 3 (60.0%) | |
Very difficult to read | 1 (25.0%) | 1 (20.0%) | 0 (0.0%) | |
Treatment (n = 27) | < 0.001 | |||
Plain English | 0 (0.0%) | 0 (0.0%) | 6 (22.2%) | |
Fairly easy to read | 0 (0.0%) | 0 (0.0%) | 0 (0.0%) | |
Difficult to read | 13 (48.1%) | 19 (70.4%) | 13 (48.1%) | |
Fairly difficult to read | 2 (7.4%) | 0 (0.0%) | 7 (25.9%) | |
Very difficult to read | 12 (44.4%) | 8 (29.6%) | 1 (3.7%) | |
Screening & Prevention (n = 11) | 0.245 | |||
Plain English | 0 (0.0%) | 0 (0.0%) | 1 (9.1%) | |
Fairly easy to read | 0 (0.0%) | 0 (0.0%) | 0 (0.0%) | |
Difficult to read | 8 (72.7%) | 5 (45.5%) | 6 (54.5%) | |
Fairly difficult to read | 1 (9.1%) | 5 (45.5%) | 4 (36.4%) | |
Very difficult to read | 2 (18.2%) | 1 (9.1%) | 0 (0.0%) |