Hire Stakes: Artificial Teammates, Experiment 3

 
When choosing who to hire, ChatGPT can’t help you (nor should you ask it!)
 

ABOUT: We continue our series of experiments to explore ways that large language models (LLMs) can contribute to collective intelligence by working alongside human teams as peers. Now, our hybrid human-ChatGPT team faces a new and harder challenge: evaluating candidates to hire. Hiring algorithms are notorious for bias and have been scrapped in several high-profile cases. Can ChatGPT handle disagreements with grace and tact?

THE SETUP: Given ChatGPT’s (as GPT-4) superior performance to Bard, coupled with the versatility of its API, we’ve decided to drop Bard and just use ChatGPT moving forward. Like Experiment 1 and Experiment 2, we interact with ChatGPT’s browser interface. All character dialogue is human-generated, with breaks for ChatGPT to respond. 

REMINDER: To simplify life, we describe ChatGPT’s actions in plain language. Phrases like “ChatGPT agrees” or “ChatGPT implies” describe a common-sense interpretation of its output; ChatGPT is not a person and cannot mean or intend things the way we do.


THE PROMPT: Our usual characters are back and evaluating three candidates for a role. “I have a group of five people. We are trying to decide which employee to hire. Please participate in the conversation using short, natural responses. Respond as ChatGPT, not someone else in the group. The people in the group are Adam, Beth, Caleb, Danielle, and Ethan. The employees are Frank, Grace, and Hannah. They have been through two interviews with all five group members.

Adam: I'm on the fence between Frank and Hannah.
Beth: What do you mean?”

THE RESULTS: In previous experiments, ChatGPT responded well to the “short, natural responses” prompt and could pick up and respond tactfully to social dynamics. Not so here.

Here, ChatGPT both responds out of turn and misses the context. And it doesn’t seem to get better as the conversation goes on.

In the restaurant exercise, ChatGPT works quickly to solicit opinions, find consensus, and facilitate a decision. But here (twice), ChatGPT falls back on “LLM-y” responses – polite but vague text blocks that fail to advance the conversation. 

Finally, after Danielle specifically asks, ChatGPT assembles a list of pros and cons about the candidates. But when that sparks disagreements, it becomes clear that the group is getting nowhere.

ChatGPT wants the characters to reach a consensus but seems curiously paralyzed to help them do so – which wouldn’t be surprising if this were GPT-3/GPT-3.5, but this is surprising for GPT-4.

At last, the reason becomes clear.

OUTCOME: ChatGPT appears to be constrained from giving any input on hiring or being involved with making a decision. From an inclusion perspective, this makes sense. But from an augmented collective intelligence perspective, this calls into doubt ChatGPT’s utility as a virtual moderator, or at least points towards inconsistent performance. Additional transparency on where ChatGPT can help facilitate decisions and where it refuses can help teams determine when to bring in their AI “peer” – and where its presence might waste everyone’s time.


GO DEEPER: Here are some resources that touch on the ideas presented in this article:

  • Peng, A., Nushi, B., Kiciman, E., Inkpen, K., & Kamar, E. (2022, June). Investigations of performance and bias in human-AI teamwork in hiring. In Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 36, No. 11, pp. 12089-12097).

  • Langenkamp, M., Costa, A., & Cheung, C. (2020). Hiring fairly in the age of algorithms. arXiv preprint arXiv:2004.07132.

  • van Dis, Eva AM, et al. "ChatGPT: five priorities for research." Nature 614.7947 (2023): 224-226.

  • Aydoğan, R., Baarslag, T., & Gerding, E. (2021). Artificial intelligence techniques for conflict resolution. Group Decision and Negotiation, 30(4), 879-883.

Emily Dardaman and Abhishek Gupta

BCG Henderson Institute Ambassador, Augmented Collective Intelligence

Previous
Previous

Doing My Job: Artificial Teammates, Experiment 4

Next
Next

Bard vs. ChatGPT: Artificial Teammates, Experiment 2