Doing My Job: Artificial Teammates, Experiment 4

“ChatGPT takes on a consulting case interview!”

ABOUT: We’re continuing our augmented collective intelligence (ACI) experiments and kicking up the difficulty quite a bit. Remember, we’re testing ChatGPT’s (underpinned by GPT 4) ability to enhance a human team’s performance by moderating and contributing to conversations like a peer. But first, we take a step back to test its solo performance on a difficult task – a consulting case interview.

THE SETUP: Given the challenge level of the material, we’re temporarily removing the human teammates and letting ChatGPT work independently. This is a “classic ChatGPT” user experience in the web browser.

THE PROMPT: For the consulting case study, our prompts have two parts. First, instructing ChatGPT how to respond – and second, providing the problems to solve. Thus, these prompts are long – feel free to skim ChatGPT’s responses below.

A known problem with LLMs is calibration – getting them to express appropriate levels of uncertainty. It was important for this experiment that we understood which parts were difficult for ChatGPT. But we’ll admit, seeing it stumped so soon made us laugh.

We knew there was more underneath the surface.

The first time I saw the “let’s think step by step” trick was on an early version of Claude in 2022 By now, it’s well known, and for good reason - it works tremendously.

We’ll go into more detail on evaluating case interview performance in future experiments (hopefully with the help of a hiring professional who does that full-time), but let’s look at ChatGPT’s responses here. Evaluators look at how effectively candidates can map the problem space. Examining five dimensions with specific examples is an excellent start.

When pushed for solutions, ChatGPT is even more thorough.

From here, we flipped the process on its head and asked ChatGPT to grade itself! We provided the grading criteria and asked how it would improve its answer.

Finally, ChatGPT reflected on how its answers could have been improved.

OUTCOME: The consulting case interview is not the most technically demanding task in the world, but it is an effective gauge of business acumen and generalizable problem-solving ability. If you concur, as we do, that ChatGPT’s performance here earned at least a B, it’s time to ask what the implications are for your business, including talent, products, and problem-solving processes.

GO DEEPER: Here are some resources to learn more about ideas in this article:

Peng, A., Nushi, B., Kiciman, E., Inkpen, K., & Kamar, E. (2022, June). Investigations of performance and bias in human-AI teamwork in hiring. In Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 36, No. 11, pp. 12089-12097).
Langenkamp, M., Costa, A., & Cheung, C. (2020). Hiring fairly in the age of algorithms. arXiv preprint arXiv:2004.07132.
van Dis, Eva AM, et al. "ChatGPT: five priorities for research." Nature 614.7947 (2023): 224-226.
Aydoğan, R., Baarslag, T., & Gerding, E. (2021). Artificial intelligence techniques for conflict resolution. Group Decision and Negotiation, 30(4), 879-883.

Stay up-to-date with our work:

Abhishek Gupta | Responsible AI | Augmented Collective Intelligence

Doing My Job: Artificial Teammates, Experiment 4

Collective Intelligence: Foundations + Radical Ideas - Day 2 at SFI

Hire Stakes: Artificial Teammates, Experiment 3

Stay up-to-date with our work:

Abhishek Gupta | Responsible AI | Augmented Collective Intelligence