首页
Preview

ChatGPT能够编写一个合适的用户可用性测试脚本吗?

进行可用性测试是用户体验研究员的核心工作。

这是用户中心产品开发过程中的关键部分,有助于识别和解决用户可能遇到的任何问题或挑战。但是,准备进行可用性测试可能是一个耗时和资源密集型的过程,需要开发筛选器、任务、后续问题和其他材料。

一种可能有助于减轻负担的工具是生成式人工智能,一种可以自动生成各种内容的人工智能,例如艺术作品电影音乐。最近,OpenAI发布了ChatGPT,这是一种具有聊天机器人界面的生成式人工智能,可以生成代码剧本俳句等,以及许多其他事情

但是,它能否编写可用性测试的主持人指南呢?

使用ChatGPT编写测试计划

为了找出答案,我问ChatGPT:“为可用性测试主持人编写一个概述。该产品是一个在线经纪平台,用于交易股票和投资。任务是针对考虑使用该服务的潜在客户和管理其账户的现有客户。”

这是它返回的内容:

1. Introduction

- Purpose of the usability test: To evaluate the usability and user experience of the online brokerage platform for trading stocks and investing.
- Overview of the moderator's role: The moderator will introduce the tasks, ask questions, and observe and record the participants' interactions with the platform.
- Information about the test environment and participants: The test will be conducted remotely, using screen sharing and video conferencing software. The participants will include prospects considering the service and existing customers managing their accounts.

2. Preparation

- Familiarization with the product: The moderator should review the platform's features and functionality, as well as any relevant documentation or user guides.
- Review of the test goals and questions: The moderator should identify the specific goals and questions for each task, and consider how to best observe and measure the participants' performance and satisfaction.
- Creation of a test plan and script: The moderator should create a detailed plan outlining the tasks, questions, and observations for each part of the test. The script should include prompts for introducing the tasks and asking open-ended questions, as well as specific questions to measure performance and satisfaction.
- Selection and setup of equipment: The moderator should ensure that the necessary recording software, cameras, and microphones are set up and working properly.
- Recruitment and scheduling of test participants: The moderator should identify and recruit appropriate test participants, and schedule the test sessions at a convenient time for each participant.

3. Conducting the test

- Welcoming and orienting the participant: The moderator should greet the participant and introduce themselves, explaining the purpose and process of the usability test.
- Introducing the task and goals: The moderator should describe the task and goals, and provide any necessary context or instructions.
- Asking open-ended questions and encouraging participants to think aloud: The moderator should encourage the participant to speak their thoughts and share their reactions as they interact with the platform.
- Observing and taking notes on participant behavior and feedback: The moderator should observe and record the participant's actions and feedback, paying attention to areas of difficulty or confusion.
- Providing assistance as needed, but avoiding influencing the participant's actions: If the participant appears to be struggling with a task, the moderator can offer help or clarification, but should avoid giving direct guidance or influencing the participant's actions.
- Following the script and asking the prepared questions: The moderator should follow the script and ask the prepared questions to gather additional data and insights.

4. Debrief and wrap-up

- Thanking the participant for their time and feedback: The moderator should thank the participant for their participation and feedback.
- Gathering any final thoughts or comments: The moderator should ask the participant if they have any final thoughts or comments about their experience.
- Stopping the recording and completing any necessary post-test tasks: The moderator should stop the recording and complete any necessary post-test tasks, such as transcribing notes or uploading recordings.
- Analyzing the data and sharing the results with relevant team members: The moderator should analyze the data and share the results with relevant team members, including recommendations for improvements or changes based on the findings.

当然,任何真正的可用性测试都应在与投入利益相关者的密切合作下定义,以确保收集到的结果对其目标有用。但是,由于我以前在金融领域担任过用户体验研究员,因此我至少可以将输出与我以前在类似研究中的经验进行比较。

而且,至少到目前为止……不错!

我从未创建过像这样的文件——这些材料中的许多通常最终都会存在不同的地方——但是人们可以想象一种情况,在这种情况下,将与研究有关的所有内容都保留在一个地方可能是适当的。

概述:A-

当AI编写招募计划

接下来,我询问了样本大小的指导意见:“我们应该招募多少参与者?”

The number of participants to recruit for a usability study will depend on a variety of factors, including the goals and objectives of the study, the complexity and scope of the product or service being tested, and the resources available for conducting the study.

In general, usability studies typically aim to gather data from a relatively small number of participants, as the goal is to identify and understand common patterns and issues rather than to estimate population-level statistics. For this reason, usability studies are often designed as formative or exploratory studies rather than summative or confirmatory studies.

As a rough guide, usability studies may aim to recruit between 5 and 15 participants, depending on the complexity and scope of the product or service being tested. This is often referred to as the "rule of five," as it is generally believed that recruiting at least 5 participants can help identify the majority of usability issues and provide a sufficient sample size for analysis.

However, it is important to note that the number of participants may vary depending on the specific goals and objectives of the study, as well as the resources available for conducting the study. It may be helpful to consult with usability experts or review relevant literature to determine the most appropriate sample size for a particular study.

所以,就像真正的用户体验研究员一样,ChatGPT说:“这取决于。”

它对细节有点含糊,形成性和总结性工作之间的区别在这里有点不相关。但是它确实给出了5到15名参与者作为经验法则,这符合小型定性研究的行业最佳实践。因此,这个答案只是通过了。

样本大小:C-

接下来,我问道:“你能起草一个简短的筛选器,让潜在参与者参与,以确保我们的研究对象是正确的吗?”

1. Do you currently have an online brokerage account?- Yes
- No2. If you answered "Yes" to the previous question, which of the following firms do you have an account with? (Select all that apply)- Vanguard
- Charles Schwab
- Fidelity
- E*TRADE
- Other (please specify)3. Have you used an online brokerage platform in the past?- Yes
- No4. How frequently do you trade stocks or invest online?- Daily
- Weekly
- Monthly
- Rarely
- Never5. Do you have any experience with investing or trading stocks in general?- Yes, I have a lot of experience
- Yes, I have some experience
- No, I am new to investing6. Are you currently considering opening an online brokerage account?- Yes
- No

这是一个开始。

最佳实践是避免直接询问与研究性质相关的问题,以便受访者不会通过不诚实地回答来通过。因此,最好要求受访者选择他们拥有的所有账户类型之一,其中包括经纪账户等多种选项。

还有其他的琐事。问题3和6本身并不糟糕,但如果它们是有条件的,并且靠近问题1,那么它们就会更有意义。问题4的回答选项可能会使参与者难以知道如何回答。

但最糟糕的是,最终的问题清单需要我的多次提示,而其他输出都没有。第一个版本没有包括参与者拥有账户的具体公司——这是区分潜在客户和客户的重要细节。第二个版本使用占位符名称(例如“公司A”,“公司B”……)而不是实际公司。最后,我要求它列出前四家资产管理公司。因此,在没有人类帮助的情况下,ChatGPT失败了这个测试。

筛选器:F

草拟测试的主持人指南

在进入测试本身之前,我跳过了研究人员在研究开始时给出的典型介绍性讲话,因为其中许多要点(例如欢迎和定位参与者,提醒他们大声思考)已经在概述中了。

因此,我要求ChatGPT“起草5个开放性问题以问参与者。这些问题应该是破冰者,或引发研究主题的更广泛背景。”

这是返回的内容:

1. Can you tell me a little bit about your experience with online brokerage platforms or investing in general?
2. What attracted you to this platform or service?
3. What do you hope to achieve with this platform or service?
4. Are there any specific features or functionality that you are particularly interested in or looking for in an online brokerage platform?
5. Is there anything else you'd like to share about your expectations or motivations for participating in this study?

虽然问题5有点奇怪,但确保参与者在继续之前没有任何疑问是很好的。而且我从未指定我们要测试哪个平台,因此我可以原谅尴尬的“平台或服务”重复。

冰山一角:B

接着我要求ChatGPT“准备一个包含12个潜在任务的列表,其中6个针对潜在客户,6个针对现有客户。”

Prospects:

1. Searching for and comparing different stock options
2. Setting up a new account and completing the registration process
3. Adding funds to the account and making a first trade
4. Viewing account activity and portfolio performance
5. Using the platform's research and analysis tools
6. Contacting customer support for assistance

Current customers:

1. Viewing account activity and portfolio performance
2. Making a trade
3. Viewing and managing alerts and notifications
4. Accessing and using research and analysis tools
5. Updating account settings and preferences
6. Contacting customer support for assistance or questions

现有客户的任务非常合理。但是除了导航流程以外,潜在客户的任务与现有客户的任务几乎相同。

而且,这些任务可能不可行,具体取决于研究中使用的刺激。例如,注册经纪人账户可能需要花费数小时或数天等待法规要求的手动审核。此外,注册需要收集大量敏感个人信息,这可能不适合在此类研究中收集。因此,这些任务只能在原型中完成。

任务:C-

尽管如此,我还是要求:“你能否起草情景提示,用于12个任务的参与者?”以下是其中的前三个,因为所有提示都有类似的质量:

1. Searching for and comparing different stock options: "Imagine that you are a beginner investor and are looking to start building a portfolio. You are interested in a range of stocks, but you want to compare different options and make an informed decision. Using the platform, please search for and compare different stock options, and consider factors such as price, performance, and risk."

2. Setting up a new account and completing the registration process: "Imagine that you are new to this platform and are interested in setting up a new account. Using the platform, please complete the registration process and set up a new account, including any necessary steps such as verifying your identity and adding personal or financial information."

3. Adding funds to the account and making a first trade: "Imagine that you have just set up a new account on this platform and are ready to start investing. Using the platform, please add funds to your account and make your first trade, considering factors such as the amount of funds you want to add, the type of asset you want to trade, and any associated fees or risks."

对于开放性任务,这些都是可以的。

但是,如果我们想要收集参与者在任务成功率方面的可比数据,我们要确保提供足够具体的说明,以便参与者知道他们是否已经完成任务。而且,进一步限制情境可能会有用,以使参与者在所花费的时间和所探索的产品深度方面不会差异巨大。

测试情景:B

然后,我要求它“创建一组2到5个后续问题,以在任务之后询问参与者。包括一些开放性问题和评分尺度。”

Open-ended questions:

- How did you feel about the task overall?
- What did you find most helpful or useful about the platform during this task?
- Were there any areas of the platform that you found confusing or frustrating?
- Is there anything else you'd like to share about your experience with the task?

Rating scales:

- On a scale of 1 to 5, how satisfied were you with the task? (1=Very Unsatisfied, 5=Very Satisfied)
- On a scale of 1 to 5, how easy or difficult was the task? (1=Very Difficult, 5=Very Easy)
- On a scale of 1 to 5, how likely are you to use this feature in the future? (1=Very Unlikely, 5=Very Likely)

第二个问题有点引导性:它假定参与者会在平台上找到有用或有益的东西,这可能并不是事实。但其他问题都是比较中性和开放式的,并且与我使用的问题类似。

我以前在用户体验测试中使用过各种变体的评分尺度问题,其中感知容易度是我经验中最有价值的。满意度评级通常非常相似,因此很少提供额外信息。而要求参与者推测未来使用情况是不可靠的,但是对他们为什么给出评分的跟进可能会产生有用的见解。

任务后续问题:C+

为了总结,我要求ChatGPT“以4或5个问题结束研究。这应包括一些开放性问题和一些评分尺度,旨在评估整体体验。”

Open-ended questions:

- What did you like most about the platform or service?
- Were there any features or functionality that you found particularly useful or beneficial?
- What could be improved about the platform or service in your opinion?
- Is there anything else you'd like to share about your experience with the platform or service?
- Would you recommend this platform or service to a friend or colleague? Why or why not?

Rating scales:

- On a scale of 1 to 5, how satisfied were you with the platform or service overall? (1=Very Unsatisfied, 5=Very Satisfied)
- On a scale of 1 to 5, how likely are you to continue using this platform or service? (1=Very Unlikely, 5=Very Likely)

再次,问题1假定参与者喜欢平台上的任何内容,但是这可以通过重新表述为“如果有的话…”来使其更加中性。

我从未在没有评分尺度的情况下询问过NPS样式的问题,但是发现跟进问题通常提供最有用的数据。总体CSAT是研究结束时常见的指标,但其他评分尺度再次要求参与者预测未来。除此之外,我在研究中使用了其他开放性问题的版本。

总体研究问题:B

作为最后的测试,我问ChatGPT:“有没有适合包含的标准经过验证的量表在这种用户体验研究中?”

Some examples of standard validated scales that may be appropriate to include in a usability study include:

- The System Usability Scale (SUS): This is a widely used scale that measures overall usability and user satisfaction. It consists of 10 items, each rated on a 5-point scale, with higher scores indicating higher usability and satisfaction.
- The Task-Specific Satisfaction (TSS) scale: This scale measures satisfaction with specific tasks or features within a product or service. It consists of 5 items, each rated on a 5-point scale, with higher scores indicating higher satisfaction.
- The Single Ease Question (SEQ): This is a single item scale that measures overall ease of use and satisfaction. It consists of a single question, rated on a 5-point scale, with higher scores indicating higher ease of use and satisfaction.
- The User Experience Questionnaire (UEQ): This scale measures various aspects of the user experience, including usability, aesthetics, and emotional engagement. It consists of 16 items, each rated on a 7-point scale, with higher scores indicating higher user experience.

有趣!SUSSEQ都是常见和适当的度量标准,但我从未听说过其他两个。我对了解UEQ很感兴趣,但据我所知,ChatGPT是凭空捏造了TSS

UX量表:F(不诚实)

结论

生成式AI正在迅速打破艺术创作、代码、博客和学生论文的制作方式。

但是它准备好接手UX研究了吗?我要求ChatGPT制作一个用户体验测试脚本来找出答案。

  • **结果令人印象深刻。**至少,ChatGPT生成了可用的筛选器、任务情境、开放性问题、后续问题、评分尺度和指标的初稿。在提示下,它进行了调整和更正,例如将特定公司添加到筛选器中。
  • 但它们仍然不完美,有时偏离预期。它建议了一个不存在的UX度量标准,措辞有些别扭或过于直接,而且并不总是以最自然或最连贯的方式排序问题。它还提出了可能需要原型才能完成的任务。
  • ChatGPT从未要求我澄清任何人类研究人员会澄清的问题:研究的目标是什么?UX研究人员将研究设计和执行的每个决定考虑在内,以了解利益相关者将如何使用结果。

也许在未来,生成式AI工具将变得足够复杂,以开发完美无缺的测试计划,这将使研究人员有更多时间专注于其角色的其他部分。但是就目前而言,任何它们生成的内容都需要仔细审查。``` 尽管如此,所有这里展示的材料都是在不到10分钟内制作的。这种速度和效率可能使ChatGPT成为创意和初稿的有用工具,只要人类研究者有足够的经验来批判性地评价输出。

目前,想到生成式AI作为帮助研究人员更高效地完成工作的工具,这是令人兴奋的。但我不认为ChatGPT会很快夺走我们的工作。

译自:https://uxdesign.cc/can-chatgpt-write-a-proper-usability-test-script-f8f6f89b5a2e

版权声明:本文内容由TeHub注册用户自发贡献,版权归原作者所有,TeHub社区不拥有其著作权,亦不承担相应法律责任。 如果您发现本社区中有涉嫌抄袭的内容,填写侵权投诉表单进行举报,一经查实,本社区将立刻删除涉嫌侵权内容。

点赞(0)
收藏(0)
alivne
复杂的问题简单化

评论(0)

添加评论