Library and Learning Services: Generative AI Literacy: Evaluating AI Output

Why Evaluate GenAI Output?

Simple answer: GenAI tools do not always generate credible, reliable information. Your academic integrity, and your reputation as an employee, can be negatively affected by using GenAI output without careful evaluation.

The same principles for evaluating information sources apply to generative AI. Tests such as the CRAP/CRAAP and SIFT tests can be helpful in determining if the information you’ve found is reliable.

However, some of the questions we typically ask ourselves about sources may be more difficult to answer when consulting generative AI, because the process it takes to arrive at answers is not public.

Consider the following concerns as good reasons to carefully evaluate GenAI output:

GenAI is reliant on training data from large datasets including the web

Inaccurate training data can lead to strange and inaccurate results.
Look for other reliable sources to corroborate the AI’s claims.
Try to find alternative sources that cover the same topic, or even the original context that a claim came from (these are principles F and T of the SIFT test).

GenAI is prone to hallucinations

You can ask a generative AI tool to cite its sources, but it is known to create very convincing fake citations.
It can even create citations that have the names of real researchers who study the topic you've asked about.
However, the article named in the citation might not exist or may not be from the journal it cites. These invented citations are referred to as “hallucinations.”
You’ll need to search to confirm these articles actually exist.
See the Doing Research guide, or check in with Research help at the library for tips on how to do this!

GenAI Training Data is not always current

Currency (when a document was created, edited, updated, or revised) is an important factor in evaluating any information source.
If you need recent information on a world event or a new development in research, generative AI may not have that information in its dataset. As of October 2023, if you ask ChatGPT (GPT-3.5) how recent the data it’s trained on is, it will tell you that its information comes from 2021 and it does not have the ability to pull current information from the internet.

Academic Integrity and Personal Integrity

As a student, you're expected to use credible, reliable sources to enhance your work
As an employee, you're depended on to provide credible, reliable sources when asked
Using content generated by a GenAI system without careful evaluation can have a negative effect on integrity

The curse of recursion: AI Training on AI Generated Content

As more text is published that has been created by generative AI, eventually this AI-generated content will enter the training datasets for new generations of AI. This may lead to a decrease in the quality of the data, as errors in early generations of AI may compound themselves over time.

This idea was proposed and tested by Shumailov et al. (2023) in their paper “The Curse of Recursion: Training on Generated Data Makes Models Forget.” They found that the inclusion of AI-generated content in training datasets led to what they call model collapse - "a degenerative process whereby, over time, models forget the true underlying data distribution, even in the absence of a shift in the distribution over time" (p. 2).

We're already seeing authors and creators using AI in various formats across many sites, including video and text-based content. As this becomes more wide spread, we'll need to continue to carefully evaluate content to ensure it's authenticity and accuracy.

Further reading:

Shumailov, I., Shumaylov, Z., Zhao, Y., Gal, Y., Papernot, N., & Anderson, R. (2023). The Curse of Recursion: Training on Generated Data Makes Models Forget. ArXiv, https://doi.org/10.48550/arXiv.2305.17493

The CRAP Test is a set of criteria YOU can use to determine a website's reliability, accuracy, and legitimacy. Read through the information here, and watch the attached video for a fun look at evaluating websites.

Problems viewing? Watch Evaluating Websites by Seneca (2:03 min) College Libraries on YouTube

Currency

Is the date of publication appropriate?
Is all of the content up-to-date?
Has the information been revised?

Relevance

Does the information relate to your topic or answer your question?
Who is the intended audience?
Have you looked at a variety of resources to determine this is the one that is best suited for your assignment?

Authority & Accuracy

Authority

Who is the author/publisher/editor/sponsor?
What are the author's credentials or affiliations?
Is the author qualified to write on the topic?
Is there contact information, such as publisher or email address?
Does the url reveal anything about the source? Example .edu, .gov, .org etc.?

Accuracy

Where does the information come from and is it backed by evidence?
Has the information been reviewed or refereed?
Does the language or tone seem unbiased?
Are there spelling or grammar mistakes?

Purpose / Point of View

What is the purpose of the information?
Is it to teach, to inform, to sell, or to entertain?
Do the authors/sponsors make their purposes clear?
Is the information fact or opinion?
Are there any biases?

Source: Adapted from "Evaluating Information: The CRAAP Test" by McMaster University Health Sciences Library, CC BY-NC-SA. / A derivative of the CRAAP test developed by Meriam Library, California State University, Chico, CC BY 4.0.

SIFT: a four-step technique developed by Mike Caulfield from the University of Washington, used to assess information and appropriate for anyone who engages in information-seeking behaviour.

Problems viewing? Watch SIFT The Four Moves: Evaluating Generative AI Content (4:38 min) by The Learning Portal / Le Portail d’Apprentissage on YouTube

Stop

Pause to consider the credibility of the content generated by the Artificial Intelligence before you copy or share this information.

Ask Yourself: Does this output contain relevant information for my assignment? What kind of sources were retrieved by the text generator? Is the information generated based on fact, opinion, other? Does the information contain diverse perspectives and viewpoints? How current is the information cited?

Investigate the Source

Artificial Intelligence is known to generate fictitious information. Ensure that you fact-check, or double-check the content before using it.

Ask Yourself: Do you recognize any of the authors, publishers, or websites? Are there citations? If yes, can you find the cited sources in library databases? Can you verify the accuracy of the information? Can you identify bias in the information generated by the AI, or sources represented in the text?

Find Better Coverage

Locate the best evidence on your topic by searching for additional sources on library databases and search engines.

Ask Yourself: Is this the best information available on my topic? Can I locate sources that are more current than those generated by AI? Can I search for similar ideas or related topics in other peer-reviewed journals?

Trace Claims, Quotes, and Media to the Original Context

Instead of relying on Artificial Intelligence to provide you with the full picture, track down the source and then review the content to determine if the information is suitable for your assignment.

Ask Yourself: Can I access the books, articles, websites, and other sources cited by the AI? Is the data, facts, and details generated for me accurately represented?

Source: The SIFT Method was reused from The Learning Portal, licensed under CC BY-NC 4.0. / A derivative of SIFT (The Four Moves) by Mike Caulfield, CC BY 4.0.
Attribution from source: This content was adapted, with permission from "Evaluating GenAI Content - opens in a new tab" by Seneca Polytechnic Libraries. Along with information from "Evaluate AI Generated Content- opens in a new tab" by Sheridan Library & Learning Services and is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

Attribution

Evaluating AI Output is adapted from "Evaluating Information Sources" and "Generative AI and ChatGPT" by The University of British Columbia Library, licensed under CC BY-NC 4.0. / Adapted to Georgian context. Updates to headings and remixed info to focus on evaluating content. RADAR test replaced by CRAP/CRAAP test (openly licensed).

The SIFT Method was reused from The Learning Portal, licensed under CC BY-NC 4.0.
Attribution from source:, with permission from "Evaluating GenAI Content - opens in a new tab" by Seneca Polytechnic Libraries. Along with information from "Evaluate AI Generated Content- opens in a new tab" by Sheridan Library & Learning Services and is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.