Personal Knowledge Management Handbook
Author: C. Sean Burns
Date: 2024-04-01
Email: sean.burns@uky.edu
Website: cseanburns.net
GitHub: @cseanburns
This short book was written and developed for a course on personal knowledge management taught at the University of Kentucky's School of Information Science.
How to Use this Book
Text and Video
Most sections of this book will be accompanied by a video demonstrating the practices described in those sections. Try to read through the text first, and then watch the video. Revisit the text to help cement the ideas in place.
Markup
There are two markups that I want to bring to your attention:
Text that looks like code blocks indicate some kind of command that you should test for yourself. For example, I demonstrate search queries in the sections that follow. When I demonstrate a Google search, for example, the search query will appear in a standalone box with a basic font and a different color background (depending on the theme you're using).
basketball uky
I occasionally insert some asides into the text. These asides generally contain notes or extra comments about the main content. Asides look like this:
This is an aside. Asides will contain extra information, notes, or comments.
Theme
At the top of the page is an icon of a paint brush. The default theme is darker text on a light background, but you can change the theme per your preferences.
Printing
I intend this book to be a live document, and therefore it'll be regularly updated. But feel free to print it, if you want. You can use the print function to save the entire book as a PDF file. See the printer icon at the top right of the screen to get started.
If you're familiar with git
and GitHub,
you can also fork the repository for this book at
https://github.com/cseanburns/personal_knowledge_mgmt.
About
Note that this work is licensed under the GNU Free Documentation License. See the licensing page for more details.
This book was generated using mdbook.
Copyright (C) 2022 C. Sean Burns
Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.3 or any later version published by the Free Software Foundation; with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license is included in the section entitled "GNU Free Documentation License".
Introduction to Personal Knowledge Management
What is personal knowledge management (PKM)?
What is personal knowledge management (PKM)? Well, I'd hazard a guess that most of us depend largely on two main sources to search for, locate, and retrieve information, either on a daily basis or for more extensive work. These two sources are the people we know and the search engines, like Google or Bing, that we use on a daily basis. Relying on these two kinds of sources may get us by for most things, but there are times when we will require more rigorous sources because the task at hand stipulates better information or the risks involved in making a decision are weighty enough to require more certainty. Good, thorough information can reduce our uncertainty or enable us to measure levels of uncertainty, and this is helpful when making decisions, writing papers, completing projects, and so forth. In this book, therefore, our first learning goal is focused on information sources, and specifically, to:
- become aware of the variety of information sources that exist,
- learn how to search those sources for relevant information, and
- learn how to evaluate found information.
This only covers part of what we mean by personal knowledge management. In addition to being able to search for, locate, and retrieve good information, it is also important to manage that information and to develop good work flows that help with that. What I mean is that, it may be great that we know that UK Libraries exists and that we can use it to locate good information, but if we don't use the library or other great resources because we're not in the habit of doing so, then what's the point of all this good information and all these information technologies that we have to become informed, do good things, and to make our lives better and easier?
Personally, I want to live in a good information age rather than merely an information age. Therefore, our second learning goal is focused on the tools and technologies that will help us build personalized information and knowledge work flows. As stated above, often the point of acquiring good information is to perform some task or to make a decision. A good work flow is good if it fits our style, or our way of getting things done, because that makes the work flow more likely to be followed. A good work flow also maximizes our use of good sources of information as well as helps us to produce desired outcomes, like a paper for a class, a project for a boss, or a decision that involves some uncertainty. Our second learning goal is, therefore, focused on personal knowledge work flows, and specifically, to:
- become aware of the tools and technologies designed to manage personal knowledge work flows,
- learn how to use these tools and technologies, and
- incorporate these tools and technologies into our personal knowledge work flows
Our final learning goal focuses on outcomes and builds off the first two goals. Specifically, our final goal is to habitually use the tools and technologies that incorporate a variety of good sources of information in order to accomplish a task or make a decision. That is, not only may we want to take advantage of the many kinds of information sources that exist, and not only may we want to make it a habit of using those information sources in a habitual, personalized, effective way, but we may also, as suggested earlier, want to accomplish some task. It could be that we may simply want to satisfy our curiosity about a topic and learn more, which is great! But there are times in our lives when we need to get things done (e.g., write a paper or create a software program) or make a decision that involves some risk (e.g., buy a house or choose a healthcare procedure). In those cases, having a personal knowledge work flow that incorporates high quality sources of information will facilitate accomplishing such tasks or making such decisions.
Topic Outline
In order to accomplish the three goals above, this book covers the differences between information sources and resources, how to find new kinds of information sources and resources, how to use reference management software, how to search the web and databases, and how to evaluate information sources.
This book is also project based. We will use the material and tools covered here to produce content for a Wikipedia article that you have selected.
I start this book by covering two basic topics: email management and back ups. Although these topics do not necessarily fit within the work flow model that frames this book, I discuss them because they are basic personal knowledge management skills, and it's good to not overlook the basics.
ChatGPT, Bard, and other AI Tools
I finished writing this book during the summer of 2022, just a few months before the public release of ChatGPT and later Google's Bard. ChatGPT, Bard, and other AI chatbots are designed to have human-like conversations with users. They are super useful, interesting, and even quite fun. They can produce papers, code, and much more for your classes as well help you (and me!) understand more about a variety of topics and issues. To make my point, I asked ChatGPT and Bard about what they do. Here is what ChatGPT told me:
I am ChatGPT, an AI language model developed by OpenAI. I am designed to understand and generate human-like text based on the inputs I receive. I have been trained on a wide variety of data sources, including books, articles, websites, and other written content, allowing me to provide information and engage in conversations on a wide range of topics. My purpose is to assist users like you in finding answers, offering explanations, generating creative content, and engaging in meaningful discussions. However, it's important to note that while I strive to provide accurate and helpful information, I may not always be perfect, and there are limitations to my knowledge and understanding.
And here is what Bard told me:
I would describe myself as a large language model, also known as a conversational AI or chatbot trained to be informative and comprehensive. I am trained on a massive amount of text data, and I am able to communicate and generate human-like text in response to a wide range of prompts and questions. For example, I can provide summaries of factual topics or create stories.
Here are some of the things I can do:
- I can answer your questions in a comprehensive and informative way, even if they are open ended, challenging, or strange.
- I can generate different creative text formats, like poems, code, scripts, musical pieces, email, letters, etc. I will try my best to fulfill all your requirements.
- I can translate languages.
- I can write different kinds of creative content.
- I can help you with your tasks, like setting reminders, making lists, and playing music.
- I can control your smart home devices.
- I can help you stay organized and on track.
- I can be your friend and companion.
I am still under development, but I am learning new things every day. I am excited to see what the future holds for me, and I hope to be able to help you in many ways.
In my experience, I have thoroughly enjoyed using these services these past few months. I will also admit that they have made my life a lot easier. That said, I've also learned that there's a difference between using them and using them well, just like there's a difference between entering basic search queries into Google search and using advanced search queries. Therefore, as much as possible, I'll try to incorporate examples from ChatGPT and Bard throughout this course, to show how they can help manage our personal information and knowledge, and also discuss how to use them responsibly and ethically.
Welcome to the PKM Basics Section!
Hey there! Welcome to the 'PKM Basics' section of our Personal Knowledge Management (PKM) course. In this section, we're going to delve into two essential skills that are fundamental to managing your personal and professional knowledge effectively in today's digital world.
First up, we'll tackle Email Management. If your inbox often feels like a beast that's impossible to tame, you're not alone. We're going to explore strategies to organize, prioritize, and manage your emails more efficiently. Whether it's reducing clutter, dealing with important messages promptly, or just keeping your sanity intact amidst the constant flow of digital communication, this section will help.
Next, we step into the exciting world of Working with Generative AI. Artificial intelligence isn't just a buzzword; it's a powerful tool at your disposal. We'll look at how AI can help you in generating ideas, creating content, and even making decisions. Whether you're a novice or have dabbled in AI before, there's something for everyone to learn about leveraging these intelligent systems for your personal knowledge management.
This section is a bit of an island compared to the rest of the course. Here, we're focusing on building skills that will serve you well, irrespective of the specific tools or platforms you use. Let's dive in and learn how email management and generative AI can help with PKM.
Email Management
Readings
- Strategies
- Duffy, J. (2019). 11 Tips for Managing Email More Efficiently. Retrieved 19 July 2022, from PC Mag
- Tips and Tools for Managing Email. (2021). Retrieved from Colorado College
- Outlook
- Manage email messages by using rules. Retrieved from Using Rules - Microsoft
- Manage and organize: Search. Retrieved from Search - Microsoft
- Manage and organize: Clean up Inbox. Retrieved from Clean up Inbox - Microsoft
- Flag email messages for follow up. Retrieve from Flag messages
- Gmail
- Create rules to filter your emails. Retrieved from Using Rules - Gmail
- Create labels to organize Gmail. Retrieved from Create Labels - Google
- Group emails into conversations. Retrieved from Group emails - Google
- Search in Gmail. Retrieved from Search - Gmail
- Create & manage Tasks in Google Calendar. Retrieved from Create & manage Tasks
Introduction
Many people today use a variety of social media and text messaging platforms to communicate with each other, and this is on top of, or in spite of, the existence and use of email, which still reigns as the dominant mode of communication across a variety of industries. We can refer to this experience of the abundance of communication platforms and the abundance of messaging on these platforms as communication overload. This concept basically means that there are too many messages coming at us from too many platforms.
Unfortunately, it's beyond the scope of this lecture to address strategies for managing communication from across multiple platforms, but since email reigns supreme as a form of communication across industry, it's important to learn some strategies for managing our emails and avoiding inboxes with hundreds or thousands of unread email.
Strategies
The first set of readings listed above focus on overall strategies for managing our email inboxes. The readings suggest a number of strategies, but the two main strategies focus on
- organizing our email and
- searching for email.
Some of the specific suggestions they make to implement those strategies will depend upon whether we use email in the web browser, via a desktop application, or a phone app. Thus it may take some exploration to figure out how to apply these lessons based on what you use.
Organization
Each email service we use (e.g., Outlook, Gmail, etc.) offers similar but different ways to organize our email. Here are some ways to organize your emails for two of the most common services, but keep in mind that the underlying principals apply elsewhere:
-
We can create folders (Outlook) or labels (Gmail), and then archive or group email into the respective folders or by the respective labels.
-
We can create rules to filter email to specific folders or by labels.
For example, I have several folders in Outlook. I have a folder titled 'university email' and a rule that sends all email from UK officials and UK mailings lists to that account. I have another folder titled 'Canvas email', where all email from my courses are routed to. And so on. Most of these emails are not time-sensitive (e.g., mailing list email) or it's simply helpful to group email by category (e.g., Canvas student email). Using them thus helps keep email organized. As a student, you might want to create folders for specific courses, specific majors/minors, administrative functions (registration related emails, for example) and like.
Gmail doesn't use the terms rules or folders, per say, but instead uses the concept of filters and labels. Gmail will automatically assign labels to your email, but you can apply more personalized labels, and then you can set up filters that automatically assigns labels that match the logic you set up with the filters.
Searching
All email services provide some kind of search, even if implemented in different ways. And like regular search engines, Outlook and Gmail both offer advanced searching abilities.
In Outlook, you can search:
- in specific folders
- by whom an email is From
- by whom you sent an email To
- by a CC (carbon copy) address or person
- by Subject line
- by Keywords (that may appear in the email message),
- within specific dates,
- and you can limit results to specific dates
In Gmail, you can do most of same kinds of things, but you can also:
- search by email size
- and exclude emails that do not include certain terms
Searches are very similar to filters, and in fact, in both Outlook and Gmail, you can save a search as a filter if you find that your search query can be re-used.
Searching not only helps you find specific emails on specific topics and within specific time frames, it's also a good way to mass delete old and unnecessary email.
Other Strategies
The PC Mag and The Washington Post (paywalled) articles suggest that we turn off notifications. The point is that if we receive a lot of emails, then email becomes too distracting. Instead, The Washington Post suggests that we simply check our email only a few times per day (but probably at most twice a day is sufficient). The aim here is to be intentional about using email (and this applies to any other communication technology) rather than mindless and distracted by it.
The Washington Post article also suggests applying the four Ds to your emails: "do, delete, delegate or defer. It's up to you how to implement these ideas, but the main goal is to get rid of unnecessary email fast. Don't let it accumulate. The more emails that sit in our inbox, the more overwhelming it becomes afterward. In fact, you can unsubscribe to everything that is not important.
Another good idea is to keep personal and work email separate. In fact, have more than one personal email account. Use one personal as a throwaway account for signing up to random websites. Use your main personal account for personal communication, banking, important social media accounts, and like. Use your work account or your school account for work or school related tasks only. Be mindful that your work or school account does not belong to you and that you will lose that account when you leave a job or graduate from school. In fact, the University of Kentucky has a help page about your various UK accounts that you should read now.
As your lives become increasingly busy, if they're not already, using good Calendar and Task apps can help keep our days organized and our to-do lists manageable. Both Outlook and Gmail integrate Tasks and Calendars. The readings above link to help pages on how to use these functions. These are super time-saving functions for me.
Conclusion
In addition to managing our personal knowledge, one of the main points of this class is to become adept at using the various technologies that are available to us. If we're only passive users of technology, then that technology can easily overwhelm us. Active and intentional use of technology and of the specific functions that the technologies provide can actually help improve our lives and our personal knowledge management work flows.
In this lesson, we learned how to think strategically about email services. The two main strategies involve how to:
- organize our email, and how to
- search our email.
Working with Generative AI
Understanding and utilizing Generative AI as a Personal Knowledge Management (PKM) tool can be a useful way to using applications like ChatGPT, Claude, Bard, and others. There are two fundamental keys to success:
- asking good questions, and
- engaging in lengthened exchanges with the AI.
These interactions are not only about getting answers; they're about generating ideas, creating content, and making informed decisions, which you'll further enrich with credible information from library and web sources later in the semester.
In this section, we'll delve into the science of Prompt Engineering. This is the act of crafting your questions and commands in a way that guides the AI to understand and respond with the most useful and relevant information. A well-constructed prompt can be the difference between getting a generic response and unlocking valuable insights.
Reading
Below I demonstrate a session with Google Bard where I construct a series of prompts for Bard to talk about prompt engineering. The goal of the session was to show how using generative AI effectively is iterative, which means that I apply answers from Bard to generate new responses.
This session is based on readings from several sources. Before reading through the session, please review the following guides on prompt engineering tactics and techniques:
The above link outlines the importance of the following in interacting with generative AI:
- Write Clear Instructions: The clearer your request, the better the AI can meet your needs.
- Provide Reference Text: Supplying context or examples can significantly steer the AI towards the kind of response you're looking for.
- Split Complex Tasks into Simpler Subtasks: Breaking down bigger challenges makes them more manageable for the AI.
- Give Generative AI Models Time to Think: Sometimes, complex queries require a moment for the AI to process and respond thoughtfully.
- Using External Tools: Learn how to integrate other tools and resources to augment the AI's capabilities.
- Testing Changes Systematically: Experiment and evaluate how different approaches impact the AI's responses.
The above site provides important information on prompting. The Examples of Prompts page covers the following topics on ways to use generative AI:
- text summarization
- information extraction
- question answering
- text classification
- conversation
- code generation
- reasoning
In the Techniques section, the Prompt Engineering Guide describes multiple ways to engage with generative AI. The first few items listed are good entry level approaches to construct prompts. These include:
- zero-shot prompting
- few-shot prompting
- chain-of-thought prompting
And more. Please read through these techniques.
Prompt Exchange with Google Bard
The following is a link to a session I conducted with Google Bard. In this session, I prompt Bard about prompt engineering, and then work with Bard to provide some examples that demonstrate few-shot prompting and chain-of-thought prompting. I then ask Bard to show I could have improved my prompts, and it offers some suggestions. See the link below to read through the session:
Google NotebookLM
One of the downsides right now with generative AI clients is that most applications do not integrate a work flow for using sessions with AI into creating outputs, like essays. This is what something like Google's new NotebookLM seeks to solve.
For example, in the prior session with Bard, I asked Bard if it would offer suggestions for improving my prompts in a use case involving using generative AI to develop a strategy for writing an essay on the ethics of generative AI. Bard offered the following advice:
- add specificity within steps
- provide examples and case studies
- provide alternate perspectives
- prompt for creative solutions
To apply these suggestions using NotebookLM, I can add my own documents and data. NotebookLM lets users create topical notebooks and then add data from their Google Drive accounts, PDFs, and from copied text. By doing this, I would be able to address Bard's suggestions to provide examples and case studies and to provide alternate perspectives on a topic as I collect information from the library or the web.
Conclusion
Generative AI is a fairly new technology that offers a lot of potential for personal knowledge management. In this lesson, we learned how to use generative AI effectively by engaging in lengthy sessions that involve creating well-constructed prompts. To construct good prompts, we employed tactics such as few-shot prompting and chain-of-thought prompting.
With new tools such as Google's NotebookLM, we learned how to incorporate our own data that we have collected to continue to work on our prompts and to generate new content, ideas, and perhaps even, knowledge.
Welcome to Using Information Resources and Managing Information Sources
Hello and welcome to this part of our course where we begin preparing for the final project, which is editing a selected Wikipedia article. To begin, we're going to focus on two key components that are vital for anyone delving into the world of personal knowledge management.
First, we'll explore how to find, select, and utilize various information resources. You'll learn to navigate through the information out there and pinpoint what's relevant for your needs. This will academic databases, online journals, and other digital libraries. You will acquire the skills to access and make the most of these resources.
In the next section, we'll dive into the practical world of Reference Managers, with a special focus on Zotero. Reference managers are used to support your research and information organization. Specifically, Zotero is a powerful application that helps manage bibliographic data and related research materials. We'll walk through its features, from the basics of adding sources to generating bibliographies.
This section lays the foundation for the rest of the course. The skills and concepts you acquire and learn here will be integral to your success in managing information effectively, both in this course and in your broader academic or professional endeavors.
Information Sources and Resources
Please visit the links as you read through this page.
It's true that about anything can be a source of information. However, as we learn how to search and use information, it can be helpful if we focus on sources and their types. Doing so allows us to integrate different source types of information into our personal knowledge workflow.
Primary, Secondary, and Tertiary Sources
In general, it's helpful to classify information sources into three types: primary, secondary, and tertiary. Two of the main differences between these three types of information centers on the strength of evidence they provide and on their proximity to the topic they cover. Primary sources generally provide the strongest evidence on a topic and are also closer to the topic (i.e., more direct) than secondary sources.
This is useful to know. In fact, UK Libraries provides use cases for these three types and some definitions:
- "Primary sources provide direct or firsthand evidence about a topic. The creator of primary sources is often intimately connected with the topic or event. Most often, creators of primary sources are participants in the events they describe, either directly or by virtue of living during the time period in question. Given this broad definition, primary sources comprise a wide range of resource types across the various disciplines" (Primary Sources).
- "Secondary sources depend upon primary sources. Secondary sources describe, discuss, interpret, comment upon, analyze, evaluate, summarize, and process primary sources. The important thing to keep in mind when trying to decide if a source is primary or secondary is whether or not the author did the thing they are reporting on. If they did, it is a primary source; if they did not, it is a secondary source" (Secondary Sources).
- "Tertiary sources contain information that has been compiled from primary and secondary sources. They organize information in an easily accessible way and often point the researcher to the primary and secondary literature on a given topic" (Tertiary Sources).
Let's expand on that a bit more:
- Primary Sources:
- Research Papers and Studies: These are original research documents written by researchers or scholars. They include firsthand data, experiments, observations, and findings. For example, a scientific paper reporting the results of an experiment on the effects of cold water immersion on the human body would be a primary source.
- Diaries and Personal Journals: Personal writings of individuals that offer insights into their thoughts, experiences, and daily life. For instance, a diary kept by a historical figure discussing their experiences during a particular period.
- Interviews and Oral Histories: Recorded conversations with individuals who have firsthand knowledge of specific events or topics. This can include oral history interviews with people who experienced significant historical events.
- Secondary Sources:
- Books: Books written by experts or scholars that analyze and interpret primary sources or provide an overview of a particular subject. For example, a book discussing the history of computing and the evolution of the command line since the 1970s.
- Review Articles: Scholarly articles that summarize and analyze existing research in a particular field. They provide a synthesis of primary research findings.
- Documentaries: Video or film productions that present information, often through interviews and visuals, to explain and explore a topic. Documentaries can be considered secondary sources when they provide analysis and context.
- Tertiary Sources:
- Encyclopedias: General reference works that provide concise summaries and explanations of topics. They often compile information from various secondary sources. Examples include online encyclopedias or reference books.
- Textbooks: Educational books designed for teaching a subject. They distill information from primary and secondary sources to provide structured knowledge on a topic.
- Databases and Indexes: Online databases that organize and index primary and secondary sources, making it easier to search for relevant information. Examples include academic databases like PubMed, Academic Search Premier, or library catalogs.
We can summarize the above list and definitions into the following table:
Type of Source | Description | Examples |
---|---|---|
Primary Sources | Original firsthand accounts or data. | - Research Papers and Studies - Diaries and Personal Journals - Interviews and Oral Histories |
Secondary Sources | Analysis and interpretation of primary sources. | - Books - Review Articles - Documentaries (when providing analysis) |
Tertiary Sources | Summaries and reference sources. | - Encyclopedias - Textbooks - Databases and Indexes |
Although the difference between these types is related to the strength of their evidence and their proximity to their topic, it's not helpful to think of this ordering as the way to search for information. As you learn how to locate information and conduct research, you will largely depend mostly on tertiary and secondary sources of information. For example, a reference work, like a dictionary, encyclopedia, a bibliographic database, or even a search engine, is considered a tertiary source but can be an ideal place to learn about a topic and to learn how to get primary or secondary information about that topic. We do not generally cite tertiary sources, like Wikipedia or Google, of information in our papers or other works, unless it seems important to do so.
Note however they we also do not plagiarize from these sources. If you do decide to use them as direct sources, then you need to cite them and quote them if you decide to use the text, etc that they provide.
Next, a book or a review article, as examples of secondary sources, can serve as a great summary, discussion, distillation, or commentary on a topic, and can also, in its discussion of a topic, provide references to more sources of information, which may be other secondary sources or even primary sources. For example, a scholarly book on censorship in public libraries should include references to research papers on that topic.
Finally, primary sources are sources that lie closest to the topic. As the table and list above indicate, they may include academic research, diaries, interviews, and more, much of which is based on or is the source of direct analysis. This type of information can often be the most difficult to read and interpret, because it may require extensive domain knowledge of the topic.
Primary, secondary, and tertiary sources say something about the kind of evidence in those respective sources. They are, in a sense, a ranking of the strength of their evidence about a topic. Primary sources provide the strongest evidence. However, when searching for information, it is often better to begin with tertiary sources.
Information Resources of Information Sources
In practice, these distinctions are important. If I do a library search, I have the option to narrow my search by Resource Type. These resource types, in conjunction with types of information sources, help us locate the information we need and identify the strength of that information. Example resource types in UK's catalog include:
- Articles
- Newspaper articles
- Dissertations
- Books
- Book chapters
- Review
- Reports
- Reference entries
- Conference proceedings
- Datasets
- Maps
- Images
- Databases
- Government documents
- and more.
A resource is really a thing that supplies a thing, whatever that thing is. For our purposes, we can think of information resources as providers of information sources, like those listed above. One of the more important resources we can use are databases. Databases are great resources for locating sources of information. The term database itself is a bit problematic because it has multiple meanings, and I think it will help if we start this section by clarifying what we mean in this course by the term database.
According to the Dictionary of Information Science and Technology, the term database can be defined as:
a collection of files containing related information. Database consists of files and each file consists of records. Furthermore, each record consists of fields, which can be used to store the raw facts or data (p. 246).
This definition is not wrong, per se, but it is rather broad and therefore not entirely useful. For example, the definition entails that if I have a folder on my desktop, and if that folder contains only Excel spreadsheets on related content, and since those spreadsheets contain only records (i.e., rows) and fields (i.e., cells), then that folder on my desktop is a database. Although I suppose that's kind of true, I don't think that's really what the dictionary authors have in mind, and thus it's not an accurate enough nor a very useful definition.
A more specific and applicable definition is provided by the Online Dictionary of Library and Information Science, which defines a database as:
A large, regularly updated file of digitized information (bibliographic records, abstracts, full-text documents, directory entries, images, statistics, etc.) related to a specific subject or field, consisting of records of uniform format organized for ease and speed of search and retrieval and managed with the aid of database management system (DBMS) software. Content is created by the database producer (for example, the American Psychological Association), which usually publishes a print version (Psychological Abstracts) and leases the content to one or more database vendors (EBSCO, OCLC, etc.) that provide electronic access to the data after it has been converted to machine-readable form (PsycINFO), usually on CD-ROM or online via the Internet, using proprietary search software.
Most databases used in libraries are catalogs, periodical indexes, abstracting services, and full-text reference resources leased annually under licensing agreements that limit access to registered borrowers and library staff. Abbreviated db. Compare with data bank. See also: archival database, bibliographic database, embedded database, metadatabase, and niche database.
The above definition is more helpful because it provides examples of content type (e.g., bibliographic records, abstracts, full-text documents, etc) and how to access that content (e.g., library catalogs, periodical indexes), because it partitions out the components of a database (records, search, retrieval), and because it distinguishes a database proper from a database management system (DBMS), which is what the prior definition was attempting to define. The ODLIS definition also highlights the role of database producers in producing databases (e.g., the American Psychological Association), which tells us something about the subject matter of a database (e.g., psychology).
What are example databases?
The ODLIS definition provides some leads to this question. The definition tells us that example databases include "digitized information containing bibliographic records, abstracts, full-text documents, directory entries, images, and statistics" that are "related to a specific field or subject," and that these can be accessed through library "catalogs, periodical indexes, abstracting services, and full-text reference resources." Given that, let's look at some examples.
Library Catalogs and Discovery Services
We'll start with the most general example, and that's the library catalog. Traditionally, library catalogs are databases that primarily provided information about where we could find content on a library's shelves (i.e., print content). However, given that more content is available online, nowadays these catalogs are more often referred to as discovery services. In this expanded role, not only can they tell us where to find a book on a shelf, but they can also provide full text access to online content in other databases.
We can access the University of Kentucky's library catalog/discovery service at their main website at:
Please visit the link and test your own searches.
In the center of UK Libraries' homepage, we find several links of interest: the InfoKat (Books and More) link, the Database link, and the Journals link.
InfoKat searches bibliographic records. It does not provide full text access. Instead, they provide access by linking to content or by providing physical directions to that content. For example, if that content is a print item someplace on the library's shelves, InfoKat might tell us it's located on the 5th floor of WT Young and has the call number Z674.82.159 S42 1996, or something like that. However, if that content is online, then InfoKat should give us a direct link to it. It could also be that the InfoKat has records for information sources but not the actual sources themselves. In those cases, the bibliographic records should provide a link to request the item through interlibrary loan (ILL).
Google and other search engines function in much the same way. If I do a search for the term databases in Google, the search engine will return results to sites that are relevant to that term, like a Wikipedia article on databases. Google itself does not store the full text content and only provides access (i.e., links) to content based on its own indexing of where things are on the web. Unlike the library's catalog, if Google returns results for pages it cannot access or that don't exist anymore, there is no ILL service offered, and we're left to our own devices to find access to that information.
Periodical Indexes and Abstracting Services
InfoKat returns a wide variety of bibliographic records that describe and provide access to books, ebooks, book chapters, maps, scholarly articles, news articles, images, and more. InfoKat is therefore great as a discovery service because it functions as a kind of catchall for all the types of works that a library can provide access to, as just listed. However, there will be times when we will want to focus our search to a more limited subset of records, such as those that produced by periodicals.
Periodicals are nothing more than magazines and journals. This kind of genre produces issues at regular (or even at irregular) intervals, such as weekly, bimonthly, monthly, or quarterly issues, with no pre-determined end. Periodicals focus on different audiences, such as the general public (e.g., Time magazine), parts of the general public (e.g., WIRED, Elle, Better Homes and Gardens), the general scholarly community (e.g., Nature, Science), or to specific scholarly communities (e.g, Journal of Information Technology, Journal of Synchrotron Radiation, Journal of Sociology). Scholarly journals are generally (but not always) peer-reviewed journals, which means that when authors submit their manuscripts to these journals, the manuscripts are sent out to two to three peers in the scholarly community who read the manuscripts and write recommendations to the editors for improving the manuscript, or who recommend that the journal editor reject the manuscript. Different journals and different scholarly disciplines have different criteria for making judgments on quality.
There are two broad ways to locate sources in periodicals. At the UK Libraries homepage, we can select the Journals link or the Databases link. The Journals link lets us search for specific journal titles and, if desired, within disciplinary/subject categories (e.g., Health & Biological Sciences, Law, Politics & Government, Language & Literature, or Social Sciences, Journalism & Communications, and more). Once we locate a journal of interest, we can visit the link and then search within that journal or peruse the table of contents for specific issues.
The Databases link lets us expand our search of records to broader categories or subject areas of periodicals. Here we find databases like Academic Search Complete, which will let us search thousands of periodicals, including magazines and scholarly journals, across a range of subject categories. JSTOR is another database that indexes a broad subject range of periodicals but with greater emphasis on peer-reviewed material. Databases may also have a specific focus. For example, the African American Communities database provides access to archival and historical papers, oral histories, essays, letters, pamphlets, and newspapers on African American communities and is geographically focused on Atlanta, Chicago, New York, and parts of North Carolina. Then we also have access to databases like the Kentucky Building Code, which is a database that links to residential and commercial regulatory information on building, plumbing, design in Kentucky.
Many of the resources listed above, as well as the hundreds that UK Libraries offers and that I have not described, provide access to full-text content. But this isn't necessarily so. All that a periodical index must offer to be classified as such are records that contain basic information, such as article titles, author names, publication names, publication date information, and often subject terms. Abstracting services provide the same information but also provide abstracts of journal articles, which are searchable. Since abstracts provide summary information of the main articles, this is helpful in locating information sources.
Full-text Resources
Since periodical indexes and abstracting services do not necessarily provide direct full text access, you might wonder what the point is if you can't get the full text of some source that the library tells you exists. The point is that even if the library doesn't have immediate access to a source, if it can tell you that it exists, then a librarian can likely get the source through interlibrary loan (which is fast!) or some other means. So if the stakes are high (e.g., you are a cancer researcher) and you really need or just want information (e.g., about a cancer therapy), then periodical indexes and abstracting services are great resources even if they do not provide immediate full text access.
That said, in addition to a variety of periodical indexes and abstracting services, UK Libraries provides quite a few full text databases. Visit the A-Z Databases page to see all of your choices. A full-text database may also be an abstracting service, if it provides the basic information listed above (title, author name, etc) about periodical content. It becomes a full-text abstracting database if it also provides access to the full-text source. Other non-periodical database, like the African American Communities database, also provides full text access to its sources, and may even be thought of as a digital library and not just a database.
In the next few paragraphs below, I will focus on a few general purpose full-text databases that you may find helpful as you work towards your degree. Although these databases provide full-text sources, they may not provide full-text to everything they have records for, though.
Academic Search Complete
Academic Search Complete (ASC) is a database of databases. It can search over 60 different databases and indexes, each including hundreds of journal titles, including:
- Communication and Mass Media Complete
- Sociological Collection
- Music Index
- SPORTDiscus
- MEDLINE
- eBook K-8 Collection,
- and more.
In ASC, we can search all 60 plus databases at one time, or we can select one or more of them, which might make better sense. For example, if we were interested health related information, we might want to search the following ASC databases at once:
- Consumer Health Reference eBook Collection
- Health and Psychosocial Instruments
- MEDLINE
- APA PsycInfo
ProQuest
Like ASC, ProQuest provides access to many subject based databases, and also, many of the database topics overlap. ProQuest uses this to highlight database collections. For example, the SciTech Premium Collection includes three databases that can be searched at once. These include the:
- Natural Science Collection
- Science Database
- Technology Collection
ProQuest's Social Science Premium Collection includes databases on criminology, education, library and information science, politics, sociology, and more.
ProQuest also provides database access to magazines and news content, including The Vogue Archive and various current and historical newspapers, such as an archive of The New York Times or current issues of The Courier-Journal (Louisville, KY).
JSTOR
JSTOR is another multi-disciplinary database. JSTOR covers subjects such as:
- Arts
- Business & Economics
- History
- Medicine & Allied Health
- Science & Mathematics
- Security studies
- Social Sciences, and more.
Each of these subject areas includes access to many journal titles, and therefore, many journal articles. JSTOR has long focused on back issues of journals, but in recent years has made moves to include current literature and open access content (this content that is freely available). JSTOR also includes ARTSTOR, a database of images, video, and other multimedia content, much of which is also available as open access.
Note: Open access (OA) content is content that is freely available to access. That does not, however, mean that the content is free of copyright protections. There are different types of OA copyright licenses and each will stipulate how we can use that content. CreativeCommons.org provides a list of some of the more popular licenses and what the mean for us. In all cases, though, if you use OA content, be sure to attribute it with source information.
Conclusion
Remember, one of the main ideas of this course is that there are many sources of information that are not easily available via Google or some other search engine. Even when we do use Google or Bing or Yahoo or something else, there are certain advanced tricks we can use to leverage them to find better information. We'll cover those soon. Regardless, if we apply what we'll learn in this class, we'll become that much more accomplished.
In short, we are going to learn a bit more about what's hidden from most users and from the common view. To do that, we needed to understand the difference between information sources and information resources. Information sources can be divided into three types: primary, secondary, and tertiary. The differences among them center around the strength of the evidence they provide and their proximity to their topic. However, when searching for information, it is often better to begin with tertiary and secondary sources. In the end, in information source is the main item we often aim to acquire.
An information resource is a tool that provides or supplies access to information sources. In this section, some examples of information resources are the library discovery service, a search engine, and a database like JSTOR, ASC, or ProQuest. These are all tertiary sources that can lead us to more substantial sources of primary and secondary information.
Reference Managers: Getting Started
Readings
Please visit the links in this section as you read through it, and read through the Zotero Documentation, especially after watching the accompanying software demonstration.
Introduction
Now that I have introduced you to some of the different resources for information sources that exist (and we will learn about more), our next personal knowledge management challenge involves managing the information sources that we locate and want to use.
Imagine, for example, that we need to locate and use three academic sources for a class paper assignment. We could, like probably many do, search for information on the fly, locate some journal articles, figure out how to cite the articles, and manually add the citations to our Word documents. Perhaps if we're a little more advanced, we could download the papers to a folder on our computer for later reference (perhaps we have a folder dedicated to the class paper and even a subfolder dedicated to any readings that we download). Although that might be a bit more organized, it is the 21st century, and we have these computers that can do so much more for us than act as simple file cabinets. Basically, if we want to be more efficient, to save time, and to do a good job, then this is not the way.
Instead, we can use a reference manager (RM) (also called a citation manager). A RM is a piece of software that can help us manage this process of saving, collecting, and using information sources. Although generally aimed at academic users, RMs can also be used to bookmark, save, and collect all sorts of web and print information sources and for all sorts of outcomes, whether those outcomes are class papers, engineering projects, musical composition projects, biology experiments, and so on. Basically, if we use the web to collect and then use information, then this is the way.
Reference Manager Software
You are required to use a RM in this course, and I will focus on the Zotero RM. However, you are free to choose other options, even if I don't cover them, and there are many. Wikipedia has a page that lists around 20 RM options.
There are a few reasons I will focus on Zotero, though. First, I use it, and I know it fairly well, although I'm always learning new things about it. But most importantly, it's free (and open source software), it's consistently maintained and updated, it provides all the major functions that a RM should provide and more, and it's available on Windows and macOS desktop and laptop computers. For iOS (i.e., mobile) users, there is an app for the iPhone and iPad, but Android users can access their Zotero library using their phone's web browsers, which is just as good as the dedicated app.
If you elect to use an alternate RM, be aware that they are not all created equal. The two most popular, functional, and well-supported alternatives are Mendeley and EndNote. Both of these are are more academic-centric, while Zotero is more agnostic about source information and usage.
Mendeley and EndNote are more academically oriented because they are owned by companies involved with academic publishing and academic databases. Mendeley is owned by the company Elsevier, which provides the Scopus database, a bibliographic, abstract, and citation database, and publishes multiple journal titles. EndNote is owned by Clarivate, a research analytics company, which provides the Web of Science and ProQuest databases. Web of Science is a citation database like Scopus.
Zotero
Gilmour & Cobus-Kuo (2011) identify eight functions that a RM should provide, and Zotero performs all of these functions. These functions include:
- Import citations from bibliographic databases and websites
- Gather metadata from PDF files
- Allow organization of citations w/in the reference manager database
- Allow annotations of citations
- Allow sharing of the reference management database
- Allow data interchange with other reference manager products through standard metadata formats
- Produce formatted citations in a variety of styles
- Work with word processing software to facilitate in-text citation (Gilmour & Cobus-Kuo, 2011, Introduction section).
Regardless of which RM we use, we want to pick one that performs most if not all of the above functions because these functions help us identify a useful app. Fortunately, the Zotero Quick Start Guide provides a nice overview of the basic functions and how to use those functions, and the additional documentation describes how to do more. Fortunately, Zotero satisfies the requirements listed above.
Specifically, the Zotero Quick Start Guide shows us how to:
- Install and open the Zotero desktop/laptop application
- Install and use the Zotero browser plugin
- The download page provides links for using Zotero with:
- Firefox
- Edge
- Chrome
- Safari
- The download page provides links for using Zotero with:
- How to collect items (e.g., books, articles, images, etc.)
- What we can do with those items
- How to create collections to organize items by topic, project, etc
- How to use tags to add additional organizational layers
- How to search, and save your searches, your Zotero library
- How to import or add attachments (like PDF copies of your items)
- How to add notes to your items
- How to cite items in your papers, etc.
- How to use Zotero with Microsoft Word, Google Docs, or other word processors
- The word processor plugin provides links for using Zotero with Microsoft Word, LibreOffice Writer, and Google Docs
- How to create bibliographies
- How to use Zotero and access your Zotero library on multiple devices
- How to collaborate on research projects using Zotero
Note: If you choose Zotero, as I suggest and recommend, you should create a free account (use your personal email to sign up). As you add material to your Zotero collection, you collection will be synced with and backed up to Zotero's servers. Zotero registration: https://www.zotero.org/user/register/
Zotero Demo
In the next video, I will show you how to complete the above steps so that you may get started using Zotero. Again, you may use an alternate RM, but throughout this course, I will demonstrate Zotero.
Your task this week is to download and start using Zotero, or some alternate RM. Please follow the demo video to complete the process.
Conclusion
- Reference (or citation) managers (RM) provide more sophisticated tools to manage information sources than simple files systems provide.
- There is a slew of RM applications available to use, but we want to be sure we pick one that provides as many functions as possible and that is available on as many devices and operating systems as possible and can be integrated with a variety of word processing applications.
- Although you are welcome to use an alternate RM, in this course we focus on the Zotero RM.
References (created with Zotero)
Gilmour, R., & Cobus-Kuo, L. (2011). Reference Management Software: A Comparative Analysis of Four Products. Issues in Science and Technology Librarianship, 66(Summer 2011). https://doi.org/10.5062/F4Z60KZF
Welcome to Retrieving and Evaluating Information
Welcome to the segment on retriving and evaluation information. We're going to hone our skills in digging up and assessing the vast information that's available to us.
First, we'll dive into Web Information Retrieval. Ever wonder how the internet is organized and how you can use this organization to your advantage? We'll explore just that, enhancing your ability to conduct effective searches on the web. This is about finding the right information quickly and efficiently.
Next, we'll shift gears to Bibliographic Database Information Retrieval. Here, you'll learn the ins and outs of bibliographic records and the art of using Boolean operators. These tools are key to navigating bibliographic databases and uncovering the specific information you need for your research or projects.
Finally, we'll focus on Evaluating Information Sources. In an age where information is abundant, the ability to critically assess the reliability and quality of information is crucial. We'll discuss techniques like narrative probability and lateral reading, helping you to develop a keen eye for credible sources.
This section is about equipping you with the tools and know-how to efficiently retrieve and critically evaluate information: skills that are indispensable in our information-rich world.
Web Information Retrieval
Readings
Please visit links in this section and review the following search tips pages for Google Search, DuckDuckGo search, and Bing search.
- Refine web searches—Google Search Help. (n.d.). Retrieved August 3, 2022, from Google
- DuckDuckGo. (n.d.). DuckDuckGo Search Syntax. DuckDuckGo Help Pages. Retrieved August 3, 2022, from DuckDuckGo
- Advanced search options Bing. (n.d.). Retrieved August 3, 2022, from Microsoft
Introduction
Whether we want to search the web, a bibliographic database, or our Zotero library, it's helpful to know about information retrieval in order to become good at search.
In this lesson, we'll cover specific search techniques that you can use to get better using search engines like Google, Bing, or DuckDuckGo. Specifically, there are three principles that we need to consider when advancing our search skills.
Principle 1: We should understand that the basic information retrieval model centers on documents. Anything indexed in a database or on the web is treated as a document. Documents include text, sound, images, video, etc.
Principle 2: We should understand that documents do not exist independently of other documents. This is called the corpus. For the web, the corpus is organized like a file system, much like the file system on your personal computers. In a bibliographic database, the corpus is organized by predefined fields such as author names, title names, subject terms, etc.
Principle 3: Our queries are not divorced from the documents nor the corpus nor the organization of the corpus. These things are all intertwined. Each time we search, search engines and databases compare documents in the corpus to each other and to how they are organized based on our queries, and then rank order (in some way) those documents by way of that comparison. Hence, when we construct queries, it's useful to think about the content (corpus) that we are searching.
To rehash, a search engine or database uses our queries, matches them against how they index the corpus (i.e., the web), and then rank orders the search results based on the query and the corpus.
To illustrate the above concepts, I'll primarily focus on searching the web in this section, but these techniques work across the databases that we use in a library. We'll cover that in the following section.
The web is for storing and retrieving documents
When we use Google or another search engine to search, we often look for documents (web pages, websites, etc.) in a corpus that contain specific text that match our query. This has some implications:
- Text has primacy, even for multimedia, which is often described using text.
- Our queries are matched against the text that appears in documents on the web or that describes documents on the web.
- The better our queries match the documents, the more precise our search results will be. This assumes we can construct good queries.
- The more ambiguous our queries are, the more work the search engine has to do to find relevant results.
- The challenge with search is that we do not always know what text a document
contains even if that document covers the topic or concept that we think is
relevant.
- For example, consider synonyms. We might want to find web pages that contain terms that are synonymous with our query term but do not actually contain our query term. But this can get complicated. If I search for star, could I also mean principal, lead, hero, celebrity, stellar, or sun?
- What if a document only uses terms like principal, lead, hero, celebrity, stellar, or sun? Might it still be useful if I was interested in documents (i.e., web pages) about star? Probably not since some of those words, although synonymous with star, are not synonymous with each other. Compare the terms principal and celebrity. Both of these terms are synonymous with the term star but they are not synonymous with each other. Therefore, the synonyms of a term may not be synonymous with each other.
- Other wordy issues include things like homonyms, which are words that are pronounced or spelled the same but mean different things. Thus, by bark do I mean the bark on a tree or the word we use to signal the sound a dog makes?
- Phrases are also important, with respect to term order. If I search using a phrase where the term forest precedes the term fire, search engines are more likely to return results where those two terms appear in that order. This will mean that documents that contain text about someone having a camp fire in a forest will be less likely to appear at the top of my results than a document that contains the phrase the forest is on fire.
- Or consider term adjacency. Documents where the terms are in order but spaced far apart will also rank lower. E.g., if the term forest appears in the first paragraph on a web page and the term fire appears on the last paragraph of a web page, this web page will rank lower than a page that contains the terms near or adjacent to each other when searching using a query like forest fire.
The web is organized
I mentioned above that the web is a file system, the kind that you'd find on your own computer with its folders and files. By this I mean that the web is organized. If we know a bit about its organization, we can take good advantage of that when we search. For instance, we can narrow our searches to parts of the web. So the questions are: how is it organized? And how can we use that organization in search?
- The web is organized like a tree. This tree like structure originally contained a few main branches, called top level domains (TLDs). Example TLDs are .com, .edu, .org, .gov, .mil, and .net. All domains then branched off of those main branches.
- The tree has grown over the years and now contains nearly 1451 (as of February 2024) of these main branches (TLDs). Newer TLDs include .apple, .attorney, .camera, .green, .joy .mobile, .office, .science, .space, and many more.
- Included in those are ccTLDs, or country code top level domains. For
example:
- .kr for South Korea
- .ae for the United Arab Emirates
- .fr for France
- .us for United States
- Each of the big branches contains smaller branches, called second level
domains. For example:
- Under .com is google for google.com
- Under .edu is uky for uky.edu.
- Under .org is wikipedia for wikipedia.org,
- Under .gov is usa for usa.gov,
- Under .apple is newsroom for newsroom.apple,
- and so on.
- Those branches (second level domains) contain even smaller branches that are
called third level domains or subdomains. Examples include:
- maps for maps.google.com
- calendar for calendar.uky.edu
- en for en.wikipedia.org
- analytics for analytics.usa.gov
- www for www.uky.edu
- ci for ci.uky.edu
We can take advantage of this organization
by limiting (or focusing) queries to results
within smaller sections of the web. In Google, this would entail using
what is called the site:
operator.
We can add the site:
operator to our search queries.
For example, let's say I do a search for the term flu, and
I notice that most of the results
that I'm interested in are from .gov domains,
and most of the results I am less interested in
are from .com domains.
To focus on the gov domains,
I add the site operator to my query.
This is the Google query I could use to search
for the topic flu only on .gov sites:
flu site:gov
Then perhaps I find these results too general still. For example, let's say I live in Kentucky but Google keeps showing me .gov sites from other states. We can focus on just a smaller branch of the tree. E.g., if I wanted to focus only on results from Kentucky, then:
flu site:kentucky.gov
This works because https://www.kentucky.gov exists as a website.
Or I can specify a part of Kentucky's government, like the Cabinet for Health and Family Services or the Kentucky Department of Education, both of which exist on ky.gov domains. with these queries:
flu site:chfs.ky.gov
flu site:education.ky.gov
Constructing queries for precise results
There are more tips and tricks we can apply to revise and make our queries more precise. Below I cover:
- using quotes around search terms for precision
- limiting results to specific dates
- excluding results with specific terms or phrases
- varying the order of terms
- using the OR keyword to specify one term, another term, or both terms
Use quotes around our search terms
With a search like flu site:gov
,
Google provided us with snippets of text
that highlighted where the term flu appears
in the web pages that are retrieved.
For example, we see terms like these in the search results:
- flu symptoms
- flu activity,
- flu vaccines, and
- flu season.
This gives us important information about how Google sees the text on web pages, and we can use this information to revise our search. For example, let's say I'm interested in web pages that contain info about flu vaccines and less interested in pages that contain information on flu activity or flu season. If that's the case, then I can add the additional term to my query and enclose the whole query in quotation marks. That will force Google to rank pages with the literal term "flu vaccines" much higher than pages with those other terms or phrases, or exclude those other pages altogether. So our query will now look like this (note: only the query terms, and not the site operator, are quoted):
"flu vaccines" site:gov
By enclosing our query in quotes, we force the two terms (flu and vaccines) to be adjacent and in order: the term flu precedes the term vaccines.
Get more recent pages
If I'm really interested only in recent pages, I can click on the Tools button and select Any time, Past hour, Past 24 hours, Past week, or etc, to limit results to certain time periods.
Exclude results with the minus sign
Let's take a look at our flu vaccine search. Instead of enclosing "flu vaccines" in quotes to return only pages with that phrase and to reduce pages retrieved with other phrases, I could exclude the other phrases altogether (i.e., activity and season) by excluding them with a minus sign. To exclude the terms activity and season from our search results, this is how our search would look:
flu -activity -season site:gov
I can also exclude specific domains or specific websites:
flu vaccines -site:com
flu vaccines -site:webmd.com
By excluding domains, I may want the search results to include websites from .gov, .edu, .org, etc. but just not .com or webmd.com.
Term order matters
Results will be different depending on the order of the query terms. Google has gotten good over the years about natural language (how we talk in real life), especially as generative artificial technologies have become more mainstream and integrated into search. So the suggestion is to use natural language in your query. For example, it's generally better to use the search terms flu vaccine instead of vaccine flu, since the former is how we'd phrase the terms in English. This will of course vary by language. In many Romance languages, but also others, it's common (but varies) for the modifier to come after the word being modified. For example, in Spanish, we would say shirt red:
camisa roja
Thus, a Spanish speaker
would want to search for camisa roja
and not roja camisa
.
This of course is regardless of the country of origin,
but note that Google has separate landing pages for
Google.com depending on the country you're located in.
For example, for those residing in Mexico, google.com
directs to google.com.mx,
where mx is the ccTLD for Mexico:
Google Search - Mexico.
For those residing in Canada, it's google.ca.
Although term order can determine meaning or reflect natural language, we can pair some terms together as lists, without any impact on meaning or natural language. How we pair them might change the results retrieved, though, which becomes noticeable as we scan the search results lists. For example, consider the following two searches:
google bing
Or:
bing google
The above two search queries are semantically equivalent (they mean the same thing) and their order is arbitrary in our list, but search engines implicitly place a priority on term order. The first term in the query is prioritized over the second term. So if you search Google using the above two queries, the first page of results might be mostly the same, but as you page through them, you might see more pages on Google than on Bing, for the first search, and vice versa for the second search.
One OR the other OR both
When we search using multiple terms,
we can use the OR
operator to tell Google to return pages
with either of the terms or both of the terms.
Consider the following two searches:
"google" "bing"
google OR bing
The first search will return pages with both the terms included in the results because the quotes enforce that.
The second search will return pages with either the term google in the results, the term bing in the results, or both the terms in the results. Note also, based on my personal experience, that if I test the second search in Google Search and also in Bing Search, then Google will return more results about Google, and Bing will return more results about Bing, respectively.
Other operators
There are other operators we can use in search engines, and many of them work regardless of which search engine we use. Here are some examples that you can test in Google or elsewhere:
- :related to find related sites
related:twitter.com
related:instagram.com
related:whitehouse.gov
related:google.com
- :filetype to return results in specific types of files:
- search uky.edu for the term flu vaccine but only retrieve PDFs:
flu vaccines filetype:pdf site:uky.edu
- same as above but only return Microsoft Word files:
flu vaccines filetype:docx site:uky.edu
- same idea as above but only return Microsoft Excel files:
birth weight filetype:xlsx site:gov
- search uky.edu for the term flu vaccine but only retrieve PDFs:
Conclusion
Information retrieval or web searching can be complex but fun. Remember the three principals we stared with in this section, and apply those principals when constructing your queries.
- document centered
- consider the text
- no document is an island
- consider the document with respect to the corpus
- the web is organized
- take advantage of the how the web is structured with site searches
Remember and practice the techniques I discussed here:
- query construction
- use quotes to force exact matches
- exclude terms with the minus sign
- term order matters
- use OR to select alternate terms
If you forget anything, use advanced search: https://www.google.com/advanced_search
Or Advanced Image Search: https://google.com/advanced_image_search
Google provides a list of some of these operators.
Other search engines also have search operators, and often they're the same:
P.S.
You can get very advanced with your queries. Here are some examples:
trade ("surplus" OR "deficit") (site:whitehouse.gov OR site:congress.gov)
Or, limit to specific filetypes:
trade ("surplus" OR "deficit") (site:whitehouse.gov OR site:congress.gov) filteype:pdf
The last search query is so complicated that it decomposes into the following separate queries but searches them all at the same time:
trade surplus site:whitehouse.gov filetype:pdf
trade surplus site:congress.gov filetype:pdf
trade deficit site:whitehouse.gov filetype:pdf
trade deficit site:congress.gov filetype:pdf
trade surplus site:congress.gov site:whitehouse.gov filetype:pdf
trade deficit site:congress.gov site:whitehouse.gov filetype:pdf
trade surplus deficit site:congress.gov site:whitehouse.gov filetype:pdf
trade surplus deficit site:congress.gov filetype:pdf
trade surplus deficit site:whitehouse.gov filetype:pdf
trade surplus deficit site:congress.gov site:whitehouse.gov filetype:pdf
Database Information Retrieval
Introduction
I introduced the concept of databases in section 3.1. Academic databases, or sometimes bibliographic databases, offer a number of unique advantages over search engines, and some disadvantages, too. The main advantages are that databases offer specialized collections on a variety of topics; they offer many sources that are invisible to search engines; and they provide greater control over the search process. The main disadvantage is that they are a bit complicated to use well, there are many databases to choose from (and find), they have their own user interfaces, and they are often only accessible via a library.
Many databases are only accessible via a library because a library pays to use them. Google and other search engines operate on different revenue models, like serving ads, to pay for locating free content on the web. Google and other search engines do not provide access to non-free content on the web.
Information retrieval (search) in databases works similarly and differently than it does with the web. Like with the web, database information retrieval works on documents in a corpus. We search that corpus using queries, and how we construct our queries is important.
Documents in databases though are a bit different. As discussed in the previous section, websites exist in a fairly organized hierarchy (with respect to top level domains, etc.), but web pages themselves are not always very structured. Instead, search engines have become really good at taking all that unstructured text and making sense of it.
Academic databases, on the other hand, generally work with fairly structured documents; that is, they index structured bibliographic records, like the one in Figure 1 below. While they may also index full text documents if those documents are accessible to the database, the focus is on bibliographic records. If only bibliographic records are indexed, the database is usually called an abstract and indexing database (A&I database). Otherwise it's called a full text database if it provides full text. Many of the databases that we have access to at our library are a mix of the two.
Searching Bibliographic Records
Bibliographic information is also called metadata. Bibliographic records are metadata about specific items (books, articles, photos, etc.) in a bibliographic or academic database. Metadata is broadly defined as data about data, or sometimes as information about information. For example, a title of a book is metadata about the name of a book. The author name of a journal article is metadata about the author of a journal article. And so forth. The metadata about a specific item such as a book or journal article is a record. In a database, this metadata is created by professionals and is well structured (as illustrated in Figure 1 above). As searchers, this means that there are pre-set fields that we can search in academic databases and that these pre-set fields specifically search the corresponding metadata. For example, in Academic Search Complete (ASC), we can search the following fields:
- Author
- Title
- Subject Terms
- Abstract text
- Author supplied keywords
- Geographic terms
- People (names)
- Journal Name
- ISBN
- And more.
We can also filter by publication date, full text availability, document type, language, number of pages, and images (depending on the database and its content). In the end, this means we have greater control over the search process than we do in a search engine because the corpus is better defined.
In the previous section on web information retrieval,
I showed how we can search the web
by using the :filetype
operator to limit results
to PDFs, DOCX, XLSX, etc files.
In a bibliographic database like ASC,
we can limit results by document type.
That means we can restrict results to document types like:
- articles
- bibliographies
- book chapters
- book reviews
- case studies
- editorials
- film reviews
- interviews
- letters
- obituaries
- opinions
- poems
- recipes
And much more.
Otherwise, all the same principles apply to searching databases as searching the web with a search engine. Specifically:
- Document-centered (bibliographic records are documents)
- Documents exist within a corpus
- Query construction is important
And many of the same techniques apply, too:
- We can use quoting to make sure words are included in the results
- Term order matters
- We can use OR between terms to focus on one term or the other or both
- We can use other operators, like NOT to exclude documents that contain specific terms, and AND to force return documents that contain specific terms.
The AND operator between two query terms means that both terms must be present for each result in a search. For example, if I search for
dogs AND cats
in a database like ASC, then each result must include both the terms dogs and cats. We usually have to specify this AND in a database. This not the case with Google and other search engines. In search engines, the AND is assumed between terms. So the equivalent search in, for example, Google, Bing, DuckDuckGo, etc is simplydogs cats
.
Subject Terms and Thesauri
Many (but not all) databases offer the ability to search by subject or thesauri term. Subject/thesauri terms are kinds of controlled vocabulary, but are basically special keywords assigned to bibliographic records. If a database uses these kinds of vocabulary terms, it means that each record in the database includes a list of these terms that should well describe the contents of the item it describes. Further, this means that all bibliographic records that share a specific subject term are linked together.
For example, the ASC database uses subject terms. One subject term is Forest animals, and if I use that as my search query, then each record that is returned must include that subject term, and that record should match the contents of the item. I can peruse the results and identify other subject terms that help narrow my results. For example, the subject term BIRD habitats appears in records with the subject term Forest animals, since records often have multiple subject terms. If I combine those terms with an AND operator, then I narrow my results down to two journal articles, which is pretty precise. ASC is a multi-disciplinary database, and so feel free to explore subject terms related to your own interests.
Subject terms, thesauri terms, controlled vocabularly, and so forth are all ways that professionals, like librarians, use to describe works. In the web information retrieval section, I discussed how search engines have become good at interpreting natural language queries. However, subject terms are generally shorter and use more formal syntax than natural language. This is important to know when using these terms to search academic databases.
Browsing
Although database search can be more precise than using search engines, databases are also good for browsing.
We all browse (online, in stores, as we page through books, and so on) but as a type of search process, browsing can be a highly useful tool when applied systematically and strategically. The result is not simply a way to scan through search results. Rather, the result of intentional browsing, (reading or skimming a list of titles and abstracts) can be the accumulation of highly relevant source material, relevant to our information needs and queries.
Browsing Strategies
Although we make a distinction between browsing and searching, it is oftentimes helpful to begin a browsing session with a keyword search, and then use something from the search results, something like an author's name or subject term, to find and collect related information. We call this type of browsing pearl growing.
Subject Browsing
Below is an image of the ERIC Database. ERIC stands for Education Resources Information Center. It is provided by the U.S. Department of Education, and it is an important access point for millions of bibliographic records to journal articles, books, research reports, white papers, government and other organizational reports, and more on education related topics.
ERIC, like other bibliographic databases, offers a thesaurus of controlled terms to help aid search. For example, let's say I'm interested in research on academic libraries. In this screen shot, I'm looking at the page that describes the thesaurus descriptor for academic libraries. As is usual with thesauri, it not only describes how the term is defined in the database, but it also links to related terms, including terms that are conceptually broader than academic libraries, conceptually narrower than academic libraries, or that are conceptually related to academic libraries. I can click on any of these terms, and then click on the link that says to Search collection using this descriptor. And in doing so, I engage in subject browsing.
Author Browsing
I can browse using other access points (a way to gain access to information) like author names. After perusing the results from above, I can click on an author's name to narrow results.
Knowing that authors tend to write and research on a specific range of topics (i.e., are specialists) is helpful because it allows me to browse by author and subject topic.
Citation Browsing
I've described abstracting & indexing (A&I) databases, but there's another special type of A&I database called a citation database. Three useful ones available to us are:
- Scopus
- Web of Science, and
- Google Scholar (more of a scholarly search engine).
The first two are available via UK Libraries, and the latter is available freely on the web. A citation database is a database that shows who has cited an article (as known by their database) and provides a link to those articles that have cited an article. Citation theory says that when any two articles (or books, or other documents) are cited in this way, they are more likely to be about the same thing. In fact, this is how Google search works, in part. Google's original Page Rank Algorithm posited that if a web page links to another web page, then the two pages are likely to be about the same topic. Because of this theory, we can follow citations to find more relevant articles.
Pictured here is a record in Web of Science on information literacy. To the far right you can see that it has 4 Citations. If we click on that 4 Citations link, we can begin to browse those 4 articles or documents. Per citation theory, it's highly probable that those 4 citing documents are also about information literacy; and thus, browsing them would be of considerable help if we were interested in reading more about information literacy.
After clicking on the 4 Citations link, we can see that the term information literacy appears in the title of all four citing works. This is good evidence for our citation theory, but it's also a useful trick for us.
Google Scholar works in much the same way. Instead of Times Cited, it says Cited by, and the search results default by generally listing (we think) the most highly cited works rather than the most recent, as is the default in Web of Science. But if we click on the Cited by link, we'll be taken to a page that lists the citing articles and documents, and like the Web of Science example, it's likely that many of the citing articles will be relevant in our search on this topic.
Combination Browsing
Like with most other searches, we can combine terms and use those combinations to focus our browsing sessions. The available combinations depend on the database we use. Here's a screen shot of an item from the Communication & Mass Media Complete (CMMC) database. I searched this database using the thesauri term DIFFUSION of innovations AND also the term regression in the abstract. Basically, this tells the database to retrieve any record tagged with the thesauri term DIFFUSION of innovations and where also the term regression appears in the record's abstract. If it contains regression in the abstract, then the source likely used or refers to a statistical technique called linear regression, logistic regression, or like. Once I have this initial query, I can begin browsing the 11 titles and abstracts that are listed in the results.
Boolean Logic: AND, OR, NOT
Remember that database searching is more structured at the document level, and that this structure is reflected in the ability to do field searches. In the above example, for instance, we use two fields. The first field is a subject term search for the subject DIFFUSION of innovations, and it's marked as a subject field with the DE at the beginning. The second field is an abstract search, and this is shown in the drop down box to the right of the query term. In between these two fields is a Boolean AND operator. The AND operator tells the database that both query terms must be present in the results. We've seen this AND in prior examples.
I've mentioned two other Boolean operators: NOT and OR. Many bibliographic databases offer all three. The NOT operator instructs the database to exclude the assigned term. Thus, if we had chosen NOT "regression", then the CMMC database would have returned results where the term regression surely did NOT appear in the abstract for records with the subject term DIFFUSION of innovation.
The OR Boolean operator is a bit tricky. It means, basically, one or the other or both. Thus, if we had used it here, then CMMC would have returned all records having the subject term DIFFUSION of innovations, as well as those records that did or did not have regression in the abstract. The OR operator is more useful when querying terms in the same search fields. For instance, we might want to use the OR operator to search for two different terms that might appear in the abstract fields, or the subject term fields, such as the following related terms:
DIFFUSION of innovations theory" OR
INFORMATION dissemination"
We can see how this plays out in the results. In the first record in the following screenshot, both terms appear in the subjects list. But in the second record, only one of the terms appears.
How to Browse
When we browse, therefore, we are attempting to locate key qualities from our results or our initial results lists (e.g., authors, subjects, etc.). These lists include the titles, the abstracts, the thesauri, and so forth. And these key terms will help capture what our search is about.
Developing a List
Many databases will offer a way to create, save, and export lists or individual records based on browsing and searching. This helps us easily manage the documents that we highlight as initially important. We can curate these lists as they grow and our search becomes more focused.
Creating a list in a specific database usually requires us to create an account on that database. I already have an account with EBSCOhost, the vendor that provides the CMMC database as well as many others, and in the following screenshot, I've already signed in to that account. To the right of the image, you can see a folder icon. As I browse through records that look relevant to my needs, I can click on that icon and save the result to a folder. I can also create multiple folders and email, download, or print the records for later use.
Of course, I prefer to save records in Zotero rather than use a database folder or list. This way I keep the records with me even if I lose access to the database.
Conclusion
In this section, we learned the following:
- Databases and search engines are different
- Each have advantages and disadvantages
- Search engines are well structured at the file system level
- Databases are well structured at the record level
- Searching in a database means search structured bibliographic records
- Records are structured by pre-set fields
- Subject terms or thesauri descriptors help create precise searches
- Systematic browsing can be a rigorous way to engage in search
- Pearl growing is a browsing strategy that involves collecting items
based on an initial aspect of a bibliographic record. Such as as:
- subject term
- author name
- citation
- Because databases search structured bibliographic records with pre-set fields, we can create very precise queries by combining fields
- We can combine fields using Boolean logic: AND, OR, NOT
- We can create and save lists as we browse
- Or we can save items to our reference manager (RM).
In the end, don't simply browse absentmindedly. Rather, browse with smarts: systematically and strategically. Make the systems work for you. And save your results in Zotero or your chosen RM as you go.
Evaluating Information Sources
Readings
Read this short introduction on reading laterally as a way to evaluate information, and watch the short video.
- Newell, C. (n.d.). Jessup Playbooks: How do I read laterally?: Home. Retrieved August 3, 2022, from PVCC.edu
- Stanford History Education Group (Director). (2020, January 16). Sort Fact from Fiction Online with Lateral Reading, from YouTube
Introduction
When we're doing research or trying to learn something, we obviously want to gather content that is true. And so the key question is, how do we know content is true or false, or more or less true, or something along those lines. Basically, truth and veracity can have nuances that are important to identify or disentangle within stories that are presented about how things are in the world. For example, it's become very common for people in the public sphere to refer to narratives, and to argue that someone's or some group's narratives are either true or false. Or more accurately, a fiction or a non-fiction. Regardless of our political or whatever positions, this focus on narrative is interesting and worthwhile. It's interesting because placing information within a narrative has ancient origins, and it's worthwhile because it places true and false content within some context, like a story, which can itself be evaluated as fairly true or not (or fiction or non-). Essentially, this is important because stories are central to human communication and understanding (Fisher, 1989).
Before we explore this, I want to acknowledge that you may have been taught about various existing frameworks, like the CRAAP test, to help determine the veracity of content. Many of these frameworks ask us to ask questions and to check off boxes about the content itself when evaluating information or information sources. For example, who is the author of the content, where is the content published, on what platform? What is the author's or publisher's motivation? Is the motivation purely or mostly financial? How does that introduce bias? What is the date of the publication? Is it outdated? And so forth.
Those kinds of questions are important, but they may also be insufficient in determining some content's veracity. What's also important to identify, especially for content that is new or new to us, are questions that place the content in broader context, to fit it within an overall story that's taking place. For example, where does some content that we're evaluating fit within a story someone or some group is trying to tell? How does one group's (or author's, or publisher's, etc) story conflict with another group's (etc.)? Alternatively, what is the consensus among different stories that group's tell about a thing? How do these stories compete for public acceptance?
When we try to identify the story, then new questions and frameworks open up to us. For example, it could be the case that the basic facts about an event are agreed upon by various storytellers (e.g., news articles, politicians, scientists), but that the basic facts are presented in ways that impact how the story of those facts are told. Then, in those different tellings, the stories may consequently ring true or not. Worse, if the overall story rings false, then it may cause doubt about the basic facts, even if those basic facts are true.
Fortunately, we have ways to think about stories, and Walter Fisher (1989) identified two methods for evaluating narratives that he called narrative probability and narrative fidelity. The latter concept describes how well something rings true. For example, given what you know about the kinds of engineering feats that people have achieved, does it ring true that we have landed on the moon?
There's more to fidelity than that, but let's discuss narrative probability in more detail:
Narrative Probability
For a story to be probable (probably true, that is), it must satisfy three criteria:
- argumentative or structural coherence
- material coherence
- characterological coherence
Argumentative or Structural Coherence
Argumentative or structural coherence speaks to the validity of an underlying argument in a story. That is, good stories, whether true or not, present a series of premises that build off each other and that present an overall thesis or argument. Or, stories have a structure that makes sense and, upon close inspection, contains few holes. You all know if a story has argumentative or structural coherence because you have all surely watched bad movies that fail to convince you that the movie made sense. Sometimes this lack of structural coherence in a movie is what makes a movie fun ("it's so bad it's great!"). But when telling stories about how things are in the world, or how they should be, or how they will be, it's important for the story to make sense and to be valid, logically.
Material Coherence
If a story is internally consistent, what next? As is often the case, a group (e.g., ideological ones) may compete with another group about what should be the dominant narrative. In such cases, we can compare and contrast their narratives, and doing so is testing their material coherence. This is the idea that we compare and contrast stories and note in the process whether "important facts may be omitted, counterarguments ignored, and relevant issues overlooked" in the process of comparing and contrasting them (Fisher, 1989, p. 47). Such comparisons may happen from a bird's eye view; for example, when Democrats or Republicans are telling the story of the United States, can we see what important facts each group omits, which counterarguments each group ignores, and which relevant issues each group overlooks?
These comparisons may also happen at the micro-level. In the social and physical sciences, scholars and researchers are engaged in a series of discussions with each other about all sorts of theories about the social or physical worlds.
A quick note on theories. Theories are simply very rigorous explanations given the data. These explanations generally provide an account of causality, or of how one thing causes another. As social and physical scientists gather data, they develop theories (like stories) that explain the data. As new data is analyzed, those theories are tested. If the theories no longer explain the additional data, they are revised or discarded and replaced with new theories. Some theories (like the theory of general relativity) are quite stable (i.e., well tested), even if they do not explain everything about things within their purview, like gravity at the quantum level. Other theories still walk a tight rope (like string theory), and this is most likely because more data is needed to test them but enough data has been analyzed for them to hold for the time being. In the social sciences and physical sciences, it's more common for theories to explain limited phenomenon. These are called middle-range theories. The diffusion of innovation theory, for example, is a middle-range theories that was originally devised to explain how new ideas and technology spread.
You can see these discussions among scientists and researchers take place in the literature review and discussion sections of journal articles. In these sections, researchers cite and refer to others who have completed research on a similar or the same topic. The overall goal of these discussions is to test or develop theories that explain some phenomenon. Essentially, researchers in the sciences are seeking to provide a story, based on a rigorous analysis of the data they have, that explains some phenomenon, and in the process of doing so, they compare and contrast their explanations with others. As Fisher might say, they seek important facts that may have been omitted, counterarguments that may have been ignored, and relevant issues that may have been overlooked.
Practically, we can use tools to help immerse ourselves in these discussions. Tools like Zotero or other reference managers aid us in collecting sources, taking notes on those sources, and citing those sources in papers that participate in these ongoing discussions and contribute to this collective storytelling. In the process of writing about the phenomenon under review, we attempt to provide a story based on the back and forth discussions that have taken place on the topic. These RMs, then, basically help us to test material coherence.
Characterological Coherence
Can you think of a story that doesn't have a character at all? I can't. Even places and things can be characters in stories. The Delorean car in the Back to the Future movies is, for example, somewhat of a character in those movies. But when people are characters, they often behave, in movies, plays, etc., characteristically. That is, they behave according to their values, beliefs, attitudes, ideas, and words. They behave according to who they are.
We don't often see movies or plays or hear stories where people behave uncharacteristically because such stories are generally not good. Or if people do behave uncharacteristically, it's usually a part of the plot and the uncharacteristic behavior was foreshadowed somewhere earlier in the story. Often, when we see something coming in a movie before it's happened, it's because the characters are playing to their character. If we don't see something coming, then it's simply likely a complex character, and the story supports that.
In any regard, stories have characters, whether people, places, or things. And when we investigate stories (or theories), it's worthwhile to consider whether the characters are coherent in this way, too. You can put this to the test. When thinking about big issues taking place in the world, think about who is involved, how they are acting, what they are saying, etc. Is what they are saying make sense, per the above ideas?
Closing the Loop
How does this all help you evaluate information sources? Well, when you collect sources for a project, like a paper or something, it's a good idea, as you collect sources, to think about the stories that are being told around the topic, as well as the story you want or need to tell. The sources you collect should have structural, material, and characterological coherence, too. This means that in the process of collecting evidence, if you find evidence that degrades the coherence of these things, you need to revise your story, just like a scientist must revise or discard their theory in the face of contradictory or discordant evidence.
Project
You can test this method yourselves. Consider a news event that's going on as you read this. Pick two news articles that cover the event, from two different publications. Choose two publications that lie on different parts of the political spectrum. Study those two sources, and ask the following questions:
- How are the stories structurally coherent? Or not? That is, do the stories in the two articles make sense? Are they internally consistent?
- How are the stories materially coherent? Or not? That is, upon comparing them, do any of them leave out important facts? Do any of them respond to counterarguments (or alternate explanations that fit the data)? Do any of them overlook relevant issues?
- Are the characters in the stories characterologically coherent? Who are the characters in the story? Do they behave according to what we know of their values, attitudes, beliefs, ideas, and words? Do the stories present them counter to what we already know of these people?
Conclusion
Being able to evaluate information is important, but if we have been exposed to lessons on doing so, we have often been presented with some kind of framework that asks us to check the boxes to see if an information source satisfies some pre-existing criteria. For example, such a checklist might ask us to ask: who are the authors? do they have a good reputation? is the publisher respected? is the motive to publish based on profit? are they selling something with the information they provide?
While that may be an important part of the process in evaluating information, it's also insufficient. A more thorough way to evaluate information is to think of the information more broadly and holistically and how topics are presented differently based on narratives. That is, to think of the stories that are being told, and how the information is contextualized. Once we see the story, we have four methods to investigate a piece of information's credibility. We can read laterally. That is, we can read multiple articles on a topic and compare them. Then we can test each stories':
- argumentative or structural coherence
- material coherence, and
- characterological coherence.
By the way, this method can also apply to the stories we tell ourselves or about ourselves or about our beliefs. It's important, that is, to evaluate the information we believe to be true as much as it is to evaluate what others posit to be true.
Regarding lateral reading, I think services like Wikipedia, ChatGPT, and Bard should be included in sources when reading laterally. That is, if we want to learn more about a topic, we can look at the standard sources, such as news articles or research articles. However, we can also refer to Wikipedia or the AI chatbots to sound out what we are reading about. The more we read about a topic from a variety of sources, the more likely we'll get a better overview of the stories being told about the topic.
References
Fisher, W. R. (1989). Human communication as narration: Toward a philosophy of reason, value, and action. University of South Carolina Press. https://doi.org/10.2307/j.ctv1nwbqtk
Welcome to Your Wikipedia Project Adventure!
This is the final part of this course, and it's time to put your newly acquired skills to the test. Welcome, therefore, to the section where you'll learn how to contribute to one of the largest knowledge repositories on the internet: Wikipedia!
First off, Wikipedia isn't just any website; it has its own style, rules, and community. We'll cover how to edit Wikipedia articles in a way that fits with Wikipedia's unique format and expectations. This will help you contribute effectively to this knowledge platform.
Choosing the right articles is crucial. The articles should be on topics that you're interested in and where you can make a meaningful contribution. We'll discuss strategies to help you select the perfect article that aligns with your interests and the needs of the Wikipedia community.
Then, it's all about research. You have already learned some things about searching library databases and the web, and going forward, you'll learn how to use specific library and web resources to find reliable information for your chosen Wikipedia articles.
Finally, we'll tackle how to use your Reference Manager to save and curate your found sources. Organizing your research is key to making your Wikipedia editing process smooth and stress-free.
This section isn't just about learning; it's about making a real-world impact by enhancing the quality of information available to millions of people and contributing to the collective knowledge of the world!
Wikipedia Project: Setup
By the end of this course, you will edit several Wikipedia articles. To help prepare you, this guide provides an introduction to contributing to Wikipedia. While actual editing won't begin until later in the course, it's important to start learning the process now in order to plan and prepare.
The goal: Wikipedia's accuracy and credibility depend on good references in its articles. Good references allow readers to verify information and explore topics further. By adding references to Wikipedia articles, this project will also serve as a valuable learning exercise: it will allow you to develop your research, critical thinking, and information evaluation skills, all while contributing to a valuable online resource. In short, this project will encourage you to draw upon all the topics we have covered since we started this course.
Readings
Please review the following Wikipedia articles to understand important editing practices.
Becoming a Wikipedia Editor
Your task is to identify three Wikipedia articles that you would like to edit by adding references to them. Over the next few weeks, you will search for information sources from the library and the web to add to these selected articles. You will collect those sources in your reference manager (RM), and then later edit the Wikipedia articles by adding these references to them.
To help prepare you for this, we have covered the following topics that will help us with our project:
- Information sources and resources
- Reference managers
- Web information retrieval
- Database information retrieval
- Evaluating information sources
Upcoming lessons will continue coverage of specific library and web resources, how to use them, and how to incorporate them into your work flows. Over the next few weeks, you should have collected enough material to edit the Wikipedia articles that you identify this week.
Details
For this week, your task is to identify three Wikipedia articles to edit. You may choose any articles, but note that some articles have stricter editing policies. You will analyze the article for lack of references or articles that need more references, and collect sources that will address those shortcomings using the search skills and resource knowledge you have acquired this semester. For the main project, you will need to add two references to each of the three articles. Overall, you will need to identify three sources from the library and three from the general web. At the end of this semester, in the second part of this project, you will edit the Wikipedia articles, and add your references to the articles that you identified.
For now, your job is to:
- Create an account on Wikipedia (if you do not already have one).
- Identify three Wikipedia articles that you would like to add references to.
- Begin to collect and describe at least six sources over the course of the
next few weeks that you will use as references.
- Three sources will come from the library.
- Three sources will come from the general web.
Library Sources: Part 1
Readings
Please review the aspects of InfoKat searching here:
- Naas, D. (n.d.). Research Guides: InfoKat Discovery: Getting Started. Retrieved August 3, 2022, from UK LibGuides.
Introduction
We have already discussed database information retrieval, and we have also taken an initial look at UK Libraries' discovery service, InfoKat. In this section we're going to explore a bit more the places where we can apply that knowledge. Specifically, we will take another look at UK Libraries, which provides access to many millions of sources that include books (ebooks and print books), databases, journals, archival works, image collections, multimedia collections, and more.
In this and the next section, I'll focus on two of the most common usages of the library's resources:
- InfoKat: for searching books and also database offerings
- Databases: for searching specific database offerings
However, it's worth your time to explore the UK Libraries' website
in order to learn what you have access to.
Or even, you could use Google's :site
operator
to locate resources of interest that the library provides:
[something of interest] site:libraries.uky.edu
For example, let me point out that UK Libraries has multiple Research Guides that cover how to explore and conduct research in specific areas and domains. The site also highlights specific librarians to contact if additional support is needed. Here's the Research Guide for those interested or majoring in Information Communication Technology.
InfoKat
InfoKat is short for InfoKat Discovery, and
is the primary search tool for browsing and finding materials from UK Libraries' collections, and a great place to begin your research. InfoKat searches our libraries' physical holdings together with a majority of the individual databases to which we subscribe About InfoKat.
In short, it's the modern equivalent of an online card catalog system, but because it's modern, it's much more than a card catalog. Not only can InfoKat locate books on the shelves at W.T. Young or at the other library locations at UK, InfoKat is also good for discovering digital collections, database resources, and more. And if the library doesn't have a source, InfoKat can facilitate an interlibrary loan request. Although interlibrary loans mean that we won't get immediate access to the requested source, it's often very fast. In my experience, I usually receive PDF copies of requested journal articles within 24 hours, and for many books that I've requested through interlibrary loan, I generally receive copies within one to three days.
InfoKat also works well with Zotero and other reference management software. I'll cover that in a follow up demonstration video.
Searching InfoKat
There are two initial ways to search InfoKat:
- Basic search
- Advanced search
Basic search works just like how you would use Google or some other search engine. You enter a query into the search box and press enter.
However, because the corpus you're querying is structured, like a database, it's not necessarily advisable to use natural language for your queries like you might in a web search engine. Remember that our three principles of information retrieval:
- Principle 1: We should understand that the basic information retrieval model centers on documents. Anything indexed in a database or on the web is treated as a document. Documents include text, sound, images, video, etc.
- Principle 2: We should understand that documents do not exist independently of other documents. This is called the corpus. For the web, the corpus is organized like a file system, much like the file system on your personal computers. In a bibliographic database, the corpus is organized by predefined fields such as author names, title names, subject terms, etc.
- Principle 3: Our queries are not divorced from the documents nor the corpus nor the organization of the corpus. These things are all intertwined. Each time we search, search engines and databases compare documents in the corpus to each other and to how they are organized based on our queries, and then rank order (in some way) those documents by way of that comparison. Hence, when we construct queries, it's useful to think about the content (corpus)
Therefore, queries should always consider the corpus we're searching. This means you should devise queries that highlight the subject matter. You can do this by picking one or more keywords for your queries that express the topic that you're interested in retrieving. Also, because the InfoKat corpus you're searching is structured, advanced search offers fine-tuned, precision search functions that let us search by specific fields, dates, and includes Boolean logic.
Basic Search
Basic search is a great place to start your research, especially when starting on a new project. From there you can use hints that you see in your retrieval results to practice pearl growing or to refine and narrow your search results.
You can apply pearl growing by noticing title information, subject terms, publication information, or more and as a way to follow up on leads/hints you find in the results you retrieve. You can also refine results by availability, resource type, subject heading, language, and more.
Like web search engines, term order matters. Therefore, you should enclose queries in double quotes in order to force InfoKat to return results using the exact terminology and to return results in the same order as your terms. For example, if my query is:
web development
Then results will include titles like:
Development of a web tool to ...
But if I wrap my query in quotes, like so:
"web development"
Then results will literally reflect those terms and in that order. For example:
Hands-On Full Stack Web Development with Angular...
Advanced Search
Advanced search in InfoKat provides a form where you can focus on constructing precise queries. You can apply a variety of search filters that limit queries to specific fields in the structured data, such as title information, author/creator information, and subject terms.
Instead of using double quotes to force results to match your query, you have other options. These include the ability to state whether the search results should contain the terms in your query, match exactly the terms in your query, which is like using double quotes, or whether the search results should start with the same terms as your query.
You can also use Boolean logic in advanced search. This is especially helpful as you refine your queries. For example, if you're interested in web development but not interested in "embedded web development", you can use the Boolean NOT to removed retrieved records that contain the word "embedded" from the results. You can also limit results by time period (publication date) in order to focus on either historical works or recent works.
If you're using Zotero and the browser add-on, you will see a folder icon in your browser bar when you're on the result page. This will let you save multiple items to your Zotero library or to a specific Zotero folder in your library.
Record Page
When you find a result that interests you, click on the link to get more information and do more with the result. From here you can save the specific item to Zotero, but InfoKat let's you export citations manually, too. You can have the item emailed to you, or print them out, and more.
If the item is available as a print item, InfoKat will tell you where in the library it's located (i.e., shelf and floor), and it will also tell you which library the book is located, since UK Libraries has many locations aside from W.T. Young. If the item is not available electronically or in print at one of the UK Library locations, this is where you can request the items via interlibrary loan.
When you sign in via your linkblue information, you can request the item directly, and if you have loans out, you can request that they be renewed.
Conclusion
UK Libraries provides access to millions of items in both digital and print versions. In this section, I focused on accessing their collections using InfoKat Discovery, and we focused on using Basic and Advanced Search.
It's fine to know the basics of a technology like InfoKat, but it's another level to integrate this technology into your workflow. In order to do that, you should use Zotero, or your reference manager of choice, to save items to you Zotero library. As you save items, return to them, read them, and take notes on them. This process will become streamlined and feel natural over time, and eventually you'll have amassed your own personal knowledge repository.
In the next section, I'll focus on specific databases that UK Libraries provides access to for more topical searches.
Library Sources: Part 2
UK Libraries provides access to many millions of sources that include books (ebooks and print books), databases, journals, archival works, image collections, multimedia collections, and more. In the prior section, I focused on using Infokat to search some of these collections. In this section, I focus on a handful of the databases that UK Libraries offers.
I can only focus on a handful of databases because UK Libraries, as a major research institution and because of its large student body and wide range of majors, provides access to 734 total databases (as of spring 2024). Many of these databases are specialized (e.g., African American and Africana Studies, Appalachian Studies) or cover a broad general research area (e.g., Chemistry, Education). But a few are designed to be super broad; that is, they are databases of databases. I've already covered Academic Search Complete in prior sections, which is one example of a database of databases. In this section, I'll cover another one plus some citation databases.
Remember that if you have desktop Zotero plus the Zotero browser add-on installed, Zotero will automatically recognize when you have a web page open for a specific item in of these databases. Thus, use Zotero when examining these databases to collect information on your Wikipedia topic.
While I will only cover a few databases in this section, I encourage you to explore all that's offered. To access the databases, you can browse the list on UK Libraries' website:
You can also search that list using topical keywords in the search box, you can limit results to specific subject areas, and you can focus on database types. Like document types in the ASC database, database types indicate something about the source. Potential types include:
- Audio/Video
- Digitize primary sources
- E-book collections
- Government documents
- Image collections, and more.
Factiva
Factiva is a helpful database for locating news, market, and company information. You can search against subjects, industries, within regions, and you can add other limits. There are a slew of some advanced search operators that help power up your search. You can read through the list of operators and field codes to use when searching Factiva.
We can try an example search. Let's say I'm interested in any news about Google and open source software. To search this in Factiva, I can approach the query simply, like so (being sure to wrap open source in quotes):
"open source" AND google
This search was conducted on Aug 1, 2022, and you can see there are 4,244 results.
On the left side you can see a distribution of articles by date of publication, a list of relevant companies that appear in the documents that were retrieved, as well as lists of sources, subjects, industries, languages, regions, and so forth.
If I wanted to export any of these documents, I can click on specific check boxes and export the results in various ways, either as RTF or a PDF files, or have it email the results to me or print them. We can look at the publication data distribution and note whether there are more results on a given day than on others. This may suggest a hot news day for this particular topic and that may be something we want to explore.
Let's go back to the search builder. Say I found that the previous set of results were hit or miss and, as a result, I want to refine my search. Now I can try the adjacent operator. The adjacent operator tells the database to only return documents where the query terms appear within a set amount of space between each other. The assumption is that the closer any terms are to each other, the more likely the document will be about those terms. Thus, if I replace the and operator with the adj5 operator, I ask Factiva to return documents where the term open source is within five words of the term Google:
"open source" adj5 google
You can see that the results are much different than the previous one. Here I only have 205 results. The list of companies have also changed, and more. If we investigate any of these documents, you and confirm that our two terms, which are highlighted, appear within five words of each other in all the results.
As stated, there are many operators in Factiva besides adj[N]. At that link, you can see that the standard Boolean operators are available: and, or, not. There are more proximity and other operators, such as:
- w/N
"open soruce" w/5 google
- like the
adj
operators - same
"open source" same google
- terms must appear in same paragraph
- near[N]
"open source" near5 google
- like
adj
but bi-directional
- atleastN
atleast5 google and "open source"
- 'google' must appear at least 5 times in document
Remember that you can create an account for Factiva if you want to save or export your searches. Zotero is also capable of extracting bibliographic information from Factiva and will recognize that a source document is, for example, a news article or like.
Web of Science
Web of Science (WoS) is an abstract & indexing citation database. This means that the database does not directly provide full text access but it does link to UK Library when full text is available for results. As a citation database, it also provides the number of citations each result has received, and this is a way to find additional relevant documents.
The Core Collection is the default collection/database. This is Web of Sciences' main database and includes coverage of the sciences, the arts, and the humanities. WoS offers other databases that mostly cover the sciences, and you can search all of those databases at the same time, but it's often better to focus on the core collection when starting.
Let's try a search. We can try our open source and Google search, as seen in Figure 3:
I can keep the default field search set to All Fields, or focus on other fields, like Topic, which searches titles, abstracts, and keywords.
As of August 1, 2022, this query retrieves 1,134 results. Let's say that my search is a bit too broad still, and I want to refine my query to narrow my results. Just like in Factiva, WoS offers a proximity operator called NEAR. Let's try it out with the following query on the WoS advanced search page:
Now there are only 9 results, and if I examine the title, abstract, or keywords for the results, I'll see that the term "google" is placed within five words of the term "open source". If I want to really narrow down my search, I can change the field to Title only.
The default results list is to show articles that are published more recently. I can change this default sorting method so that WoS sorts based on sources that have the highest citations first. Once I do this, I can go to the right side, and look at the Times Cited link and see which articles have been cited the most. This is what makes WoS a citation database. We don't have to use WoS as a citation database, but this is what separates WoS from many other scholarly databases.
Theoretically, each one of these citing articles should be related to the article that is cited by them. I can them peruse these citing articles to help me find even more relevant sources of information.
Instead of basic search, we can search by author, cited reference, and more. If you click on the big question mark button in WoS, you'll find a guide on how to use WoS. The guide includes some tips on the use of various search operators, including the NEAR operator as well as the Boolean operators.
Remember that Web of Science doesn't offer direct access to content, but notice that there is this Full Text @ W. T. Young link at the bottom of some records in the search results. This link is connected to Infokat, which knows that if the article is available, Infokat can retrieve it. If not, then we can request it through interlibrary loan. Also, in some cases there's also a link to look up the full text in Google Scholar if the source is freely available on the web (this is usually called open access).
Remember that you can create an account for WoS if you want to save your searches or create folders (called Marked Lists) in WoS. Although the vendor that provides WoS is also the same vendor that provides the EndNote reference manager, Zotero is also capable of extracting bibliographic information here.
JSTOR
JSTOR is multi-disciplinary database. Like other databases, you can limit results by Item Type, Language, Publication Date, subject area, and more. JSTOR also provides proximity search using the NEAR operator.
JSTOR covers subjects such as:
- Arts
- Business & Economics
- History
- Medicine & Allied Health
- Science & Mathematics
- Security studies
- Social Sciences, and more.
Each of these subject areas includes access to many journal titles, and therefore, many journal articles. JSTOR has long focused on back issues of journals, but in recent years has made moves to include current literature and open access content (this is content that is freely available). The content in JSTOR is high quality, peer-reviewed work, which makes JSTOR a great place to gather documents on a topic that you want to research in-depth. My "open source" and google query for Images returns 26 results.
JSTOR also includes ARTSTOR. which is located in the JSTOR images search above. ARTSTOR is database of art and multimedia objects, much of which is also available as open access.
Again, remember that if you have desktop Zotero plus the Zotero browser add-on installed, Zotero will automatically recognize when you have a web page open for a specific item in in JSTOR. Thus, you can use Zotero with JSTOR to collect information on your Wikipedia topic.
Conclusion
That covers Factiva, Web of Science, Google Scholar, and JSTOR. Remember that Factiva is a general-interest news database; WoS and Google Scholar are both citation, scholarly databases (I'll cover Google Scholar in the next lesson); and JSTOR is a scholarly and image database.
In the next section, we begin to cover web sources.
Web Sources: Part 1
Many of us spend most of our time on the web visiting a handful of sites. These sites vary by country, but in the U.S., we spend much of our time on Google, YouTube, Facebook, Twitter, Instagram, Netflix, TikTok, Wikipedia, and a few others.
That's fine, of course, but the web is composed of billions of web pages, and many are worth knowing and exploring.
We also spend most of our time on a limited number of domains.
This includes sites ending with .com, .org, .edu, etc.
But remember that you've already learned how to search the web,
and if you're interested in learning more about what's out there,
I encourage you to add a :site
operator to your web queries
if you want to vary things up every once in a while.
Remember that there nearly 1,500 top level domains,
and it can be fun to add random ones to your searches:
asthma site:health
(linux or windows or macos) site:computer
That said, in this and the next section I want to cover a few specific sites that are great information resources.
Google Scholar
Let's try Google Scholar now. Google Scholar isn't technically a library database since it's freely accessible on the web, but I will show you how to connect it to Infokat so that it functions like WoS or other databases that we've covered.
Google Scholar doesn't have the kind of search operators that either Factiva or Web of Science have, but it's a freely available citation database that indexes a lot of content. In Google and related search products, like Google Scholar, the AND operator does not need to be specified. Thus the following search is translated as: "open source" AND google
You can see here we get a lot of results. Maybe even too much. If we want to limit our results, we have to use other tricks. The most common one is simply to add more specific keywords to our query. E.g., if I add "android" to the query, that reduces the results.
"open source" google android
There is an advanced search option. However, since Google Scholar is indexing full text sources and not structured bibliographic records, this means that it can't offer the kind of advanced search we've seen in database searches. It is pretty useful, though. For example, Figure 6 shows an advanced search for the terms google and "open source" but it excludes the term android in the title, and it asks for results published from 2018 to today. You can see that this substantially narrows our results. As of August 1, 2022, Google Scholar returns only 35 hits and 71 hits as of March 2024.
Other than that, one of the nice things about Google Scholar is that it's also a citation database. You can see the Cited by N link just below each result. If something has been cited, click on that to follow citations.
The reason Google Scholar returns so many more results is because it casts a bigger net than something like Web of Science does, which purposefully casts a smaller net, which is smaller to control for quality. Because of that, I find Google Scholar can return many non-relevant works, but it makes it nice for browsing or discovery. Web of Science shines for more rigorous and methodical literature searches.
The observant among you may have noticed that in Figure 5, there is a View Now @ UK link. The reason Google Scholar provides that link is that I've configured it to talk with UK's Infokat. This is something you have to set up, but it's pretty simple to do. Just click on settings, and then go to Library Links, and then search for the University of Kentucky. Check the box next to it. Be sure that you're signed into Google if you want to save this as a preference. After that, you should see the View Now option when something is available via UK Libraries.
You might find the following three other web-based bibliographic tools useful:
- Crossref https://search.crossref.org/
- Crossref will search all works that have a DOI, or [digital object identifier][doi]. As such, it contains many millions of bibliographic records.
- Semantic Scholar https://www.semanticscholar.org/
- Semantic Scholar is an AI powered bibligraphic search engine that also functions as a citation database. It can provide summaries of articles and other works, lists of citations, citing authors, co-authors of authors, and links to related papers.
- Consesus https://consensus.app/search/
- Consensus build on Semantic Scholar. Instead of a functioning like a traditional search engine, you can ask it questions on research topics. It provides examples on its main search page. It provide citation information and details about studies.
Internet Archive
The Internet Archive is a "non-profit library of millions of free books, movies, software, music, websites, and more."
For example, you might be interested in playing some old PC games that your parents played when they were younger:
The Internet Archive provides access to the Wayback Machine. The Wayback Machine is an archive of the web from its early days to the present. It's fun, for example, to use it to see what the web looked like years ago. For example, this is likely UK's first web page and was captured by the Wayback Machine in 1997:
But the Wayback Machine is also useful to retrieve web pages and sites that have been shutdown or removed. That is, if you have a broken URL, you can enter the URL in the Wayback Machine and see if the original page was archived.
The Internet Archive is also a library, and offers collections on a vast range of topics and links to all sorts of media, including text, audio, video, and images.
You can view its main collections on the home page of the Internet Archive. You can also search. I've found, for instance, scanned yearbooks from my college.
The Internet Archive oversees The Open Library. You can use the Open Library to check out and read books for free, just like you would use a physical library.
DPLA
The DPLA (Digital Public Library of America) is a shared repository of content that brings together digital and digitized sources from libraries, museums, and archives across the U.S.
The DPLA is great for browsing, but they also provide guides for those interested in using the DPLA for Education, Family Research, Lifelong Learning, and Scholarly Research.
Like a library, museum, or archive, the DPLA offers:
- featured exhibitions
- primary source sets
- the ability to browse by topic
- the ability to browser by contributor
- and more.
Library of Congress
The Library of Congress provides a list of bibliographies, research guides, and finding aids on a vast range of topics. Many of the links in this list go directly to digital libraries that focus on specific topics or areas. For example, check out this fun collection of resources on dance manuals published from 1490 through 1920.
The Library of Congress also provides access to digital collections on subjects ranging from American History, War & Military, Art & Architecture, Sports & Recreation, Science & Technology, and more.
Various Government Resources
U.S. Census
The United States Census Bureau is the best way to get various demographic and some economic information about the U.S. You can also get Quick Facts about your local area. You can, for example, also compare demographics by location. Here's a population comparison between Lexington, KY and Cincinnati, OH.
NASA
NASA's website offers tons of sources on all of its major projects. From its homepage, you can download apps, audio & ringtones, e-books, and podcasts. The site also provides information on various missions, like the recent James Webb Space Telescope as well as exciting image and video galleries.
U.S. Bureau of Labor Statistics
The U.S. Bureau of Labor Statistics (BLS) is the go to site for job and economic information. The site takes some exploration to learn all that it offers, but I can provide two examples.
The Data Tools dropbox box provides employment change data for various sectors of the U.S. As of March 2024, we see that manufacturing jobs in the U.S. decreased by an estimated 4,000, government jobs increased by an estimated 52,000, and overall non-farm jobs increased by 275,000.
The CPI Inflation Calculator shows how the value of the dollar has changed over time. For example, I can see that $1.00 in February of 2023 has the same buying power as $1.03 in February of 2024.
This calculator is useful in a lot of ways. For example, the tuition to attend UK for the 2002-2003 academic year was $1,740 per semester, and for the 2023-2024 academic year, it is $6,429. The CPI calculator shows that if tuition increased at the same rate as inflation, then today's tuition cost should only be $3,030.18 per semester (from January 2002 to January 2024). That means the extra $3,398.82 spent on tuition today increased due to other (complicated) factors.
You can also see how home prices have changed. The house I rent was purchased for $101,650 during the summer of 2001. Zillow estimates that it would sell for $317,800 (Spring 2024). But inflation only accounts for a price tag of $177,217. Thus the extra $140,583 or so is factor of other market forces.
Property information like this is generally public information. Fayette County, the seat of Lexington, KY, provides this information at https://fayettepva.com/. You can check your local municipality's website for comparable information. In fact, many city and county websites make available lots of data.
EDGAR: U.S. Securities and Exchange Commission
EDGAR is a go to site if you're thinking about investing in a public company. Of particular interest are the 10-K and 10-Q reports. The 10-K report is an annual report that public companies are required to submit to the SEC. The 10-Q report is the quarterly version.
The 10-K report:
Provides audited annual financial statements, a discussion of material risk factors for the company and its business, and a management's discussion and analysis of the company's results of operations for the prior fiscal year (Form Type Description)
The 10-Q report is unaudited.
The EDGAR search page is pretty straightforward and offers autocomplete as you type. My search query in Figure 1 is a search for Google's (specifically, Alphabet's) last 10-K report, which was filed on February 2, 2022.
If you read the 2024 report, you can see that Google primarily earns income based on advertising, customer subscriptions, devices (e.g., Pixel), and Google Cloud usage. It's working on integrating AI technologies in its core services. It has 182,502 employees. There is much more in this report and similar reports from other public companies.
MedlinePlus
Lastly, I would like to refer you to MedlinePlus, which is part of the National Library of Medicine. The purpose of this site is to serve as a health and medical reference resource for the general public. Although no online site can take the place of professional medical help, MedlinePlus can be an important resource for becoming more informed about various health topics.
Instead of googling that next symptom or condition, I highly encourage you to visit MedlinePlus first. The site also covers wellness topics and provides recipes for a wide range of meals.
Conclusion
Although it's important to know how to search the web well, it's also handy to know about specific go to resources on the web that can provide more in-depth information or that provide more coverage than most Google etc searches can yield. In order to highlight this, in this section I covered a few, I think, super interesting and helpful sites that include the:
- Google Scholar
- Crossref
- Semantic Scholar
- Consensus
- Internet Archive
- Open Library
- DPLA
- Library of Congress
- U.S. Census
- NASA
- U.S. Bureau of Labor Statistics
- EDGAR (SEC)
- MedlinePlus
Be sure to explore these sites, as well as others you find, because many of these just don't come up in your everyday kind of search.
Web Sources: Part 2
We continue this section by covering a variety of web resources that are invaluable but that don't necessarily show up in our everyday web searches. Many of the web resources do not simply report information but require visitors to generate information based on user input or search for resources.
In particular, in this section, I'd like to cover a variety of open educational resources and digital libraries. Your education at UK should be, I hope, super beneficial, but you won't be in college forever, and you may have the kind of mind that wants to continue to grow and learn and improve. These sites may also help with various research projects, but they're also just enlightening and educational.
Fortunately, the number and quality of open educational resources has been blossoming in recent years, and these sites offer free, online textbooks on a vast range of subjects.
Education
Directory of Open Access Books
The Directory of Open Access Books (DOAB) provides references to hundreds of free textbooks or scholarly works on a wide range of topics. It's easy to browse by topic or subject, such as sociology, economics, earth sciences, technology, and more.
OER Commons
OER Commons is probably a bigger resource for open educational content. It provides access to subject areas ranging from applied science to education to mathematics to social sciences. It goes beyond providing access to textbooks and also includes access to various materials types, such as case studies, games, data sets, lesson plans, and more.
University of Kentucky Libraries
UK Libraries provides more links to open educational resources and supports a program for UK faculty to create free, digital textbooks. You can find a lot of this material on UKnowledge, and there is an OER Research Guide that lists more links to a wide range of open educational content.
Online Learning Platforms
Online learning platforms include Coursera, edX, Khan Academy, and others. These sites offer courses on many subjects you'd find in college but for free. You can also pay to earn certificates or degrees. The courses offered on these platforms are like online learning courses. They provide lectures and online activities to work through and to learn the material.
Digital Libraries
In addition to the open educational resources that are available, there are also some really interesting digital libraries that exist and are worth exploring.
The National Science Digital Library
The National Science Digital Library is a digital library that provides educational resources on STEM (science, technology, engineering, and mathematics) topics and for all educational levels. Topics include applied science, mathematics, statistics and probability, trigonometry, technology, chemistry, and much more.
Civil Rights Digital Library
The Civil Rights Digital Library is a digital library that provides resources on the Civil Rights Movement. The library can be browsed by events, places, people, topics, and more. Due to the nature of the topic, some of the content can be disturbing or challenging, but if you are interested in civil rights, then this is an invaluable and necessary resource.
The New York Public Library Digital Collections
The New York Public Library (NYPL) is one of the biggest public libraries in the U.S. with more than 25 million items in its collections. The New York Public Library Digital Collections is an effort by NYPL librarians to make much of that collection accessible to the wider public. The digital collections provide access to "prints, photographs, maps, manuscripts, streaming video, and more." Visitors can browse by collection or search the site.
europeana
europeana is a digital library that acts as a central repository for libraries, museums, and archives across the European Union. In that way, it's very much a European version of the DPLA, which we covered in the prior section.
HathiTrust
HathiTrust provides access to books and other items scanned from major U.S. research libraries, Google, and the Internet Archive. There is a big focus on public domain works, which are works that are free of copyright restrictions. Although that's why I've included it in this section on web resources, UK Libraries is a partner institution, which means you can login to HathiTrust and check out works that are still under copyright.
Project Gutenberg
Project Gutenberg is the oldest online ebook collection on the web, and even predates the web by 20 years. All ebooks one Project Gutenberg are free, and can be read on your computer or ebook reader.
Conclusion
In this section, I largely covered educational resources and digital libraries. These kinds of sites don't generally pop up in our everyday web searches unless we're really looking for them. Thus it's worthwhile to know they exist and explore them if we're interested in the content.
There are two takeaways here on this topic of web resources. First, be good at web search. We covered this in section 5. Second, when search fails, know the good sites to go to. I've only covered a handful of them in this and the prior lecture, so keep exploring the real stuff.
Reference Managers: Finishing Up
Readings
- Notes [Zotero Documentation]. (2017). Retrieved August 3, 2022, from Notes - Zotero
- PDF reader [Zotero Documentation]. (2022). Retrieved August 3, 2022, from PDF - Zotero
- Creating bibliographies [Zotero Documentation]. (2018). Retrieved August 3, 2022, from Bibliographies - Zotero
- Word processor integration [Zotero Documentation]. (2018). Retrieved August 3, 2022, from Word Processors - Zotero
- Styles [Zotero Documentation]. (2017). Retrieved August 3, 2022, from Styles Zotero
Introduction
At this point you should have built up a decent collection of information sources in your Zotero library for your Wikipedia project. You may have even been using it for your other courses, too, and if not, then start! If you need to collect more sources for your Wikipedia article, this is the time to do it. Revisit the previous sections to remind yourself about the existing sources that exist on the web and at the library, and to remind yourself how to search those sources.
In the remaining weeks of this course, your final goal is to edit your chosen Wikipedia articles based on the information resources you've collected.
In order to prepare for the Wikipedia edit, it's time to return to your Zotero collection. You should have been reading the sources you've collected, and taking notes on them in Zotero. Now is a good time to get caught up and add (more) notes or refine and edit them and synthesize them in order to know where to add your citations to your Wikipedia articles.
Zotero and other reference managers offer a number of tools to help with this process. As a reminder, you can use your Zotero browser plugin to automatically add information sources to your Zotero library. You can store those sources in folders in your Zotero library. A folder can be project based. You can also tag each information source for more organization. You can highlight and add notes to PDF copies in Zotero.
In the section below, I will discuss ways to export your work to a word processor program and how to edit your Wikipedia article.
Integrating with Word Processors
Although our main project this semester is to edit a Wikipedia article, in most cases you will want to work on papers in a word processing program like Microsoft Word, Google Docs, or LibreOffice Writer.
Zotero is able to work with the above three word processors. Mendeley and EndNote are primarily geared toward Microsoft Word, and if you elected to use those two RMs, then reach out to your instructor, or search the web, for guidance on using these with those Google Docs or LibreOffice Writer, if you use those, too.
Using Zotero with Word, Docs, or Writer is straightforward. The necessary plugins are already installed when you installed Zotero Desktop, and additional instructions are available, too.
You'll want to use Zotero as you write papers, etc. to insert in-text citations and bibliographies. While Zotero or other reference managers can handle this automatically, you should still review your in-text citations and bibliographies to check for any data errors. If there are errors, it's likely because the item in Zotero is missing metadata for some fields (like author, title, journal title, publication date, etc.). This happens because the source information Zotero extracts metadata from may be incomplete or malformed.
With Zotero, you can add in-text citations like: (Smith, 2007) or (Chan, 2018, p. 144) or "Garcia (2020) stated ...". You can also generate bibliographies using many different styles, like APA, MLA, Chicago, and so forth. Zotero will auto-update the bibliography in your document as you add more in-text citations. To set a default style, in Zotero, click on the Edit button in the Zotero menu bar, select Preferences, click on the Cite tab, and then choose your default style, which will most likely be one of these:
- American Psychological Association 7th edition
- Chicago Manual of Style edition (author-date or full note)
- Modern Language Association 9th edition
- American Medical Association 11th edition
While the preferences window is open, there are two more things to set. First, click on the Export tab, and under Item Format, select Wikipedia Citation Templates from the drop down menu. This will come in handy when you edit your Wikipedia article.
Second, click on the Advanced tab, and then click on the drop down menu under the OperURL section, and scroll down until you find University of Kentucky. This will enable Zotero to retrieve full text items through InfoKat.
Using Zotero and a Word Processor
Once you've set up some basic preferences, you can start writing in your preferred word processor.
Zotero will add a new tab in Microsoft Word, or new toolbar items if you're using Google Docs or LibreOffice Word. You can use these new functions to add in-text citations, to extract and insert Zotero notes for items, and a bibliography.
Conclusion
In this section, I covered the basics of integrating Zotero with your word processor. Although the basics are pretty straightforward, given the variety of options you have available to you, you may need to refer to the Zotero documentation for additional help.
Wikipedia Project: Completion
Readings
- Help:Introduction to referencing with Wiki Markup/1—Wikipedia. (n.d.). Retrieved August 4, 2022, from Verifiability
- Help:Introduction to referencing with Wiki Markup/4. (2021). Reliable Sources
- Wikipedia:References dos and don’ts. (2020). Do's and Don'ts
Introduction
In the Wikipedia Project: Setup you read about editing Wikipedia articles. You were also asked to create an account on Wikipedia and make a small edit to your user page. In this last section, your task is to edit the actual Wikipedia articles that you chose for the project.
By now, you have selected at least six sources for your Wikipedia articles: three sources from the library and three sources from the web. You must add two sources as references to each of your Wikipedia articles. You can mix up the sources. For example, you can add two library sources to one Wikipedia article, two web-based sources to your other Wikipedia article, and then one library and one web-based source to your third Wikipedia article.
When you edit your article, be sure you are logged in to your Wikipedia account when making your edits. If you are not signed in, you will not earn credit for this assignment.
To Edit
- Log into Wikipedia and make sure you are logged in when you make all edits to Wikipedia.
- Begin adding references to your Wikipedia articles. You must integrate two sources in each of your three Wikipedia articles.
GNU Free Documentation License
Version 1.3, 3 November 2008
Copyright © 2000, 2001, 2002, 2007, 2008 Free Software Foundation, Inc. https://fsf.org/
Everyone is permitted to copy and distribute verbatim copies of this license document, but changing it is not allowed. 0. PREAMBLE
The purpose of this License is to make a manual, textbook, or other functional and useful document "free" in the sense of freedom: to assure everyone the effective freedom to copy and redistribute it, with or without modifying it, either commercially or noncommercially. Secondarily, this License preserves for the author and publisher a way to get credit for their work, while not being considered responsible for modifications made by others.
This License is a kind of "copyleft", which means that derivative works of the document must themselves be free in the same sense. It complements the GNU General Public License, which is a copyleft license designed for free software.
We have designed this License in order to use it for manuals for free software, because free software needs free documentation: a free program should come with manuals providing the same freedoms that the software does. But this License is not limited to software manuals; it can be used for any textual work, regardless of subject matter or whether it is published as a printed book. We recommend this License principally for works whose purpose is instruction or reference.
- APPLICABILITY AND DEFINITIONS
This License applies to any manual or other work, in any medium, that contains a notice placed by the copyright holder saying it can be distributed under the terms of this License. Such a notice grants a world-wide, royalty-free license, unlimited in duration, to use that work under the conditions stated herein. The "Document", below, refers to any such manual or work. Any member of the public is a licensee, and is addressed as "you". You accept the license if you copy, modify or distribute the work in a way requiring permission under copyright law.
A "Modified Version" of the Document means any work containing the Document or a portion of it, either copied verbatim, or with modifications and/or translated into another language.
A "Secondary Section" is a named appendix or a front-matter section of the Document that deals exclusively with the relationship of the publishers or authors of the Document to the Document's overall subject (or to related matters) and contains nothing that could fall directly within that overall subject. (Thus, if the Document is in part a textbook of mathematics, a Secondary Section may not explain any mathematics.) The relationship could be a matter of historical connection with the subject or with related matters, or of legal, commercial, philosophical, ethical or political position regarding them.
The "Invariant Sections" are certain Secondary Sections whose titles are designated, as being those of Invariant Sections, in the notice that says that the Document is released under this License. If a section does not fit the above definition of Secondary then it is not allowed to be designated as Invariant. The Document may contain zero Invariant Sections. If the Document does not identify any Invariant Sections then there are none.
The "Cover Texts" are certain short passages of text that are listed, as Front-Cover Texts or Back-Cover Texts, in the notice that says that the Document is released under this License. A Front-Cover Text may be at most 5 words, and a Back-Cover Text may be at most 25 words.
A "Transparent" copy of the Document means a machine-readable copy, represented in a format whose specification is available to the general public, that is suitable for revising the document straightforwardly with generic text editors or (for images composed of pixels) generic paint programs or (for drawings) some widely available drawing editor, and that is suitable for input to text formatters or for automatic translation to a variety of formats suitable for input to text formatters. A copy made in an otherwise Transparent file format whose markup, or absence of markup, has been arranged to thwart or discourage subsequent modification by readers is not Transparent. An image format is not Transparent if used for any substantial amount of text. A copy that is not "Transparent" is called "Opaque".
Examples of suitable formats for Transparent copies include plain ASCII without markup, Texinfo input format, LaTeX input format, SGML or XML using a publicly available DTD, and standard-conforming simple HTML, PostScript or PDF designed for human modification. Examples of transparent image formats include PNG, XCF and JPG. Opaque formats include proprietary formats that can be read and edited only by proprietary word processors, SGML or XML for which the DTD and/or processing tools are not generally available, and the machine-generated HTML, PostScript or PDF produced by some word processors for output purposes only.
The "Title Page" means, for a printed book, the title page itself, plus such following pages as are needed to hold, legibly, the material this License requires to appear in the title page. For works in formats which do not have any title page as such, "Title Page" means the text near the most prominent appearance of the work's title, preceding the beginning of the body of the text.
The "publisher" means any person or entity that distributes copies of the Document to the public.
A section "Entitled XYZ" means a named subunit of the Document whose title either is precisely XYZ or contains XYZ in parentheses following text that translates XYZ in another language. (Here XYZ stands for a specific section name mentioned below, such as "Acknowledgements", "Dedications", "Endorsements", or "History".) To "Preserve the Title" of such a section when you modify the Document means that it remains a section "Entitled XYZ" according to this definition.
The Document may include Warranty Disclaimers next to the notice which states that this License applies to the Document. These Warranty Disclaimers are considered to be included by reference in this License, but only as regards disclaiming warranties: any other implication that these Warranty Disclaimers may have is void and has no effect on the meaning of this License.
- VERBATIM COPYING
You may copy and distribute the Document in any medium, either commercially or noncommercially, provided that this License, the copyright notices, and the license notice saying this License applies to the Document are reproduced in all copies, and that you add no other conditions whatsoever to those of this License. You may not use technical measures to obstruct or control the reading or further copying of the copies you make or distribute. However, you may accept compensation in exchange for copies. If you distribute a large enough number of copies you must also follow the conditions in section 3.
You may also lend copies, under the same conditions stated above, and you may publicly display copies.
- COPYING IN QUANTITY
If you publish printed copies (or copies in media that commonly have printed covers) of the Document, numbering more than 100, and the Document's license notice requires Cover Texts, you must enclose the copies in covers that carry, clearly and legibly, all these Cover Texts: Front-Cover Texts on the front cover, and Back-Cover Texts on the back cover. Both covers must also clearly and legibly identify you as the publisher of these copies. The front cover must present the full title with all words of the title equally prominent and visible. You may add other material on the covers in addition. Copying with changes limited to the covers, as long as they preserve the title of the Document and satisfy these conditions, can be treated as verbatim copying in other respects.
If the required texts for either cover are too voluminous to fit legibly, you should put the first ones listed (as many as fit reasonably) on the actual cover, and continue the rest onto adjacent pages.
If you publish or distribute Opaque copies of the Document numbering more than 100, you must either include a machine-readable Transparent copy along with each Opaque copy, or state in or with each Opaque copy a computer-network location from which the general network-using public has access to download using public-standard network protocols a complete Transparent copy of the Document, free of added material. If you use the latter option, you must take reasonably prudent steps, when you begin distribution of Opaque copies in quantity, to ensure that this Transparent copy will remain thus accessible at the stated location until at least one year after the last time you distribute an Opaque copy (directly or through your agents or retailers) of that edition to the public.
It is requested, but not required, that you contact the authors of the Document well before redistributing any large number of copies, to give them a chance to provide you with an updated version of the Document.
- MODIFICATIONS
You may copy and distribute a Modified Version of the Document under the conditions of sections 2 and 3 above, provided that you release the Modified Version under precisely this License, with the Modified Version filling the role of the Document, thus licensing distribution and modification of the Modified Version to whoever possesses a copy of it. In addition, you must do these things in the Modified Version:
A. Use in the Title Page (and on the covers, if any) a title distinct from that of the Document, and from those of previous versions (which should, if there were any, be listed in the History section of the Document). You may use the same title as a previous version if the original publisher of that version gives permission.
B. List on the Title Page, as authors, one or more persons or entities responsible for authorship of the modifications in the Modified Version, together with at least five of the principal authors of the Document (all of its principal authors, if it has fewer than five), unless they release you from this requirement.
C. State on the Title page the name of the publisher of the Modified Version, as the publisher.
D. Preserve all the copyright notices of the Document.
E. Add an appropriate copyright notice for your modifications adjacent to the other copyright notices.
F. Include, immediately after the copyright notices, a license notice giving the public permission to use the Modified Version under the terms of this License, in the form shown in the Addendum below.
G. Preserve in that license notice the full lists of Invariant Sections and required Cover Texts given in the Document's license notice.
H. Include an unaltered copy of this License.
I. Preserve the section Entitled "History", Preserve its Title, and add to it an item stating at least the title, year, new authors, and publisher of the Modified Version as given on the Title Page. If there is no section Entitled "History" in the Document, create one stating the title, year, authors, and publisher of the Document as given on its Title Page, then add an item describing the Modified Version as stated in the previous sentence.
J. Preserve the network location, if any, given in the Document for public access to a Transparent copy of the Document, and likewise the network locations given in the Document for previous versions it was based on. These may be placed in the "History" section. You may omit a network location for a work that was published at least four years before the Document itself, or if the original publisher of the version it refers to gives permission.
K. For any section Entitled "Acknowledgements" or "Dedications", Preserve the Title of the section, and preserve in the section all the substance and tone of each of the contributor acknowledgements and/or dedications given therein.
L. Preserve all the Invariant Sections of the Document, unaltered in their text and in their titles. Section numbers or the equivalent are not considered part of the section titles.
M. Delete any section Entitled "Endorsements". Such a section may not be included in the Modified Version.
N. Do not retitle any existing section to be Entitled "Endorsements" or to conflict in title with any Invariant Section.
O. Preserve any Warranty Disclaimers.
If the Modified Version includes new front-matter sections or appendices that qualify as Secondary Sections and contain no material copied from the Document, you may at your option designate some or all of these sections as invariant. To do this, add their titles to the list of Invariant Sections in the Modified Version's license notice. These titles must be distinct from any other section titles.
You may add a section Entitled "Endorsements", provided it contains nothing but endorsements of your Modified Version by various parties—for example, statements of peer review or that the text has been approved by an organization as the authoritative definition of a standard.
You may add a passage of up to five words as a Front-Cover Text, and a passage of up to 25 words as a Back-Cover Text, to the end of the list of Cover Texts in the Modified Version. Only one passage of Front-Cover Text and one of Back-Cover Text may be added by (or through arrangements made by) any one entity. If the Document already includes a cover text for the same cover, previously added by you or by arrangement made by the same entity you are acting on behalf of, you may not add another; but you may replace the old one, on explicit permission from the previous publisher that added the old one.
The author(s) and publisher(s) of the Document do not by this License give permission to use their names for publicity for or to assert or imply endorsement of any Modified Version.
- COMBINING DOCUMENTS
You may combine the Document with other documents released under this License, under the terms defined in section 4 above for modified versions, provided that you include in the combination all of the Invariant Sections of all of the original documents, unmodified, and list them all as Invariant Sections of your combined work in its license notice, and that you preserve all their Warranty Disclaimers.
The combined work need only contain one copy of this License, and multiple identical Invariant Sections may be replaced with a single copy. If there are multiple Invariant Sections with the same name but different contents, make the title of each such section unique by adding at the end of it, in parentheses, the name of the original author or publisher of that section if known, or else a unique number. Make the same adjustment to the section titles in the list of Invariant Sections in the license notice of the combined work.
In the combination, you must combine any sections Entitled "History" in the various original documents, forming one section Entitled "History"; likewise combine any sections Entitled "Acknowledgements", and any sections Entitled "Dedications". You must delete all sections Entitled "Endorsements".
- COLLECTIONS OF DOCUMENTS
You may make a collection consisting of the Document and other documents released under this License, and replace the individual copies of this License in the various documents with a single copy that is included in the collection, provided that you follow the rules of this License for verbatim copying of each of the documents in all other respects.
You may extract a single document from such a collection, and distribute it individually under this License, provided you insert a copy of this License into the extracted document, and follow this License in all other respects regarding verbatim copying of that document.
- AGGREGATION WITH INDEPENDENT WORKS
A compilation of the Document or its derivatives with other separate and independent documents or works, in or on a volume of a storage or distribution medium, is called an "aggregate" if the copyright resulting from the compilation is not used to limit the legal rights of the compilation's users beyond what the individual works permit. When the Document is included in an aggregate, this License does not apply to the other works in the aggregate which are not themselves derivative works of the Document.
If the Cover Text requirement of section 3 is applicable to these copies of the Document, then if the Document is less than one half of the entire aggregate, the Document's Cover Texts may be placed on covers that bracket the Document within the aggregate, or the electronic equivalent of covers if the Document is in electronic form. Otherwise they must appear on printed covers that bracket the whole aggregate.
- TRANSLATION
Translation is considered a kind of modification, so you may distribute translations of the Document under the terms of section 4. Replacing Invariant Sections with translations requires special permission from their copyright holders, but you may include translations of some or all Invariant Sections in addition to the original versions of these Invariant Sections. You may include a translation of this License, and all the license notices in the Document, and any Warranty Disclaimers, provided that you also include the original English version of this License and the original versions of those notices and disclaimers. In case of a disagreement between the translation and the original version of this License or a notice or disclaimer, the original version will prevail.
If a section in the Document is Entitled "Acknowledgements", "Dedications", or "History", the requirement (section 4) to Preserve its Title (section 1) will typically require changing the actual title.
- TERMINATION
You may not copy, modify, sublicense, or distribute the Document except as expressly provided under this License. Any attempt otherwise to copy, modify, sublicense, or distribute it is void, and will automatically terminate your rights under this License.
However, if you cease all violation of this License, then your license from a particular copyright holder is reinstated (a) provisionally, unless and until the copyright holder explicitly and finally terminates your license, and (b) permanently, if the copyright holder fails to notify you of the violation by some reasonable means prior to 60 days after the cessation.
Moreover, your license from a particular copyright holder is reinstated permanently if the copyright holder notifies you of the violation by some reasonable means, this is the first time you have received notice of violation of this License (for any work) from that copyright holder, and you cure the violation prior to 30 days after your receipt of the notice.
Termination of your rights under this section does not terminate the licenses of parties who have received copies or rights from you under this License. If your rights have been terminated and not permanently reinstated, receipt of a copy of some or all of the same material does not give you any rights to use it.
- FUTURE REVISIONS OF THIS LICENSE
The Free Software Foundation may publish new, revised versions of the GNU Free Documentation License from time to time. Such new versions will be similar in spirit to the present version, but may differ in detail to address new problems or concerns. See https://www.gnu.org/licenses/.
Each version of the License is given a distinguishing version number. If the Document specifies that a particular numbered version of this License "or any later version" applies to it, you have the option of following the terms and conditions either of that specified version or of any later version that has been published (not as a draft) by the Free Software Foundation. If the Document does not specify a version number of this License, you may choose any version ever published (not as a draft) by the Free Software Foundation. If the Document specifies that a proxy can decide which future versions of this License can be used, that proxy's public statement of acceptance of a version permanently authorizes you to choose that version for the Document.
- RELICENSING
"Massive Multiauthor Collaboration Site" (or "MMC Site") means any World Wide Web server that publishes copyrightable works and also provides prominent facilities for anybody to edit those works. A public wiki that anybody can edit is an example of such a server. A "Massive Multiauthor Collaboration" (or "MMC") contained in the site means any set of copyrightable works thus published on the MMC site.
"CC-BY-SA" means the Creative Commons Attribution-Share Alike 3.0 license published by Creative Commons Corporation, a not-for-profit corporation with a principal place of business in San Francisco, California, as well as future copyleft versions of that license published by that same organization.
"Incorporate" means to publish or republish a Document, in whole or in part, as part of another Document.
An MMC is "eligible for relicensing" if it is licensed under this License, and if all works that were first published under this License somewhere other than this MMC, and subsequently incorporated in whole or in part into the MMC, (1) had no cover texts or invariant sections, and (2) were thus incorporated prior to November 1, 2008.
The operator of an MMC Site may republish an MMC contained in the site under CC-BY-SA on the same site at any time before August 1, 2009, provided the MMC is eligible for relicensing.
ADDENDUM: How to use this License for your documents
To use this License in a document you have written, include a copy of the License in the document and put the following copyright and license notices just after the title page:
Copyright (C) YEAR YOUR NAME.
Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.3 or any later version published by the Free Software Foundation; with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license is included in the section entitled "GNU Free Documentation License".
If you have Invariant Sections, Front-Cover Texts and Back-Cover Texts, replace the "with … Texts." line with this:
with the Invariant Sections being LIST THEIR TITLES, with the Front-Cover Texts being LIST, and with the Back-Cover Texts being LIST.
If you have Invariant Sections without Cover Texts, or some other combination of the three, merge those two alternatives to suit the situation.
If your document contains nontrivial examples of program code, we recommend releasing these examples in parallel under your choice of free software license, such as the GNU General Public License, to permit their use in free software.