Electronic Access
By the end of this section, you should be able to:
- Understand the Principles of Access: Students will learn that access is the foundational principle of librarianship, shaping discussions around censorship, usability, and information retrieval, especially in the context of electronic resources.
- Differentiate Proactive and Reactive Troubleshooting: Students will explore the framework provided by Samples and Healy (2014), learning how to manage electronic access through proactive troubleshooting (preventing issues before they occur) and reactive troubleshooting (addressing issues reported by patrons).
- Analyze the Complex Nature of Electronic Resource Management: Students will critically engage with the challenges of managing constantly shifting electronic resources, from changing URLs to metadata compatibility issues, and how these issues differ from managing physical collections.
- Authentication Technologies: Students will understand the key authentication technologies used by libraries, including IP/proxy-based systems like EZproxy and SAML-based systems like OpenAthens, and their implications for access and user privacy.
- Impact of Workflow Standardization and Organizational Structure: Students will learn the importance of standardized workflows in improving the management of electronic resources and the challenges posed by diverse organizational structures across libraries.
Introduction
Access is the paramount principle of librarianship. All other issues, from censorship to information retrieval or to usability, are on some level derived from or framed by that principle of Access.
This week we devote ourselves to a discussion of electronic access. To start, let's begin with Samples and Healy (2014), who provide a nice framework for thinking about managing electronic access. They include two broad categories, proactive troubleshooting and reactive troubleshooting of access.
- proactive troubleshooting of access: "defined as troubleshooting access problems before they are identified by a patron". Some examples include:
- "letting public-facing library staff know about planned database downtime"
- "doing a complete inventory to make sure that every database paid for is in fact 'turned on'
- reactive troubleshoot of access: "defined as troubleshooting access issues as problems are identified and reported by a patron".
Some examples include:
- "fixing broken links"
- "fixing incorrect coverage date ranges in the catalog"
- "patron education about accessing full text"
The goal here, as suggested by Samples and Healy (2014), is to maximize proactive troubleshooting and to minimize reactive troubleshooting. The Samples and Healy (2014) report is a great example of systematic study. The authors identify a problem that had grown "organically." They collected and analyzed data, and then generalized from it by outlining a "detailed workflow" to "improve the timeliness and accuracy of electronic resource work." Practically, studies like this promise to improve productivity and work flows and foster job and patron satisfaction. Such studies also help librarians identify the kinds of software solutions that align with their own workflows and patron information behaviors. If interested, I suggest reading Lowe et al., 2021 about the impact of COVID-19 on electronic resource management. Six authors individually describe access issues at their respective institutions. They show how issues of pricing, acquisitions, training, user expectations, and budgets affect electronic access. I suggest reading articles like this in light of the framework provided by Samples and Healy (2014). Stories like these, about this impact of the pandemic on electronic access, can help guide us in developing proactive troubleshooting procedures. They can minimize future issues, pandemic or otherwise, at our own institutions.
Samples and Healy (2014) say something important against a common assumption about electronic resources, particularly those provided by vendors:
The impression that once a resource is acquired, it is then just 'accessible' belies the actual, shifting nature of electronic resources, where continual changes in URLs, domain names, or incompatible metadata causes articles and ebooks to be available one day, but not the next (The Complexity of ERM section, para. 6).
Hence, unlike a printed work that once cataloged may be shelved for decades or longer without major problems of access, electronic resources require constant and active attention to maintain access. Ebooks, for example, can create metadata problems. Often what's important about scholarly ebooks, in particular, are the chapters they include. Hence metadata describing ebook components is important, along with providing links to those chapters in discovery systems. This difference between item-level cataloging and title-level cataloging, as Samples and Healy describe, can lead to confusing and problematic results when considering different genres and what those genres contain.
Or, note that they discuss the series of links involved starting from the source of discovery to the retrieval of an item. It can be difficult to determine which of these links and which of those services is broken when access becomes problematic. From our last section, consider all the URLs that are processed and all the technologies used in going from target to source.
Let me highlight key findings from their report:
- Workflows: why does this keep coming up? It's because workflows help automate a process. They simplify and smooth out what needs to be done. This is only possible when things are standardized.
- Staffing: Part of the problem here is that ERM has had a major impact on library organizational structure, but one where different libraries have responded differently. This lack of organizational standardization has its benefits regarding overall management practices and cultures, but it also has drawbacks. These drawbacks form the difficulty in establishing effective, generalized workflows that include key participants and minimize dependencies on any one person.
- Tracking: if there's no tracking of usage, there's no method to systematically identify patterns in problems. And if that's not possible, then there's no method to solve those problems proactively. It becomes all reactive troubleshooting. And reactive troubleshooting, as Samples and Healy indicate, results in poor patron experiences. We'll discuss tracking during the week on Evaluation and Measurement section.
We commonly get the line that discovery systems are a great solution to all the disparate resources that librarians subscribe to. Or, if we do think about problems with such systems, we are often presented with a basic information retrieval problem: such that the larger the collection to search, the more likely a relevant item will get lost in the mix. Carter and Traill (2017) point out that these discovery systems also tend to reveal access problems as they are used. The authors provide a checklist to help track issues and improve existing workflows.
Buhler and Cataldo (2016) provide an important reminder that the mission of the electronic resource librarian is to serve the patron. This should remind us that the internet and the web have flattened genres. By that I mean they have made it difficult to distinguish among works like magazine articles, news articles, journal articles, encyclopedia articles, ebooks, etc.
Myself, I grew up learning about the differences between encyclopedia articles, journal articles, magazine articles, newspaper articles, book chapters, handbooks, indexes, and dictionaries because I grew up with the print versions. By definition, these works were tangible things that looked different from each other. Today, a traditional first year college student was born around the year 2006 and grew up reading sometime in the last decade. The problem this raises is that although electronic resources are electronic or digital, they are still based on genres that originated in the print age. Yet as digital works, they lack the physical characteristics that distinguished one genre from the other. For example, by looking at each web page, what's the difference between a longer NY Times article (traditionally a newspaper article) and an article in the New Yorker (traditionally a magazine article)? Aside from some aesthetic differences, they are both presented as web pages.
Bracketing aside my years of experience with these sources, it's not altogether obvious, based on any kind of cursory examination, that we can tell that they're entirely different genres. However, there are important informational differences between the two, how they were written, how they were edited, how long they are, and who they were written by that lead us to consider them as different genres. Even Wikipedia articles pose this problem. Citing an encyclopedia article was never an accepted practice, but this was only true for general encyclopedias. It was generally okay to cite articles from special encyclopedias because they focused on limited subject matters like art, music, science, culture, and were usually more in-depth in their coverage. Examples include the Encyclopedia of GIS, the Encyclopedia of Evolution, The Kentucky African American Encyclopedia, The Encyclopedia of Virtual Art Carving Toraja--Indonesia, and so forth. Studies show that Wikipedia provides in-depth coverage like some special encyclopedias and short articles like some general encyclopedias, thus helping to flatten the encyclopedia genre (general vs. special), too.
The flattening holds true for things like Google. The best print analogy for Google is that of an index, which was used to locate keywords that would refer users to source material. The main difference between these indexes and Google is that the indexes were produced to cover specific publications, like a newspaper, or specific areas, like the Social Science Citation Index or the Science Citation Index. Both of these are actual, documented, historical precursors to Google and to Google Scholar. But today, these search engines are erroneously considered source material (e.g, "I found it on Google"). In comparison, we would not have considered a print index as source material, but rather as a reference item, since it referred users to sources. Nowadays, it's all mixed up, but who can blame anyone?
Example print indexes:
- Photos of the New York Times Index
- Photos of the Newspaper Index for the Washington Post
- Photos of the Reader's Guide to Periodical Literature
Access and Authentication
In this section, we'll delve into the technological frameworks that facilitate access to and authentication of library electronic collections. Given that a significant portion of these resources are behind paywalls, libraries employ specialized software to verify user credentials before granting access. These authentication measures are not just best practices but are often mandated by contractual agreements with content providers.
There are two main technologies used to authenticate users. The first is through an IP / proxy server, and the second is through what is called SAML authentication. We address these two authentication types below.
Proxy Authentication
EZproxy (OCLC) is the main product of the first type. When we access any paywalled work, like a journal article, you may notice something like ezproxy.uky.edu in the string of text in a URL. For example, the following is an EZProxy URL:
https://www-sciencedirect-com.ezproxy.uky.edu/science/article/pii/S030645730500004X
Note that UK Libraries, which I use in these examples, is transitioning away from EZProxy and adopting OpenAthens, which is SAML based. More on that below.
The interesting thing about this URL is that it has a uky.edu address even though the article is in a journal that's hosted in Elsevier's ScienceDirect database. The www-sciencedirect-com part of the address is a simple subdomain of ezproxy.uky.edu. You can tell because the components are separated by dashes instead of periods. As a subdomain, it is no different than the www in www.google.com or the maps in maps.google.com. The original URL is in fact:
https://www.sciencedirect.com/science/article/pii/S030645730500004X
As opposed to the first URL, the interesting thing about the original URL is that it is in fact a sciencedirect.com address. Even though "sciencedirect" appears in the uky.edu URL, it is not a "sciencedirect.com" server. They are two different servers, from two different organizations, and are as different as uky.edu and google.com.
The reason we read an article or some other paywalled content at a uky.edu address and not at a, e.g., sciencedirect.com address is because of the way proxy servers work. In essence, when we make a request for a resource, like a journal article or a bibliographic database that's provided by a library, our browser makes the request to the proxy server and not to the original server. The proxy server then makes the resource request to the original server, which relays that content back to the proxy server (EZproxy). This then sends the content to our browser. This means that when we request an article in a journal at sciencedirect.com or jstor.com, our browser never actually makes a connection to those servers. Instead, the proxy server acts as a go-between.
See Day (2017) for a more technical and yet accessible description of the process.
Proxy servers provide access either through a login server or based on the user's IP address. If we're on campus, then our authentication is IP based, since all devices attached to the university's network are assigned an IP from a pre-defined range of IP addresses. This attempts to make access to paywalled content seamless...when on campus.
If we are off-campus, access is authenticated via a login method to the proxy server. When we attempt to access paywalled content from off-campus, we will see an EZproxy login URL. This looks something like this for accessing the ScienceDirect database:
http://ezproxy.uky.edu/login?url=https://www.sciencedirect.com
Aside from ScienceDirect, you can see a list of other subscribed content that requires EZproxy authentication here:
https://login.ezproxy.uky.edu/menu
SAML Authentication
The second main technology used to authenticate and provide access is based on what is called SAML authentication. The main product that provides SAML authentication for libraries is OpenAthens.
SAML, or Security Assertion Markup Language, is an XML-based standard that exchanges and authorizes data between parties, in particular, between an identity provider (IdP) and a service provider (SP).
Unlike a proxy / IP authentication process, SAML's main function is that of a identity verification system. Under this method, libraries offer a single sign-on process, and once authenticated, patrons have access to all SAML ready content or service providers. The process is similar to the Duo Single Sign-On service universities and other organizations use for authentication. In the OpenAthens case, users are authenticated via an identity provider, which would be the library or the broader institution (and usually via some other software service). The library provides identification by connecting to its organization's identity management system, such as adfs, or Active Directory Federation Services. Once a patron has been authenticated, a confirmation is sent to the content provider, which then provides access to the content to the patron.
For more details, see What is SAML? and this detailed OpenAthens software demo.
One of the benefits of the SAML method is that URLs are not proxied. This means that content is not delivered to the patron from a proxy server like EZproxy. Instead, patrons access the original source directly. From a patron's perspective, this facilitates sharing clean, non-proxied URLs. As far as I can tell, one of the downsides might be privacy related. With a proxy server, users don't access the original source, but instead the source is delivered through the proxy server, which by definition, masks the patron's IP address and browser information. This wouldn't be true under the SAML method.
Note: The library would have access to EZproxy logs, which would include much of the user's activity while using the proxy.
In a bit more detail, a SAML-based authentication process is described below:
- User Request: A user tries to access a resource on the service provider (e.g., a paywalled library article).
- Redirection: If the user is not already authenticated, the service provider redirects the user to the identity provider (IdP), often passing along a SAML request.
- Authentication: The IdP challenges the user to provide valid credentials (e.g., username and password). If the user is already authenticated with the IdP (e.g., already logged into a university portal), this step may be skipped.
- Assertion Creation: Upon successful authentication, the IdP generates a SAML assertion, which is an XML document that includes the user's authorization information.
- Response: The IdP sends this SAML assertion back to the service provider, often as part of a SAML response package.
- Verification: The service provider verifies the SAML assertion (often by checking a digital signature) to ensure it came from a trusted IdP.
- Access Granted: Once the assertion is verified, the service provider grants the user access to the requested resource.
- Session: A session is established for the user, allowing them to access other resources without needing to re-authenticate for a certain period.
In the context of a library, the IdP could be a university's authentication system, and the service provider could be a database of academic journals. When a patron tries to access an article, they would be redirected to log in through the university's system. Once authenticated, the university's system would send a SAML assertion to the journal database, confirming that the student is authorized to access the content.
This method is particularly useful for organizations like universities that have multiple service providers (e.g., different databases, internal services, etc.) but want to offer a single sign-on (SSO) experience for their users.
Conclusion
The Samples & Healy (2014) and the Carter & Traill (2017) articles address troubleshooting strategies with electronic resources. One additional thing to note about these readings is how the organizational structure influences workflows and how the continued transition from a print-era model of library processes to an electronic one remains problematic. Even once that transition is complete, both readings make the case that strategy and preparation are needed to deal with these issues.
The Buhler & Cataldo (2016) article shows how confusing e-resources are to patrons and how the move to digital has complicated all genres, or "containers", as the authors name them. Such "ambiguity" has implications not only for how users find and identify electronic resources but on how librarians manage access to them.
I added the EZproxy and OpenAthens content in order to complete the technical discussions we have had in recent weeks on integrated library systems, electronic resource management systems, link resolvers, and standards. These authentication and access technologies complete these discussions, which cover the major technologies that electronic resource librarians work with to provide access to paywalled content in library collections. Both technologies aim to provide seamless access to paywalled content, as nearly as seamless as accessing content via a search engine or other source. Although neither will ever be able to offer completely seamless access to paywalled sources in library collections, the job of an electronic resource librarian is to make sure they work as well as possible. This will often mean working with vendors and colleagues.
Additional Sources
- How SAML works and enables single sign-on
- Differences Between SAML V2.0 and SAML V1.1
- AD FS Overview
Readings / References
Samples, J., & Healy, C. (2014). Making it look easy: Maintaining the magic of access. Serials Review, 40, 105-117. https://doi.org/10.1080/00987913.2014.929483
Carter, S., & Traill, S. (2017). Essential skills and knowledge for troubleshooting e-resources access issues in a web-scale discovery environment. Journal of Electronic Resources Librarianship , 29(1), 1–15. https://doi.org/10.1080/1941126X.2017.1270096
Buhler, A., & Cataldo, T. (2016). Identifying e-resources: An exploratory study of university students. Library Resources & Technical Services, 60, 22-37. https://doi.org/10.5860/lrts.60n1.23
Additional References
Breeding, M. (2008). OCLC Acquires EZproxy. Smart Libraries Newsletter, 28(03), 1–2. https://librarytechnology.org/document/13149
OCLC. (2017, September 22). EZproxy. OCLC Support. https://help.oclc.org/Library_Management/EZproxy
OpenAthens transforms user access to library resources, replacing EZproxy and IP address authentication. (2021, June 2). About UBC Library. https://about.library.ubc.ca/2021/06/02/openathens-transforms-user-access-to-library-resources-replacing-ezproxy-and-ip-address-authentication/
Botyriute, K. (2018). Access to online resources. Springer International Publishing. https://doi.org/10.1007/978-3-319-73990-8
Day, J. M. (2017, April 25). Proxy servers: Basics and resources. Library Technology Launchpad. https://libtechlaunchpad.com/2017/04/25/proxy-servers-basics-and-resources/
Lowe, R. A., Chirombo, F., Coogan, J. F., Dodd, A., Hutchinson, C., & Nagata, J. (2021). Electronic Resources Management in the Time of COVID-19: Challenges and Opportunities Experienced by Six Academic Libraries. Journal of Electronic Resources Librarianship, 33(3), 215–223. https://doi.org/10.1080/1941126X.2021.1949162