Electronic Access

Introduction

Access is the paramount principle of librarianship, and all other issues, from censorship to information retrieval or to usability, are on some level derived from or framed by that principle of Access.

This week we devote ourselves to a discussion of electronic access. To start, let's begin with Samples and Healy (2014), who provide a nice framework for thinking about managing electronic access. They include two broad categories, proactive troubleshooting and reactive troubleshooting of access.

  • proactive troubleshooting of access: "defined as troubleshooting access problems before they are identified by a patron". Some examples include:
    • "letting public-facing library staff know about planned database downtime"
    • "doing a complete inventory to make sure that every database paid for is in fact 'turned on'
  • reactive troubleshoot of access: "defined as troubleshooting access issues as problems are identified and reported by a patron". Some examples include:
    • "fixing broken links"
    • "fixing incorrect coverage date ranges in the catalog"
    • "patron education about accessing full text"

The goal here, as suggested by Samples and Healy (2014), is to maximize proactive troubleshooting and to minimize reactive troubleshooting. The Samples and Healy (2014) report is a great example of systematic study. The authors identify a problem that had grown "organically," collected and analyzed data, and then generalized from it by outlining a "detailed workflow" to "improve the timeliness and accuracy of electronic resource work." Practically, studies like this promise to improve productivity and work flows and foster job and patron satisfaction. Such studies also help librarians identify the kinds of software solutions that align with their own workflows and patron information behaviors. If interested, I suggest reading Lowe et al., 2021 about the impact of Covid-19 on electronic resource management. Six authors individually describe access issues at their respective institutions and show how issues of pricing, acquisitions, training, user expectations, and budgets affect electronic access. I suggest reading articles like this in light of the framework provided by Samples and Healy (2014) because stories like these, about this impact of the pandemic on electronic access, can help guide us in developing proactive troubleshooting procedures minimize future issues, pandemic or otherwise, at our own institutions.

Samples and Healy (2014) say something important against a common assumption about electronic resources, particularly those provided by vendors:

The impression that once a resource is acquired, it is then just 'accessible' belies the actual, shifting nature of electronic resources, where continual changes in URLs, domain names, or incompatible metadata causes articles and ebooks to be available one day, but not the next (The Complexity of ERM section, para. 6).

Hence, unlike a printed work from the long ago print-only era that, once cataloged, may be shelved for decades or longer without major problems of access, electronic resources require constant and active attention to maintain accessibility to them. Ebooks, for example, can create metadata problems. For example, often what's important about scholarly ebooks, in particular, are the chapters they include, and hence metadata describing ebook components is important, along with providing links to those chapters in discovery systems. This difference between item-level cataloging and title-level cataloging, as Samples and Healy describe, can lead to confusing and problematic results when considering different genres and what those genres contain.

Or, note that they discuss how a series of links are involved starting from the source of discovery, e.g., an OPAC or a discovery layer, to the retrieved item, and how difficult it might be in determining which of these links and which of those services is broken when access becomes problematic.

Let me highlight a few key findings from their report:

  • Workflows: why does this keep coming up? It's because workflows help automate a process---simplify and smooth out what needs to be done, and because this is only possible when things are standardized.
  • Staffing: we'll discuss this more in another section, but part of the problem here is that ERM has had a major impact on organizational structure, but one where different libraries have responded differently. This lack of organizational standardization has its benefits regarding overall management practices and cultures, but it also has huge drawbacks---and that's the difficulty in establishing effective, generalized workflows that include key participants, and to minimize as many dependencies on any one person.
  • Tracking: if there's no tracking, there's no method to systematically identify patterns in problems. And if that's not possible, then there's no method to solve those problems proactively. It becomes all reactive troubleshooting, and reactive troubleshooting, as Samples and Healy indicate, results in poor patron experiences. We'll discuss tracking when we during the week on Evaluation and Statistics.

We commonly get the line that discovery systems are a great solution to all the disparate resources that librarians subscribe to. Or, if we do think about problems with such systems, we are often presented with a basic information retrieval problem, such that the larger the collection to search, the more likely a relevant item will get lost in the mix. Carter and Traill (2017) point out that these discovery systems also tend to reveal access problems as they are used. The authors provide a checklist to help track issues and improve existing workflows.

Buhler and Cataldo (2016) provide an important reminder that the mission of the electronic resource librarian is to serve the patron. This should remind us that the internet and the web have flattened genres. By that I mean they have made it difficult to distinguish among works like magazine articles, news articles, journal articles, encyclopedia articles, ebooks, etc. Though the Buhler and Cataldo (2016) reading is student-focused, other studies have hinted at the same issue they describe across other populations. It's important, if possible, to recognize these issues as ERM librarians and work to resolve them in the ways that you would be able to.

Myself, I grew up learning about the differences between encyclopedia articles, journal articles, magazine articles, newspaper articles, book chapters, handbooks, indexes, and dictionaries because I grew up with the print versions, which by definition, were tangible things that looked different from each other. Today, a traditional first year college student was born around the year 2004 and grew up reading sometime in the last decade. The problem this raises is that although electronic resources are electronic or digital, they are still based on genres that originated in the print age, yet they lack the physical characteristics that distinguished one from the other. E.g., what's the difference between a longer NY Times article (traditionally a newspaper article) and an article in the New Yorker (traditionally a magazine article) today in their online forms? Aside from some aesthetic differences between the two, they are both presented on web pages, and it's not altogether obvious, based on any kind of cursory examination, that we can tell, as regular users, that they're entirely different genres. However, there are important informational differences between the two, how they were written, how they were edited, how long they are, and who they were written by that might still lead us to consider them as different genres. Even Wikipedia articles pose this problem. Citing an encyclopedia article was never an accepted practice, but this was only true for general encyclopedias. It was generally okay to cite articles from special encyclopedias because they focused on limited subject matters like art, music, science, culture, and were usually more in-depth in their coverage. Examples include the Encyclopedia of GIS, the Encyclopedia of Evolution, The Kentucky African American Encyclopedia, The Encyclopedia of Virtual Art Carving Toraja--Indonesia, and so forth. There are studies that show that Wikipedia provides the same kind of in-depth coverage of some special encyclopedias, thus helping to flatten the encyclopedia genre, too.

The flattening holds true for things like Google. The best print analogy for Google is that of an index, which was used to locate keywords that would refer to source material. The main difference between these indexes and Google is that the indexes were produced to cover specific publications, like a newspaper, or specific areas, like the Social Science Citation Index or the Science Citation Index, both of which are actual, documented, historical precursors to Google and to Google Scholar. But today, these search engines are erroneously considered source material (e.g, "I found it on Google"). Few, I think, would have considered a print index as source material, but rather as a reference item, since it referred users to sources. Nowadays, it's all mixed up, but who can blame anyone.

Example print indexes:

Access and Authentication

In this section, we'll delve into the technological frameworks that facilitate access to and authentication of library electronic collections. Given that a significant portion of these resources are behind paywalls, libraries employ specialized software to verify user credentials before granting access. These authentication measures are not just best practices but are often mandated by contractual agreements with content providers.

There are two main technologies used to authenticate users. The first is through an IP / proxy server, and the second is through what is called SAML authentication. We address these two authentication types below.

Proxy Authentication

EZproxy (OCLC) is the main product of the first type. When we access any paywalled work, like a journal article, you may notice something like ezproxy.uky.edu in the string of text in a URL. For example, the following is an EZProxy URL:

https://www-sciencedirect-com.ezproxy.uky.edu/science/article/pii/S030645730500004X

Note that UK Libraries, which I use in these examples, is transitioning away from EZProxy and adopting OpenAthens, which is SAML based. More on that below.

The interesting thing about this URL is that it has a uky.edu address even though the article is in a journal that's hosted in Elsevier's ScienceDirect database. The www-sciencedirect-com part of the address is a simple subdomain of ezproxy.uky.edu (you can tell because the components are separated by dashes instead of periods), As a subdomain, it is no different than the www in www.google.com or the maps in maps.google.com. The original URL is in fact:

https://www.sciencedirect.com/science/article/pii/S030645730500004X

As opposed to the first URL, the interesting thing about the original URL is that it is in fact a sciencedirect.com address. Even though "sciencedirect" appears in the uky.edu URL, it is not a "sciencedirect.com" server. They are two different servers, from two different organizations, and are as different as uky.edu and google.com.

The reason we read an article or some other paywalled content at a uky.edu address and not at a, e.g., sciencedirect.com address is because of the way proxy servers work. In essence, when we make a request for a resource, like a journal article or a bibliographic database, that's provided by a library, our browser makes the request to the proxy server and not to the original server. The proxy server then makes the resource request to the original server, which relays that content back to the proxy server (EZproxy), which then sends the content to our browser. This means that when we request an article in a journal at sciencedirect.com or jstor.com, our browser never actually makes a connection to those servers. Instead, the proxy server acts as a go-between. See Day (2017) for a more technical and yet accessible description of the process.

Proxy servers provide access either through a login server or based on the user's IP address. If we're on campus, then our authentication is IP based, since all devices attached to the university's network are assigned an IP from a pre-defined range of IP addresses. This makes access to paywalled content fairly seamless, when on campus.

If we are off-campus, access is authenticated via a login method to the proxy server. When we attempt to access paywalled content from off-campus, we will see an EZproxy login URL. This looks something like this for accessing the ScienceDirect database:

http://ezproxy.uky.edu/login?url=https://www.sciencedirect.com

Aside from ScienceDirect, you can see a list of other subscribed content that requires EZproxy authentication here:

https://login.ezproxy.uky.edu/menu

SAML Authentication

The second main technology used to authenticate and provide access is based on what is called SAML authentication. The main product that provides SAML authentication for libraries is OpenAthens.

SAML, or Security Assertion Markup Language, is an XML-based standard that exchanges and authorizes data between parties, in particular, between an identity provider (IdP) and a service provider (SP).

Unlike a proxy / IP authentication process, SAML's main function is that of a identity verification system. Under this method, libraries offer a single sign-on process, and once authenticated, patrons have access to all SAML ready content or service providers. The process is similar to the Duo Single Sign-On service many universities use for authentication. In the OpenAthens case, users are authenticated via an identity provider, which would be the library or the broader institution (and usually via some other software service). The library provides identification by connecting to its organization's identity management system, such as adfs, or Active Directory Federation Services. Once a patron has been authenticated, a confirmation is sent to the content provider, which then provides access to the content to the patron. For more details, see What is SAML? and this detailed OpenAthens software demo.

One of the benefits of this method is that URLs are not proxied, which means that content is not delivered to the patron from a proxy server like EZproxy. Instead, patrons access the original source directly. From a patron's perspective, this facilitese sharing clean, unproxied URLs. As far as I can tell, one of the downsides might be privacy related. With a proxy server, users don't access the original source, but instead the source is delivered through the proxy server, which by definition, masks the patron's IP address and browser information. This wouldn't be true under the SAML method.

Note: The library would have access to EZproxy logs, which would include much of the user's activity while using the proxy.

In a bit more detail, a SAML-based authentication process is described below:

  1. User Request: A user tries to access a resource on the service provider (e.g., a paywalled library article).
  2. Redirection: If the user is not already authenticated, the service provider redirects the user to the identity provider (IdP), often passing along a SAML request.
  3. Authentication: The IdP challenges the user to provide valid credentials (e.g., username and password). If the user is already authenticated with the IdP (e.g., already logged into a university portal), this step may be skipped.
  4. Assertion Creation: Upon successful authentication, the IdP generates a SAML assertion, which is an XML document that includes the user's authorization information.
  5. Response: The IdP sends this SAML assertion back to the service provider, often as part of a SAML response package.
  6. Verification: The service provider verifies the SAML assertion (often by checking a digital signature) to ensure it came from a trusted IdP.
  7. Access Granted: Once the assertion is verified, the service provider grants the user access to the requested resource.
  8. Session: A session is established for the user, allowing them to access other resources without needing to re-authenticate for a certain period.

In the context of a library, the IdP could be a university's authentication system, and the service provider could be a database of academic journals. When a student tries to access an article, they would be redirected to log in through the university's system. Once authenticated, the university's system would send a SAML assertion to the journal database, confirming that the student is authorized to access the content.

This method is particularly useful for organizations like universities that have multiple service providers (e.g., different databases, internal services, etc.) but want to offer a single sign-on (SSO) experience for their users.

Conclusion

The Samples & Healy (2014) and the Carter & Traill (2017) articles address troubleshooting strategies with electronic resources. One additional thing to note about these readings is how the organizational structure influences workflows and how the continued transition from a print-era model of library processes to an electronic one remains problematic. Even once that transition is complete, both readings make the case that strategy and preparation are needed to deal with these issues. The Buhler & Cataldo (2016) article shows how confusing e-resources are to patrons and how the move to digital has complicated all genres, or "containers", as the authors name them. Such "ambiguity" has implications not only for how users find and identify electronic resources but on how librarians manage access to them.

I added the EZproxy and OpenAthens content in order to complete the technical discussions we have had in recent weeks on integrated library systems, electronic resource management systems, link resolvers, and standards. These authentication and access technologies complete these discussions, which, altogether, cover the major technologies that electronic resource librarians work with to provide access to paywalled content in library collections. Both technologies aim to provide seamless access to paywalled content, as nearly as seamless as accessing content via a search engine or other source. Although neither will never be able to offer completely seamless access as long there are paywalled sources in library collections, the job of an electronic resource librarian is often to make sure they work as well as possible. This will often mean working with vendors and colleagues.

Additional Sources

Readings / References

Samples, J., & Healy, C. (2014). Making it look easy: Maintaining the magic of access. Serials Review, 40, 105-117. https://doi.org/10.1080/00987913.2014.929483

Carter, S., & Traill, S. (2017). Essential skills and knowledge for troubleshooting e-resources access issues in a web-scale discovery environment. Journal of Electronic Resources Librarianship , 29(1), 1–15. https://doi.org/10.1080/1941126X.2017.1270096

Buhler, A., & Cataldo, T. (2016). Identifying e-resources: An exploratory study of university students. Library Resources & Technical Services, 60, 22-37. https://doi.org/10.5860/lrts.60n1.23

Additional References

Breeding, M. (2008). OCLC Acquires EZproxy. Smart Libraries Newsletter, 28(03), 1–2. https://librarytechnology.org/document/13149

OCLC. (2017, September 22). EZproxy. OCLC Support. https://help.oclc.org/Library_Management/EZproxy

OpenAthens transforms user access to library resources, replacing EZproxy and IP address authentication. (2021, June 2). About UBC Library. https://about.library.ubc.ca/2021/06/02/openathens-transforms-user-access-to-library-resources-replacing-ezproxy-and-ip-address-authentication/

Botyriute, K. (2018). Access to online resources. Springer International Publishing. https://doi.org/10.1007/978-3-319-73990-8

Day, J. M. (2017, April 25). Proxy servers: Basics and resources. Library Technology Launchpad. https://libtechlaunchpad.com/2017/04/25/proxy-servers-basics-and-resources/

Lowe, R. A., Chirombo, F., Coogan, J. F., Dodd, A., Hutchinson, C., & Nagata, J. (2021). Electronic Resources Management in the Time of COVID-19: Challenges and Opportunities Experienced by Six Academic Libraries. Journal of Electronic Resources Librarianship, 33(3), 215–223. https://doi.org/10.1080/1941126X.2021.1949162