Systems Librarianship

Author: C. Sean Burns
Date, version 3: 2025-05-08 Email: sean.burns@uky.edu
Website: cseanburns.net
GitHub: @cseanburns

Introduction

The goal of this book is to provide a technical introduction to the basics of systems librarianship using Linux. The book is used alongside a course on systems librarianship that the author teaches.

The course teaches the following skills:

how to use cloud computing resources and create virtual machines;
how to use the Linux command line in order to become more efficient computer users and more comfortable with using computers in general;
how to document technical information using Git and GitHub;
how to create a LAMP server, websites, and create a bare bones OPAC;
how to install and configure content management systems, and;
how to install and configure an integrated library system.

The main overarching goals of this work are:

to foster self-efficacy with computers and an enthusiasm for foundational computer technologies, and
to provide library science students a starting point in a career in systems librarianship or related positions.

I created and began teaching a Systems Librarianship course in 2023 in order to help librarians become proficient in the kinds of technology used to manage and provide electronic resources. I also want to help library science students see systems librarianship as a potential career path.

Since I use this book for my Systems Librarianship course, which I teach in the spring semesters, this book will be a live document. Each semester that I teach this course, I will update the content in order to address changes in the technology and to edit for clarity when I discover some aspect of the book causes confusion or does not provide enough information.

This book is not a comprehensive introduction to systems librarianship. For example, this book does not cover software coding nor managerial duties, like issuing requests for proposals for software products, or budgeting. It is designed as an entry level course in the technical aspects of systems librarianship and is meant to go hand-in-hand with other courses taught in our library science program. That includes my course on electronic resource management but also other courses that my colleagues teach.

A small part of this book draws from my course on Linux Systems Administration, which I teach in the fall semesters in our undergraduate ICT program.

If you use this work, in whole or in part, please reach out to me to let me know. I accept suggestions for improvement, via email or GitHub.

Technical Note

I write the text for this work in Markdown and use mdBook to build the output. The Markdown source code is in GitHub: Systems Librarianship. Use the search function on this site to search for specific topics or keywords. If the reader desires a PDF copy of this work, the printer icon at the top right of the page will print to PDFs.

The content in this book is open access and licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 license. Feel free to fork it on GitHub and modify it for your own needs.

History of Unix and Linux

An outline of the history of Unix and Linux.

Note: this section is borrowed from my Linux Systems Administration course.

Location: Bell Labs, part of AT&T (New Jersey), late 1960s through early 1970s

Before there was Linux, there was (and still is) Unix. Unix began in the late 1960s and was first released in the early 1970s at Bell Labs, part of AT&T in New Jersey, with an operating system called Multics. Multics was an early time-sharing system; i.e., it allowed more than one person to use it the system. Despite its innovative approach, Multics was fraught with issues and slowly abandoned. In the midst of this abandonment, Ken Thompson stumbled upon an old PDP-7 and started writing what would become UNIX. This specific version of UNIX would later be known as Research Unix. The project caught the attention of Dennis Ritchie, the creator of the C programming language, who joined Thompson's efforts. Together they laid the groundwork for a revolution in computing.

Location: Berkeley, CA (University of California, Berkeley), early to mid 1970s

The evolution of Unix continued through the early to mid-1970s at Bell Labs. Ken Thompson visited the University of California, Berkeley, where he helped install Version 6 of UNIX. This marked a significant moment in the system's history. At Berkeley, several contributors, including Bill Joy, played vital roles in its development. Joy was particularly influential. He created the vi text editor, a precursor of the still popular Vim editor, and many other essential programs. He also co-founded Sun Microsystems. This collaborative effort at Berkeley eventually led to the creation of the Berkeley Software Distribution, or BSD Unix, a landmark in the history of UNIX and computing as a whole.

AT&T

Until its breakup in 1984, AT&T operated under a unique agreement with the U.S. government that restricted the company from profiting off patents not directly related to its telecommunications businesses. This arrangement helped shield AT&T from monopolistic charges, but it also came with a significant limitation: they could not commercialize UNIX. The landscape changed dramatically after the breakup of AT&T. With the constraints lifted, AT&T was allowed to release and sell System V UNIX, which would emerge as the standard bearer of commercial UNIX. This transition marked a turning point in the history of computing, positioning UNIX as a central player in the commercial technology market.

Location: Boston, MA (MIT), early 1980s through early 1990s

In Boston, MA, at MIT during the early 1980s through the early 1990s, a significant shift in the software industry was taking place. In the late 1970s, Richard Stallman observed the growing trend in the commercialization of software. As a result, hardware vendors began to stop sharing the code they developed to make their hardware work. This paradigm change was further solidified by the Copyright Act of 1976, which made software code eligible for copyright protection. Stallman battled against this new direction and responded by creating the GNU project, formalizing and embracing the free software philosophy, and developing influential tools such as GNU Emacs, a popular text editor, and many other programs. The GNU project was an ambitious attempt to create a completely free software operating system that was Unix-like, called GNU. By the early 1990s, Stallman and others had developed all the software needed for a full operating system, except for a kernel. However, this encompassing project included the creation of the Bash shell, written by Brian Fox, reflecting a profound commitment to free and open software.

The GNU philosophy includes several propositions that define free software:

The four freedoms, per GNU Project: 0. The freedom to run the program as you wish, for any purpose (freedom 0).

The freedom to study how the program works, and change it so it does your computing as you wish (freedom 1). Access to the source code is a precondition for this.

The freedom to redistribute copies so you can help others (freedom 2).

The freedom to distribute copies of your modified versions to others (freedom 3). By doing this you can give the whole community a chance to benefit from your changes. Access to the source code is a precondition for this.

The Four Freedoms

The Unix wars and the lawsuit, late 1980s through the early 1990s

During the late 1980s through the early 1990s, the so-called "Unix wars" and an ensuing lawsuit marked a contentious period in the history of computing. Following its breakup, AT&T began to commercialize Unix. This lead to distinct differences between the Unix created by AT&T Unix and the Unix developed as BSD Unix. The former was aimed at commercial markets, while the latter was targeted at researchers and academics. These contrasting objectives led to legal friction, culminating in UNIX Systems Laboratories, Inc. (USL, part of AT&T) suing Berkeley Software Design, Inc. (BSDi, part of the University of California, Berkeley) for copyright and trademark violations. Ultimately, USL lost the case, but not before the lawsuit had created significant obstacles for BSD Unix. The legal battle delayed the adoption of BSD Unix and left a lasting impact on the development and dissemination of Unix systems.

Linux, Linus Torvalds, University of Helsinki, Finland, early 1990s

Meanwhile, on August 25, 1991, at the University of Helsinki in Finland, Linus Torvalds, a young computer science student, announced that he had started working on a free operating system kernel for the 386 CPU architecture. This kernel would later be famously named Linux, a kind of portmanteau of Linus and Unix. It's essential to understand that Linux technically refers only to the kernel, which handles startup, devices, memory, resources, and more, but does not provide user land utilities—the kind of software that people use on their computers.

Torvalds' motivation for this project was both to learn about OS development and to have access to a Unix-like system. He did had access to an Unix-like system called MINIX, but MINIX was limited by technical and copyright restrictions. Interestingly, Torvalds has stated that if a BSD or GNU Hurd operating system were available at that time, he might not have created the Linux kernel at all. However, he and others took the GNU utilities and created what is now widely referred to as Linux or GNU/Linux. This amalgamation of Torvalds' kernel and GNU utilities marked a critical point in the evolution of free and open-source software, fostering a global community of developers and users.

Distributions, early 1990s through today

Soon after the development of Linux in the early 1990s, enthusiasts and developers started creating their own Linux and GNU-based operating systems. They customized these systems to suit various needs and preferences and would then distribute these customized versions to others. As a result of this practice, these Linux operating systems became known as Linux distributions. This phenomenon has led to a rich ecosystem of Linux distributions, catering to different user bases, industries, and interests, and has played a central role in the continued growth and diversification of open-source computing.

The two oldest distributions that are still in active development include:

Short History of BSD, 1970s through today

Unix and Unix-derivatives continue to exist and thrive today. The history of the Berkeley Software Distribution (BSD) of Unix spans from the 1970s to today and is closely intertwined with the evolution of Unix, generally. Early Unix version numbers 1-6 eventually led to the development of BSD versions 1-4. By the time of BSD 4.3, all versions still contained some AT&T code. A desire to remove this proprietary code led to the creation of BSD Net/1.

The effort continued until all AT&T code was successfully removed by BSD Net/2. This version was then ported to the Intel 386 processor, resulting in 386BSD, made available in 1992, a year after the Linux kernel was released.

386BSD eventually split into two distinct projects: NetBSD and FreeBSD. Later, NetBSD split into another project, giving rise to OpenBSD. All three of these BSDs are still in active development today, and each has a unique focus:

NetBSD is known for its focus on portability, finding applications in various environments such as MacOS and even NASA projects.
FreeBSD is recognized for its wide applicability and has been utilized by notable companies and products like WhatsApp, Netflix, PlayStation 4, and MacOS.
OpenBSD emphasizes security and has contributed several essential applications in this domain.

This intricate journey of BSD, marked by splits, adaptations, and varied focuses, has cemented its place in the history of operating systems, and allowed it to cater to a wide range of applications and audiences.

MacOS is based on Darwin, is technically UNIX, and is partly based on FreeBSD with some code coming from the other BSDs. See Why is macOS often referred to as 'Darwin'? for a short history.

Short History of GNU, 1980s through today

The history of GNU, particularly the GNU Hurd kernel, traces back to the 1980s and continues to evolve today. The GNU Hurd, despite its long development process, remains in a pre-production state. The latest release of this kernel was version 0.9, which was released in December 2016. Even though it has not yet reached full maturity, a complete operating system based on the GNU Hurd can be used. For example, Debian GNU/Hurd represents one such implementation. This ongoing work on the GNU Hurd exemplifies the free and open-source community's commitment to innovation and collaboration.

Free and Open Source Licenses

In the free software and open source landscape, there are several important free and/or open source licenses that are used. The two biggest software licenses are based on the software used by GNU/Linux and the software based on the BSDs. They each take very different approaches to free and/or open source software. The biggest difference is this:

Software based on software licensed under the GPL must also be licensed under the GPL.
- This is referred to as copyleft software, and the idea is to propagate free software.
- See: GNU General Public License (GPL)
Software based on software licensed under the BSD license may be closed source and primarily must only attribute the original source code and author.
- BSD License

What is Linux?

The Linux Kernel

Technically, Linux is a kernel, and a kernel is a part of an operating system that oversees CPU activity like multitasking, as well as networking, memory management, device management, file systems, and more. The kernel alone does not make an operating system. It needs user land applications and programs, the kind we use on a daily basis, to form a whole, as well as ways for these user land utilities to interact with the kernel.

Linux and GNU

The earliest versions of the Linux kernel were combined with tools, utilities, and programs from the GNU project to form a complete operating system, without necessarily a graphical user interface. This association continues to this day. Additional non-GNU, but free and open source programs under different licenses, have been added to form a more functional and user friendly system. However, since the Linux kernel needs user land applications to form an operating system, and since user land applications from GNU cannot work without a kernel, some argue that the operating system should be called GNU/Linux and not just Linux. This has not gained wide acceptance, though. Regardless, credit is due to both camps for their contribution, as well as many others who have made substantial contributions to the operating system.

Linux Uses

We use Linux as a server in this course, which means we will use Linux to provide various services, such as web services and database services. Our first focus is to learn to use Linux itself, but by the end of the course, we will also learn how to provide web and database services. Linux can be used to provide other services that we won't cover in this course, such as:

file servers
mail servers
print servers
game servers
computing servers

Although it's a small overall percentage, many people use Linux as their main desktop/laptop operating system. I belong in this camp. Linux has been my main OS since the early 2000s. While our work on the Linux server means that we will almost entirely work on the command line, this does not mean that my Linux desktop environment is all command line. In fact, there are many graphical user environments, often called desktop environments, available to Linux users. Since I'm currently using the Ubuntu Desktop distribution, my default desktop environment is called Gnome. KDE is another popular desktop environment, but there are many other attractive and useful ones. And it's easy to install and switch between multiple ones on the same OS.

Linux has become quite a pervasive operating system. Linux powers the fastest supercomputers in the world. It, or other Unix-like operating systems, are the foundation of most web servers. The Linux kernel also forms the basis of the Android operating system and of Chrome OS. The only place where Linux does not dominate is in the desktop/laptop space.

What is Systems Administration?

Introduction

What is systems administration or who is a systems administrator (or sysadmin)? Let's start off with some definitions provided by the National Institute of Standards and Technology:

An individual, group, or organization responsible for setting up and maintaining a system or specific system elements, implements approved secure baseline configurations, incorporates secure configuration settings for IT products, and conducts/assists with configuration monitoring activities as needed.

Or:

Individual or group responsible for overseeing the day-to-day operability of a computer system or network. This position normally carries special privileges including access to the protection state and software of a system.

See: Systems Administrator @NIST

Specialized Positions

In addition to the above definitions, which broadly define the role, there are a number of related or specialized positions. We'll touch on the first three in this course:

Web server administrator:
- "web server administrators are system architects responsible for the overall design, implementation, and maintenance of Web servers. They may or may not be responsible for Web content, which is traditionally the responsibility of the Webmaster (Web Server Administrator" @NIST).
Database administrator:
- like web admins, and to paraphrase above, database administrators are system architects responsible for the overall design, implementation, and maintenance of database management systems.
Network administrator:
- "a person who manages a network within an organization. Responsibilities include network security, installing new applications, distributing software upgrades, monitoring daily activity, enforcing licensing agreements, developing a storage management program, and providing for routine backups" (Network Administrator @NIST).
Mail server administrator:
- "mail server administrators are system architects responsible for the overall design and implementation of mail servers" (Mail Server Administrators @NIST).

Depending on where a system administrator works, they may specialize in any of the above administrative areas, or if they work for a small organization, all of the above duties may be rolled into one position. Some of the positions have evolved quite a bit over the last couple of decades. For example, it wasn't too long ago when organizations would operate their own mail servers, but this has largely been outsourced to third-party providers, such as Google (via Gmail) and Microsoft (via Outlook). People are still needed to work with these third-party email providers, but the nature of the work is different than operating independent mail servers.

Certifications

It's not always necessary to get certified as a systems administrator to work as one, but there might be cases where it is necessary; for example, some government positions or in large corporations require it. It also might be the case that you can get work as an entry level systems administrator and then pursue certification with the support of your organization.

Some common starting certifications are:

Plus, Google offers, via Coursera, a beginners Google IT Support Professional Certificate that may be helpful.

Associations

Getting involved in associations and related organizations is a great way to learn and to connect with others in the field. Here are few ways to connect.

LOPSA, or The League of Professional System Administrators, is a non-profit association that seeks to advance the field and membership is free for students.

ACM, or the Association for Computing Machinery, has a number of relevant special interest groups (SIGs) that might be beneficial to systems administrators.

NPA, or the Network Professional Association, is an organization that "supports IT/Network professionals."

The Library & Information Technology Association was a division of the American Library Association that served as a home for systems librarians and librarians working in information technology, generally. However, the division was dissolved in 2020 and the ALA Core division now serves the community's needs.

Codes of Ethics

Systems administrators manage computer systems that contain a lot of data about us and this raises privacy and competency issues, which is why some have created code of ethics statements. Both LOPSA and NPA have created such statements that are well worth reviewing and discussing.

LOPSA: Code of Ethics
NPA: Code of Ethics

Keeping Up

Technology changes fast. In fact, even though I teach this course about every year, I need to revise the course each time, sometimes substantially, to reflect changes that have developed over short periods of time. It's also your responsibility, as sysadmins, to keep up, too.

I therefore suggest that you continue your education by reading and practicing. For example, there are lots of books on systems administration. O'Reilly continually publishes on the topic. RedHat, the makers of the Red Hat Linux distribution, and sponsor of Fedora Linux, provides the Enable Sysadmin site, with new articles each day authored by systems administrators in the field. Opensource.com, also supported by Red Hat, publishes articles on systems administration. Command Line Heroes is a fun and informative podcast on technology and sysadmin related topics. Linux Journal publishes great articles on Linux related topics.

For those interested in Systems Librarianship, you can stay up to date by following Marshall Breeding's Systems Librarian column.

Conclusion

In this section I provided definitions of systems administrators and also the related or more specialized positions, such as database administrator, network administrator, and others.

I provided links to various certifications you might pursue as a systems administrator, and links to associations that might benefit you and your career.

Technology manages so much of our daily lives, and computer systems store lots of data about us. Since systems administrators manage these systems, they hold a great amount of responsibility to protect them and our data. Therefore, I provided links to two code of ethics statements that we will discuss.

It's also important to keep up with the technology, which changes fast. The work of a systems administrator is much different today than it was ten or twenty years ago, and that surely indicates that it could be much different in another ten to twenty years. If we don't keep up, we won't be of much use to the people we serve.

What is Systems Librarianship

Introduction

Of course, let's begin with the question, what is systems librarianship? Normally we might go to the literature to answer a question like this. Indeed, the literature is helpful, but it's sparse. The LISTA database only returns 131 results with a 45 year coverage for a search using the thesauri term SYSTEMS Librarians. I can get more results if I expand the search query, but then I get less relevant results, and the main idea is the same: this is an understudied area of librarianship.

It's been that way for a while. Susan K. Martin wrote the following over 35 years ago:

Of the specialist positions that exist in libraries, none is as underexamined as those of the systems librarians—the people who identify the needs of the library for automated systems, cause these systems to be implemented, and analyze the operations of the library (p. 57).

Perhaps as a result of this under-examination, sometimes there is confusion around the requirements and skills needed in this area of librarianship. Martin (1988) captured this tension when she wrote the following in 1988, which is still true today:

Over the years the library world has argued whether systems librarians should be librarians who have learned information technologies, or computer experts who have learned about libraries (p. 61).

The argument is partly a matter of jurisdiction. Abbott (1998), writing on librarianship in the sociology of professions, illustrated how:

The future of librarianship thus hinges on what happens to the perpetually changing work of the profession in its three contexts: the context of larger social and culture forces, the context of other competing occupations, and the context of competing organizations and commodities. To these complex contextual forces, any profession responds with varying policies and internal changes (pp. 434-5).

Essentially, Abbott means that professions, like librarianship, are always changing. The mechanisms for that change are structural and cultural (Abbott, 2010), but a changing profession means that its "link of jurisdiction" (Abbott, 1998, p. 435) changes, too. It not only changes, but professions constantly compete with each other over to adopt new areas of jurisdiction. So when we ask, as Martin (1998) did, whether librarians should learn information technologies or whether computer experts should learn libraries, I find myself thinking the prior is more important for libraries and their patrons. It means that librarians are expanding their jurisdiction by also becoming computer experts rather than computer experts expanding theirs.

That leads us to the next questions: what does it mean to be a computer expert for a systems librarian? What does a systems librarians need to do and know?

The answer is that it is a mix. Some part of the work involves systems administration, but that has broad meanings, and systems librarianship is more specific. Or, it has a more specific domain: the domain of libraries and librarianship.

A systems librarian might thus be considered a library systems administrator. Under this view, they need to be someone who knows about libraries, how libraries work, what they do, about their patrons, what their values are, and then use that knowledge to build and maintain the infrastructure to support that.

Given this, and the technologies involved, such work requires constant learning. Jordan (2003) identified three areas of learning:

pre-service education in library schools
on the job training
professional development in the form of workshops, courses, and conferences (p. 273)

Pre-service, formal education is a small part of any professional's career, regardless if that profession is in medicine, law, or librarianship. Thus the goal of pre-service education is to prepare people to adapt and grow in their fields. Jordan (2003) wrote that:

While formal training is undoubtedly important, the ability to learn new technologies independently lies at the foundation of systems librarians' professional life, because they often have to use technologies, or make planning decisions about specific technologies, before they become common enough to be the subject of formal training sessions (p. 273).

Even though Jordan's article is 20 years old and the technology has changed a lot, the basic duties of the systems librarian remain the same (Fu, 2014; Gonzales, 2020). Wilson (1998), as cited in Jordan (2003), refers to a list of the "typical responsibilities of systems librarians." These responsibilities look different today, because the technology is different, but conceptually, they're the same as they were then. In fact, this work will focus on a subset of this list that includes:

integrated library system management
server management
documentation
technology exploration and evaluation (Jordan, 2003, p. 274)

Gonzales (2020) highlights these and more current areas that include:

content management systems
electronic resource management systems
website redesign
help and support

Other items on Jordan's (2003) list are still relevant, but due to various constraints, this textbook will not cover the following areas:

network design and management
desktop computing
application development
planning and budget
specification and purchasing
miscellaneous technology support
technical risk management (p. 274)

In short, this work specifically focuses on a few of the bigger technical aspects of systems librarianship. Other works (or courses) and other sources will provide learning opportunities on the more managerial and administrative functions of systems librarianship and librarianship, in general.

If you are interested in learning more about network design and administration, then I encourage you to read my chapters on Networking and TCP/IP and DNS and Domain Names in my book on Systems Administration with Linux.

If you are interested in learning about application development, then you can pursue courses in a variety of programming languages, such as R, Python, JavaScript, and PHP, as well as courses on relational databases, such as MySQL or PostgreSQL, and so forth.

As Jordan (2003) identified, there is a lack of formalized training in systems librarianship in LIS schools. This is as true today as it was in 2003. This course was created to address the lack of that training. However, as Jordan (2003) noted, pre-service education is only a start. Technology is constantly changing, and that means we must always embrace learning opportunities, such as through workshops, conferences, and on the job training. LIS programs are only two or so years long (if attending full time), but our careers will hopefully span decades. So all this course can ever be is just a starting point.

It is a big start, though. This course should lay a strong foundation for self-growth and self-education in the variety of technologies that we will learn and use here. Although separate areas of librarianship, my work (and course) on electronic resource management complement this one in many ways. For example, this work supports several parts of the technology section in the NASIG Core Competencies for Electronic Resources Librarians. It is no coincidence these two areas of librarianship often overlap or are assumed in a single librarian position.

Cloud Computing

Lastly, I want to mention cloud computing. This has become a major area of change in the last decade or so. It used to be more common for librarians to install their integrated library system software and store their bibliographic data on their premises. In the last ten years, there has been more migration to the cloud, which means that both the integrated library system software and the bibliographic data are stored off-site. Liu & Cai (2013) highlight the beginning of this trend toward cloud computing that continues to play a large role in systems librarianship (Naveed et al., 2021). As Liu and Cai note:

Systems librarians used to make their livings by managing hosted library systems. This situation is silently changing with the library systems moving onto the cloud (p. 26).

This trend has changed some aspects of systems librarianship. It means that systems librarians, while still a technical area of librarianship, need to work more closely with vendors, who themselves are hosting library systems. This is perhaps why that even though this is a technical area of librarianship, the ability to communicate well is probably the most important requisite for people working as systems librarians. Specifically, in a recent article, Willis, 2025 found in an examination of descriptions of job positions that some of the most in-demand skill sets include:

communication
liaison work
vendor relations

However, the trend does not erase all locally hosted solutions. Many libraries and other information agencies continue to support local collections and will either host those locally or work to get the bibliographic information for those collections ingested into their cloud-based integrated library systems.

Conclusion

The remainder of the course will be more technical and will prepare you to work and understand the systems that support the modern library. We will cover a lot, too! We will begin with setting up virtual machine instances on Google Cloud. We will use a distribution of the Linux operating system for these virtual machines. We will then learn the basics of the Linux command line. Next, we will learn how to use the version control system called git. We will use git to document our work flows and push that documentation to GitHub.com. On our Linux servers, we will create a web server out of what is called a LAMP stack, which stands for Linux, Apache, MySQL, and PHP. We will use the web server to setup a basic website and a bare bones OPAC. Then we will learn how to install and setup two content management systems: Wordpress and Omeka. Lastly, we will spend the final two weeks of the semester installing and setting up the open source Koha ILS.

Let's get started!

References

Abbott, A. (1998). Professionalism and the future of librarianship. Library Trends, 46(3), 430–443. https://www.proquest.com/docview/220452054/abstract/A48FC30B10D94886PQ/1?accountid=11836

Abbott, A. (2010). Varieties of ignorance. The American Sociologist, 41(2), 174–189. https://www.jstor.org/stable/40664150

Gonzales, B. M. (2020). Systems librarianship: A practical guide for librarians. Rowman & Littlefield Publishers. https://rowman.com/ISBN/9781538107133/Systems-Librarianship-A-Practical-Guide-for-Librarians

Fu, P. (2014). Supporting the next-generation ILS: The changing roles of systems librarians. Journal of Library Innovation, 5(1), 30–42. https://digitalcommons.cwu.edu/cgi/viewcontent.cgi?article=1015&context=libraryfac

Jordan, M. (2003). The self‐education of systems librarians. Library Hi Tech, 21(3), 273–279. https://doi.org/10.1108/07378830310494445

Liu, W., & Cai, H. (Heather). (2013). Embracing the shift to cloud computing: Knowledge and skills for systems librarians. OCLC Systems & Services: International Digital Library Perspectives, 29(1), 22–29. https://doi.org/10.1108/10650751311294528

Martin, S. K. (1988). The role of the systems librarian. Journal of Library Administration, 9(4), 57–68. https://doi.org/10.1300/J111v09n04_06

Naveed, M. A., Siddique, N., & Mahmood, K. (2021). Development and validation of core technology competencies for systems librarian. Digital Library Perspectives, 38(2), 189–204. https://doi.org/10.1108/DLP-03-2021-0022

Ratledge, D., & Sproles, C. (2017). An analysis of the changing role of systems librarians. Library Hi Tech, 35(2), 303–311. https://doi.org/10.1108/LHT-08-2016-0092

Willis, S. K. (2025). Systems librarianship preparedness: A comparative analysis of skills needed and taught. Journal of Education for Library and Information Science, e20230080. https://doi.org/10.3138/jelis-2023-0080

Wilson, T. C. (1998). Systems librarian: Designing roles, defining skills. American Library Association. https://www.worldcat.org/title/1038159656

Project Management

This course involves working towards a final project that will lead us to install two content management systems and an integrated library system.

To accomplish this, we will need to set up Linux servers. We will use Google Cloud for this purpose. With Google Cloud, we can create what are called virtual machines that run full-fledged operating systems. We will work with Linux, and in particular, the Ubuntu distribution of Linux, to complete our project.

We will also want to document our work. To do that, we will use git, which is a version control system, and GitHub, an online platform for hosting git repositories. Using git, we will write and share documentation, code, and more.

Using Google Cloud (gcloud)

The first section in this chapter introduces us to Google Cloud, which I'll often refer to as gcloud. We will use this platform to create virtual instances of the Ubuntu Server Linux operating system. Once we create our own Ubuntu virtual machines, we will connect to them via the command line. I have written some helpful software to help you learn the command line language, specifically, the Bash shell. Just about everything we'll do this semester will happen via the Bash shell.

Git and GitHub

The last section in this chapter introduces us to git and GitHub. git and GitHub are primarily used for software management. Every major software project requires managing the codebase, collaborations, documentation, and more. Many people may be involved in these projects, and it takes coordination for them to write the many thousands of lines of software code, which also requires management.

Although git and GitHub are primarily used for this purpose, our goal is to use them to document our work, much like this book, which has its own GitHub repository or repo. This documentation covers the processes involved in learning Google Cloud, git and GitHub, Linux, and more. Therefore, in the next section, we'll learn how to create a new repo on GitHub, add notes, and write our notes using Markdown, an easy-to-understand and use markup language to format our text.

Providing good documentation is key to being able to build on prior work, make adjustments to our workflows, recall the details of some process, and, for students, it can help in retention and reflection. In the remainder of the semester, we will begin to install and configure some complicated pieces of software. In order to better understand what we will be doing, it will be helpful to document our processes.

Attending to Detail

As we begin to work on the more technical aspects of this book and course, it will be important to remain attentive to details. Many people who are new to this kind of work often stumble over the details, like a missing period, incorrect capitalization, and more. To learn how to pay attention to the details, work slowly and read any messages, including error messages, the screen prints out in response to your commands.

Using gcloud for Virtual Machines

Virtual Machines

Our goal in this section is to create a virtual machine (VM) instance. A VM is basically a virtualized operating system that runs on a host operating system. That host operating system may also be Linux, but it could be Windows or macOS. In short, when we use virtual machines, it means instead of installing an operating system (like Linux, macOS, Windows, etc) on a physical machine, we use virtual machine software to mimic the process. The virtual machine, thus, runs on top of our main OS. It's like an app, where the app is a fully functioning operating system.

In this course, we're going to use gcloud (via Google) to provide us with virtual machines. There are other cloud service providers available that you can explore on your own. You can also play with VirtualBox (on your own), which I've used in prior classes, to install virtual machines on your own computers.

Google Cloud / gcloud

Google Account

We need to have a personal Google account to get started with gcloud. I imagine most of you already have a Google account, but if not, go ahead and create one at https://www.google.com.

Google Cloud (gcloud) Project

Next we will need to create a project on the Google Cloud website.

Follow Step 1 at the top of the Install the gcloud CLI page to create a new project. Also, review the page on creating and managing projects.

When you create your project, you can name it anything, but try to name it something to do with this course. E.g., I am using the name syslib-YEAR (replace YEAR with the actual year). Avoid using spaces when naming your project.

Then click on the Create button, and leave the organization field set to No Organization.

Google Billing

The second thing to do is to set up a billing account for your gcloud project. This does mean there is a cost associated with this product, but the good news is that our bills by the end of the semester should only amount to $5 to 10 dollars, at most. Follow Step 2 to enable billing for your new project. See also the page on how to create, modify, or close your self-serve Cloud Billing account.

At the end of the semester, I'll remind you that you may want to delete your virtual machines. If you don't do this, you will continue to be billed for them.

Install the gcloud CLI to connect to our virtual machines

NOTE: We install the gcloud CLI to connect to our virtual machines from our personal computers. I suggest this process because it offers more functionality but it's also a bit advanced. If you think you'd like to skip this process for now and try later, we can use an alternate method to connect to our virtual machines. I'll describe the alternate method to connect below in the Connect to our VM section. If you would prefer to use the alternate method, skip to the gcloud VM Instance section.

After you have set up billing, the next step is to install gcloud on your local machines. The Install the gcloud CLI page provides instructions for different operating systems.

There are installation instructions for macOS, Windows, Chromebooks, and various Linux distributions. Follow these instructions closely for the operating system that you're using. Note that for macOS, you have to choose among three different CPU/chip architectures. If you have an older macOS machine (before November 2020 or so), it's likely that you'll select macOS 64-bit (x86_64). If you have a newer macOS machine, then it's likely you'll have to select macOS 64-bit (arm64, Apple M1 silicon). It's unlikely that any of you are using a 32-bit macOS operating system. If you're not sure which macOS system you have, then let me know and I can help you determine the appropriate platform. Alternatively, follow these instructions to find your processor information:

click on the Apple menu
choose About This Mac
locate the Processor or Chip information

After you have downloaded the gcloud CLI for your particular OS and CPU architecture, you will need to open a command prompt/terminal on your machines to complete the instructions that describe how to install the gcloud CLI. macOS uses the Terminal app, which can located using Spotlight. Windows user can use Command.exe, which can be located by search also.

Windows users will download a regular .exe file, but macOS users will download a .tar.gz file. Since macOS is Unix, you can use the mv command to move that file to your $HOME directory. Then you extract it there using the tar command, and once extracted you can change to the directory that it creates with the cd command. For example, if you are downloading the X86_64 version of the gcloud CLI, then you would run the following commands:

For macOS users, this assumes the .tar.gz file was downloaded to your default Downloads folder:

cd ~/Downloads/
mv google-cloud-cli-392.0.0-darwin-x86_64.tar.gz ~/
cd ~/
tar -xzf google-cloud-cli-392.0.0-darwin-x86_64.tar.gz
cd google-cloud-sdk

Modify the above commands, as appropriate, if you're using the M1 or the M2 version of the gcloud CLI.

Initializing the gcloud CLI

As above, please follow the instructions from the Google Cloud documentation for your operating system.

Once you have downloaded and installed the gcloud CLI program, you need to initialize it on your local machine. Scroll down on the install page to the section titled Initializing the gcloud CLI. In your terminal/command prompt, run the initialization command, per the instructions at the above page:

gcloud init

And continue to follow the instructions from the prompt and from the Google Cloud documentation page above.

gcloud VM Instance

Once you've initialized gcloud, log into Google Cloud Console, which should take you to the Dashboard page.

Our first goal is to create a virtual machine (VM) instance. As a reminder, a VM is basically a virtualized operating system. That means instead of installing an operating system (like Linux, macOS, Windows, etc) on a physical machine, software is used to mimic the process.

gcloud offers a number of Linux-based operating systems to create VMs. We're going to use the Ubuntu operating system and specifically the Ubuntu 22.04 LTS version.

Ubuntu is a Linux distribution. There are many, many distributions of Linux, and most are probably listed on the DistroWatch site. A new version of Ubuntu is released every six months. The 22.04 signifies that this is the April 2022 version. LTS signifies Long Term Support. LTS versions are released every two years, and Canonical LTD, the owners of Ubuntu, provide standard support for LTS versions for five years.

LTS versions of Ubuntu are stable. Non-LTS versions of Ubuntu receive nine months of standard support, and generally apply cutting edge technology, which is not always desirable for server operating systems. Each version of Ubuntu has a code name. 24.10 has the code name Oracular Oriole. You can see a list of versions, code names, release dates, and more on Ubuntu's Releases page.

We will create our VM using the gcloud console. To do so, follow these steps from the Project page:

Click on the hamburger icon (three vertical bars) in the top left corner.
Click on Compute Engine and then VM instances.
Enable Compute Engine API.
Make sure your project is listed.
Next, click on Create Instance.
Provide a name for your instance.
- E.g., I chose main-ubuntu (no spaces) but you are free to use any name you prefer
In the Machine configuration section, make sure E2 is selected.
In the Machine type section, select e2-micro (2 vCPU, 1 core, 1 GB memory)
- This is the lowest cost virtual machine and perfect for our needs.
In the left navigation section, click on OS and storage.
- Click on CHANGE, and a side bar on the right of the screen open up.
- In the sidebar, select Ubuntu from the Operating system drop down box.
- Click on Version and select Ubuntu 22.04 LTS x86/64
- Leave Boot disk type be set to Balanced persistent disk
- Leave disk size to 10 GB.
- Click on the Select button.
In the left navigation section, click Networking.
- Check the Allow HTTP Traffic button
Finally, click on the Create button to create your VM instance.

Later in the semester when we install Koha, we will need to create a virtual machine with more CPUs and memory. We will be charged more for those machines. Since we do not yet need the extra resources, we will start off with fairly low powered machines.

Connect to our VM

After the new VM machine has been created, we need to connect to it.

Using gcloud CLI

We use a ssh command to connect to our VMs. The syntax follows this pattern:

gcloud compute ssh --zone "zone-info" "name-info" --project "project-id"

macOS users will use that command using their Terminal.app. Windows users can connect to it via their command prompt (CMD.exe or PowerShell).

To get the specific connection command for your virtual instance, click on the SSH drop down menu. Select View gcloud command. Copy and paste that command in your OS terminal.

Using the web interface

If you elected not to install the gcloud CLI, you can connect via the web interface. Click on the SSH drop down menu, and select Open in browser window. This will open a terminal window in your browser.

Connection Differences

If you connect using the gcloud CLI method, your username on your virtual machine will be based on the username on your personal machine. However, if you connect using the web interface, your username on your virtual machine will be based on your Google account username. This has important ramifications, especially if you decide to use change connection methods.

The main ramification is that you will have two different home directories on your virtual machine. For example, if my username for my Google account is sean_burns, then my home directory on the virtual machine will be /home/sean_burns if I connect via the web interface.

But if my username on my personal computer is sean, then my home directory on the virtual machine will be /home/sean if I connect via the gcloud CLI.

Keep this in mind.

Quick Shell Information

When you log into your machines, you'll see a command prompt with the following format:

username@machine_name:~$

This is where we type our commands. It's not obvious right away, but the command prompt displays our location in the file system. By default, you will be located in your home directory, which is indicated by the ~ tilde in your prompt.

Update our Ubuntu VM

The VM will include a recently updated version of Ubuntu 22.04, but it may not be completely updated. Therefore the first thing we need to do is update our machines. On Ubuntu, we'll use the following commands, which you should run also:

sudo apt update
sudo apt -y upgrade
sudo apt -y autoremove
sudo apt clean

You should run these commands at least weekly to keep your system updated.

Disconnecting

To exit and disconnect from the remote system, type exit, like so:

exit

If you are using the gcloud CLI, you should exit back to your personal machine's terminal prompt. Pay attention to the messages upon logout. If you get the following message:

Updates are available for some Google Cloud CLI components. To install them, please run:
gcloud components update

Then run that command in your own terminal app and click Y (yes) when prompted:

gcloud components update

Once updated, you can close your terminal app.

If you are using the web interface, the window should close.

Snapshots (Optional)

This is entirely optional at this point. You may want to consider doing this when we begin to install a LAMP stack and content management systems because it'll help you recover your system if something gets messed up. But as of now, if we mess up our system, it's pretty trivial to delete the virtual instance and start a new one.

Lastly, we have installed a pristine version of Ubuntu, but it's likely that we will mess something up as we work on our systems. Or it could be that our systems may become compromised at some point. Therefore, we want to create a snapshot of our newly installed Ubuntu server. This will allow us to restore our server if something goes wrong later.

To get started:

In the left hand navigation panel, click on Snapshots.
At the top of the page, click on Create Snapshot.
Provide a name for your snapshot: e.g., ubuntu-1.
Provide a description of your snapshot: e.g.,

This is a new install of Ubuntu 22.04.
Choose your Source disk, which should be the name of your virtual instance (e.g., main-ubuntu).
Choose a Location to store your snapshot.
- To avoid extra charges, choose Regional.
- From the drop down box, select the same location (zone-info) your VM has
Click on Create

Please monitor your billing for this to avoid costs that you do not want to incur.

Conclusion

Congratulations! You have just completed your first installation of a Linux server.

To summarize, in this section, you learned about and created a VM with gcloud. This is a lot! After this course is completed, you will be able to fire up a virtual machine on short notice and deploy websites and more.

Learn the Command Line Interface (CLI)

Introduction

There are two major interfaces that we use to interact with our computers. The most common interface is the graphical user interface, or GUI. This interface largely emphasizes non-textual interaction, such as the mouse, fingers (touch screens), remote controls (e.g., smart TVs), and more recently, wearable tech such as VR headsets and like. All of the above mechanisms for interacting with our computer systems are worthwhile, but more importantly, they are all suited to specific ranges of engagement with our computers. That is, they afford certain kinds of actions (Dourish, 2001).

The other major way of interfacing with our computers is via the command line interface, or CLI. The CLI is also suited to specific ranges of engagement, and it's the kind of engagement that allows greater control over the fundamental uses of our systems.

One reason the CLI provides greater control over our systems is because the interaction is text-based. Text-based interaction requires more specificity than graphical-based interaction. By that I mean, it requires us to provide written instructions to a computer and to know what instructions to give it when we want the computer to perform some specific action. This means that we have to memorize some common instructions in order to use our systems. This is not necessarily difficult because many of the most common instructions, or commands, are mnemonic, but it does take some getting used to.

A second reason the CLI provides greater control over the system is that because it's text-based, it can be automated. We will not cover programming in this work or course, but know that all the commands that we will learn can be put in a text file, made into an executable file, and run like a program. This makes text-based interaction rather powerful.

The big gotcha with a text-based interface with the computer is that it requires specificity. We have to be fairly exact in our commands. This exactitude requires an attention to detail. Little things like misplaced punctuation, missing punctuation, incorrect capitalization or indentation, and misspelled words can cause errors or prevent the execution of our programs. It's important to proceed slowly on the command line and to pay attention to the messages the screen displays when we run commands.

Basic Commands

In light of that, I have developed two programs that will help you learn and remember basic Linux shell commands. The commands that I'll ask you to learn encompass less than 0.3% of the commands that are available on a Linux system, but they are the most commonly used commands. Many of the other commands that are available are for very specific purposes. I'd estimate that despite having used the Linux command line for over 20 years, I've barely used 20% of them, and I might be stretching my estimate.

The first set of commands that I'll ask you to learn and practice include the following:

list files and directories.................. ls
print name of current/working directory..... pwd
create a new directory...................... mkdir
remove or delete an empty directory......... rmdir
change directory............................ cd
create an empty file........................ touch
print characters to output.................. echo
display contents of a text file............. cat
copy a file or directory.................... cp
move or rename a file or directory.......... mv
remove or delete a file or directory........ rm

You will practice these commands using the program that I wrote called learn-the-cli (I will show you how to install this and the other programs shortly).

I also developed a flashcards program that will help you learn, or at least become familiar, with an additional 45 commands. (This program is based on one created by someone else for a different purpose; see source code link above for credit). I'll explain these additional commands as we proceed through the semester. In the meantime, I'll ask that you periodically run the flashcards program to familiarize yourself with these commands, which includes the ones in the list above but also a few additional ones.

The Filesystem

In addition to the various commands that I'll ask you to learn, you will also have to learn the structure of the Linux filesystem. A filesystem has several meanings, but in this context, I refer to where the directories on the Linux system are placed. I find this to be the most difficult thing that new Linux users have to learn for a couple of reasons. First, modern operating systems tend to hide (abstract away) the filesystem from their users. So even though, for example, macOS is Unix, many macOS users that I have taught are completely unfamiliar with the layout of directories on their system. This is because, per my observations, macOS Finder does not show the filesytem by default these days. Instead it shows its users some common locations for folders. This might make macOS more usable to most users, but it makes learning the system more difficult.

What's common for both macOS and Linux operating systems is a filesytem based on a tree-like structure. These filesystems begin at what's called a root location. The root location is referenced by a forward slash: /. All directories branch off from root. The location to any directory is called a PATH. For example, our home directories on Linux are located at the following PATH:

/home/USER

That PATH begins at the root directory /, proceeds to the directory named home, and then ends in our USER directory, which will share the same name as our usernames. As an example, if my username on a Linux system is sb, then my home directory will be located at:

/home/sb

It is a little different for Windows users. Since Windows is not Unix-like, it uses a different filesystem hierarchy. Many Windows users might be familiar with the basics, such as the C: drive for the main storage device or the D: drive for an added USB stick. As such, the Windows operating system uses multiple root directories (C:, D:, E:, etc.). I encourage you to read the following article on A quick introduction to the Linux filesystem for Windows users. The article is published by Red Hat, which makes its own Linux distribution.

In short, learning the Linux filesystem requires adopting a new mental model about how the operating system organizes its directories and files. Like learning the basic commands, it's not too hard, but it may take time and practice before it sticks. To help learn it, I wrote an additional program that will let you practice navigating around the Linux filesystem and making some changes to it. The program is called learn-the-filesystem. Before you use this program, I would like to encourage you to read another Red Hat article on Navigating your filesystem in the Linux terminal. It includes sections that my program will cover that include:

viewing file lists
opening a folder (aka, a directory)
closing a folder
navigating directories
absolute paths

Bash: The Bourne Again Shell

I should point out that the command line interface that we are using on our Linux servers is provided by a shell. A shell is "both an interactive command language and a scripting language" (see link above). We will use the shell strictly as a command language, but if you're interested someday, I'd encourage you to explore Bash as a scripting language (I personally script in Bash quite a lot, and the learn-the-cli and flashcard programs were written in bash). There are a variety of shells available for Linux and other Unix-like operating systems, but the most popular one and the one we will be using is called Bash.

Bash is an acronym for the Bourne Again Shell because it's based on the original Unix shell called the Bourne shell, written by Stephen Bourne. Bash itself was written by Brian Fox.

I think it's important to know the history of the technologies that we use, and Bash has a super interesting history that pre-exists Linux. Therefore, I highly encourage you listen to the Command Line Heroes episode titled Heroes in a Bash Shell, narrated by Saron Yitbarek. The episode recounts Brian Fox's history with the Bash shell while he worked for the Free Software Foundation in the 1980s.

Conclusion

We will spend the next few weeks practicing these commands and learning the filesystem. We'll do this because knowing these things is integral to accomplishing everything else in this work, including installing and setting up our content management systems and the integrated library system.

In the video for this week, I'll show you how to install the three programs that I wrote or modified. We will use git to download them. The we will move the programs to a specific directory in our executable PATH. This will allow us to run them simply by typing their names.

Installation

To install my practice programs, login to your Linux virtual instances, and run the following commands. You will learn more about these commands shortly.

First, let's take a look at the contents of your home directory (the default directory you're in when you connect to your virtual machine):

ls

Most likely, nothing will be listed.

Now let's retrieve the programs using the git command:

git clone https://github.com/cseanburns/learn-the-commandline.git

Run the ls command again, and you'll see a new directory called learn-the-commandline:

ls

Next, copy the programs to an executable path:

sudo cp learn-the-commandline/* /usr/local/bin

Run the first program and work through it in order to learn some of the basic commands:

learn-the-cli

When ready, run the second program in order to learn about the Linux filesystem:

learn-the-filesystem

Finally, periodically run the flashcards program to refresh your memory of the basic commands, plus some other commands that you'll learn about soon:

flashcards

After working through the learn-the-cli program a few times, you can continue to practice with the learn-the-cli-module program. This is a modified version that allows you to focus specific learning modules.

Resources

Here are some additional resources for learning Bash and Linux shell commands:

explainshell.com : helps explain the parts of a shell command
shellcheck.net : helps debut a shell script
The Art of the Command Line : describes the fundamentals of Bash and the command line

References

Dourish, P. (2001). Where the Action Is: The Foundations of Embodied Interaction. MIT Press. https://doi.org/10.7551/mitpress/7221.001.0001

Text editors

As we learn more about how to work on the command line, we will acquire the need to write in plain text or edit configuration files. Most configuration files for Linux applications exist in the /etc directory and are regular text files. For example, later in the semester we will install the Apache Web Server, and we will need to edit Apache's configuration files in the process.

In order to edit and save text files, like the Apache configuration files, we need a text editor. Programmers use text editors to write programs, but because programmers often work in graphical user environments, they may often use graphical text editors or graphical Integrated Development Environments (IDEs). It might be that if you work in systems librarianship, that you will often use a graphical text editor, but knowing something about how to use command line-based editors can be helpful. For a variety of reasons, a GUI text editor or IDE isn't always available.

What is a Plain Text?

Plain text is the most basic way to store human-readable textual information. Whenever we use a word processor program, like Microsoft Office, we are creating a complex series of files that instruct the Office application how to display the contents of the file as well as how the contents are formatted and arranged. This can easily be illustrated by using an archive manager to extract the contents of a .docx file. Upon examination, most of the files in a single .docx file are plain text that are marked up in XML. The files are packaged as a .docx file and then rendered by an application, commonly Microsoft Word, but any application that can read .docx files will do.

A plain text file only contains plain text. Its only arrangement is from top to bottom. It does not allow for any kind of additional formatting, and it does not include media. It is the closest thing the digital has to output produced by a typewriter, but a typewriter that's connected to the internet.

A lot of content is written in plain text. For example, HTML is written in plain text and the web browser uses the HTML markup to render how a page will look.

<p>This is using a HTML paragraph tag.
The web browser would render this like the other paragraphs on this page.
However, it's written in a code block,
which allows me to display the HTML as source code.</p>

The rendered result is not plain text but HTML, just like the rendered result of all those XML files in a .docx file are not plain text but a .docx file. Software is written in plain text files because programming languages cannot evaluate content that is not just text. Those of you who have learned how to use the R programming language wrote your R code in plain text likely using the RStudio IDE. For our purposes, we need plain text files to modify configuration files for the various programs that we will install later.

Why Edit in Plain Text

Most of the time when we configure software, we might use our mouse to find the settings menu in an application that we are using. Then we'll make a change in those settings. For the most part, all we're really doing is making a change in some text file somewhere. The application's GUI-ness simply obscures that process.

We have to be more direct when we are working on the command line. That is, the setting configurations we will do require editing plain text files that modify how the programs will use work. Often the settings for programs can only be modified by editing their plain text configuration files.

`nano`

The most common text editor on many Linux systems is nano. The nano text editor is a user-friendly command line text editor, but it requires some learning as a new command line user. The friendliest thing about nano is that it is modeless. You are already accustomed to using modeless editors. It means that nano can be used to enter and manipulate text without modes, like insert mode or command mode. It is also friendly because, like many graphical text editors and software, it uses control keys to perform its operations.

A modal text editor has modes such as insert mode or command mode. In insert mode, the user types text as anyone would in any kind of editor or word processor. The user switches to command mode to perform operations on the text, such as find and replace, saving, and cutting and pasting. Switching between modes usually involves pressing specific keys. In Vim and ed(1), my text editors of choice, users start in command mode and switch to insert mode by pressing the letter i or the letter a. The user returns to command mode by pressing the Esc key in Vim or by pressing the period in a new line in ed(1).

The tricky part to learning nano is that the control keys are assigned to different keystroke combinations than what many graphical editors (or word processors) use by convention today. For example, instead of Ctrl-c or Cmd-c to copy text, in nano you press the M-6 key (press Alt, Cmd, or Esc key and 6) to copy. To paste, press Ctrl-u instead of the more common Ctrl-v. Fortunately, nano lists the shortcuts at the bottom of the screen.

nano is a text-editor with old origins. Specifically, it's a fork of the Unix pico editor. The keyboard shortcuts used by nano were carried over from the pico editor. These keyboard shortcuts were designed before the Common User Access guidelines helped standardize the common keyboard shortcuts we use today for opening, saving, closing, etc files.

The shortcuts listed need some explanation. The carat mark is shorthand for the keyboard's Control (Ctrl) key. Therefore to perform the Save As operation on a file, we write out the file by pressing Ctrl-o (although Ctrl-s will work these days, too). The M- key is important. Depending on your keyboard configuration, it may correspond to your Alt, Cmd, or Esc keys. To search for text, you press ^W or Ctrl and W; that is, Ctrl-W (lowercase w will work). If your goal is to copy, then press M-6 to copy a line. Move to where you want to paste the text, and press Ctrl-U to paste.

We start nano simply by typing nano on the command line. This will open a new, unsaved file with no content. Alternatively, we can start nano by specifying a file name after typing nano. For example, if I want to open a file called example.txt, then I type the following command:

nano example.txt

If the file doesn't exist, this will create it. If it does exit, then the command will open it.

One of the other tricky things about nano is that the menu bar (really just a crib sheet, so to speak) is at the bottom of the screen instead of at the top, which is where we are mostly accustomed to finding it these days. Also, the nano program does not provide pop up dialog boxes. Instead, all messages from nano, like what to name a file when we save it, appear at the bottom of the screen.

Lastly, nano also uses distinct terminology for some of its functions. The most important function to remember is the Write Out function, which means to save.

For the purposes of this class, that's all you really need to know about nano. Use it and get comfortable writing in it. Some quick tips:

nano file.txt will open and display the file named file.txt.
nano by itself will open to an empty page.
Save a file by pressing Ctrl-o.
Quit and save by pressing Ctrl-x.
Be sure to follow the prompts at the bottom of the screen.

Other Editors

It's good to be familiar with nano because it's often the default text editor on Linux operating systems nowadays. However, if you are interested in using a command line text editor with familiar keyboard shortcuts, then there are others you may want to try. Specifically, I suggest you investigate the tilde and the micro text editors. Both of these are really quite nice.

tilde

The tilde text editor is a user friendly text editor that uses conventional keybindings (like ctrl-s for saving, etc). tilde also offers a standard menu bar, which you activate by pressing the Alt key and the letter for the menu option. For example, to open the File menu, press Alt-F. Press the Esc key to exit the menu.

You can install tilde via the apt command:

sudo apt install tilde

You can run tilde either by itself or by invoking a pre-existing or new file:

tilde

Or:

tilde newfile.md

micro

The micro text editor is another user friendly editor. Like tilde, it uses conventional key bindings. Unlike tilde, there is no menu bar, but you can press ctrl-g to open a help menu. With the help menu open, use your arrow keys to read through the documentation and learn more about its capabilities and its functions. One of the nice things about micro is that you can open multiple files in tabs. Press ctrl-q to exit the help menu.

You can install it via the apt command and start the program like you can with the other editors:

sudo apt install micro

Editing `.bashrc`

By default, your Bash shell is probably white text on a black background. We can add some color to this by modifying our Bash shell configuration file. To do so, open the .bashrc file with nano or your text editor of choice, like tilde or micro:

nano ~/.bashrc

Scroll to the end of the file and add these two lines:

LS_COLORS='rs=0:di=04;31:fi=00;00:ex=01;93';
export LS_COLORS

Next, go to the line that starts with the text below, which is probably line 46:

# force_color_prompt=yes

Remove the comment character (i.e., the pound sign, #) at the beginning of the line. The result should be:

force_color_prompt=yes

Save the file and exit nano, and then at the shell prompt, type the following command:

source ~/.bashrc

LS_COLORS Note

Although the above modification will enable color in our terminals, we don't have to settle for the defaults. To change the colors when listing files and directories, we modify the LS_COLORS variable. The LS_COLORS setting is a bit complicated. It contains several parameters separated by colons. Let's break it down:

LS_COLORS='rs=0:di=04;31:fi=00;00:ex=01;93';

LS_COLORS: A Bash environmental variable that holds the color values for the ls command.
rs=0: This starting parameter resets text formatting to normal (non-bold, default colors, etc). It's placed at the beginning to ensure that we start from basic values.
di=04;31: This sets the color of directory names (di) to be underlined (04) and red (31). If you'd rather directory names formatted in bold rather than underlined, you can change 04 to 01. If you'd rather directory names to be green rather than red, you can change 31 to 32. See other formatting properties here: Configuring LS_COLORS.
fi=00;00: This sets the color of regular files. Because both values are zero (00;00), the ls command lists these with no special color or style.
ex=01;93: This set the color of executable files (i.e., programs) to be bold (01) and bright yellow (93).

Feel free to play with the colors of the ls command. Remember to run source ~/.bashrc to put the changes into effect.

A note about other text editors: ed, Vi/Vim, Emacs

The traditional Unix and Linux text editors are ed, vim, and emacs. I first started using Linux because I found emacs, but sometime during my early Linux years, I switched to vim. Vim is a descendant of the vi text editor, which itself is a descendant of the ed editor. ed was used to write the first versions of the Unix operating system over 50 years ago when teletypes machines (and not video monitors) were the common input/output devices. It's still available, and I use it often. vi was an extension of ed and was written to take advantage of computer monitors. vim (or Vi Improved) added enhancements to vi and is my main editor. emacs is highly extensible text editor that can do about anything. It's so versatile, the saying goes that emacs is an "operating system posing as a text editor."

These editors are extremely powerful once you learn them. Even though they are quite popular, they are not user-friendly. (ed probably isn't all that popular, but it has a dedicated following.) If interested, there are plenty of online resources that provide tutorials on getting started with these text editors. I won't teach you how to use them because it would take too much time, but they are worth knowing about because all three are important parts of Unix and Linux history.

Conclusion

In the prior lesson, we learned how to use the Bash interactive shell. We will continue to do that, but in the meantime, we begin to learn how to use a command line text editor. These include nano, tilde, and micro. We will use a text editor to edit configuration files and publish text to GitHub. It's your choice what you want to use.

Appendix: My `.nanorc`

You can configure the look and feel of tilde and micro through their menus or help options. You can configure nano to look and behave in certain ways, too. If you decide to use nano and want to mimic the setup I have, then create a file called .nanorc in your home directory:

nano ~/.nanorc

And add the following to the file:

# Syntax:
# set element fgcolor,bgcolor
set titlecolor red,black
set statuscolor blue,white
set errorcolor white,red
set selectedcolor white,black

set numbercolor lightblue
set stripecolor red
set keycolor black,white
set functioncolor white,black 

set speller "aspell -x -c"

## When soft line wrapping is enabled, make it wrap lines at blanks
## (tabs and spaces) instead of always at the edge of the screen.
set atblanks

## Use auto-indentation.
set autoindent

## Back up files to the current filename plus a tilde.
# set backup

## The directory to put unique backup files in.
# set backupdir "~/.backup"

## Use bold text instead of reverse video text.
set boldtext

## Remember the used search/replace strings for the next session.
set historylog

## Display line numbers to the left of the text.
set linenumbers

## Enable vim-style lock-files.  This is just to let a vim user know you
## are editing a file [s]he is trying to edit and vice versa.  There are
## no plans to implement vim-style undo state in these files.
set locking

## Remember the cursor position in each file for the next editing session.
set positionlog

## Do extended regular expression searches by default.
set regexp

## Allow nano to be suspended.
set suspend

## Use this tab size instead of the default; it must be greater than 0.
set tabsize 8

## Convert typed tabs to spaces.
set tabstospaces

You can read about how to make such settings in the man page for the nano configuration file:

man nanorc

Documenting with Git, GitHub, and Markdown

Introduction

Documentation is the cornerstone of effective communication and knowledge sharing. It ensures that processes are understood, tasks are reproducible, and collaborators can contribute to shared goals. In this section, we learn how to use Git, GitHub, and Markdown as tools for managing and presenting documentation efficiently. Specifically, Git with GitHub offer robust version control and collaboration capabilities. Markdown, a markup language with a simple syntax, facilitates clean, professional documentation compatible with multiple platforms.

Create a GitHub Account

Let's start by creating an account on GitHub:

Visit the GitHub Website: GitHub's website.
Sign Up: Click on the "Sign Up" button usually located at the top right corner of the page.
Enter Your Details: You will be prompted to enter some basic information:
- Username: Choose a unique username that will be your identity on GitHub. Select a name that reflects your personal or professional identity. It will be visible publicly.
- Email Address: Provide a valid, personal email address (not university email address). This will be used for account verification and communication.
- Password: Create a strong password. Use a mix of letters, numbers, and symbols for better security.
Choose a Plan: GitHub offers various plans. Select the free option, which is fine for most individual users.

Tips for New Users

Profile Information: After creating your account, consider adding more details to your profile, like a profile picture and bio, to make it more personable.
Security: Set up two-factor authentication for added security.
Learning Resources: GitHub has a wealth of tutorials and guides to help you get started. Utilize these to familiarize yourself with GitHub's features and best practices.

Create a Repository (Repo) on GitHub

Now that you have a GitHub account, your next step is to create a repository (repo) for your documentation project. I outline the steps below, but see the official documentation: Creating a new repository. To get started:

Click the green New button in the upper left corner on your home page.
In the Owner/Repository name field, add a name for your repo:
- Use a descriptive name and avoid spaces and special characters.
Add an optional description:
- This helps later in case you eventually create lots of repos.
Keep it public.
Click to add a README file:
- This serves as the main page of your repository on GitHub.
Choose an open source license, if you want.
Click the Create repository button.

Edit README

You should now see your repository's home page, and you will be viewing the default, empty README.md file. Let's edit this file on GitHub:

Click the pencil icon at the top right of that README.md file.
This opens an editor. Put the cursor after the heading and press Enter.
Add some text that describes the project. You can add a better description later.
Use Markdown code to edit the text you add.

Markdown Basics

Markdown is a simple markup language for formatting plain text, which can later be rendered as HTML or even as a PDF, DOCX, etc. It's a very popular markup language in tech industries, and it's easy to get started.

Here's a quick guide to the most commonly used Markdown syntax:

Headings

Create headings using the pound # symbol before your text. The number of pound # symbols indicates the level of the heading. Heading level 1 indicates the main heading and so forth.

# Heading 1
## Heading 2
### Heading 3
#### Heading 4
##### Heading 5
###### Heading 6

Emphasis

Bold: To make text bold, wrap it in double asterisks or double underscores:
- For example, **bold** or __bold__.
Italic: To italicize text, wrap it in single asterisks or single underscores:
- For example, *italic* or _italic_.

Lists

Unordered Lists: Use asterisks, plus signs, or hyphens to create bullet point lists.

Use indentation to create sub-items in a list.

* Item 1
* Item 2
  * Subitem 1
  * Subitem 2

Ordered Lists: Use numbers followed by periods for an ordered list.
```
1. First item
2. Second item
   1. Subitem 2.1
   2. Subitem 2.2
```

Links and Images

Links: To create a named link, wrap the link text in brackets [ ], and then wrap the URL in parentheses ( ).
- For example, [GitHub](https://github.com) will be GitHub.
- Add a link title: [GitHub](https://github.com "GitHub Code Repo").
- Use reference-style links: [GitHub][github]. Then refer to the full URL elsewhere in the document. For example, I usually add the reference at the end: [github]:https://github.com.
Images: Similar to links, but start with an exclamation mark, followed by the alt text in brackets, and the URL in parentheses.
- For example, ![Alt text](image-url.jpg).
- In this example, the file image-url.jpg must be in the same directory as the Markdown file. It's good practice to organize project files. In this case, I would suggest creating an images directory in the project home and storing images there, with good, descriptive names. Then use a relative path to link to the image: ![Alt text](images/image-url.jpg) where images/ is the directory name and image-url.jpg is the image file name.

Code

Inline Code: Use a single backtick to wrap your code in text.
- For example: `inline code`.
Code Blocks: For larger sections of code, use three backticks or indent with four spaces.

The following code:

```
your code here
```

Will render as:

your code here

Blockquotes

To create a blockquote, use the greater than symbol > at the beginning of a line. For nested blockquotes, use multiple > symbols.

This code:

> This is a blockquote.
>> This is a nested blockquote.

Will render as:

This is a blockquote.

This is a nested blockquote.

Horizontal Rules

Create a horizontal line or rule by using three or more asterisks, dashes, or underscores on a new line.

---

The above will render as:

Additional Tips

Whitespace and Line Breaks: In Markdown, paragraphs are automatically created when text is separated by an empty line. To create a new line without starting a new paragraph, end a line with two or more spaces.
Escaping Markdown: To display a Markdown character, precede it with a backslash (\). For example, \*demo italicizing\*.

Preview and Save

As you edit your README.md file, you can click the Preview tab to see how it will be rendered.

Once you are finished editing, save with the following steps:

Click on the Commit changes... button.
A pop-up will appear. Update the Commit message or leave as-is:
- When you make more substantive edits, you will want to leave descriptive commit messages.
- This helps with with version control.
Press the Commit changes button.
Then click on the repo link to return to your repo's homepage.

File Naming Conventions

README files serve as a de facto standard file. They provide a description of the project, outline its purpose, or provide instructions on using the repository. As you work on your projects, you can return to edit your README file to add more information about your work.

In the process of working on your project, you will create other files, and you want to name them well. Good file names help to organize and maintain a clear and efficient documentation system. They help provide and ensure:

Clarity and Accessibility: To save time and reduce confusion, use well-named files:
- They make it easier to identify and understand your files at a glance.
Ease of Navigation: Use consistent naming to aid navigating through files.
System Compatibility:
- Avoid spaces in file names. They cause issues in URLs and command-line operations.
- Avoid special characters like !, $, #, %, etc. They have specific functions in certain environments or scripts, including shell environments.
- Name files with single words or combine words using:
  - camelCase: serverSetupGuide.md,
  - underscores: server_setup_guide.md, or
  - hyphens: server-setup-guide.md.

The Importance of `.md` Extension for Markdown Files

File name extensions are not always necessary, especially on Linux and Unix systems. However, when it comes to Markdown files, add the .md extension (e.g., README.md rather than just README). This helps in the following ways:

GitHub Rendering: GitHub automatically renders files with a .md extension as formatted Markdown. This means your documentation will be displayed with the intended formatting (like headers, lists, links, etc.) when viewing it on GitHub.
Editor Support: Most code editors use file extensions, like .md, and provide appropriate syntax highlighting.
Consistency and Recognition: Using a file extension, like .md, helps users identify the file type and its intended use.

For instance, naming a file installation_guide.md ensures that GitHub renders the file as a Markdown document and displays all formatting correctly in the browser. This enhances readability and makes the documentation more user-friendly. Your text editor will also recognize the file extension and colorize the syntax appropriately.

Gitting Started

Now that we've set up our GitHub repo, it's time to return to our virtual machines. git is already installed on these machines, but it needs to be configured.

Git Configuration

First, connect to your remote server and run the commands below to begin configuring git. In the example commands below, note the quotes around the Your Name command. Replace Your Name with your name and keep those quotes. You don't need quotes in the commands for setting your github_username and email address. Simply replace your info in the respective places. Use the same information you used when setting up your GitHub account. Run these commands separately:

git config --global user.name "Your Name"
git config --global user.email youremail@example.com

Next, configure git to use the name main as your default branch. The second command instructs git to use nano as your default editor. Run these two commands as-is, but if you are using a different text editor (like tilde or micro), be sure to lookup the appropriate command for that editor (it's just tilde or micro, though). Keep the quotes around the editor name, which should be the name of the executable (i.e., program name) for your text editor.

git config --global init.defaultBranch main
git config --global core.editor "nano"

Verify the above settings with the following command:

git config --list

For additional details, see the Git documentation on getting started:

Getting Started - First-Time Git Setup

Next, we need to configure how git and GitHub work together.

Generate SSH Keys

We need to secure our git and GitHub connection and repositories. We do that first by creating an SSH key.

On the server:

Generate a new ssh key with the following command:
1. ssh-keygen -t ed25519 -C "your_email@example.com"
2. Use the same email that you used when signing up with GitHub.
Copy your SSH public key to your clipboard:
1. View it with this command: cat ~/.ssh/id_ed25519.pub.
2. Then select it with your mouse and copy it.
3. Open GitHub and visit Settings.
4. In the Access section of sidebar, click SSH and GPG keys
5. Click New SSH key or Add SSH key
6. In the Title field, add a descriptive label for the new key:
  1. For example, the name of the machine you used to generate the key.
7. Select the key type: authentication.
8. Paste your SSH public key in the Key field.
9. Click Add SSH key
10. See the official documentation here: Adding a New SSH Key
On your virtual machine, setup the SSH public key as your signing key:
1. git config --global gpg.format ssh
2. git config --global user.signingkey $HOME/.ssh/id_ed25519.pub
3. git config --global commit.gpgsign true
4. See the documentation at: Telling Git About Your Signing Key

Clone Your Repo

Now that you have git configured to work with GitHub, clone your repo to your virtual machine.

Return to GitHub and your repo's homepage.
Click the green Code drop down button.
Make sure the SSH option is selected.
Copy the command, which should have the following syntax:
- git@github.com:repo_user/repo_name.git
- repo_user should be your GitHub username.
- repo_name.git should be your repo's name.
Return to your Linux virtual machine, and run the following command to clone your repo:
- git clone git@github.com:repo_user/repo_name.git
- This command will create a new directory named after your repo.

Stage, Commit, and Push Your Repo

Now use your text editor (e.g., nano) to make changes to your repository. Navigate to your repo's directory on your virtual machine:

cd repo_name

Create and open a new file. I'll use entry_one.md as an example file name, but feel free to choose a different name:

nano entry_one.md

Add whatever you'd like here to get started. When completed, save the file and exit nano.

Now we need to push our changes to our GitHub repo. First, stage the changes with the git add command:

git add entry_one.md

Then commit the changes and add a commit message with the -m option:

git commit -m "commit message here"

Then push the commit to our GitHub URL (i.e., origin) and main branch:

git push origin main

Visit your repo's homepage on GitHub to see the update.

Whenever we add, edit, or delete a file or directory in our local repo, we follow the stage (add), commit, and push steps above. You can monitor the status of your local repository with the following command:

git status

Pull

Your remote repository is located on GitHub. Your local repository is located on your virtual instance. Get used to working on your documentation in your local repository. However, if you mix it up and make edits to files on your remote repository via the GitHub web interface, then you need sync your local and remote repositories before switching back to local work. To do that, you need to run a pull command:

cd repo_name
git pull origin main

If you often switch between local and remote repo work, the repos will quickly grow apart and it will be hard to merge them later.

Git Basics

Now that we've covered the basic practice, let's review some concepts.

Repos

The first Git concept to learn is the repository concept. Git uses two kinds of repositories:

local repository (repo)
remote repository (repo)

The local repo is a project directory (or folder) on your computer. I will use the term directory and not folder since the former term is more commonly used in tech fields. The project directory contains all the project files and any sub-directories for the project.

The remote repo is where we send, retrieve, or sync the files and directories that are contained in the local repo. We can retrieve projects from other repos that other people or organizations have created, if those repos are public.

With Git and GitHub, we can start a project on the local system (i.e., our computers) or start a project by creating a remote repo on GitHub and then cloning it to our local system.

Branches

The second Git concept to learn is:

branches

When you configure a directory on your local system to become a Git project, you create a default branch for your project. For small projects, we might only work in the default branch. The default branch will be named main.

However, since Git is a version control system, we can create additional branches to test or work on different components of our projects without messing with the main branch. For large or complex projects, we would work and switch among different branches. A large project might be a big website, an software application, or even an operating system. Working in non-main branches (e.g., a testing branch), allows us to develop components of our project without interfering with the main branch, which might be at a stable version of our project. And then when we are ready, we can merge a testing branch with our main branch, or we can delete the testing branch if we don't want to use it.

We will primarily work with the default, main branch with our projects, but you should read the Git documentation on branches.

For future reference, here's a nice cheat sheet of Git commands.

Learning the Command Line

It's obviously more common for people today to learn how to use a computer via a graphical user interface (GUI), but there are benefits to learning a command line interface (CLI). In this section, we learn some of the basics of using the Bash shell as our CLI. Our primary goal is to learn how to use the CLI as a file manager and to perform some text editing. However, if you find this interface appealing, know that Bash is a full-fledged programming language, and I encourage you to explore it as a scripting language.

There are three reasons, from a systems administration/librarianship point of view, to prefer the CLI over the GUI.

First, the GUI entails extra software, and the more software we have on a server, the more resources (memory, CPU, storage, etc) that software consumes. We would much rather have our machine's resources being used to provide the services we build them to do than to run irrelevant software.
Second, the extra software a GUI requires means that we expose our systems to additional security risks. That is, every time we install more software on our servers, the server becomes more vulnerable because all software is buggy. This means that we want to be conservative, careful, and protective of our systems. This is especially true for production systems.
Third, graphical user interfaces do not provide a good platform for automation, at least not remotely as well as command line interfaces do. Working on the command line, because it is a text-based environment, in what is known as a shell, is a reproducible process. That is not as easily true in a GUI.

Fortunately, Linux, and many other Unix-like operating systems, have the ability to operate without graphical user interfaces. This is partly the reason why these operating systems have done so well in the server market.

In this section, our focus is learning the command line environment. We will do this using the Bash shell. We will learn how to use the shell, how to navigate around the filesystem, how to perform basic tasks, and explore other functions and utilities the shell has to offer.

Searching with grep

As a systems librarian, you might deal with large amounts of text-based data: logs from library systems, metadata files, MARC records, exported citation data, and configuration files for tools that you manage. Searching these efficiently is crucial when troubleshooting issues, extracting insights, or automating repetitive tasks. Graphical interface-based applications exist for some of these tasks, but they can be slow, inflexible, or unavailable when working on a remote server. Fortunately we have grep, which is a command-line tool that allows for fast and precise searching. Using grep, we can accomplish all of the above.

There are other powerful utilities and programs to process, manipulate, and analyze text files (e.g., awk, sed, and more). However, in this section, we will focus on the grep utility, which offers advanced methods for searching the contents of text files. Specifically, we'll work through an introduction of grep using a small data file that will help us understand how grep works. Then we will use grep to analyze bibliographic data downloaded as a .bib file from Scopus. This will demonstrate how grep can help you filter specific information from a structured dataset—an approach that can also be applied to processing catalog records, debugging system errors, or analyzing usage logs (e.g., see Arneson, 2017).

`grep`

The grep command is one of my most often used commands. The purpose of grep is to "print lines that match patterns" (see man grep). In other words, it searches text, and it's super powerful.

grep works line by line. So when we use it to search a file for a string of text, it will return the whole line that matches the string. This line by line idea is part of the history of Unix-like operating systems, and it's important to remember that most utilities and programs that we use on the commandline are line oriented.

"A string is any series of characters that are interpreted literally by a script. For example, 'hello world' and 'LKJH019283' are both examples of strings" (Computer Hope). More generally, it's a type of data structure.

To visualize how grep works, let's consider a file called operating-systems.csv with content as seen below. It's helps to learn something like grep when working with easy, clear examples.

OS, License, Year
Chrome OS, Proprietary, 2009
FreeBSD, BSD, 1993
Linux, GPL, 1991
macOS, Proprietary, 2001
Windows NT, Proprietary, 1993
Android, Apache, 2008

We can use grep to search for anything in that file. Let's start with a search for the string Chrome. Notice that even though the string Chrome only appears once, and in one part of a line, grep returns the entire line.

Command:

grep "Chrome" operating-systems.csv

Output:

Chrome OS, Proprietary, 2009

Case Matching

Be aware that, by default, grep is case-sensitive, which means a search for the string chrome, with a lower case c, returns no results. However, many Linux command line utilities can have their functionality extended through command line options. grep has an -i option that can be used to to ignore the case of the search string. You can learn about grep's other command line options in its man page: man grep. In the following examples, grep returns nothing in the first search since we do not capitalize the string chrome. However, adding the -i option results in success since grep is instructed to ignore case:

Command:

grep "chrome" operating-systems.csv

Output:

None.

Command:

grep -i "chrome" operating-systems.csv

Output:

Chrome OS, Proprietary, 2009

Invert Matching

grep can do inverse searching. That is, we can search for lines that do not match our string using the -v option. Options can often be combined for additional functionality. We can combine -v to inverse search with -i to ignore the case. In the following example, we search for all lines that do not contain the string chrome:

Command:

grep -vi "chrome" operating-systems.csv

Output:

FreeBSD, BSD, 1993
Linux, GPL, 1991
iOS, Proprietary, 2007
macOS, Proprietary, 2001
Windows NT, Proprietary, 1993
Android, Apache, 2008

Regular Expressions

Sometimes data files, like spreadsheets, contain header columns in the first row. We can use grep to remove the first line of a file by inverting our search and selecting all lines not matching "OS" at the start of a line. Here the carat key ^ is a regex indicating the start of a line. Again, this grep command returns all lines that do not match the string os at the start of a line, ignoring case:

Command:

grep -vi "^os" operating-systems.csv

Output:

Chrome OS, Proprietary, 2009
FreeBSD, BSD, 1993
Linux, GPL, 1991
iOS, Proprietary, 2007
macOS, Proprietary, 2001
Windows NT, Proprietary, 1993
Android, Apache, 2008

Alternatively, since we know that the string Year comes at the end of the first line, we can use grep to invert search for that. Here the dollar sign key $ is a regex indicating the end of a line. Like above, this grep command returns all lines that do not match the string year at the end of a line, ignoring case. The result, in this specific instance, is exactly the same as the last command, indicating that there are sometimes many ways to achieve the same outcome with various commands:

Command:

grep -vi "year$" operating-systems.csv

Output:

Chrome OS, Proprietary, 2009
FreeBSD, BSD, 1993
Linux, GPL, 1991
iOS, Proprietary, 2007
macOS, Proprietary, 2001
Windows NT, Proprietary, 1993
Android, Apache, 2008

The man grep page lists other options, but a couple of other good ones include:

Count Matches

If we're looking for patterns in a data file, we may also be interested in their frequency. Fortunately, we can get a count of the matching lines with the -c option.

In the next example, I get a total count of lines that contain the word Proprietary:

grep -ic "proprietary" operating-systems.csv

More broadly, we can get a total count of rows in our file after excluding the header. In other words, we can get the total number of data rows or records:

grep -vic "year$" operating-systems.csv

Alternate Matching

We can do a sort of Boolean OR search by using the vertical bar |, also called the infix operator. This is called an alternate expression. That is, using alternate matching, we can search for at least one string among multiple options.

Here is an example where only one string returns a true value since the file contains bsd but not atari:

Command:

grep -Ei "(bsd|atari)" operating-systems.csv

Output:

FreeBSD, BSD, 1993

Here's an example where both strings evaluate to true:

Command:

grep -Ei "(bsd|gpl)" operating-systems.csv

Output:

FreeBSD, BSD, 1993
Linux, GPL, 1991

You can use more than two strings:

grep -Ei "(bsd|gpl|apache)" operating-systems.csv

Whole Word Matching

By default, grep will return results where the string appears within a larger word, like OS in macOS.

Command:

grep -i "os" operating-systems.csv

Output:

OS, License, Year
Chrome OS, Proprietary, 2009
iOS, Proprietary, 2007
macOS, Proprietary, 2001

However, we might want to limit results so that we only return results where OS is a complete word. To do that, we can surround the string with special characters:

Command:

grep -i "\<os\>" operating-systems.csv

Output:

OS, License, Year
Chrome OS, Proprietary, 2009

Sometimes I find it hard to remember the backslash and angle bracket combinations because they're too much alike HTML syntax but not exactly like HTML syntax. Fortunately, grep has a -w option to match whole words. This functions as another way of searching for whole words:

Command:

grep -wi "os" operating-systems.csv

Output:

OS, License, Year
Chrome OS, Proprietary, 2009

Context Matches

Sometimes we want the context for a result; that is, we might want to print lines that surround our matches. For example, to print the matching line plus the two lines after the matching line using the -A NUM option, where NUM equals the number of lines to return after the matching line:

Command:

grep -i "linux" -A2 operating-systems.csv

Output:

Linux, GPL, 1991
macOS, Proprietary, 2001
Windows NT, Proprietary, 1993

Or, print the matching line plus the two lines before the matching line using the -B NUM option:

Command

grep -i "linux" -B2 operating-systems.csv

Output:

Chrome OS, Proprietary, 2009
FreeBSD, BSD, 1993
Linux, GPL, 1991

We can combine many of the variations. Here I search for the whole word BSD, case insensitive, and print the line before and the line after the match:

Command:

grep -iw -C1 "bsd" operating-systems.csv

Output:

Chrome OS, Proprietary, 2009
FreeBSD, BSD, 1993
Linux, GPL, 1991

Halt Matching

We can use another option to stop returning results after some number of hits. Here I use grep to return a search for the string "proprietary" and stop after the first hit:

Command:

grep -i -m1 "proprietary" operating-systems.csv

Output:

Chrome OS, Proprietary, 2009

Returning Line Numbers

We can add the -n option to instruct grep to tell us the line number for each hit. Below we see that the string "proprietary" is found on lines 2, 5, and 6.

Command:

grep -in "proprietary" operating-systems.csv

Output:

2:Chrome OS, Proprietary, 2009
5:macOS, Proprietary, 2001
6:Windows NT, Proprietary, 1993

Character Class Matching

We can use grep to search for patterns in strings instead of literal words. Here we use what's called character classes and repetition to search for five letter words that contain any English character a through z:

Command:

grep -Eiw "[a-z]{5}" operating-systems.csv

Output:

Linux, GPL, 1991
macOS, Proprietary, 2001

Or four letter numbers, which highlights the years:

Command:

grep -Eiw "[0-9]{4}" operating-systems.csv

Output:

Chrome OS, Proprietary, 2009
FreeBSD, BSD, 1993
Linux, GPL, 1991
macOS, Proprietary, 2001
Windows NT, Proprietary, 1993
Android, Apache, 2008

grep can also search for words that begin with some letter and end with some letter and with a specified number of letters between. Here we search for words that start with m, end with s, and have three letters in the middle:

Command:

grep -Eiw "m.{3}s" operating-systems.csv

Output:

macOS, Proprietary, 2001

Practice

Let's use the grep command to investigate bibliographic data. Our task is to:

Search Scopus.
Download a BibTeX file from Scopus as a .bib file.
Use the grep command to search the downloaded BibTeX file, which should be named scopus.bib.

Download Data

I'm using Scopus data in this example, but other bibliographic data can be downloaded from other databases.

From your university's website, find Scopus.
In Scopus, perform a search.
Select the documents you want to download.
Click on the Export button.
Click on BibTeX under the listed file types.
Select all Citation Information and Bibliographic Information. Select more in interested.
Click on Export.

The file should be saved to your Downloads folder and titled scopus.bib. The next step is to upload the file to your virtual instance. See the steps below.

Upload to gcloud

There are several methods for uploading and downloading files to your Google Cloud instance. The two main ones I cover below depend on how you connect to your virtual instances, but see the full documentation at: Transfer files to Linux VMs.

`gcloud compute scp`

If you use the gcloud compute command to connect to your virtual instance, you use a similar command to upload and download files. However, there are some differences between the two commands. The gcloud copy command uses scp instead of ssh and then specifies the local file to transfer and the remote location. The following command copies the local file titled file_name to the remote server. Simply replace the file name, server, zone, and project names with those specific to your virtual instances.

gcloud compute scp file_name "server_name":~/ --zone "zone_name" --project "project_name"

SSH-in-browser

To upload to your virtual instance using the web browser, connect via the Open in browser window method from your VM instances console. Once you've established a connection, then click on the UPLOAD FILE. Select the file and proceed. (If you get an error, then try again.)

Investigate

Now that the file is uploaded, the first task is to to get an understanding of the structure of the data. BibTeX (.bib) files are structured files that contain bibliographic data. It's important to understand how files are structured if we want to search them efficiently.

The scopus.bib file begins with information about the source (Scopus) of the records and the date the records were exported. These two lines and the empty line after them can be safely deleted or ignored.

Each bibliographic record in the file begins with an entry type (or document type) preceded by an at @ sign. Example entry types include: article, book, booklet, conference, and more. There is a opening curly brace after the entry or document type. These curly braces mark the beginning and ending of each record.

The cite key follows the opening curly brace. The cite key is an identifier that often refers to the author's name and includes publication date information. For example, a cite key might look as follows and would stand for the author Budd and the date 2020-11-23.

Budd20201123

The rows below the entry type contain the metadata for the record. Each row begins with a tag or field name followed by an equal sign, which is then followed by the values or content for that tag. For example, there's an author tag, an equal sign, and then a list of authors. There is a standard list of BibTeX fields. Example fields include: author, doi, publisher, title, journal, year, and more. The fields are standardized because some programs use BibTeX records to manage and generate bibliographies, in-text citations, footnotes, etc.

The content of each field is enclosed in additional curly braces. Each line ends with a comma, except for the last line. The record ends with a closing curly brace.

Document Types

We can use grep to examine the types of documents in the list of records. In the following command, I use the carat key ^, which is a regular expression to signify the start of a line, to search for lines beginning with the at @ symbol. The following grep command therefore means: return all lines that begin with the at @ symbol:

grep "^@" scopus.bib

The results show, for this particular data, that I have BOOK and ARTICLE entry types. The data I'm using does not contain many records, but if it contained thousands or more, then it would be helpful to filter these results.

Thus, below I use the -E option to extend grep's regular expression engine. I use the (A|B) to tell grep to search for letters after the at sign @ that start with either A or B, for ARTICLE or BOOK. Then I use regular expression character class matching with [A-Z] to match any letters after the initial A or B characters. The -i option turns off case sensitivity, and the -o option returns only matching results from the lines. I pipe the output of the grep command to the sort command to sort the results alphabetically:

grep -Eio "^@(A|B)[A-Z]*" scopus.bib | sort

Tip: Without using the sort command, the grep command returns the results in the order it finds them. To see this, run the above command with and without piping to the sort command to examine how the output changes.

Now let's get a frequency of the document types. Here I pipe | the output from the grep command to the sort command, in order to sort the output alphabetically. Then I pipe the output from the sort command to the uniq command. The uniq command will deduplicate the results, and the -c option will count the number of duplicates. As a result, it will provide an overall count of the document or entry types we searched.

grep -Eio "^@(A|B)[A-Z]*" scopus.bib | sort | uniq -c

Journal Titles

We can parse the data for other information. For example, we can get a list of journal titles by querying for the journal tag:

grep "journal" scopus.bib

Even though that works, the data contains the word Journal in the name of some journals. If we were searching thousands or more records, we might want to construct a more unique grep search.

To rectify this, we can modify our grep search in two ways. First, the rows of data fields begin with a tab character. The regular expression for the tab character is \t. Therefore, we can search the file using this expression with the -P option:

grep -P "\tjournal" scopus.bib

Second, we can simply add more unique terms to our grep search. Since each tag includes a space, an equal sign, followed by another space, we can use that in our grep query:

grep "journal =" scopus.bib

Using either method above, we can extract the journal title information. Here I use two new commands, cut and sed. The cut command takes the results of the grep command, removes the first column based on the comma as the column delimiter. In the first sed command, I remove the space and opening curly brace and replace it with nothing. In the second sed command, I remove the closing curly brace and the comma and replace it with nothing. The result is list of only the journal titles without any extraneous characters. I then pipe the output to the sort command, which sorts the list alphabetically, to the uniq -c command, which deduplicates and counts the results, and again to the sort command, which sorts numerically, since the first character is a number:

grep "journal =" scopus.bib | cut -d"=" -f2 | \
    sed 's/ {//' | sed 's/},//' | \
    sort | uniq -c | sort

Total Citations

There are other things we can do if we want to learn more powerful technologies. While I will not cover awk, I do want to introduce it to you. With the awk command, based on the BibTeX tag that includes citation counts at the time of the download (e.g., note = {Cited by: 2}), we can extract the number from that field for each record and sum the total citations for the records in the file:

 grep -o "Cited by: [0-9]*" scopus.bib | \
    awk -F":" \
    'BEGIN { printf "Total Citations: "} \
    { sum += $2; } \
    END { print sum }'

In the above command, we use the pipe operator to connect a series of commands to each other:

use grep to search for the string "Cited by: " and to include any number of digits
use awk to use the colon as the column or field delimiter
use the awk BEGIN statement to print the words "Total Citations: "
instruct awk to sum the second column, which is the citation numbers
use the awk END statement to print the sum.

If you want to learn more about sed and awk, please see my text processing chapter for my Linux Systems Administration. There are also many tutorials on the web.

Conclusion

grep is very powerful, and there are more options listed in its man page.

The Linux (and other Unix-like OSes) command line offers a lot of utilities to examine data. It's fun to learn and practice these. Despite this, you do not have to become an advanced grep user. For most cases, simple grep searches work well.

There are many grep tutorials on the web if you want to see other examples.

References

Arneson, J. (2017). Determining usage when vendors do not provide data. Serials Review, 43(1), 46–50. doi.org/10.1080/00987913.2017.1281788

Managing Software

Introduction

ManyLinux distributions use a package manager to handle the installation, upgrades, and uninstalls of the software on a system. The Ubuntu distribution uses a package manager called dpkg and a front-end called apt (advanced package tool). We will use apt to install, update, and remove software from our servers.

`sudo`

To use the package manager, we will need the sudo command. The sudo command allows us to execute a command as another user (see man sudo). By default, the sudo command executes a command as the superuser (see man 8 sudo).

The name of the superuser account is root. Th root user can perform administrative tasks that regular users cannot. For security purposes, regular accounts may not add, remove, or update software on a system, nor may they modify most files or directories outside their home directories. Using sudo allows regular users who have administrative privileges to perform maintenance tasks on our systems by using executing commands as the root user. Some consider this safer than logging in as the root user.

Not all regular users can use the sudo command. On regular Ubuntu distributions, users must belong to the sudo group in order to run the sudo command. The groups command will return a list of groups that your account belongs to. On the Ubuntu version used by the Google Cloud Platform (GCP), your user should belong in the google-sudoers group. The difference between the sudo group on regular Ubuntu distributions and the google-sudoers GCP version is that regular users in the google-sudoers group are not prompted for their password.

Down the line, we will use the sudo command to modify files, create directories, and perform other maintenance tasks needed to install and manage software. In this lesson, we will use sudo along with the apt commands to update our systems and install software.

`sudo` syntax

The sudo command is simple to use. When necessary, we use sudo by pre-pending it to the regular commands that we have already learned. In our home directories, for example, we don't need to use sudo to create a new directory with the mkdir command. Instead we type something like mkdir data to create a new directory/folder called data. But our regular user doesn't own the files or directories outside our home directory. For example, when we downloaded my bash scripts to the /usr/local/bin directory, we used sudo since don't own that directory. If I want to create a data directory in /usr/local/bin, then I have to use sudo at the beginning of my command:

cd /usr/local/bin
sudo mkdir data

Or, without changing to that directory, I can just specify the full path:

sudo mkdir /usr/local/bin/data

Or if I want to create a file in some other directory outside my home directory, then I have to use sudo there, too:

cd /srv
sudo touch data.csv

Or, without changing to that directory, I can specify the full path:

sudo touch /srv/data.csv

apt

We will use sudo in the above ways soon enough, but for now, we will use sudo to install, update, and uninstall software on our systems.

Next I'll demonstrate the apt commands that we'll need.

`sudo apt update`

Your system keeps a record of what software is installed on your system and their version numbers. The sudo apt update command updates that list and compares the update to what's installed. That is, if you have a piece of software called acme1.1 on your system, and acme1.2 is available, then running sudo apt update will let you know that you can upgrade to acme1.2. It's good practice to run sudo apt update before installing or upgrading your system. This lets your system upgrade to the most recent version of what you want to install.

In short, the command to download new package information is:

sudo apt update

`sudo apt upgrade`

Once the list of packages have been updated, you can upgrade with the sudo apt upgrade command if there are any upgrades. When you run this command, and if there are any upgrades, you will be prompted to proceed. You can press Y to proceed, or N to cancel.

This command is simply:

sudo apt upgrade

`apt search`

If you want to install a piece of software, then you have to install it using its package name. Sometimes that means we have to search for the name of the package. This apt command does not require the use of sudo. sudo is not required because apt search does not modify the system. It simply helps you search for a package name.

For example, the man pages provide helpful documentation about how to use the commands on our systems, but the man pages can also be dense and not straightforward.

Fortunately, there's an application called tldr. This is a community-driven application that provides simple help pages and examples of how to use some of the most commonly used commands.

To search for the tldr package, we execute the following command:

apt search tldr

This returns a list of results that match the search query. One of those results is the tldr package, which is simply named tldr. Not all packages are simply named, which is why we need to search for the specific name.

Note that sometimes when we search for a package, the list of results is quite long. In those cases, pipe the above command through the less pager to page through the results: apt search <packagename> | less

`apt show`

If we want more specific information about a package, we can use the apt show command along with the package name. Therefore, to get more information about the tldr application, we execute the following command:

apt show tldr

This will return a fuller description of the package (usually), as well as the URL to the application's website, plus other details. We do not need to use sudo because we are not modifying the system. We are only retrieving information.

`sudo apt install`

To install the tldr application, we use the sudo apt install command along with the package name. We want to make sure that the name of the package is exactly what was returned from the apt search command. In the tldr case, it's pretty straightforward. To install:

sudo apt install tldr

`sudo apt remove`

In order to remove a package, we use the sudo apt remove command. I like to add the --purge option because this also removes system configuration files that I probably do not need. That is, some applications install configuration files (configs) in the /etc directory. Adding --purge will remove those configs.

To remove a package and its system configuration files (if any), we run the command with the package name:

sudo apt --purge remove tldr

Some configs are stored in your home directory. Generally only end user applications install configs in our home directories. The --purge option will not remove those configs; instead, we have to remove them manually if we want.

`sudo apt autoremove`

One of the great things about dpkg and apt is that it installs and handles software dependencies really well. Few computer applications are self-contained, and they often require other software to operate. These other software are called dependencies. When we uninstall (or remove) applications, the package manager does not auto uninstall those dependencies that were installed with it. We use the autoremove command to uninstall those, which helps keep our systems clean:

sudo apt autoremove

`sudo apt clean`

When we install packages, some files are installed with them. The sudo apt clean removes those extra files and frees up disk space. It's a simple command:

sudo apt clean

The `dpkg` Command

If you use Windows, then you are likely familiar with downloading and installing .exe files. On macOS, the equivalent are .dmg files. The Ubuntu distribution has an equivalent file. These are .deb files. These files can be installed using the dpkg command:

sudo dpkg -i <file_name.deb>`

Like with exe or dmg files, you want to be careful installing deb files you find on the web. Unlike software managed by the apt system, these files are not monitored and can contain malicious code.

You can generally use apt to remove applications installed with dpkg. Or, you can uninstall an application installed with dpkg with dpkg:

sudo dpkg --purge <application_name>

In most cases, stick with apt.

Conclusion

The apt command makes it quite easy to manage software on our systems. We will use this command to install more complicated software later. Here's a list of commands we covered:

sudo apt update
sudo apt upgrade
apt search
apt show
sudo apt install
sudo apt --purge remove
sudo apt autoremove
sudo apt clean

To locate and install software

sudo apt update
apt search <package_name>
apt show <package_name>
sudo apt install <package_name>

sudo apt --purge remove <package_name>
sudo apt autoremove
sudo apt clean

To keep system up to date

sudo apt update
sudo apt upgrade
sudo apt autoremove
sudo apt clean

Library Search

In this section, we're going to explore the yaz-client. The yaz-client is an information retrieval client that uses the Z39.50/SRU protocols to query bibliographic databases, like library catalogs and repositories. For those unfamiliar, Z39.50 is a standard protocol in libraries for sharing, querying, and retrieving bibliographic information between library databases. Its development in the 1970s pre-dates the web, and its continued use illustrates the evolution of information retrieval systems since the 1970s. The protocol is maintained by the Library of Congress.

The yaz-client is an SRU client, as well. SRU (Search/Retrieve via URL) and SRW (Search/Retrieve Web service) are modern internet and web-based successors to Z39.50. These protocols offer modern flexibility and more simplicity in accessing and sharing bibliographic records than Z39.50. See OCLC's page on SRW/U for more information and The Library of Congress's documentation page: SRU/CQL.

The yaz-client allows us to interact with these protocols directly from the command line. This provides a hands-on opportunity with the underlying mechanics of digital library searches and data retrieval.

However, this exploration is only partly about learning a tool. More so, it's about understanding the history and ongoing development of information retrieval systems. This is a crucial (and fun!) part of library and information science.

In order for us to use the yaz-client, we need to connect to a library database. Fortunately, LSPs (library service platforms) can function as SRU targets for applications like yaz-client. For example, see the ExLibris tutorial on enabling and using SRU in Alma, its LSP product. We will connect to an Alma database in the following tutorial.

Installing `yaz`

First, let's get started by installing the yaz-client. Use the apt instructions from the prior lesson to locate the name of the yaz client.

First search for the name of the software:

apt search yaz

The package name happens to be yaz, but you never know! To get information about the program, we use the apt show command:

apt show yaz

The details help confirm that this is the program we want to install. Note that the output also returns a URL to the program's homepage on the web. Visit that link to read more about the software and its documentation.

To install yaz, run the following command:

sudo apt install yaz

Documentation

The documentation for the yaz-client can be accessed via its manual page or on the web. To access the man page, see:

man yaz-client

yaz is able to search quite a few bibliographic attributes, including many metadata fields. To see which attributes are available to yaz, see:

man bib1-attr

The Library of Congress also provides an overview of the bib1-attr documentation, but it's less comprehensive: Bib-1 Attribute Set

Complete documentation for the yaz-client can be found on its homepage: YAZ

Using `yaz`

The start the yaz program, run the yaz-client command.

yaz-client

This creates a new command line interface with a new prompt:

Z>

In this new interface, we can connect to a library's OPAC or discovery service. To do so, we use the open command followed by the server address. The following open command establishes a connection to the University of Kentucky's library catalog:

Z> open saalck-uky.alma.exlibrisgroup.com:1921/01SAA_UKY

Queries

Queries are constructed using Prefix Query Notation (PQN). In the context of PQN, this is a way of structuring queries where the operator (e.g., AND, NOT, OR) precedes the operands (e.g., search terms, attributes, fields).

Each query begins with a command followed by a search syntax articulated by its PQN. The list of commands are described in man yaz-client in the COMMANDS section. The main command we'll use is the find command, which may be abbreviated as f. Let's see some examples:

Example 1

To search for the term information in the title field and the term library science in the Library of Congress Subject Heading (LCSH) field, we use the following query:

Z> find @and @attr 1=4 "information" @attr 1=21 "library science"

Let's break that down:

find is the command that sends a search request
@and is the operator signifying a Boolean AND search of the next two attributes
@attr 1=4 instructs the query to search for the term in the Title field
"information" is the term for the Title search
@attr 1=21 instructs the query to search for the term in the subject heading field
"library science" is the second search term for the subject heading search

The search does not reveal the results. To peruse the results, we use the show command. To show the first record:

show 1

To show the second record:

show 2

And so forth.

Example 2

Search for works with subject headings library science and philosophy. In this example, I abbreviate the find command as f:

Z> f @and @attr 1=21 "library science" @attr 1=21 "philosophy"

@attr 1=21 instructs the query to search for the term library science in the subject heading field
@attr 1=21 instructs the query to search for the term philosophy in the subject heading field

Example 3

Find where personal name is "mcmurtry, larry".

Z> f @attr 1=1 "mcmurtry, larry"

@attr 1=1 instructs the query to search for the term mcmurtry, larry in the personal name field.

Example 4

Find where the term "c programming language" appears in the Any field.

Z> f @attr 1=1016 "c programming language"

@attr 1=1016 instructs the query to search for the term in Any field.

Finally, we can exit the yaz client with the quit command:

Z> quit

Advanced Usage

Let's open the yaz-client again but with the -m option. According to the yaz-client man page, the -m option option instructs the client to append bibliographic records to a file. In the example below, I arbitrarily name the file records.marc.

$ yaz-client -m records.marc

Again, we use the open command to connect to the library's catalog. Then use the find command to search the catalog. Use the show command to examine some of the retrieved records. Then use the quit command to exit the yaz-client.

Z> open saalck-uky.alma.exlibrisgroup.com:1921/01SAA_UKY
Z> find @and @attr 1=4 "information" @attr 1=21 "library science"
Z> show 1
Z> show 2
Z> show 3
Z> quit

However, this time when we exit the yaz-client, we can examine all the records we retrieved. The default file type isn't human friendly. We can take a look at the first few lines of the file first:

head records.marc

Then we can use the file command to determine its file type:

file records.marc
records.marc: MARC21 Bibliographic

Fortunately, we can convert the MARC file to friendlier formats. For example, using the yaz-marcdump command, we can convert the file to JSON, which is a standard text-based format for representing structured data (JSON).

yaz-marcdump -o json records.marc > records.json

We then use the jq command, a JSON processor, to format the JSON for better readability:

jq . records.json > records-formatted.json

With the records formatted, we can use the less command to scan the file, but the jq command is quite powerful and we can use it to query and examine specific fields in the JSON-formatted MARC records.

Note: learning jq and MARC is beyond the scope of this work. However, if you are new to MARC or need a reminder, see: MARC 21 Format for Bibliographic Data. The jq homepage also provides a nice tutorial: jq Tutorial.

But as an example, the following command extracts the 650 Subject field with the a (Topical term) subfields for our entries:

jq '.fields[] | select(has("650")) | .["650"].subfields[] | select(has("a")) | .a' records-formatted.json

Or we can examine general subdivisions ($x subfield) of the 650 subfields and tabulate the data by piping through sort, uniq -c, and sort:

jq '.fields[] | select(has("650")) | .["650"].subfields[] | select(has("x")) | .x' records-formatted.json | sort | uniq -c | sort

For other fields to examine, see the MARC 21 Reference Materials sheet.

`jq` breakdown

Selects all fields:

jq '.fields[]' records-formatted.json

Selects all 650 fields:

jq '.fields[] | select(has("650"))' records-formatted.json

Selects only the subfields from the 650 fields:

jq '.fields[] | select(has("650")) | .["650"].subfields[]' records-formatted.json

Selects only the x subfields from the 650 fields:

jq '.fields[] | select(has("650")) | .["650"].subfields[] | select(has("x")) | .x' records-formatted.json

Selects only the z subfields (Geographic subdivision) from the 650 fields:

jq '.fields[] | select(has("650")) | .["650"].subfields[] | select(has("z")) | .z' records-formatted.json

Other formats

We can the original MARC data to XML:

yaz-marcdump -o marcxml records.marc > records.xml

We can query the XML data with xmlstarlet command, which is similar to jq but for XML structured data.

Downloading All Results

The process above saved only records we examined with the show command. The following find query locates 120 records and then the show command below allows us to save to file all 120 records.

$ yaz-client
> set_marcdump records.new
> open saalck-uky.alma.exlibrisgroup.com:1921/01SAA_UKY
> find @and @attr 1=4 "technology" @attr 1=21 "library science"
> show 1 +120
> quit

Then we can follow the steps above to convert to JSON and examine the file with jq. One thing we learn with bigger data sets is that data gets messy. In the 120 records, I found differences in capitalization, usage of punctuation, and other variations that are mistakes. The following command helps to clean some of that up. The command is technically a one-liner, but I've broken it up on multiple lines by including a backslash at the end \.

jq '.fields[] | select(has("650")) | .["650"].subfields[] | select(has("a")) | .a' records-formatted.json |\
sort | \
sed 's/\.//g' | \
awk '{ print tolower($0) }' | \
sort | \
uniq -c | \
sort -n

In the following, I add a final sed and awk command on the last two lines. The final sed command deletes the most common subject heading, which is library science. Since these are all library science records, including it in the results is meaningless. The final awk command sums the number of records from the tabulated count.

jq '.fields[] | select(has("650")) | .["650"].subfields[] | select(has("a")) | .a' records-formatted.json |\
sort | \
sed 's/\.//g' | \
awk '{ print tolower($0) }' | \
sort | \
uniq -c | \
sort -n | \
sed '$d' | \
awk '{ sum+=$1 } END{print sum}'

Because these commands are query agnostic, they can be used to examine subject headings in the catalog from other queries. We can even select for other subfields, like the z 650 subfield to get the geographical divisions and use that to map out the geographies reported in the subject headings in a catalog.

Conclusion

Z39.50 is often presented as an abstract information retrieval concept even though it has played a central part of searching online catalogs and database for nearly 50 years. The protocol, using tools like yaz, can be used to build search interfaces to bibliographic data. For example, see:

A Guide to the PHP YAZ Library for Information Retrieval
Fun with bibliographic indexes, bibliographic data management software, and Z39.50 If you are interested in establishing a connection to the Library of Congress's catalog, use the following server address:

Z> open z3950.loc.gov:7090/voyager

Creating a LAMP Server

In this section, we learn how to set up a LAMP (Linux, Apache, MySQL, PHP) stack. This stack enables us to create a web server that provides extra funtionality via PHP and MySQL, both of which are required to run content management systems and integrated library systems.

Installing the Apache Web Server

Introduction

In this section, we focus on a fundamental component of the internet's infrastructure: the web server, or alternatively called the HTTP server.

The web server is the software that makes websites available in your browsers. The basic function is to make files on the web server accessible to others via their web browsers. At a basic level, the web is essentially a world wide file system, and the web browser retrieves and displays files from web servers, much like a file explore does for local files. At a more advanced level, HTTP can also add more complexity beyond simple file access, such as providing dynamic content, APIs, and more.

Knowing how a web server functions is crucial for anyone wanting to manage or deploy web services. There are many web servers available, but in this session, I will guide you through installing and using the Apache web server. Apache is one of the most popular web server applications. We will learn how to install it, configure it, and conduct basic checks to ensure its operation. We will end this lesson by creating your first web page on your first web server.

It's important to understand the basics of an HTTP server, and therefore I ask you to read Apache's Getting Started page before proceeding with the rest of this section. Each of the main sections on that page describe the important elements that make up and serve a website. These elements include:

clients, servers, and URLs
hostnames and DNS
configuration files and directives
web site content
log files and troubleshooting

Installation

Before we install Apache, we need to update our systems first. This ensures we will be downloading and installing the most recent, secure version of the package.

sudo apt update
sudo apt -y upgrade

Once the machine is updated, we can install Apache using apt. First we'll use apt search to identify the specific package name. I know that a lot of results will be returned, so I will pipe | the output from apt search command through the head command to look at the initial results:

apt search apache2 | head

The package that we're interested in happens to be named apache2 on Ubuntu. The name of this package is not a given, though. On other distributions, like Fedora, the Apache package is called httpd. This is why it's important to learn and use apt search <package_name> and apt show <package_name> commands to locate desired packages before installing.

apt show apache2

Once we've confirmed that apache2 is the package that we want, install it with the apt install <package_name> command. Press Y to agree to continue after running the command below:

sudo apt install apache2

Basic checks

Once it is installed, we need to make sure the server is up and running, configure some basic things, and create a basic web site.

To start, I use the systemctl command to acquire status info about apache2 and make sure it is enabled and running:

systemctl status apache2

The output may be overwhelming at first glance, so I advise you to read each line slowly. In particular, look for key lines that show its Active and Loaded status. For example, the output shows that apache2 is enabled, which is the default for this software. The term enabled means that the software starts automatically on reboot. The output should also state that the software is active. This means that the apache2 is running and live.

Creating a web page

Since apache2 is active, let's look at the default web page.

There are (at least) two ways we can look at the default web page. We can use a command line web browser or a graphical web browser, like Firefox, Chrome, etc.

Text Based Web Browser

We have lots of command line browsers to use. I like w3m because it defaults to Vim keybindings, but many like elinks.

To check the site with w3m, we have to install it first:

sudo apt install w3m

Or if you want to try elinks, run:

sudo apt install elinks

Once the text based browser is installed, we can visit our default site using its loopback IP address. The loopback address is named localhost and always points to the local machine. It is useful for testing services, connections, and more locally. From the command line on our server, we can run either of these two commands to view localhost:

w3m 127.0.0.1

Or:

w3m localhost

If you elected to use elinks, just replace w3m with it.

We can also acquire the system's private IP address using the ip a command. There are different address ranges for the private networks. On your home network, your private IP address for a your laptop or phone might begin with 192.168.x.x. On our virtual instances, the address will begin with the number 10 and look like 10.128.0.99. The difference deals with the size of the private networks. In any case, to use the private IP address with w3m from the virtual machine's command line, we run, assuming private IP address for my local machine is 10.128.0.99:

w3m 10.128.0.99

If apache2 installed and started correctly, you should see the following text:

Apache2 Ubuntu Default Page
It works!

To exit w3m or elinks, press q and then y to confirm exit.

Graphical Browser

To view the default web page using a regular web browser, like Firefox, Chrome, Safari, Edge, or etc., you need to get your server's public IP address. To do that, log into the Google Cloud Console click on the Compute Engine link, and then click on VM instances. You should see your External IP address in the table on that page. You can copy that external IP address or simply click on it to open it in a new browser tab. If successful, you should see the graphical version of the Apache2 Ubuntu Default Page.

Note that most browsers nowadays try to force a secure HTTPS mode. If your web page is not loading, make sure your URL is http://IP-ADDRESS and not https://IP-ADDRESS.

Please take a moment to read through the text on the default page. It provides important information about where Ubuntu stores configuration files, what those files do, and the document root, which is where website files are stored.

Create a Web Page

Let's create your first web page on your first web server. The default page described above provides the location of the document root at /var/www/html. The document root may reside at a different location on a different Linux operating system, so it's important to verify that location. Remember that the web is, at its simplest, a filesystem that has been made available to the wide world. The web server is what provides access to part of the filesystem. That point of access is called the document root.

When we navigate to the document root on the command line, we'll see that there is an index.html file located in that directory. This is the Apache2 Ubuntu Default Page that we visited above in our browsers. Most web servers look for a file specifically named index.html and serve that as the default. Let's rename that index.html file, in order to back it up, and create a new one:

cd /var/www/html/
sudo mv index.html index.html.original
sudo nano index.html

Note: we use sudo in this directory because we are working on files and directories outside our home directories. Thus, be careful here about the commands you run. Any mistake may result in deleting necessary files or directories.

If you know HTML, then feel free to write some basic HTML code to get started. Otherwise, you can re-type the content below in nano or like, and then save and exit out.

<html>
<head>
<title>My first web page using Apache</title>
</head>
<body>

<h1>Welcome</h1>

<p>Welcome to my web site.
I created this site using the Apache HTTP server.</p>

</body>
</html>

If you have your site open in your web browser, reload the page, and you should see the new text.

You can still view the original default page by specifying its name in the URL. Remember that web browsers are, at their most basic, simply file viewers. So it makes sense that you simply have to specify the name of the file you want to view. For example, if your public IP address is 55.222.55.222, then you'd specify it like so:

http://55.222.55.222/index.html.original

Conclusion

In this section, we learned about the Apache HTTP server. We learned how to install it on Ubuntu, how to use a systemctl command to check its status, how to create a basic web page in /var/www/html, how to view that web page using the w3m command line browser and in our graphical browser,

In the next section, we will install PHP, which will provide the language needed to connect to the relational database MySQL. This will enable more data driven web sites and begin to transform out sites from basic file viewers to full fledged applications.

Installing and Configuring PHP

Introduction

Client-side programming languages, like JavaScript, are handled entirely by the browser. To do this, browsers like Firefox, Chrome, Safari, Edge, and others include JavaScript engines that use just-in-time compilers to execute JavaScript code (see JavaScript Engine, Mozilla).

From an end user's perspective, you basically install JavaScript when you install a web browser.

PHP, on the other hand, is a server-side programming language. This means it must be installed on the server. Unlike with JavaScript, the browser does not execute PHP directly; instead, the web server processes the PHP and sends the resulting HTML or other content to the browser.

From a system or web administrator's perspective, this means that PHP has be installed and configured to work with the web/HTTP server. In our case, we have to install and configure PHP on our virtual instances to work with the Apache web server software.

One of the primary uses of PHP is to interact with databases, like MySQL, MariaDB, PostgreSQL, etc., in order to create data-based page content. To begin to set this up, we have to:

Install PHP and relevant Apache modules
Configure PHP and relevant modules to work with Apache
Configure PHP and relevant modules to work with MySQL

Install PHP

As usual, we will use apt install to install PHP and relevant modules. Then we will restart Apache using the systemctl command. Use apt show [package_name] to read more about each package we will install. The first command below installs the php and the libapache2-mod-php packages. The latter package is used to create a connection between PHP and the Apache web server.

sudo apt install php libapache2-mod-php
sudo systemctl restart apache2

Once installed, you want to confirm the installed version with the following command. This is because other software (e.g., WordPress, etc.) might require a specific version before that other software to work.

php -v

After we restart Apache, we need to check its status and see if there are any errors in the log output:

systemctl status apache2

Check Install

Next we check that PHP has been installed and that it's working with Apache. We can create a small PHP file in our web document root. To do that, we cd to the document root, /var/www/html/, and create a file called info.php:

cd /var/www/html/
sudo nano info.php

In that file, add the following text, then save and close the file:

<?php
phpinfo();
?>

Now visit that file using the public IP address for your server. If the public IP address for my virtual machine is 55.333.55.33, then in Firefox, Chrome, etc, I would open:

http://55.333.55.333/info.php

Again, be sure to replace the IP below with the IP address of your server and be sure to use http and not https.

You should see a page that provides system information about PHP, Apache, and the server. The top of the page should look like Figure 1 below:

Fig. 1. A screenshot of the title of the PHP install page.

Once you've confirmed that PHP is installed and functioning, you should delete it since it exposes detailed systems information:

sudo rm /var/www/html/info.php

Basic Configurations

By default, when Apache serves a web page, it looks for a file titled index.html and serves that, even if it does not display that file in the URL bar. Thus, http://example.com/ actually resolves to http://example.com/index.html in such cases. However, if our plan is to provide for PHP, we want Apache to default to a file titled index.php instead of index.html file. In these cases, http://example.com/ would actually resolves to http://example.com/index.php.

To configure that, we need to edit the dir.conf file in the /etc/apache2/mods-enabled/ directory. In that file there is a line that starts with DirectoryIndex followed by a list of files. The first file listed in that line is index.html, and then there are a series of other files that Apache looks for in the order listed. Apache checks that list list and prioritizes these in order of appearance on the list. If any of those files exist in the document root, then Apache serves those before proceeding to the next. We want Apache to prioritize the index.php file first and index.html second. Before modifying this file, it's good practice to create a backup of the original. So we will use the cp command to create a copy with a new name, and then we will use nano to edit the file.

cd /etc/apache2/mods-enabled/
sudo cp dir.conf dir.conf.bak
sudo nano dir.conf

Next we change the line to this, so that index.php is first in line:

DirectoryIndex index.php index.html index.cgi index.pl index.xhtml index.htm

Whenever we make a configuration change, we should use the apachectl command to check our configuration:

apachectl configtest

If we get a Syntax Ok message, we can reload the Apache configuration, restart the service, and check its status:

sudo systemctl reload apache2
sudo systemctl restart apache2
systemctl status apache2

Create an index.php File

Now create a basic PHP page. cd back to the Apache document root directory and use nano to create and open an index.php file:

cd /var/www/html/
sudo nano index.php

Let's add some HTML and PHP to it. We will add PHP that functions as a simple browser detector. Add the following code:

<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>Browser Detector</title>
</head>
<body>
    <h1>Browser & OS Detection</h1>
    <p>You are using the following browser to view this site:</p>

    <?php
    $user_agent = $_SERVER['HTTP_USER_AGENT'];

    // Browser Detection
    if (stripos($user_agent, 'Edge') !== false) {
        $browser = 'Microsoft Edge';
    } elseif (stripos($user_agent, 'Firefox') !== false) {
        $browser = 'Mozilla Firefox';
    } elseif (stripos($user_agent, 'Chrome') !== false && stripos($user_agent, 'Chromium') === false) {
        $browser = 'Google Chrome';
    } elseif (stripos($user_agent, 'Chromium') !== false) {
        $browser = 'Chromium';
    } elseif (stripos($user_agent, 'Opera Mini') !== false) {
        $browser = 'Opera Mini';
    } elseif (stripos($user_agent, 'Opera') !== false || stripos($user_agent, 'OPR') !== false) {
        $browser = 'Opera';
    } elseif (stripos($user_agent, 'Safari') !== false && stripos($user_agent, 'Chrome') === false) {
        $browser = 'Safari';
    } else {
        $browser = 'Unknown Browser';
    }

    // OS Detection
    if (stripos($user_agent, 'Windows') !== false) {
        $os = 'Windows';
    } elseif (stripos($user_agent, 'Mac') !== false || stripos($user_agent, 'Macintosh') !== false) {
        $os = 'Mac';
    } elseif (stripos($user_agent, 'Linux') !== false) {
        $os = 'Linux';
    } elseif (stripos($user_agent, 'iOS') !== false || stripos($user_agent, 'iPhone') !== false || stripos($user_agent, 'iPad') !== false) {
        $os = 'iOS';
    } elseif (stripos($user_agent, 'Android') !== false) {
        $os = 'Android';
    } else {
        $os = 'Unknown OS';
    }

    // Output Result
    echo "<p>Your browser is <strong>$browser</strong> and your operating system is <strong>$os</strong>.</p>";
    ?>

</body>
</html>

Next, save the file and exit nano. In your browser, visit your site at its public IP address (again, replace your server's IP address):

http://55.333.55.333/

Although your index.html file still exists in your document root, Apache now returns the index.php file instead. If for some reason PHP fails, then the index.html file would be served next since that's what is listed next in the dir.conf file on the DirectoryIndex line.

Conclusion

In this section, we installed PHP and configured it to work with Apache. We also created a simple PHP test page that reported our browser user agent information on our website.

In the next section, we'll learn how to complete the LAMP stack by adding the MySQL relational database to our setup.

Installing and Configuring MySQL

Introduction

We started our LAMP stack when we installed Apache on Linux. We added extra functionality when we installed and configured PHP to work with Apache. In this section, our objective is to complete the LAMP stack and install and configure MySQL.

If you need a refresher on relational databases, see: Introduction to Relational Databases. However in the next section, we will explore the database basics from the command line.

Install and Set Up MySQL

In this section, we'll learn how to install, setup, secure, and configure the MySQL relational database so that it works with the Apache web server and the PHP programming language.

First, as normal, we make run the following commands to ensure our machines are fully updated:

sudo apt update
sudo apt upgrade
sudo apt autoremove
sudo apt clean

Note that sometimes you will have to reboot your machine, mostly when you get a Linux kernel update. To do so, run sudo reboot now. This command will disconnect you from your machine. Wait a minute or two, and then reconnect.

Then we install the MySQL Community Server package. The MySQL Community Server package is a metapackage that installs the latest, most secure version of MySQL, regardless of the software's version number, as well as its dependencies.

sudo apt install mysql-server

The above install should be fine for us, but note that in some cases you first may want to specifically confirm which versions are available via the apt command since some software (e.g., WordPress) may require specific versions or above. You can check that with the following:

apt policy mysql-server

After installing, you can confirm the version number with the following command to ensure you know which version you are using:

mysql --version

The install process should start and enable the database server, but we can check if it's running and enabled using the systemctl command. Note that you are looking for the lines beginning with Loaded and Active.

systemctl status mysql

Next we need to run a post installation script called mysql_secure_installation. The script performs some security checks and creates a secure, baseline configuration of MySQL. To do that, run the following command:

sudo mysql_secure_installation

When you run the above script, you'll get a series of prompts to respond to. For most responses, you will want to respond with a Y for yes. We will respond with a Y to the first question on validating passwords, but to keep things simple, select LOW when prompted for the password validation policy. This will enforce a weak password policy for testing, but note that in real-world scenarios, we might want to select a more secure policy. In the output below, I show how to respond to each question:

Validate passwords: Y
Password validation policy: 0 (zero) for LOW
Remove anonymous users: Y
Disallow root login remotely: Y
Remove test database and access to it: Y
Reload privilege tables now: Y

We can login to the database now. In order to do so, we use the following command:

sudo mysql -u root

NOTE: we need to distinguish between the regular user prompt of our Linux accounts and the MySQL prompt below. The default prompt for our user accounts has the following syntax: user_name@computer_name:path$ . In the following, I will indicate we are at the MySQL prompt with the following text: mysql>. Do not type that prompt when you are using MySQL.

When I connect to a MySQL server, I like to list the available databases. To do this, we use the show databases; command. Note that MySQL commands end with a semicolon:

mysql> show databases;

The following databases should be returned:

+--------------------+
| Database           |
+--------------------+
| information_schema |
| mysql              |
| performance_schema |
| sys                |
+--------------------+
4 rows in set (0.01 sec)

To exit the MySQL server prompt and return to the Bash shell, we use the following command:

mysql> \q

Create and Set Up a Regular User Account

We need to reserve the root MySQL user for special administrative cases and create a regular MySQL user account for regular use cases.

To create a regular MySQL user, we use the MySQL create command. In the command below, I create a new user called opacuser with a complex password. The single quotes are quoting the password and are not the password itself. Thus in the example below, the Xs indicate my password.

First, log back into the MySQL server:

sudo mysql -u root

At the MySQL prompt, create the new user:

mysql> create user 'opacuser'@'localhost' identified by 'XXXXXXXXX';

If the prompt returns a Query OK message, then the new user should have been created without any issues.

Create a Practice Database

As the root database user, we create a new database for the user account we just created. We'll call this database opacdb and set the character encoding to UTF-8 to support international characters. Then we run the MySQL show command to view the new database. Next we grant all privileges on the database to the user account opacuser.

mysql> create database opacdb default character set utf8mb4 collate utf8mb4_unicode_ci;
mysql> show databases;
mysql> grant all privileges on opacdb.* to 'opacuser'@'localhost' with grant option;

Other than granting all privileges, we could limit the user to specific privileges, including:

CREATE
DROP
DELETE
INSERT
SELECT
UPDATE
GRANT OPTION

Such privileges may be called operations or functions. They allow MySQL users to use and modify the databases, where appropriate. For example, we may want to limit the opacuser user account to only be able to use SELECT commands. These decisions depend on the purpose of the database and our security risks.

Exit out of the MySQL database as the root MySQL user, and then exit out of the root Linux user account. You should be back to your normal Linux user account:

mysql> \q

Note: relational database keywords are often written in all capital letters: CREATE, DROP, SELECT, etc. As far as I know, this is simply a convention to make the code more readable. However, in most cases I'll write the keywords in lower case letters. This is simply because, by convention, I'm lazy.

Logging in as Regular User and Creating Tables

We can now start doing MySQL work. Note that when we login to the MySQL server, we leave the bash shell and enter the MySQL command line client. By default, the prompt for the client is bare-bones. We can make it more informative, though. To do so, open your .bashrc file:

nano .bashrc

Scroll to the bottom of the file and add the following:

export MYSQL_PS1="[\d]> "

Then save and exit and source the file:

source ~/.bashrc

As a reminder, we've created a new MySQL user named opacuser and a new database for opacuser that is called opacdb. When we run the show databases command as the opacuser user, we should see the opacdb database. Note below that I use the -p option when logging back into MySQL as the opacuser. This instructs MySQL to request the password for this user, it is required.

mysql -u opacuser -p

From the MySQL prompt, list the available databases and use the use command to switch to the new opacdb database:

mysql> show databases;
mysql> use opacdb;

A database is not worth much without data. In the following code, I create and define a new table for our opacdb database. The table will be called books that will contain data describing, er, some books. We will keep this table very simple and use only four fields: id, author, title, and copyright. The id field will function as a primary key (second to last line in the command below). This key is used as a unique identifier for a record in the field. When we create this key as a field called id, we state that it should be an integer id (or whole number), that it should only be a positive number unsigned, that it should not be empty not null, and that with each record, it should increment by a single integer auto_increment. When we create the author and title fields, we say that these fields can have a maximum length of 150 characters and should not be empty. When we create the copyright field, we limit it to the year data type, which means it has to adhere to a specific syntax YYYY, and should not be empty.

mysql> create table books (
        id int unsigned not null auto_increment,
        author varchar(150) not null,
        title varchar(150) not null,
        copyright year(4) not null,
        primary key (ID)
);

Note: A relational database contains tables. If you are unfamiliar with this, you can think of a database as an overall Excel spreadsheet file and tables as specific sheets in the Excel file. There is quite a bit that goes into creating proper tables in database because the composition dictates how well data is described and how tables connect and interact with (or relate to) each other. However, we are going to keep things rather simple in this exercise.

You can confirm that the table was created by running the following two commands. The MySQL show command lists the tables in a database and the describe command describes a table's structure.

mysql> show tables;
mysql> describe books;

Congratulations! Now create some records for that table.

Adding records into the table

We can populate our opacdb database with some data. (I simply picked the first book listed from the NYTimes best lists of books for the years 2019-2022.) We'll use the MySQL insert command to add our records into our books table. We need to specify three fields when entering data: author, title, and copyright. The copyright field is a date field, and it should conform to the YYYY syntax. We do not need to specify data for the id field because that will be created and will increment automatically.

mysql> insert into books (author, title, copyright) values
('Jennifer Egan', 'The Candy House', '2022'),
('Imbolo Mbue', 'How Beautiful We Were', '2021'),
('Lydia Millet', 'A Children\'s Bible', '2020'),
('Julia Phillips', 'Disappearing Earth', '2019');

Now we can view all the records that we just created with the MySQL select command:

mysql> select * from books;

Success! Now let's test our table.

Testing Commands

We will complete the following tasks to refresh our MySQL knowledge or begin to learn how it works. We will:

retrieve records or parts of records,
delete a record,
alter the table structure so that it will hold more data, and
add a record

Note: each MySQL command ends with a semi-colon. Some of the following MySQL commands are single-line, but others are multi-line. Regardless if a MySQL command is one-line or multi-line, it doesn't end until it ends with a semi-colon.

Please run the following commands, one at a time:

mysql> select author from books;
mysql> select copyright from books;
mysql> select author, title from books;
mysql> select author from books where author like '%millet%';
mysql> select title from books where author like '%mbue%';
mysql> select author, title from books where title not like '%e';
mysql> select * from books;
mysql> alter table books add publisher varchar(75) after title;
mysql> describe books;
mysql> update books set publisher='Simon \& Schuster' where id='1';
mysql> update books set publisher='Penguin Random House' where id='2';
mysql> update books set publisher='W. W. Norton \& Company' where id='3';
mysql> update books set publisher='Knopf' where id='4';
mysql> select * from books;
mysql> delete from books where author='Julia Phillips';
mysql> insert into books
       (author, title, publisher, copyright) values
       ('Emma Donoghue', 'Room', 'Little, Brown \& Company', '2010'),
       ('Zadie Smith', 'White Teeth', 'Hamish Hamilton', '2000');
mysql> select * from books;
mysql> select author, publisher from books where copyright < '2011';
mysql> select author from books order by copyright;
mysql> \q

Install PHP and MySQL Support

The next goal is to complete the connection between PHP and MySQL so that we can use both for our websites.

First install PHP support for MySQL. We're installing some modules alongside the basic support. These may or may not be needed, but I'm installing them to demonstrate some basics.

sudo apt install php-mysql php-mysqli

And then restart Apache and MySQL:

sudo systemctl restart apache2
sudo systemctl restart mysql

Create PHP Scripts

In order for PHP to connect to MySQL, it needs to authenticate itself. To do that, we will create a login.php file in in our document root's parent directory: /var/www. We also need to change the group ownership of the file and its permissions (see note below). This will allow the file to be read by the Apache web server but not by the world. This prevents the password information from being accessible to web users.

cd /var/www
sudo touch login.php
sudo chmod 640 login.php
sudo chown :www-data login.php
ls -l login.php
sudo nano login.php

A Short Aside on File Ownership and Permissions

We haven't covered in detail the concept of file ownership and permissions. In short, files and directories are owned by users and by user groups. When we run a command like ls -l, the output will show the file owner and the group owner: In the output below, the user root is the file owner and the group www-data is the group owner of the file login.php.
-rw-r----- 1 root www-data    0 Feb 14 02:44 login.php
We change the file permissions with the chmod command. We change the file ownership with the chown command. In the above code snippet, the chmod 640 login.php command changes the file permissions to:

user owner: read, write

group owner: read only

other/world: no permissions

For an in-depth tutorial, see Linux Systems Administrations, Chapter 3.3 File Permissions and Ownership

When we created the file using the sudo touch login.php command, the use of sudo creates the file with root as the owner. The chown :www-data login.php command changes the group ownership to www-data. The www:data user is the Apache user.

Many services, like Apache, have corresponding users on the system. Files placed in our document root /var/www/html will be served on the web. Therefore, those files must have read access for other/world. But we don't want the login.php file to be accessible to the world since it contains login information for our MySQL user. Thus, in addition to modifying file ownership and permissions, we also place it in the parent directory /var/www.

In the file, add the following credentials. If for some reason you used a different database name than opacdb and a different username than opacuser, then you need to substitute your names below. You need to use your own password where I have the Xs:

<?php // login.php
$db_hostname = "localhost";
$db_database = "opacdb";
$db_username = "opacuser";
$db_password = "XXXXXXXXX";
?>

Next we create a new PHP file for our website. This file will display HTML but will primarily be PHP interacting with our opacdb database.

Create a file titled opac.php in /var/www/html.

cd /var/www/html
sudo nano opac.php

Then copy over the following text. I suggest you transcribe it, especially if you're interested in learning a bit of PHP.

<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>MySQL Server Example</title>
</head>
<body>

    <h1>A Basic OPAC</h1>
    <p>We can retrieve all the data from our database and book table using a couple of different queries.</p>

    <?php
    // Load MySQL credentials securely
    require_once '/var/www/login.php';

    // Enable detailed MySQL error reporting
    mysqli_report(MYSQLI_REPORT_ERROR | MYSQLI_REPORT_STRICT);

    // Establish database connection
    $conn = new mysqli($db_hostname, $db_username, $db_password, $db_database);

    if ($conn->connect_error) {
        die("Connection failed: " . $conn->connect_error);
    }

    echo "<h2>Query 1: Retrieving Publisher and Author Data</h2>";

    // Query using prepared statement
    $stmt = $conn->prepare("SELECT publisher, author FROM books");
    $stmt->execute();
    $result = $stmt->get_result();

    while ($row = $result->fetch_assoc()) {
        echo "<p>Publisher " . htmlspecialchars($row["publisher"]) .
             " published a book by " . htmlspecialchars($row["author"]) . ".</p>";
    }

    $stmt->close();

    echo "<h2>Query 2: Retrieving Author, Title, and Date Published Data</h2>";

    $stmt2 = $conn->prepare("SELECT author, title, copyright FROM books");
    $stmt2->execute();
    $result2 = $stmt2->get_result();

    while ($row = $result2->fetch_assoc()) {
        echo "<p>A book by " . htmlspecialchars($row["author"]) .
             " titled <em>" . htmlspecialchars($row["title"]) .
             "</em> was released in " . htmlspecialchars($row["copyright"]) . ".</p>";
    }

    $stmt2->close();
    $conn->close();
    ?>

</body>
</html>

Save the file and exit out of nano.

Test Syntax

After you save the file and exit the text editor, we need to test the PHP syntax. If there are any errors in our PHP, these commands will show the line numbers causing errors or leading up to errors. Nothing will output if all is well with the first command. If all is well with the second command, HTML should be outputted:

sudo php -f /var/www/login.php
sudo php -f /var/www/html/opac.php

Now view the site by opening the public IP address for your server in your browser. If all goes well, you should see the data in your opacdb database and books table rendered in your webpage.

Conclusion

Congratulations! If you've reached this far, you have successfully created a LAMP stack. In the process, you have learned how to install and set up MySQL, how to create MySQL root and regular user accounts, how to create a test database with play data for practicing, and how to connect this with PHP for display on a webpage.

In regular applications of these technologies, there's a lot more involved, but completing the above process is a great start to learning more.

DIY Integrated Library Systems

Libraries use integrated library systems (ILS) to manage and add to their collections and to provide online public access catalogs or discovery systems to their users to search those collections. In this chapter, based on the LAMP stack we created in the prior chapter, we will build a basic ILS. Specifically, we will create a bare bones OPAC and a bare bones cataloging module, two cornerstone modules in all ILSs.

We start with an additional lesson in relational database usage. We then proceed to build the OPAC and cataloging modules. By the end of this chapter, you will have a deeper understanding of the fundamentals of these library technologies and how they are used in everyday practice.

Introduction to Relational Databases

In the last section, we installed, configured, and setup a Linux, Apache, MySQL, and PHP (LAMP) stack. While setting up MySQL, we created a basic opacdb database containing a books table. Then we learned a few queries to get a feel for how querying a relational database, like MySQL, works.

In this section, we are going to spend a bit more time with MySQL simply to acquire a greater understanding of relational databases. The real power of relational databases lies in their ability to manage and retrieve data efficiently. This is accomplished by spreading data across multiple tables in order to limit data duplication and increase efficiency.

We'll create a new database for our opacuser. Unlike the opacdb database, our new database, which we'll call DinnerDB, will contain two tables. This will reduce the amount of data we need to add to our database.

Create Database

First, we will create a new database. Our opacuser does not have the privileges to create a new database, so we must login as the root MySQL user:

sudo mysql -u root

Once logged in, we create a database called DinnerDB. We could create another user, but for our purposes, we can simply grant opacuser privileges on the new database.

mysql> create database DinnerDB;
mysql> grant all privileges on DinnerDB.* to 'opacuser'@'localhost';

We can now exit the root MySQL user account:

mysql> \q

Create Tables

Next we login as opacuser:

mysql -u opacuser -p

First, let's check if we can see if the new DinnerDB database is visible to opacuser. If so, we begin using it:

[(none)]> show databases;
[(none)]> use DinnerDB;

Create `Meals` Table

We are going to create two tables in DinnerDB. We will call the first table Meals and the second table Ingredients. The second table will list the ingredients and quantities needed to make the meals named in the Meals table.

The following command creates a table called Meals. The table has five values:

meal_id is an integer that serves as the primary key.
meal_name contains a variable-length string up to 100 characters.
cuisine contains a variable-length string up to 50 characters.
cooking_time is an integer that uses a CHECK constraint so that if a user enters a zero or a negative value, MySQL will reject it, or if a user enters no value, it will default to one.
vegetarian contains a BOOLEAN value, which means it must be TRUE or FALSE:
- technically, BOOLEAN is synonymous with a data type called TINYINT(1), but BOOLEAN better conveys what we mean.

create table Meals (
    meal_id int auto_increment primary key,
    meal_name varchar(100) not null,
    cuisine varchar(50),
    cooking_time int not null default 1 check (cooking_time > 0),
    vegetarian boolean
);

Create `Ingredients` table

The following command creates the Ingredients table. The table has four values:

ingredient_id is the primary key.
meal_id is an integer:
- the last line in the command declares the meal_id value to be a foreign key that references the meal_id primary key in the Meals table
- foreign keys allow for cross-referencing to primary keys.
- since we cross-reference meal_id in the Ingredients table to meal_id in the Meals table, we make the Ingredients table a child of the Meals table, which by entailment functions as the parent table to the Ingredients table.
- the on delete cascade clause instructs MySQL to delete associated ingredients when deleting a meal in the Meals table.
ingredient_name contains a variable-length string up to 100 characters.
quantity contains a variable-length string up to 50 characters.

create table Ingredients (
    ingredient_id int auto_increment primary key,
    meal_id int,
    ingredient_name varchar(100) not null,
    quantity varchar(50),
    foreign key (meal_id) references Meals(meal_id) on delete cascade
);

Insert data

Now that we have created the structure of our two tables, we can begin adding data to them. The first command adds four records to the Meals table:

insert into Meals (meal_name, cuisine, cooking_time, vegetarian) values
    ('Spaghetti Bolognese', 'Italian', 45, FALSE),
    ('Vegetable Stir Fry', 'Chinese', 20, TRUE),
    ('Chicken Curry', 'Indian', 50, FALSE),
    ('Mushroom Risotto', 'Italian', 35, TRUE);

And the second command adds the list of ingredients for the meals we added to the Meals table. The integers we use for meal_id match the values produced in the Meals table, which we can see with the select * from Meals; command. Therefore, 1 refers to Spaghetti Bolognese, 2 refers to Vegetable Stir Fry, and so on.

insert into Ingredients (meal_id, ingredient_name, quantity) values
    (1, 'Spaghetti', '200g'),
    (1, 'Ground Beef', '250g'),
    (1, 'Tomato Sauce', '1 cup'),
    (2, 'Broccoli', '100g'),
    (2, 'Carrots', '50g'),
    (2, 'Soy Sauce', '2T'),
    (3, 'Chicken Breast', '300g'),
    (3, 'Curry Powder', '2T'),
    (3, 'Coconut Milk', '1 cup'),
    (4, 'Arborio Rice', '1 cup'),
    (4, 'Mushrooms', '1 cup'),
    (4, 'Parmesan Cheese', '1/2 cup');

In practice, we might want to create an additional column that would contain units for the quantities (e.g., cups, grams, etc).
This would result in better [database normalization][db_normalization_wiki].

Querying Data

Now that we have created our tables and added records to them, we can begin to query them. The next command is a simple SELECT statement that returns the entire contents of the Meals table:

select * from Meals;

We can filter results with the WHERE clause. In this example, we filter results by whether a meal is vegetarian:

select * from Meals where vegetarian = TRUE;

We can sort the results by descending or ascending order. This works for both alphabetic and numeric characters. The following commands sort the Meals by length of cooking time, which is an integer:

select * from Meals order by cooking_time desc; 
select * from Meals order by cooking_time asc;

In the following command, we select three values:

meal_name from the Meals table and rename the resulting column Meals.
ingredient_name from the Ingredients table and rename the resulting column Ingredients.
quantity from the Ingredients table and rename the resulting column Quantity.

We also use the join action to cross-reference the tables based on the shared meal_id value. Note that I use the table_name.column_name syntax in the query. For example, Meals.meal_name refers to the column meal_name in the Meals table, and so forth. The as in as Meals, as Ingredients, and as Quantity instructs MySQL to rename the columns. This is useful for the presentation of the data:

select Meals.meal_name as Meals,
    Ingredients.ingredient_name as Ingredients,
    Ingredients.quantity as Quantity
    from Meals
    join Ingredients on Meals.meal_id = Ingredients.meal_id;

In the following example, we list the ingredients and their quantities based on the name of a meal. In this case, we are looking to list the ingredients for the Chicken Curry dish:

select ingredient_name as Ingredients,
    quantity as Quantity
    from Ingredients 
    where meal_id = (select meal_id from Meals where meal_name = 'Chicken Curry');

In the following example, we instruct MySQL to provide a count of the Meals by cuisine:

select cuisine, count(*) as meal_count 
    from Meals
    group by cuisine;

And finally, it's been a tough day, and we want to identify Meals that don't take long to cook. The following command returns all Meals where the cooking time is less than or equal to 45 minutes:

select meal_name, cooking_time 
    from Meals 
    where cooking_time <= 45
    order by cooking_time asc;

Once done querying our database, we can logout:

\q

Database Management

When we started this lesson, we logged into MySQL and created a database called DinnerDB. Then we granted opacuser all privileges to this database. There might be times when we want to revoke those privileges. To do so, we first log back in as the root MySQL user:

sudo mysql -u root

Now we can re-review the privileges for opacuser:

mysql> show grants for 'opacuser'@'localhost';

We can take away those privileges with the REVOKE command:

mysql> revoke all privileges on DinnerDB.* from 'opacuser'@'localhost';

To confirm, we can re-run the show grants command.

If we want to track other users in with accounts in the MySQL server, the following command queries the user table in the mysql database and will return all user accounts:

select user, host from mysql.user;

We can also delete the database using the DROP command:

mysql> drop database DinnerDB;

If desired, we can delete user accounts (but don't do this with opacuser). For example, if had a user named sean, then we could use the following command to remove their account:

mysql> drop user 'sean'@'localhost';

Conclusion

In this introduction, we explored the basics of relational databases using MySQL. We created a structured database (DinnerDB), defined two tables (Meals and Ingredients), inserted data, and performed various queries to retrieve information efficiently.

The key takeaways from this exercise include:

Normalization: Breaking data into multiple tables reduces redundancy and improves consistency.
Relationships: Using foreign keys, we linked meals with their ingredients. This allows for more meaningful data retrieval.
Querying: We practiced SELECT, JOIN, WHERE, ORDER_BY, and GROUP_BY to manipulate and filter data results.
Management: We reviewed how to create a database, grant privileges to the database to a specific user, remove those privileges, and delete the database.

This tutorial is only a start. You should experiment with modifying the data, adding constraints, separating quantities from units, or creating additional tables to further your understanding.

Databases are powerful tools for organizing and retrieving structured information. If you understand SQL and its logic, it will help you in other domains, too! So go play!

And if interested, I encourage you to learn other SQL implementation, like SQLite, which you can download and install on your personal machines fairly easily. SQLite is probably the most popular SQL application, but it uses a slightly different command syntax.

Creating a Bare Bones OPAC

In this section, we're going to create a bare bones, very basic OPAC. The idea is simple: to acquire an intuition and understanding of how data from a relational database is retrieved and entered using LAMP technologies.

A real integrated library system (ILS) is much more complex than what we are doing here, but the fundamental ideas are the same: we enter data into a database, and we retrieve data from a database. In practice, a whole slew of other technologies are added to present the data in a user-friendly way: HTML, CSS, and JavaScript.

ILSes provide modules for patron management, acquisitions, circulation, cataloging, serials management, authorities, reporting, and sometimes more (see Koha: About). Those modules rely on some kind of underlying relational database, like MySQL (which is what Koha uses). And this results in a complex, interconnected set of tables. We are working with only one table in our database, the books table. In reality, an ILS will rely on dozens of tables.

In the prior section, we created a MySQL database called opacdb, which has one table, called books. We also created a PHP file to retrieve the data from the books table and present it on a web page.

In this section, we are going to use different PHP code that will allow us to search the books table. In this way, we more closely mimic an OPAC.

Creating the HTML Page and a PHP Search Page

The first thing we do is create a basic HTML page that contains a form for entering queries. We'll call this HTML page with the form mylibrary.html. When a user clicks on the submit button in the form, the form will activate a PHP script called search.php. That search.php will establish a connection to the OPAC database that we already have created. Our PHP script will contain a special MySQL query that will allow us to search all the fields in our books table. Then it will iterate through each row of the books table and return results that match our query. We also add two date fields to our form to limit results by publication dates, which we labeled as copyright in our MySQL books table.

HTML Form

Here is the HTML for our search page, titled mylibrary.html:

<!DOCTYPE html>
<html>
	<head>
		<meta charset="UTF-8">
		<title>MySQL Server Example</title>
	</head>
<body>

	<h1>A Basic OPAC</h1>

	<p>In the form below, <b>optionally</b> enter text in the search field.
	Your search query will search by author, title, or publisher.
	Capitalization is not necessary.
	It's okay to enter partial information, like part of an author's, title's, or publisher's name.</p>

	<p>You can leave the search field empty and only enter dates.
    Regardless, both start and end dates are required for all searches.
	You can use the date fields to limit results, too.
	I added some extra records, which you can view to know what you can query:</p>

	<p><a href="opac.php">OPAC</a></p>

	<p>This is very much a toy, stripped down
	<a href="https://en.wikipedia.org/wiki/Online_public_access_catalog">OPAC</a>.
	The records are basic.
	Not only do they not conform to <a href="https://www.loc.gov/marc/">MARC</a>,
	they don't even conform to something as simple as <a href="https://www.dublincore.org/">Dublin Core</a>.

	<p>I also don't provide options to select different fields, like author, title, or publisher fields.
	Instead the search field below searches all the fields (author, title, publisher) in our <b>books</b> table.</p>

	<p>The key idea is to get a sense of how an OPAC works, though.</p>

	<h2>My Basic Library OPAC</h2>

	<form method="post" action="search.php">
		<label for="search">Search Terms (optional):</label>
		<input type="text" name="search" id="search">
        
		<br>
        
		<label for="start_date">Start Date:</label>
		<input type="date" name="start_date" id="start_date" required>
        
		<br>
        
		<label for="end_date">End Date:</label>
		<input type="date" name="end_date" id="end_date" required>
        
		<br>
        
		<input type="submit" value="Search">
	</form>

</body>
</html>

PHP Search Script

Here is the PHP for our search script, which should be named search.php:

<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>Search Results</title>
<style>
    table {
        border-collapse: collapse;
        width: 100%;
    }
    th, td {
        border: 1px solid black;
        padding: 8px;
        text-align: left;
    }
</style>
</head>
<body>

    <h1>Search Results</h1>

    <?php
    // Load MySQL credentials
    require_once '/var/www/login.php';

    // Enable MySQL error reporting
    mysqli_report(MYSQLI_REPORT_ERROR | MYSQLI_REPORT_STRICT);

    // Establish connection
    $conn = new mysqli($db_hostname, $db_username, $db_password, $db_database);
    if ($conn->connect_error) {
        die("Connection failed: " . $conn->connect_error);
    }

    if ($_SERVER["REQUEST_METHOD"] == "POST") {
        $search = trim($_POST['search']);
        $start_date = $_POST['start_date'];
        $end_date = $_POST['end_date'];

        // Prepared statement to prevent SQL injection
        $stmt = $conn->prepare("SELECT * FROM books 
                                WHERE (author LIKE ? OR title LIKE ? OR publisher LIKE ?) 
                                AND copyright BETWEEN ? AND ?");

        // Use wildcard search
        $search_param = "%$search%";
        $stmt->bind_param("sssss", $search_param, $search_param, $search_param, $start_date, $end_date);
        $stmt->execute();
        $result = $stmt->get_result();

        if ($result->num_rows > 0) {
            echo "<table>";
            echo "<tr><th>ID</th><th>Author</th><th>Title</th><th>Publisher</th><th>Copyright</th></tr>";

            while ($row = $result->fetch_assoc()) {
                echo "<tr>";
                echo "<td>" . htmlspecialchars($row["id"]) . "</td>";
                echo "<td>" . htmlspecialchars($row["author"]) . "</td>";
                echo "<td>" . htmlspecialchars($row["title"]) . "</td>";
                echo "<td>" . htmlspecialchars($row["publisher"]) . "</td>";
                echo "<td>" . htmlspecialchars($row["copyright"]) . "</td>";
                echo "</tr>";
            }

            echo "</table>";
        } else {
            echo "<p>No results found.</p>";
        }

        $stmt->close();
    }

    $conn->close();
    ?>

    <p><a href="mylibrary.html">Return to search page</a></p>

</body>
</html>

Modifications

Add more records, using MySQL, to your books table, and test your queries. To add records to your books table, recall that we used the insert into MySQL statements. Here's the example from the prior lesson. Use it to add titles that are of interest to you.

First connect to the MySQL server:

mysql -u opacuser -p

Then run the insert command with the data for the new records:

insert into books
(author, title, publisher, copyright) values
('Emma Donoghue', 'Room', 'Little, Brown \& Company', '2010'),
('Zadie Smith', 'White Teeth', 'Hamish Hamilton', '2000');

Conclusion

In this lesson, we created a very bare bones OPAC simply to express the fundamental idea of how data is stored and retrieved on the web. In reality, what separates an OPAC, or a discovery service in a modern integrated library system or library service platform, from other databases on the web is the structure of the records that are stored in the relational database. Such records are structured using MARC. Our records are very simply structured, but still, I hope this helps in creating an intuition about how OPACs and like function. In the next section, we will learn how to enter data into our catalog, thereby mimicking the cataloging module of an integrated library system.

Creating a Bare Bones Cataloging Module

If you have worked with an integrated library system (ILS) or a more modern library service platform (LSP), then you know that an OPAC or discovery system, respectively, is simply one module out of several that makeup an ILS or LSP. Other modules include acquisitions, authority files, circulation, course reserves, patron management, and more. In the prior section, we created one of those modules: a bare bones OPAC. In this section, we are going to create a bare bones cataloging module in the same kind of way.

Up until this point, you have added records to your OPAC using the MySQL command interface. But unless you are a full time database administrator or programmer, it's unlikely that you would add data to your system via that interface. Instead you would use an application via a fancy graphical user interface, i.e., integrated library system. The reason we started off with MySQL is not because you would necessarily use this interface on a daily basis. Rather, it's because I want you to understand the foundations of these technologies and the how they get translated for users when they become web applications.

Creating the HTML Page and a PHP Cataloging Page

Like in the last exercise, the first thing we do is create a basic HTML page that contains a form for entering our bibliographic data. Again, our cataloging module will not be real world like. The goal here is to build an intuition about how these technologies work and to provide some grounding if you do want to pursue a more technical path.

The form that we will create needs to mirror the data structure in the books table that we created in our prior lesson. That means it will only contain four fields:

author
title
publisher
copyright

I'll call this page index.html. I'll create a new directory for this module:

cd /var/www/html
sudo mkdir cataloging

Then I'll use a text editor to create the index.html file and add the content:

cd cataloging
sudo nano index.html

In index.html, we add the following content:

<!DOCTYPE html>
<html>
<head>
	<title>Enter Records</title>
</head>
<body>
	<h1>OPAC Library Administration</h1>

	<p>This is the library administration page for entering records into the OPAC.</p>
	<p>Please do not use this page unless you are an authorized cataloger.</p>

	<form action="insert.php" method="post">
		<label for="author">Author:</label>
		<input type="text" name="author" id="author" required><br><br>

		<label for="title">Book Title:</label>
		<input type="text" name="title" id="title" required><br><br>

		<label for="publisher">Publisher:</label>
		<input type="text" name="publisher" id="publisher" required><br><br>

		<label for="copyright">Copyright:</label>
		<input type="number" name="copyright" id="copyright" min="1000" max="2300" required>

		<input type="submit" value="Submit">
	</form>
</body>
</html>

PHP Insert Script

The index.html page will provide a user interface, that is, a form, for entering our bibliographic data. However, the PHP script is needed to communicate and add the data from our form into our MySQL database and books table.

Also, just as the HTML form has to match the data structure of the books table, the PHP script also needs to match the form from the HTML page and the data structure in the books table.

Here is the PHP script, which I call insert.php, which you'll notice was referenced in the HTML code above:

<!DOCTYPE html>
<html>
<head>
	<meta charset="UTF-8">
	<meta name="viewport" content="width=device-width, initial-scale=1.0">
	<title>Cataloging: Data Entry</title>
</head>
<body>

<h1>Cataloging: Data Entry</h1>

<?php

// Load MySQL credentials
require_once '/var/www/login.php';

// Enable MySQL error reporting
mysqli_report(MYSQLI_REPORT_ERROR | MYSQLI_REPORT_STRICT);

// Establish connection
$conn = new mysqli($db_hostname, $db_username, $db_password, $db_database);
if ($conn->connect_error) {
    die("Connection failed: " . $conn->connect_error);
}

// Prepare and bind SQL statement
$stmt = $conn->prepare("INSERT INTO books (author, title, publisher, copyright) VALUES (?, ?, ?, ?)");
$stmt->bind_param("ssss", $author, $title, $publisher, $copyright);

// Set parameters and execute statement
$author = $_POST["author"];
$title = $_POST["title"];
$publisher = $_POST["publisher"];
$copyright = $_POST["copyright"];

if ($stmt->execute() === TRUE) {
    echo "New record created successfully";
} else {
    echo "Error: " . $stmt->error;
}

// Close statement and connection
$stmt->close();
$conn->close();
?>

<p><a href='index.html'>Return to Cataloging Page</a></p>
<p><a href='../mylibrary.html'>Return to Library Home Page</a></p>
</body>
</html>

Security

Since our HTML and PHP files allow us to enter data into our MySQL database from a simple web interface, we need to limit access to the module. In a real-world situation, modules like these would have a variety of security measures in place to prevent wrongful data entry. In our case, we will rely on a simple authorization mechanism provided by the Apache2 server called htpasswd.

First, we create an authentication file in our /etc/apache2 directory, which is where the Apache2 web server stores its configuration files. The file will contain a hashed password and a username we give it. In the following command to set the password, I set the username to libcat, but it could be anything:

sudo htpasswd -c /etc/apache2/.htpasswd libcat

Next we need to tell the Apache2 web server that we will use the htpasswd to control access to our cataloging module. To do that, we use a text editor to open the apache2.conf file.

sudo nano /etc/apache2/apache2.conf

In the apache2.conf file, look for the code block / stanza below. We are interested in the third line in the stanza, which is line 172 for me, and probably is for you, too.

<Directory /var/www/>
  Options Indexes FollowSymLinks
  AllowOverride None
  Require all granted
</Directory>

Carefully, we need to change the word None to the word All:

<Directory /var/www/>
  Options Indexes FollowSymLinks
  AllowOverride All
  Require all granted
</Directory>

Next, change to the cataloging directory and use our text editor to create a file called .htaccess (note the leading period in the file name):

cd /var/www/html/cataloging
sudo nano .htaccess

Add the following content to .htaccess:

AuthType Basic
AuthName "Authorization Required"
AuthUserFile /etc/apache2/.htpasswd
Require valid-user

Check that the configuration file is okay:

apachectl configtest

If you get a Syntax OK message, then restart Apache2 and check its status:

sudo systemctl restart apache2
systemctl status apache2

Permissions and Ownership

The Apache2 web server has a user account on your Linux server. The account name is www-data, and it's account details are stored in the /etc/passwd file:

grep "www-data" /etc/passwd
www-data:x:33:33:www-data:/var/www:/usr/sbin/nologin

From the output, we can see that the www-apache user's home directory is /var/www and its default shell is /usr/sbin/nologin. See man nologin for details, but in short, the nologin prevents the www-data account to be able to login to a shell.

You can compare the output of the above grep command with your account information that is stored in /etc/passwd. Use the following command: grep $USER /etc/passwd to do so. You'll see, for example, that your home directory is listed there as well as your default shell, which is bash.

The benefit with having Apache2 a user is that we can limit file permissions and ownership to this user.

The general guidelines for this are as follows:

Static files (like HTML, CSS, JS) might not need to be writable by the Apache server, so they could be owned by a different user (like your own user account) but be readable by www-data.
Directories where Apache needs to write data (like upload directories) or applications that need write access should be owned by www-data.
Configuration files (incl. files like login.php) should be readable by www-data but not writable, to prevent unauthorized modifications.

We can initiate this guidelines with the chown and chmod commands:

Change the group ownership of /var/www/html to www-data:
```
 sudo chown :www-data /var/www/html
```
Set the setgid bit on /var/www/html. This command makes it so that any new files and directories created within /var/www/html will inherit the group ownership of the parent directory (www-data, in this case). While this ensures that group ownership is inherited, the user ownership of new files will still be the user that creates the files. In our case, since we use sudo to work in this directory, that means that the user owner for subsequent files and directories will be the Linux root user.
```
 sudo chmod -R g+s /var/www/html
```

Get Cataloging!

Now visit your cataloging module. You should be required to enter the username and password that you created with htpasswd.

Conclusion

In the last lesson, we created a very bare bones OPAC that would allow patrons to search our catalog. In this lesson, we learned how to create a bare bones cataloging module that would allow librarians to add bibliographic data and records to the OPAC.

Now try this:

Add some records using the above form, and then return to your OPAC and conduct some queries to confirm that the new records have been added.
Use the MySQL command line interface to view the new records, just like we did a couple of lessons ago.

In a production level environment, we would add quite a bit more functionality and security. Our MySQL database would contain many more tables that allow storing data related to the modules listed above. We would also like to make our modules graphically attractive and provide more content. That would mean we would add Cascading Style Sheets (CSS) and JavaScript to create an attractive and usable interface. But that would be a whole other book.

Installing Content Management Systems

Many library websites are compositions of interconnected resources. For example, a library may have a front-facing website that provides basic information about the library, its physical locations, and its services. That front-facing website may be connected to an integrated library system (ILS) or library service platform (LSP) that is itself a different website. The front-facing website may also be connected to other sites that provide access to all sorts of databases. In the end, this means that a library website is not just one place. It is much more like a series of interconnected buildings, each of which has its own entry points.

In this section, we will learn how to build these interconnected resources. First, we learn how to use WordPress to setup a library's front-facing web presence. The basic process is similar to the process we used when building our bare bones OPAC. We will then use the instructions for the WordPress install to install and configure Omeka, which we might imagine is used to build a digital library for our library.

At the end of this section, we will have begun creating a library web presence that is more than a basic front-facing web presence. Instead, it will start the infrastructure for an ecosystem that provides access to all sorts of library resources.

Install WordPress

Introduction

WordPress is a free and open source content management system (CMS). Originally, its focus was on providing a platform for blogging, but it has become a general purpose CMS that can serve as a website builder.

Two main sites exist to provide access to WordPress: WordPress.com and WordPress.org. WordPress.com is a hosting solution, which means that customers can sign up and create a free WordPress site. Since its hosted, customers are only responsible for their content and not for managing the core WordPress installation and its updates. Various paid plans can extend the functionality offered to WordPress.com customers.

WordPress.org is maintained by the WordPress Foundation, which oversees the development of and provides access to the software. When we download the WordPress software, we download it from WordPress.org. Unlike the hosted solution, when we install and setup WordPress on our own servers, we become responsible for administrating its installation and for keeping the software updated.

WordPress is widely used software, and because of that, it's often the focus of attack. Take a moment to read about the developer's efforts to protect WordPress: Security. We will not need to update our WordPress installs during the course of this course, but you should be familiar with the update process in case you decide to maintain your install or an install at a future date: Updating WordPress.

Libraries and WordPress

Many libraries use WordPress as as their main website and a quick web search will reveal them. For example, I quickly found an example of a (beautiful) WordPress library site for the Reading Public Library (RPL) in Massachusetts. These library websites coordinate with additional solutions that provide integrated library systems and other electronic resource services. RPL, for instance, connects their WordPress installation, which serves as their main website page, with the open source Evergreen ILS, which serves their OPAC. Check this by clicking on RPL's Library Catalog link, and you will see that it takes you to a different URL.

Aside: it is this need to coordinate so many services across all these websites that in part drives the need to develop standards for data exchange and work flow processes. This topic is covered in my electronic resource management textbook.

Many library websites are partitioned like this. Thus, when we install WordPress soon, it is as if we are only installing the front entrance to the library. Libraries are generally like this. They have one main website (like https://libraries.uky.edu) but then connect to other sites that provide access to OPACS, discovery systems, eBook vendors, bibliographic databases, and more. This is part of the confusion around how libraries provide electronic resources. There are efforts to make all these components connect more seamlessly (e.g., through discovery systems), but if we were to model this to the walking around world, it would be like having a library that has multiple buildings, where each building provides one thing:

one building for books,
one building for journals,
another building for other journals,
another building for another set of journals,
another building for looking up where to find journals,
another building for special collections, and so on.

I digress.

You can read the announcement about RPL's WordPress launch at: Reading Public Library Launches New WordPress Site. The announcement describes how various plugins were used to offer patrons additional functionality and describes other basic changes that come with the new site. The plugins they added display business hours and help manage events and event attendees.

Plugins are often used with WordPress sites to offer all sorts of additional capabilities. Currently, there are nearly 60 thousand plugins available for WordPress, but some are of higher quality and utility than others. In addition to the thousands of available plugins, there are nearly 12 thousand free themes for WordPress sites. Plus, many businesses offer paid themes or can create customized themes based on customer needs. These themes can drastically alter the appearance and usability of a WordPress site or cater a site for a specific clientele, such as a library.

Installation

So far I have shown you how to install software using two methods:

using the apt command
downloading from GitHub

In this lesson, we are going to install WordPress by downloading the most recent version from WordPress.org and installing it manually. The WordPress application is available via the apt command, but the apt process makes it a bit more confusing than it should be, oddly.

We are going to kind of follow the documentation provided by WordPress.org. You should read through the documentation before following my instructions, but then follow the process I outline here instead because the documentation uses some different tools than we'll use.

Another reason we do this manually is because it builds on what we have learned by building our bare bones ILS. That is, the two processes are similar. In both cases, we create a specific database for our platform, we create a specific user for that database, and we provide login credentials in a specific file.

First, read through but don't follow the following instructions:

How to install WordPress

Customized Installation Process

After you have read through the WordPress.org documentation, follow the steps below to complete the manual install:

Step 1: Requirements

All major software has dependencies. For example, our bare bones OPAC depends on MySQL and PHP to provide the database (MySQL) and the glue (PHP) between our HTML and the database. The same is true for WordPress. However, since WordPress is much more complicated software than our bare bones OPAC, its dependencies are stricter. This means that when we plan to download software outside of the apt ecosystem, we need to make sure that our systems meet the requirements for our installation. The WordPress.org Requirements page states that the WordPress installation requires at least PHP version 7.4 and MySQL version 8.0 or greater. We can check that our systems meet these requirements with the following commands. To check our installed version of PHP:

php --version

To check our installed version of MySQL:

mysql --version

The output from php --version shows that my systems have PHP 8.1.2, which is greater than PHP 7.4. The output from mysql --version show that our systems have MySQL 8.0.41, which is greater than MySQL 8.0. This should be the same for you if you're running the Ubuntu 22.04.5 LTS Linux distribution. You can check that with the following command:

cat /etc/issue.net

Since the system meets the PHP and MySQL requirements, it means I can proceed.

As always, be sure to update your system with the apt commands: sudo apt update etc.

Next, we need to add some additional PHP modules to our system to let WordPress operate at full functionality. We can install these using the apt command:

sudo apt install php-curl php-xml php-imagick php-mbstring php-zip php-intl

Then restart Apache2 and MySQL:

sudo systemctl restart apache2
sudo systemctl restart mysql

Step 2: Download and Extract

The next step is to download and extract the WordPress software, which is downloaded as a zip file. Although we only download one file, when we extract it with the unzip command, the extraction will result in a new directory that contains multiple files and subdirectories. The general instructions include:

Change to the /var/www/html directory.
Download the latest version of WordPress using the wget program.
Extract the package using the unzip program.

Specifically, we do the following on the command line:

cd /var/www/html
sudo wget https://wordpress.org/latest.zip
sudo unzip latest.zip

As noted in the WordPress documentation, this will create a directory called wordpress in the same directory. Therefore the full path of your installation will located at /var/www/html/wordpress.

Step 3: Create the Database and a User

The WordPress documentation describes how to use phpMyAdmin to create the database and a user for WordPress. phpMyAdmin is a graphical front end to the MySQL relational database that you would access through the browser. We are not going to install that because I like to minimize the software that we install on servers to reduce the server's security exposure. Though you can use phpMyAdmin from a different machine and connect to the server, this is a command line class. Therefore, we are going to create the WordPress database and a database user using the same process we used to create a database and user for our bare bones ILS. You already know this, but the general instructions are:

Switch to the root Linux user
Login as the MySQL root user

Specifically, we do the following on the command line:

sudo su
mysql -u root

The mysql -u root command places us in the MySQL command prompt. The next general instructions are to:

Create a new user for the WordPress database
Be sure to replace the Xs with a strong password
Create a new database for WordPress
Grant all privileges to the new user for the new database
Examine the output
Exit the MySQL prompt

Specifically, this means the following (be sure to replaces the Xs with a unique and strong password of your own):

create user 'wordpress'@'localhost' identified by 'XXXXXXXXX';
create database wordpress;
grant all privileges on wordpress.* to 'wordpress'@'localhost';
show databases;
\q

Step 4: Set up `wp-config.php`

When we created our bare bones ILS, we created a file called login.php that contained the name of the database (e.g., opacdb), the name of the database user (e.g., opacuser), and the user's password. WordPress follows a similar process, but instead of login.php, it uses a file called wp-config.php.

Follow these general steps:

Change to the wordpress directory, if you haven't already.
Copy and rename the wp-config-sample.php file to wp-config.php.
Edit the file and add your WordPress database name, user name, and password in the fields for:
- DB_NAME,
- DB_USER, and
- DB_PASSWORD.

This means that we specifically do the following:

cd /var/www/html/wordpress
sudo cp wp-config-sample.php wp-config.php
sudo nano wp-config.php

Using nano, add your database name, user, and password in the appropriate fields, just like we did with our login.php file for our bare bones OPAC.

Additionally, we want to disable FTP uploads to the site for security reasons. To do that, navigate to the end of the file and add the following line:

define('FS_METHOD','direct');

Step 5: Optional

The WordPress files are now installed at /var/www/html/wordpress. This means that your site would be located at a URL like:

http://11.111.111.11/wordpress

If you want to, you can rename your wordpress directory to something else. The WordPress documentation uses blog as an example. But it could be something else, like the name of a fictional library that you might be using WordPress for to build a site. If you decide to change it, be sure to keep the name lowercase and one word (no spaces and only alphabetic characters). For example, if I want to change mine to library, then:

sudo mv /var/www/html/wordpress /var/www/html/library

Step 6: Change File Ownership

WordPress will need to write to files in the base directory. Assuming you are still in your base directory, run the following command, which assumes that my directory is still named /var/www/html/wordpress:

sudo chown -R www-data:www-data /var/www/html/wordpress

Step 7: Run the Install Script

The next part of the process takes place in the browser. The location (URL) that you visit in the browser depends on your specific IP address and also includes the name of the directory in /var/www/html that we extracted WordPress to or that you renamed if you followed Step 5. Thus, if my IP address is 11.111.111.11 and I renamed my directory to library, then I need to visit the following URL:

http://11.111.111.11/library/

Or if that doesn't work (though it should), try:

http://11.111.111.11/library/wp-admin/install.php

IF I kept the directory named wordpress, then this is the URL that I use:

http://11.111.111.11/wordpress/

If you changed the name of your wordpress directory, be sure to substitute that name for wordpress in the URL.

Finishing Installation

From this point forward, the steps to complete the installation are exactly the steps you follow using WordPress's documentation.

Most importantly, you should see a Welcome screen where you enter your site's information. The site Username and Password should not be the same as the username and password you used to create your WordPress database in MySQL. Rather, the username and password you enter here are for WordPress website users; i.e., those who will add content and manage the website.

Two things to note:

We have not setup Email on our servers. It's quite complicated to setup an email server correctly and securely, but it wouldn't work well without having a domain name setup anyway. So know that you probably should enter an email when setting up the user account, but it won't work.

Second, when visiting your site, your browser may throw an error. Make sure that the URL is set to http and that it's not trying to access https. Setting up an https site also generally requires a domain name, but we are not doing that here. So if there are any problems accessing your site in the browser, be sure to check that the URL starts off with http.

Conclusion

Congrats on setting up your WordPress library site. It's now time to explore and build a website. Use free themes and free plugins to alter the look of the site, its usability, and its functionality. Try to create a nice looking website. Generally, your goal for the next week is to create an attractive, yet fictional, front entrance for a library website. It's also a break from the command line!

Install Omeka

Omeka is an Open-source web publishing platforms for sharing digital collections and creating media-rich online exhibits. Most if not all of you have already used Omeka in a prior course. Here our task is not to practice information/knowledge organization, but to learn how to administer the Omeka digital library platform.

The Task

So far we have created a:

bare bones OPAC/ILS, and
downloaded, installed, and configured WordPress on our servers.

We will use the same basic process to download, install, and configure Omeka.

Instead of providing comprehensive instructions, your goal is to take what you learned from the bare bones OPAC/ILS and WordPress assignments, and apply them to the Omeka installation and setup. Below are some additional prerequisites that you should complete first. After you've completed them, move on to the General Steps section to remind yourself of the overall process.

You can do it!

Prerequisites

When we installed WordPress, we installed most of the prerequisites that Omeka needs, but there are a couple of additional things we need to do.

Some prerequisites:

Make sure your system is fully updated first: sudo apt update etc.
Check that you installed versions of PHP and MySQL meet Omeka's system requirements.
Install ImageMagick: this is a suite of utilities to work with photo files. Omeka uses ImageMagick to create thumbnail images of photos uploaded to the digital library. Visit the ImageMagick link above for more information.

sudo apt install imagemagick

Enable Apache mod_rewrite. This is an Apache module used to rewrite URLs. Omeka uses this to create user friendly URLs for items and collections in its digital libraries.

sudo a2enmod rewrite

You should be instructed to restart Apache after enabling mod_rewrite:

sudo systemctl restart apache2

General Steps

Below is a list of the general steps you need to use to install Omeka. Generally, you have already completed these steps when you created a bare bones ILS and installed WordPress. Your task is to apply what you've learned when doing those prior assignments by completing an Omeka installation on your own.

Note: let me emphasize that the process is very similar to what we have already done with our bare bones ILS and our WordPress installations. Use this handbook to remind you of the specific commands.

In short, you are going to complete the following steps:

Create a new user and a new database in MySQL for the Omeka installation (do not re-use the WordPress database, user, etc credentials or names of databases or tables).
Use wget from your server to download Omeka Classic as a Zip file and extract it in /var/www/html:
- https://github.com/omeka/Omeka/releases/download/v3.1.2/omeka-3.1.2.zip
- Unzip it with the unzip command, which you might have to install with the apt command.
- The extracted directory will be named omeka-3.1.2.
- You want to rename it simply omeka or something else of your choosing (like digital_library). Remember that names of files and directories should not have spaces in them.
In the extracted directory, find the db.ini file and add your new database credentials. Replace all values containing XXXXXX with the appropriate information. This is the same thing we did with the login.php file for our bare bones ILS and the wp-config.php file for WordPress.
Use the chown command like we did with WordPress on the files directory in the omeka directory. However, the user AND owner should be owned by www-data. NOTE: This is necessary!!!
Restart Apache2 and MySQL.
In your web browser, go to http://your-ip-address/omeka/ and complete the setup via the web form, just like you did with WordPress.

Helpful Links

Note: The user manual below is helpful, but it does not provide explicit instructions.

Be sure to download Omeka Classic and not Omeka S.

Omeka: https://omeka.org/
Omeka Classic: https://omeka.org/classic/
Omeka Classic User Manual: https://omeka.org/classic/docs/

Conclusion

The purpose of this exercise is to apply what you've learned in creating, installing, and setting up both the bare bones ILS and WordPress on your systems. The same basic logic is used in all these processes, even if the specifics vary. But all in all: Have fun. Go slow. Read the documentation. Pay attention to the details. Use this textbook to search for how we did things in prior installations.

Install the Koha ILS

Introduction

In the prior sections, we built a WordPress site that functions as our library's front-facing presence. Then we built an Omeka site that could serve as our library's digital library. In this section, we complete our library web infrastructure by installing the Koha ILS.

Koha is a free and open source library system that provides modules for patron accounts, circulation, cataloging, serials, an OPAC, and more. The process of installing and using Koha is more complicated than the processes we used to install and use WordPress and Omeka. This is because Koha, like other ILS software, is a complex project that must provide a lot of different functionality for a library and its patrons. Fortunately, the documentation makes the process pretty straightforward. We will rely on that documentation and other resources to install Koha and complete our library's interconnected web presence.

Koha ILS

Koha is an open source library management system, otherwise called an integrated library system (ILS). These systems provide modules that perform specific ranges of functionality. Koha's modules include:

Administration
Patron management
Cash management
Circulation
Cataloging
Course reserves
Serials
Acquisitions
Reports
OPAC

According to Library Technology Guides (April 2025), Koha has been installed in 4,484 libraries [around the world], spanning 6,273 facilities or branches. Most installations are in medium sized or small libraries. Koha is well represented in academic libraries, but the majority of installations are in public libraries.

Although Koha is an open source ILS and free to download, install, and administer without external support, librarians can hire companies that support open source library management solutions, like ByWater Solutions or the Equinox Open Library Initiative These companies support ILS migration, hosting, training, and more. They also provide support for other library software services, such as open source discovery systems and electronic resource management systems.

In addition to Koha, Evergreen is an open source integrated library system. According to Library Technology Guides, Evergreen is primarily installed at small and medium size public libraries, and most installations are in the U.S. and Canada.

There is currently a migration to what has been called library service platforms (LSP) in recent years. The LSP is a next generation ILS designed from the start to integrate electronic resources. For example, the ILS has an OPAC that was designed to search a library's print collections. Modern OPACs have been adapted for electronic resources, but they are still limited because of the older design model. LSPs use a discovery service instead of an OPAC. Discovery services are designed to search a library's entire collection, including the content in third party databases and journals. Example LSPs include Ex Libris Primo (used by UK Libraries), OCLC's WorldCat Discovery Service, and open source solutions like Aspen Discovery and VuFind.

The integration of library systems like the ILS and the LSP is a major aspect of library services. When we visit a library's website, we first interact with a normal website that might be built on WordPress, Drupal, or some other content management system. These websites will link to the public facing components of an ILS or LSP, as well as other services, such as bibliographic databases, journal publishers, ebook services, and more. It may therefore be the systems librarians job to help build and connect these services. In this demo, we will continue that work by installing, configuring, and setting up the Koha ILS.

Google Cloud Setup

Before we begin to install Koha, we need to create a new virtual machine instance and configure the Google firewall to allow HTTP traffic to our Koha install.

New Virtual Instance

The virtual instances we have been using do not meet the memory (RAM) needs required by the Koha integrated library system. We therefore need to create a new virtual instance that has more RAM. I will also use a bigger disk, to be sure, since Koha takes up more disk space. Check Koha System Requirements for details.

As a refresher for creating a VM, see the section titled gcloud VM Instance at Using gcloud Virtual Machines. In this lesson, we will use for the Series an E2 and set the Machine Type to 2 vCPU, 4 GB memory, and up the disk size to 20GB. Under Networking, click on Allow HTTP traffic. In the Network tags box, add the following tag name: koha-8080.

All else, including the operating system (Ubuntu 22.04), should remain the same. Note that this is a more expensive setup. Therefore, feel free to delete this instance at the end of the semester to avoid incurring extra costs.

Google Cloud firewall

Later, after we install Koha, we will need to access the staff interface on a special port for HTTP data. All internet traffic to a server contains metadata that identifies itself by port numbers. The default port for HTTP is 80, and the default port for HTTPS (encrypted) is 443. Since we do not have encryption enabled, this means we will only use port 80, but the staff interface will be identified by port 8080.

How does a server know where to send internet traffic? Internet data is packaged in many forms. One of the most common forms are TCP packets. These packets contain header information that names the source IP, destination IP, source port, and destination port. When TCP packets arrive at a destination server,the operating system inspects the packet header for the port number. The OS looks up the port number in a table that contains a mapping of ports to applications. When the OS makes the match, it sends the TCP packets to the application. In its default setup, the Apache2 web server handles traffic on port 80.

Firewalls are used to control incoming and outgoing traffic via ports. We selected Allow HTTP traffic when we created our virtual instance on Google Cloud, and we instructed the Google Console firewall to allow traffic through port 80. We need to add a firewall rule to allow web traffic through port 8080. We will use port 8080 to access the Koha staff interface.

Please take a moment to read more about ports: What is a computer port? | Ports in networking.

To create a firewall rule to allow traffic to port 8080, go to the Google Cloud Console:

Click on the hamburger icon.
Click on VPN Network.
Click on Firewall.
At the top of the page, choose Create a firewall rule (Do not choose Create a firewall policy):
- Add name: koha-opac
- Add description: Open port 8080 for the OPAC
Next to Targets, click on Specified target tags.
In Target tags, add our tag name: koha-8080.
In the Source IPv4 ranges, add 0.0.0.0/0
Click on Specified protocols and ports
- Click on TCP
- Add 8080 in the Ports box
Click on Create

Install Koha Repo

Server setup

Now let's log onto our new server and prepare it for the Koha installation.

First we need to update our local repositories:

sudo apt update

And then upgrade our servers:

sudo apt upgrade

The next two commands help save disk space. As a reminder, the apt autoremove command is used to remove packages that were automatically installed to satisfy dependencies for other packages and are now no longer needed as dependencies changed or the package(s) needing them were removed in the meantime (see man apt). The apt clean command clears out the local repository of retrieved package files (see man apt-get). In the following example, I combine both commands on one line:

sudo apt autoremove -y && sudo apt clean

Next we need to install gnupg2 and apt-transport-https. gnupg2 is used to create digital signatures, encrypt data, and aid in secure communication.

sudo apt install gnupg2 apt-transport-https

At the time of this demo, the update above downloaded a new Linux kernel. Using the new kernel requires a reboot. The reboot command will disconnect you from the server. If you need to reboot your server, use the command below, then wait a minute or so and re-connect.

sudo reboot now

Add Koha Repository

When you run the sudo apt update command, Ubuntu syncs the local repository database with several remote repositories. These remote repositories contain metadata about the packages they contain. The syncing process identifies if any new software updates are available. The remote repositories are also used to retrieve software.

We can add repositories to sync with and to use to download software, and this includes the Koha ILS. To add the special Koha repository to our system, we use the following command:

Most of the following commands require administrator access. Therefore, I will login as the root user to make it a bit easier. If you do not want to log in as the root user, be sure to use the sudo command.

sudo su

Add the Koha repository to our server:

echo 'deb http://debian.koha-community.org/koha stable main' | sudo tee /etc/apt/sources.list.d/koha.list

Now, download and install the GPG key in the trusted.gpg.d directory:

wget -qO- https://debian.koha-community.org/koha/gpg.asc | gpg --dearmor | sudo tee /etc/apt/trusted.gpg.d/koha.gpg > /dev/null

You can inspect the key and make sure it was signed by the relevant people:

gpg --show-keys /etc/apt/trusted.gpg.d/koha.gpg

The output should include an email from the koha-community.org.

Install Koha

Next we need to update/sync the new repository with the Koha remote repository. This just means that we use apt update again.

apt update

Now we view the package information for Koha:

apt show koha-common

And install it:

apt install koha-common

The above command will download and install a lot of additional software, and therefore the process will take several minutes.

Configure Koha

Next we need to edit some configuration files for Koha. First, create a backup of the default configuration file:

cd /etc/koha/
cp koha-sites.conf koha-sites.conf.backup

Now open the configuration file in nano or your preferred text editor:

nano /etc/koha/koha-sites.conf

In the koha-sites.conf file, change the line that contains the following information:

INTRAPORT="80"

To:

INTRAPORT="8080"

Next install and setup mysql-server:

apt install mysql-server

Next we set the root MySQL password. Replace Xs with your password:

mysqladmin -u root password XXXXXXXX

When we installed Koha, the Apache2 web server was installed with it as a prerequisite. We need to enable URL rewriting and CGI functionality.

a2enmod rewrite
a2enmod cgi

Now we need to restart Apache2 in the normal way:

systemctl restart apache2

Next we create a database for Koha:

koha-create --create-db bibliolib

We need to tell Apache2 to listen on port 8080:

nano /etc/apache2/ports.conf

And under the Listen 80 line, add:

Listen 8080

Make sure Apache configuration changes are valid:

apachectl configtest

If you get an error message, trace the error in the file and line listed.

Let's restart Apache2.

systemctl restart apache2

We'll disable the default Apache2 setup, enable traffic compression using deflate, enable the bibliolib site, and then reload Apache2's configurations and restart again:

a2dissite 000-default
a2enmod deflate
a2ensite bibliolib
systemctl reload apache2
systemctl restart apache2

Koha Web Installer

All the back end work is complete, and like we did with WordPress and Omeka, we can complete the installation through a web installer.

First, get Koha username and password in the following file:

nano /etc/koha/sites/bibliolib/koha-conf.xml

Look for the <config> stanza (line number 252) and the line beginning with <user> (line number 257). The password is on the line after (line number 258). You will need this info to login to the Koha web interface.

Make sure your URL begins with http and not https, and visit the web installer at:

http://IP-ADDRESS:8080

The documentation for the web installer is helpful. Enter the username and password from the koha-conf.xml file. Note that it might take a while to step through the installer.

One thing to do is to add sample libraries and sample patrons during the install. More generally, be sure to follow instructions as you click through each step. Add lots of samples to play around with after the install completes.

When you are on the last page of the install, create an Administrator identity, and be sure to save this information.

Introduction to the Koha installation process

Public OPAC

When the install and setup are complete, you can login with your admin credentials, and you will have access to the staff interface. To view the public facing OPAC, you need to make a setting change. In Koha:

Click on More in the top drop down box
Click on Administration
Click on System Preferences
Click on OPAC in the left hand side bar
Scroll down to the OPACBaseURL line.
Enter the IP address of your server: http://IP-ADDRESS
Click on Save all OPAC Preferences

Once you save these preferences, you should be able to visit your public facing OPAC at the server IP address.

Additional Tasks

Once you've installed and setup Koha, begin to learn the system. Try the following:

Create patron accounts
Create bibliographic records
Check out books to patrons
Delete patron circulation history

Conclusion

In this final section, you learned how to install and setup a Koha ILS installation on a Linux server. Your next step is to use your WordPress install to finalize your public facing library website. This website should include links to your Omeka-based digital library and to your Koha-based OPAC.

Congratulations!

References

Helpful documentation and demos:

Conclusion

Hopefully this textbook helped you build a strong foundation in server-side systems administration and web development for libraries. The skills covered here are applicable to all library types, including academic, public, school, special, and more.

To summarize, we began by learning how to navigate the Google Cloud console and use it to set up an Ubuntu Linux server. From there, we developed fluency with the Linux command line and explored tools like command line text editors, search utilities like grep, and package managers for updating, installing, and managing software.

We gained experience using Git and GitHub for version control and documentation. Then we moved into hands-on projects by building a basic LAMP server and creating a bare bones library system. This led us to develop more complicated library systems using three major platforms—WordPress, Omeka, and Koha—and connecting them to form a unified web presence that mirrors library practice.

As you wrap up your work, don't forget to stop and delete all of your virtual machines in your Google Cloud project to avoid ongoing billing. But I hope you continue to explore how to use virtual machines on Google Cloud or other cloud hosting solutions for library purposes.

&mdashSean Burns