Linux Systems Administration

Author: C. Sean Burns
Date: 2024-08-23
Email: sean.burns@uky.edu
Website: cseanburns.net
GitHub: @cseanburns

Introduction

This book was written for my Linux Systems Administration course. The book and course's goals are to provide students with the skills to use Linux for systems administration, and to teach students:

how to use the command line in order to become more efficient computer users and more comfortable with using computers in general;
how to use command line utilities and programs and to learn what can be accomplished using those programs;
how to administer users and manage software on a Linux server;
how to secure a Linux server; and
the basics of cloud computing;

And finally, this book/course ends on walking students through the process of building a LAMP stack.

All sections of this book will be accompanied by a video demonstrating the practices described in those sections. Your are highly encouraged to read through the text first and then watch the video. Revisit the text to help cement the ideas in place and to work through tasks.

Markup

There are two markup styles that I want to bring to your attention:

Code Blocks

Text that looks like code blocks indicate some kind of command or series of commands. Do not simply copy and paste these commands into your terminal. You can cause errors if you copy and paste multiple lines of code into your terminal. Rather, you should type these commands out. For example, you might copy and paste the following command and be okay in doing so:

ls ~

But copying and pasting multiple lines can cause problems. Here's an example of a series of commands in a code block that can cause problems if you copy and paste them into your terminal:

cd /important/directory
rm -rf *
echo "All files in /important/directory have been deleted."

In the above example, a mistake in copying and/or pasting the cd /important/directory command will result in the deletion of other directories and their files. It's therefore important to understand before executing some code. Typing out the code and seeing the results printed to the terminal will help foster that understanding.

Asides

I occasionally insert asides into the text. These asides generally contain notes or extra comments about the main content. Asides look like this:

This is an aside. Asides will contain extra information, notes, or comments.

Theme

At the top of the page is an icon of a paint brush. The default theme is darker text on a light background, but you can change the theme per your preferences.

Search

Next to the paintbrush is an icon of a magnifying glass. Use this to search this work.

Printing

I intend this book to be a live document, and therefore it'll be regularly updated. But feel free to print it, if you want. You can use the print function to save the entire book as a PDF file. See the printer icon at the top right of the screen to get started.

About This Book

This book works as a live document since I use it for my fall semester Linux Systems Administration course. I will update the content as I teach it in order to address changes in the technology and to edit for clarity.

This book is not a comprehensive introduction to the Linux operating system nor to systems administration. It is designed for an entry level course on these topics. It is focused on a select and small range of those topics that have the specific pedagogical aims described above.

The content in this book is open access and licensed under the GNU GPL v3.0. Feel free to fork this work on GitHub and modify it for your own needs.

I use mdBook to build markdown source files into this final output.

History of Unix and Linux

An outline of the history of Unix and Linux.

Location: Bell Labs, part of AT&T (New Jersey), late 1960s through early 1970s

In the late 1960s through the early 1970s at Bell Labs, part of AT&T in New Jersey, the journey began with an operating system called Multics. Multics was a pioneering time-sharing system, allowing more than one person to use it at once. Despite its innovative approach, Multics was fraught with issues and was slowly abandoned. In the midst of this abandonment, Ken Thompson stumbled upon an old PDP-7 and started writing what would become UNIX. During this time, he created the ed line editor, pronounced e.d., but generally sounded out. This specific version of UNIX would later be known as Research Unix. The project caught the attention of Dennis Ritchie, the creator of the C programming language, who joined Thompson's efforts, and together they laid the groundwork for a revolution in computing.

Location: Berkeley, CA (University of California, Berkeley), early to mid 1970s

In the early to mid-1970s at the University of California, Berkeley, the evolution of UNIX continued. While not classified as 'free software,' UNIX's code was low-cost and easily shared among tech enthusiasts. Ken Thompson visited Berkeley, where he helped install Version 6 of UNIX, marking a significant moment in the system's history. At Berkeley, several contributors, including Bill Joy, played vital roles in its development. Joy was particularly influential, creating the vi text editor, a descendant of the popular Vim editor, and many other essential programs. He also co-founded Sun Microsystems. This installation and collaborative effort at Berkeley eventually led to the creation of the Berkeley Software Distribution, or BSD, a landmark in the history of UNIX and computing as a whole.

AT&T

Until its breakup in 1984, AT&T operated under a unique agreement with the U.S. government that restricted the company from profiting off patents not directly related to its telecommunications businesses. This arrangement helped shield AT&T from monopolistic charges, but it also came with a significant limitation: they could not commercialize UNIX. The landscape changed dramatically after the breakup of AT&T. The constraints lifted, allowing System V UNIX to emerge as the standard bearer of commercial UNIX. This transition marked a turning point in the history of computing, positioning UNIX as a central player in the commercial technology market.

Location: Boston, MA (MIT), early 1980s through early 1990s

In Boston, MA, at MIT during the early 1980s through the early 1990s, a significant shift in the software industry was taking place. In the late 1970s, Richard Stallman observed the growing trend of software becoming commercialized. This commercialization led to hardware vendors ceasing to share the code they developed to make their hardware work. This paradigm change was further solidified by the Copyright Act of 1976, making software code eligible for copyright protection. Stallman, who thrived in a hacker culture, began to battle against this new direction. He responded by creating the GNU project, embracing the free software philosophy, and developing influential tools such as GNU Emacs, a popular text editor, and many other programs. The GNU project was an ambitious attempt to create a completely free software operating system that was Unix-like, called GNU. By the early 1990s, Stallman and others had developed all the utilities needed for a full operating system, except for a kernel, which they named GNU Hurd. This encompassing project included the creation of the Bash shell, written by Brian Fox, reflecting a profound commitment to free and open software.

The GNU philosophy includes several propositions that define free software:

The four freedoms, per GNU Project: 0. The freedom to run the program as you wish, for any purpose (freedom 0).

The freedom to study how the program works, and change it so it does your computing as you wish (freedom 1). Access to the source code is a precondition for this.

The freedom to redistribute copies so you can help others (freedom 2).

The freedom to distribute copies of your modified versions to others (freedom 3). By doing this you can give the whole community a chance to benefit from your changes. Access to the source code is a precondition for this.

The Four Freedoms

The Unix wars and the lawsuit, late 1980s through the early 1990s

During the late 1980s through the early 1990s, the so-called "Unix wars" and an ensuing lawsuit marked a contentious period in the history of computing. Following its breakup, AT&T began to commercialize Unix, leading to distinct differences between AT&T Unix and BSD Unix. The former was aimed at commercial markets, while the latter was targeted at researchers and academics. These contrasting objectives led to legal friction, culminating in UNIX Systems Laboratories, Inc. (USL, part of AT&T) suing Berkeley Software Design, Inc. (BSDi, part of the University of California, Berkeley) for copyright and trademark violations. Ultimately, USL lost the case, but not before the lawsuit had created significant obstacles for BSD Unix. The legal battle delayed the adoption of BSD Unix, leaving a lasting impact on the development and dissemination of Unix systems.

Linux, Linus Torvalds, University of Helsinki, Finland, early 1990s

In the early 1990s at the University of Helsinki in Finland, a significant development in the world of computing unfolded. On August 25, 1991, Linus Torvalds announced that he had started working on a free operating system kernel specifically for the 386 CPU architecture and his hardware. This kernel would later be famously named Linux. It's essential to understand that Linux technically refers only to the kernel, which handles startup, devices, memory, resources, and more, but does not provide user land utilities—the kind of software that people use on their computers.

Torvalds' motivation for this project was both to learn about OS development and to have access to a Unix-like system. He already had access to an Unix-like system called MINIX, but MINIX was limited by technical and copyright restrictions. Interestingly, Torvalds has stated that if a BSD or GNU Hurd operating system were available at that time, he might not have created the Linux kernel at all. However, he and others took the GNU utilities and created what is now widely referred to as Linux or GNU/Linux. This amalgamation of Torvalds' kernel and GNU utilities marked a critical point in the evolution of free and open-source software, fostering a global community of developers and users.

Distributions, early 1990s through today

Soon after the development of Linux in the early 1990s, a trend began to emerge that continues to this day. Enthusiasts and developers started creating their own Linux and GNU-based operating systems, customizing them to suit various needs and preferences. They would then distribute these customized versions to others, sharing their innovations and insights with a wider community. As a result of this practice, these Linux operating systems became known as "distributions." This phenomenon has led to a rich ecosystem of Linux distributions, catering to different user bases, industries, and interests, and has played a central role in the continued growth and diversification of open-source computing.

The two oldest distributions that are still in active development include:

Short History of BSD, 1970s through today

The history of Berkeley Software Distribution (BSD) spans from the 1970s to today and is closely intertwined with the evolution of Unix. Early Unix version numbers 1-6 eventually led to the development of BSD versions 1-4. By the time of BSD 4.3, all versions still contained some AT&T code. A desire to remove this proprietary code led to the creation of BSD Net/1.

The effort continued until all AT&T code was successfully removed by BSD Net/2. This version was then ported to the Intel 386 processor, resulting in 386BSD, made available in 1992, a year after the Linux kernel was released.

386BSD eventually split into two distinct projects: NetBSD and FreeBSD. Later, NetBSD itself split into another project, giving rise to OpenBSD. All three of these BSDs are still in active development today, and each has a unique focus:

NetBSD is known for its focus on portability, finding applications in various environments such as MacOS and even NASA projects.
FreeBSD is recognized for its wide applicability and has been utilized by notable companies and products like WhatsApp, Netflix, PlayStation 4, and MacOS.
OpenBSD emphasizes security and has contributed several essential applications in this domain.

This intricate journey of BSD, marked by splits, adaptations, and varied focuses, has cemented its place in the history of operating systems, allowing it to cater to a wide range of applications and audiences.

MacOS is based on Darwin, is technically UNIX, and is partly based on FreeBSD with some code coming from the other BSDs. See Why is macOS often referred to as 'Darwin'? for a short history.

Short History of GNU, 1980s through today

The history of GNU, particularly the GNU Hurd kernel, traces back to the 1980s and continues to evolve today. The GNU Hurd, despite its long development process, remains in a pre-production state. The latest release of this kernel was version 0.9, which came out in December 2016. Even though it has not yet reached full maturity, a complete operating system based on the GNU Hurd can be downloaded and run. For example, Debian GNU/Hurd represents one such implementation. This ongoing work on the GNU Hurd exemplifies the free and open-source community's commitment to innovation and collaboration, maintaining a spirit of exploration that has driven the software landscape for decades.

Free and Open Source Licenses

In the free software and open source landscape, there are several important free and/or open source licenses that are used. The two biggest software licenses are based on the software used by GNU/Linux and the software based on the BSDs. They each take very different approaches to free and/or open source software. The biggest difference is this:

Software based on software licensed under the GPL must also be licensed under the GPL. This is referred to as copyleft software, and the idea is to propagate free software.
- GNU General Public License (GPL)
Software based on software licensed under the BSD license may be closed source and primarily must only attribute the original source code and author.
- BSD License

What is Linux?

The Linux Kernel

Technically, Linux is a kernel, and a kernel is a part of an operating system that oversees CPU activity like multitasking, as well as networking, memory management, device management, file systems, and more. The kernel alone does not make an operating system. It needs user land applications and programs, the kind we use on a daily basis, to form a whole, as well as ways for these user land utilities to interact with the kernel.

Linux and GNU

The earliest versions of the Linux kernel were combined with tools, utilities, and programs from the GNU project to form a complete operating system, without necessarily a graphical user interface. This association continues to this day. Additional non-GNU, but free and open source programs under different licenses, have been added to form a more functional and user friendly system. However, since the Linux kernel needs user land applications to form an operating system, and since user land applications from GNU cannot work without a kernel, some argue that the operating system should be called GNU/Linux and not just Linux. This has not gained wide acceptance, though. Regardless, credit is due to both camps for their contribution, as well as many others who have made substantial contributions to the operating system.

Linux Uses

We are using Linux as a server in this course, which means we will use Linux to provide various services. Our first focus is to learn to use Linux itself, but by the end of the course, we will also learn how to provide web and database services. Linux can be used to provide other services that we won't cover in this course, such as:

file servers
mail servers
print servers
game servers
computing servers

Although it's a small overall percentage, many people use Linux as their main desktop/laptop operating system. I belong in this camp. Linux has been my main OS since the early 2000s. While our work on the Linux server means that we will almost entirely work on the command line, this does not mean that my Linux desktop environment is all command line. In fact, there are many graphical user environments, often called desktop environments, available to Linux users. Since I'm currently using the Ubuntu Desktop distribution, my default desktop environment is called Gnome. KDE is another popular desktop environment, but there are many other attractive and useful ones. And it's easy to install and switch between multiple ones on the same OS.

Linux has become quite a pervasive operating system. Linux powers the hundreds of the fastest supercomputers in the world. It, or other Unix-like operating systems, are the foundation of most web servers. The Linux kernel also forms the basis of the Android operating system and of Chrome OS. The only place where Linux does not dominate is in the desktop/laptop space.

What is Systems Administration?

Introduction

What is systems administration or who is a systems administrator (or sysadmin)? Let's start off with some definitions provided by the National Institute of Standards and Technology:

An individual, group, or organization responsible for setting up and maintaining a system or specific system elements, implements approved secure baseline configurations, incorporates secure configuration settings for IT products, and conducts/assists with configuration monitoring activities as needed.

Or:

Individual or group responsible for overseeing the day-to-day operability of a computer system or network. This position normally carries special privileges including access to the protection state and software of a system.

See: Systems Administrator @NIST

Specialized Positions

In addition to the above definitions, which broadly define the role, there are a number of related or specialized positions. We'll touch on the first three in this course:

Web server administrator:
- "web server administrators are system architects responsible for the overall design, implementation, and maintenance of Web servers. They may or may not be responsible for Web content, which is traditionally the responsibility of the Webmaster (Web Server Administrator" @NIST).
Database administrator:
- like web admins, and to paraphrase above, database administrators are system architects responsible for the overall design, implementation, and maintenance of database management systems.
Network administrator:
- "a person who manages a network within an organization. Responsibilities include network security, installing new applications, distributing software upgrades, monitoring daily activity, enforcing licensing agreements, developing a storage management program, and providing for routine backups" (Network Administrator @NIST).
Mail server administrator:
- "mail server administrators are system architects responsible for the overall design and implementation of mail servers" (Mail Server Administrators @NIST).

Depending on where a system administrator works, they may specialize in any of the above administrative areas, or if they work for a small organization, all of the above duties may be rolled into one position. Some of the positions have evolved quite a bit over the last couple of decades. For example, it wasn't too long ago when organizations would operate their own mail servers, but this has largely been outsourced to third-party providers, such as Google (via Gmail) and Microsoft (via Outlook). People are still needed to work with these third-party email providers, but the nature of the work is different than operating independent mail servers.

Certifications

It's not always necessary to get certified as a systems administrator to get work as one, but there might be cases where it is necessary; for example, in government positions or in large corporations. It also might be the case that you can get work as an entry level systems administrator and then pursue certification with the support of your organization.

Some common starting certifications are:

Plus, Google offers, via Coursera, a beginners Google IT Support Professional Certificate that may be helpful.

Associations

Getting involved in associations and related organizations is a great way to learn and to connect with others in the field. Here are few ways to connect.

LOPSA, or The League of Professional System Administrators, is a non-profit association that seeks to advance the field and membership is free for students.

ACM, or the Association for Computing Machinery, has a number of relevant special interest groups (SIGs) that might be beneficial to systems administrators.

NPA, or the Network Professional Association, is an organization that "supports IT/Network professionals."

Codes of Ethics

Systems administrators manage computer systems that contain a lot of data about us and this raises privacy and competency issues, which is why some have created code of ethics statements. Both LOPSA and NPA have created such statements that are well worth reviewing and discussing.

LOPSA: Code of Ethics
NPA: Code of Ethics

Keeping Up

Technology changes fast. In fact, even though I teach this course about every year, I need to revise the course each time, sometimes substantially, to reflect changes that have developed over short periods of time. It's also your responsibility, as sysadmins, to keep up, too.

I therefore suggest that you continue your education by reading and practicing. For example, there are lots of books on systems administration. O'Reilly continually publishes on the topic. RedHat, the makers of the Red Hat Linux distribution, and sponsors of Fedora Linux and CentOS Linux, provides the Enable Sysadmin site, with new articles each day, authored by systems administrators, on the field. Opensource.com, also supported by Red Hat, publishes articles on systems administration. Command Line Heroes is a fun and informative podcast on technology and sysadmin related topics. Linux Journal publishes great articles on Linux related topics.

Conclusion

In this section I provided definitions of systems administrators and also the related or more specialized positions, such as database administrator, network administrator, and others.

I provided links to various certifications you might pursue as a systems administrator, and links to associations that might benefit you and your career.

Technology manages so much of our daily lives, and computer systems store lots of data about us. Since systems administrators manage these systems, they hold a great amount of responsibility to protect them and our data. Therefore, I provided links to two code of ethics statements that we will discuss.

It's also important to keep up with the technology, which changes fast. The work of a systems administrator is much different today than it was ten or twenty years ago, and that surely indicates that it could be much different in another ten to twenty years. If we don't keep up, we won't be of much use to the people we serve.

Project Management

In this section, you will learn how to use Google Cloud (gcloud) to install and manage Linux servers as virtual machines. We will specifically use gcloud to create virtual instances of the Ubuntu Server Linux operating system.

This section will also introduce you to Git and GitHub. You will use these technologies to download software but more importantly to document your work.

Using gcloud for Virtual Machines

After this section, you will be able to achieve the following outcomes:

Understand the concept of Virtual Machines (VMs): Learn what a VM is and how it operates on a host operating system.
Set up a Google Cloud project using gcloud: Create a Google Cloud project and enable billing for it.
Create a Virtual Machine instance: Configure and deploy a VM on Google Cloud using the Ubuntu 20.04 LTS operating system.
Install and initialize the gcloud CLI: Set up the gcloud command interface on your local machine for managing cloud resources.
Connect to the VM using SSH: Establish a secure connection to your VM via the command line from your local machine.
Update and manage the VM: Perform essential updates on the VM and create snapshots for backup and recovery.

Virtual Machines

Our goal in this section is to create a virtual machine (VM) instance running a distribution of the Linux operating system. A VM is basically a virtualized operating system that runs on a host operating system. That host operating system may also be Linux, but it could be Windows or macOS. In short, when we use virtual machines, it means instead of installing an operating system ,like Linux, macOS, Windows, etc, on a physical machine, we use virtual machine software to mimic the process. The virtual machine thus runs on top of our main OS. It's like an app, where the app is a fully functioning operating system.

In this book, we're going to use gcloud (via Google) to create and run our virtual machines. There are other options available that you can explore on your own.

Google Cloud / gcloud

Google Account

To create our virtual machines using gcloud, we need a Google account. I imagine you already have a Google account, but if not, then create one at https://www.google.com.

Google Cloud (gcloud) Project

In the following, we will use gcloud to create a Google Cloud project. Once you've created that project, we can enable billing for that project, and then install the gcloud software on personal machines.

To begin, you will need to visit the gcloud documentation that I've inserted throughout this page. Closely follow those instructions. I help explain the steps throughout this page. However, it is imperative that you read through the instructions closely.

First, follow Step 1 at the top of the Install the gcloud CLI page to create a new project. Also, review the page on creating and managing projects.

When you create your project, you can name it anything, but try to name it something to do with this project. E.g., I might use the name sysadmin-418. Avoid using spaces when naming your project.

Click on the Create button, and leave the organization field set to No Organization.

Google Billing

Second, set up a billing account for your gcloud project. This means there is a cost associated with this product, but the machines we'll build require few resources and the cost should be minimal. In the past, I usually pay about $1 per month. Follow Step 2 to enable billing for the new project. See also the page on how to create, modify, or close your self-serve Cloud Billing account

gcloud VM Instance

Next, log into Google Cloud Console. This should take you to the Dashboard page.

Our first goal is to create a virtual machine (VM) instance. As a reminder, a VM is a virtualized operating system. We will use software to mimic the process of installing an operating system on Google's servers.

Google Cloud offers a number of Linux-based operating systems to create VMs. We're going to use the Ubuntu operating system and specifically the Ubuntu 20.04 LTS version.

What is Ubuntu? Ubuntu is a Linux distribution. A new version of Ubuntu is released every six months. The 20.04 signifies that this is the April 2020 version. LTS signifies Long Term Support. LTS versions are released every two years, and Canonical LTD, the owners of Ubuntu, provide standard support for LTS versions for five years. Thus, Ubuntu 20.04 is supported till April 2025.

LTS versions of Ubuntu are more stable. Non-LTS versions of Ubuntu only receive nine months of standard support, and generally use cutting edge technology, which is not always desirable for server operating systems. Each version of Ubuntu has a code name. 20.04 has the code name Focal Fossa. You can see a list of versions, code names, release dates, and more on Ubuntu's Releases page.

We will create our VM using the Google Cloud dashboard. To do so, follow these steps:

Click the three horizontal bars at the top left of the screen.
Hover over the Compute Engine link, and then select VM Instances.
In the window, select the project that you created earlier.
- E.g., for me, I used the project name sysadmin-418.
Next, click on Create Instance.
Change the name for your instance.
- E.g., I chose fall-2024 (no spaces)
Under the Machine Configuration section, make sure E2 (Low cost, day-to-day computing) is selected.
Under the Machine type drop down box, select e2-micro (0.25-2 vCPU (1 shared core), 1 GB memory)
- This is the lowest cost virtual machine and perfect for our needs.
Under Boot disk, click on the Change button.
In the window, select Ubuntu from the Operating system drop down box.
Select Ubuntu 20.04 LTS x86/64
Leave Boot disk type be set to Balanced persistent disk
Disk size should be set to 10 GB.
Click on the Select button.
Check the Allow HTTP Traffic button
Finally, click on the Create button to create your VM instance.

Install the latest gcloud CLI version

The instructions above will be the same for anyone, but the following instructions will differ depending on the operating system for your local machine.

The next step is to install gcloud on your local machines. This will allow us to connect to remote server using our own terminal applications. The Install the gcloud CLI page provides instructions for different operating systems.

There are installation instructions for macOS, Windows, Chromebooks, and various Linux distributions. Follow these instructions closely for the operating system that you're using.

Note that for macOS, you have to choose among three different CPU/chip architectures. If you have an older macOS machine (before November 2020 or so), it's likely that you'll select macOS 64-bit (x86_64). If you have a newer macOS machine, then it's likely you'll have to select macOS 64-bit (arm64, Apple M1 silicon). It's unlikely that any of you are using a 32-bit macOS operating system. If you're not sure which macOS system you have, then let me know and I can help you determine the appropriate platform. Alternatively, follow these instructions to find your processor information:

click on the Apple menu

choose About This Mac

locate the Processor or Chip information

After you have downloaded the gcloud CLI for your particular OS and CPU architecture, you will need to open a command prompt/terminal on your machines to complete the instructions that describe how to install the gcloud CLI. macOS uses the Terminal app, which can be located using Spotlight. Windows users can use Powershell.

Windows Users

Windows users will download a regular .exe file, and launch the installer in the regular Windows way. Please follow the rest of the instructions for Windows.

macOS Users

macOS users may need to complete some setup work before installing Google Cloud. First, open your Terminal.app and run the following code:

xcode-select --install

Once the Xcode developer tools are installed, you need to install the macOS Homebrew package manager. To do so, follow the instructions here:

Homebrew

After Homebrew is installed use the brew command to install pyenv.

brew install pyenv

And then use pyenv to install the latest version of Python. For example, to install the latest release of Python (as of August 2024):

penv install 3.12.5

Finally, you can install the Google Cloud application using the steps outlined below. Or you can use the steps outlined in the Google Cloud Interactive installation.

macOS gcloud Install

macOS users will download a .tar.gz file and extract the .tar.gz using the tar command. The .tar.gz file may have been downloaded to your Downloads folder. First, in your Terminal.app, move that file to your home directory and extract it there. Once extracted, change to home directory with the cd command. For example, if you are running macOS and downloaded the X86_64 version of the gcloud CLI, then open your Terminal.app and run the following commands:

mv ~/Downloads/google-cloud-cli-darwin-x86_64.tar.gz $HOME
cd $HOME
tar -xzf google-cloud-cli-444.0.0-darwin-x86_64.tar.gz 
cd google-cloud-sdk

Modify the file names in the commands above, as appropriate, if you're using the M1 version of the gcloud CLI.

Initializing the gcloud CLI

Regardless if you're using macOS or Windows, you will now initialize your Google Cloud installation the same way. First, scroll down the install page to the section titled Initializing the gcloud CLI. In your terminal, run the initialization command. Per the instructions at the above page, it should be something like so:

gcloud init

And continue to follow the instructions in the documentation.

Connect to our VM

After the new VM machine has been created, you connect to it via the command line. macOS users will connect to it via their Terminal.app. Windows users can connect to it via their command prompt or use PuTTY.

If you have used ssh before, note that we use a slightly different ssh command to connect to our VMs. The syntax follows this pattern:

gcloud compute ssh --zone "zone-info" "name-info" --project "project-id"

You need to replace the values in the double quotes in the above command with the values located in your Google Cloud console and in your VM instances section.

You can select the SSH drop down box to copy the exact gcloud command to connect to your server.

Update our Ubuntu VM

The above command will connect you to the remote Linux virtual machine server. The VM will include a recently updated version of Ubuntu 20.04, but it may not be completely updated. Thus the first thing we need to do is update our machines. Since this is an Ubuntu machine, use the following two commands to update your machines:

sudo apt update
sudo apt -y upgrade

Then reboot the server with the following command:

sudo reboot

You do not always have to reboot when updating your server. It is only necessary when there has been a kernel update. I'm assuming that when we update these machines for the first time after installation, that there will have been a kernel update.

If you reboot your server, you will automatically be disconnected from it. If you do not need to reboot your server, then type exit to logout and quit the connection to the remote server.

exit

Typing exit at the prompt will always close our connection to our remote servers.

Snapshots

You have installed a pristine version of Ubuntu, but mistakes will happen while learning how to use your machines. If you want, you can backup this pristine version of the operating system. This will allow you to restore the server if something goes wrong later.

NOTE: It's pretty easy to create a new VM instance. Therefore, it's okay to create snapshots later when you are working on your projects. This will reduce costs until needed.

To get started:

In the left hand navigation panel, click Compute Engine and then Snapshots.
At the top of the page, click on Create Snapshot.
Provide a name for your snapshot: e.g., ubuntu-1.
Provide a description of your snapshot: e.g., This is a new install of Ubuntu 20.04.
Choose your Source disk.
Choose a Location to store your snapshot.
- To avoid extra charges, choose Regional.
- From the drop down box, select the same location (zone-info) your VM has
Click on Create

Please monitor your billing for this to avoid costs that you do not want to incur.

Conclusion

Congrats! You have successfully completed your first Linux server installation using gcloud. In this section, you learned about virtual machines (VMs) and created one using Google Cloud. Specifically, you created a Ubuntu 20.04 LTS Linux virtual machine. While this may seem like a lot of information, by the end of this book you will be able to effortlessly deploy and manage VMs for various tasks. Such tasks may include setting up websites and more.

Using Git and GitHub for Documentation

Managing and documenting complex systems is a challenge. Effective documentation is a cornerstone of operational efficiency and reliability. This lecture explores how Git and GitHub, traditionally seen as tools for software developers, is equally valuable for systems administrators for documenting various processes and configurations.

Documentation in systems administration encompasses a broad spectrum: from configuration files and scripts to server deployment guides and maintenance procedures. It serves as a crucial reference point for teams, ensuring consistency in system management and easing the onboarding process for new administrators.

Our focus on Git and GitHub arises from their robustness and versatility. Git, a distributed version control system, excels in tracking changes and managing versions of any set of files, be it code or text documents. GitHub, building upon Git, provides a user-friendly interface and additional features like issue tracking and collaborative tools, making it a suitable platform for hosting and sharing documentation.

In this lecture, we will delve into how these tools can be used effectively for maintaining and enhancing documentation in a systems administration context. This will only be an intro to Git and GitHub, but key areas of discussion include:

Utilizing Git for version control of configuration files and scripts.
Leveraging GitHub for centralized documentation storage and access.
Employing best practices for documenting system configurations.

Using Git and GitHub will allow you to gain an understanding of the applications of Git and GitHub. This knowledge will help streamline your documentation processes and improve the overall efficiency and communication within your teams.

Creating a GitHub Account

Creating a GitHub account is a straightforward process. Here's how you can set up your own GitHub account:

Visit the GitHub Website: Start by navigating to GitHub's website.
Sign Up: Click on the “Sign Up” button usually located at the top right corner of the page.
Enter Your Details: You will be prompted to enter some basic information:
- Username: Choose a unique username that will be your identity on GitHub. It’s important to select a name that reflects your personal or professional identity as it will be visible publicly.
- Email Address: Provide a valid email address. This will be used for account verification and communication.
- Password: Create a strong password. It’s recommended to use a mix of letters, numbers, and symbols for better security.
Verify Your Account: Complete the CAPTCHA verification to confirm that you’re not a robot.
Choose a Plan: GitHub offers various plans, including a free option that is quite robust for most individual users. Select the plan that best suits your needs.
Tailor Your Experience (Optional): GitHub may ask a series of questions to customize your experience, like your programming experience and intended use of GitHub. These can be skipped if preferred.
Verify Your Email Address: After completing the sign-up process, GitHub will send a verification email to the address you provided. Click on the verification link in that email to activate your account.
Start Exploring: Once your account is verified, you’re ready to explore GitHub. You can start by creating repositories, collaborating on projects, or exploring other users’ repositories.

Tips for New Users

Profile Information: After creating your account, consider adding more details to your profile, like a profile picture and bio, to make it more personable.
Security: Set up two-factor authentication for added security.
Learning Resources: GitHub has a wealth of tutorials and guides to help you get started. Utilize these to familiarize yourself with GitHub’s features and best practices.

Understanding File Naming Conventions

File naming is a fundamental aspect of organizing and maintaining a clear and efficient documentation system, especially when using platforms like GitHub. Adhering to a consistent naming convention is crucial for several reasons:

Clarity and Accessibility: Well-named files are easier to identify and understand at a glance, saving time and reducing confusion.
System Compatibility: Certain characters in file names can cause issues in different operating systems. Avoiding spaces and special characters ensures broader compatibility.
Ease of Navigation: Consistent naming aids in navigating through files, especially in a large repository.
Version Control: Clear naming helps in tracking changes and managing versions more effectively.

When naming files for GitHub, it's best to follow these guidelines:

Use single words or combine words using camelCase, underscores (_), or hyphens (-). For example, ServerSetupGuide.md or server-setup-guide.md.
Avoid spaces, as they can cause issues in URLs and command-line operations.
Steer clear of special characters like !, $, #, %, etc., as they might have specific functions in certain environments or scripts.

The Importance of `.md` Extension for Markdown Files

When it comes to Markdown files, adding the .md extension is vital for several reasons:

GitHub Rendering: GitHub automatically renders files with a .md extension as formatted Markdown. This means your documentation will be displayed with the intended formatting (like headers, lists, links, etc.) directly in the repository view.
Editor Support: Most code editors recognize .md files and provide appropriate syntax highlighting and formatting previews, making editing more efficient.
Consistency and Recognition: Using the .md extension helps users and systems alike to quickly identify the file type and its intended use.

For instance, naming a file InstallationGuide.md ensures that GitHub will render it as a Markdown document, displaying all formatting correctly in the browser. This enhances readability and makes the documentation more user-friendly.

In summary, adhering to clear file naming conventions and correctly using the .md extension for Markdown files are essential practices for effective documentation on GitHub. These practices facilitate better organization, compatibility, and usability of your documentation, contributing to an overall smoother experience for both maintainers and users.

Basic Markdown Tutorial

Markdown is a lightweight markup language designed to be easy to read and write. It's widely used for formatting files on platforms like GitHub, in forums, and for creating web content. Here's a quick guide to the most commonly used Markdown syntax:

Headings

Headings are created using the # symbol before your text. The number of # symbols indicates the level of the heading:

# Heading 1
## Heading 2
### Heading 3
#### Heading 4
##### Heading 5
###### Heading 6

Emphasis

Bold: To make text bold, wrap it in double asterisks or double underscores. For example, **bold** or __bold__.
Italic: To italicize text, wrap it in single asterisks or single underscores. For example, *italic* or _italic_.

Lists

Unordered Lists: Use asterisks, plus signs, or hyphens to create bullet points.
```
* Item 1
* Item 2
  * Subitem 2.1
  * Subitem 2.2
```
Ordered Lists: Use numbers followed by periods for an ordered list.
```
1. First item
2. Second item
   1. Subitem 2.1
   2. Subitem 2.2
```

Links and Images

Links: To create a link, wrap the link text in brackets [ ], and then wrap the URL in parentheses ( ).
- For example, [GitHub](https://github.com).
Images: Similar to links, but start with an exclamation mark, followed by the alt text in brackets, and the URL in parentheses.
- For example, ![Alt text](image-url.jpg).

Code

Inline Code: For small bits of code, use backticks to wrap your code. For example, `code`.
Code Blocks: For larger sections of code, use three backticks or indent with four spaces:

```language your code here ```

Replace language with the programming language for syntax highlighting (e.g., python, javascript).

Blockquotes

To create a blockquote, use the > symbol before your text. For nested blockquotes, use multiple > symbols.

> This is a blockquote.
>> This is a nested blockquote.

Horizontal Rules

Create a horizontal line or rule by using three or more asterisks, dashes, or underscores on a new line.

---

Additional Tips

Whitespace and Line Breaks: In Markdown, line breaks and spacing can be crucial. To create a new line without starting a new paragraph, end a line with two or more spaces before hitting Enter.
Escaping Markdown: To display a Markdown character, precede it with a backslash (\). For example, \*not italic\*.

Markdown's simplicity and readability make it an excellent choice for documentation and note-taking. As you become more comfortable with these basics, you'll find it a versatile tool for your writing needs.

Conclusion

In summary, Git and GitHub stand as powerful tools for documentation and version control, offering a platform for managing, tracking, and collaborating on projects. Adhering to clear file naming conventions enhances this system's efficiency, ensuring files are accessible and compatible across various environments. Furthermore, the use of Markdown within this framework simplifies the process of creating and maintaining readable and well-structured documentation. Together, these elements form an ecosystem that streamlines project management and fosters a collaborative and organized workspace, which is important for the successful execution of individual and team-based projects in information technology disciplines.

Learning the Command Line

In this section, our focus is learning the command line environment, how to use it, and what it offers.

It's more common for people to learn how to use a computer via a graphical user interface (GUI). Yet, GUIs are not well suited for servers. There are a few reasons for this. First, GUIs entail extra software. The more software we have on a server, the more resources that software consumes, the more software that needs to be managed, and the more we expose our systems to security risks. You could therefore imagine how problematic it would be for a company to manage thousands of servers if all those servers had GUIs installed on them. Second, GUIs do not provide a good platform for automation (aka, scripting), at least not remotely as well as command line interfaces (CLIs) do. It's because of this and for other reasons, we will learn how to use the CLI.

Learning the CLI has other benefits. Using the command line means using a shell, and using a shell means programming a computer. Thus, while this book does not spend much time on programming per se, you will by default be programming just by using the CLI. Using the CLI also removes a lot of abstractions that GUIs introduce. These abstractions, or metaphors, obfuscate how computers function even if they make using computers easier, which is debatable. Thus, by learning the CLI, you'll gain a better understanding how computers work.

Fortunately, Linux and many other Unix-like operating systems have the ability to operate without graphical user interfaces. This is partly the reason why these operating systems have done so well in the server market. Let's get started!

The Linux Filesystem

In this section, we will cover the:

the Linux filesystem and how it is structured and organized, and
the basic commands to navigate around and to work with directories and files

The terms directories and folders are synonymous, but as users of primarily graphical user interfaces, you are more likely familiar with the term folders. I will more often use the term directories since that is the command line (text user interface) convention. I will use the term folders when referring to a graphical environment.

Throughout this demonstration, I encourage you to connect to your remote server using the gcloud compute ssh command and follow along with the commands that I use. See Section 2.1 for details on connecting to the remote server.

Visualizing the Filesystem as a Tree

We will need to work within the filesystem quite a lot in this course, but the term filesystem may refer to different concepts, and it's important to clear that up before we start.

In come cases, a filesystem refers to how data (files) are stored and retrieved on a device like a hard drive, USB drive, etc. For example, macOS uses the Apple File System (APFS) by default, and Windows uses the New Technology File System (NTFS). Linux and other Unix-like operating systems use a variety of filesystems, but presently, the two major ones are ext4 and btrfs. The former is the default filesystem on distributions like Debian and Ubuntu; the latter is the default on the Fedora and openSUSE distributions. Opensource.com has a nice overview of filesystems under this concept.

A filesystem might also be used to refer to the directory structure or directory tree of a system. In graphical user interface parlance, this is simply how the folders are your disk are organized. This concept of a filesystem is related to the prior concept of a filesystem, but it's used here to refer to the location of files and directories on a system. For example, on Windows, the filesystem is identified by a letter, like the C: drive, regardless if the disk has a NTFS filesystem or a FAT filesystem. Additional drives (e.g., extra hard drives, USB drives, DVD drives, etc.), will be assigned their own letters (A:, B:, D:, etc.). macOS adheres to a tree like filesystem like Linux, UNIX, and other Unix-like operating systems, and this is because macOS is a registered UNIX® OS.

In Linux and Unix-like OSes, we have a top-level root directory identified by a single forward slash /, and then subdirectories under that root directory. Additional drives (e.g., extra hard drives, USB drives, DVD drives, etc.) are mounted under that root hierarchy and not separately like on Windows. Linux.com provides a nice overview of the most common directory structure that Linux distributions use along with an explanation for the major bottom level directories. In this section, we will learn about this type of filesystem.

On Linux, we can visualize the filesystem with the tree command. The tree command, like many Linux commands, can be run on its own or with options, like in the second example below:

tree : list contents of directories in a tree-like format
- tree -dfL 1 : directories only, full path, one level
- tree -dfL 1 / : list directories only at root / level

The root Directory and its Base Level Directories

As explained on the Linux.com page, here are the major sub directories under / (root) and a short description of their main purpose:

/bin : binary files needed to use the system
/boot : files needed to boot the system
/dev : device files -- all hardware has a file
/etc : system configuration files
/home : user directories
/lib : libraries/programs needed for other programs
/media : external storage is mounted
/mnt : other filesystems may be mounted
/opt : store software code to compile software
/proc : files containing info about your computer
/root : home directory of superuser
/run : used by system processes
/sbin : like /bin, binary files that require superuser privileges
/srv : contains data for servers
/sys : contains info about devices
/tmp : temp files used by applications
/usr : user binaries, etc that might be installed by users
/var : variable files, used often for system logs

Although there are 18 directories listed above that branch off from the root directory, we will use some more often than others. For example, the /etc directory contains system configuration files, and we will use the contents of this directory, along with the /var directory, quite a bit when we set up our web servers, relational database servers, and more later in the semester. The /home directory is where our default home directories are stored, and if you manage a multi-user system, then this will be an important directory to manage.

Source: Linux Filesystem Explained

Relative and Absolute Paths

macOS users have the Finder app to navigate their filesystem, to move files to different folders, to copy files, to trash them, etc. Window users have File Explorer for these functions. Linux users have similar graphical software options, but all of these functions can be completed on the Linux command line, and generally more efficiently. To get started, we need to learn two things first:

how to specify the locations of files and directories in the filesystem
the commands needed to work with the filesystem

To help specify the locations of files and directories, there are two key concepts to know:

absolute paths
relative paths

Above we learned about the / root directory and its subdirectories. All sorts of commands, especially those that deal with files and directories (like copying, moving, deleting), require us to specify on the command line the locations of the files and directories. It's common to specify the location in two different ways, by specifying their absolute path (or location) on the filesystem, or the relative path (or location).

To demonstrate, we might want to move around the filesystem. When we first log in to our remote system, our default location will be our home directory, sometimes referred to as $HOME. The path (location) to that directory will be.

/home/USER

Where USER is your username. Therefore, since my username is sean, my home directory is located at:

/home/sean

which we can see specified with the pwd (print working directory) command:

pwd
/home/sean

When I write $HOME, I am referring to a default, environmental variable that points to our home directory. It's variable because, depending on which account we're logged in as, $HOME will point to a different location. For me, then, that will be /home/sean, if I'm logged in as sean. For you it'll point to your home directory.

In my home directory, I have a subdirectory called public_html. The path to that is:

/home/sean/public_html

In a program like Finder (macOS) or File Explorer (Windows), if I want to change my location to that subdirectory (or folder), then I'd double click on its folder icon. On the command line, however, I have to write out the command and the path to the subdirectory. Therefore, starting in my home directory, I use the following command to switch to the public_html subdirectory:

cd public_html

Note that files and directories in Linux are case sensitive. This means that a directory named public_html can co-exist alongside a directory named Public_html. Or a file named paper.txt can co-exist alongside a file named Paper.txt. So be sure to use the proper case when spelling out files, directories, and even commands.

The above is an example of using a relative path, and that command would only be successful if I were first in my $HOME directory. That's because I specified the location of public_html relative to my default ($HOME) location.

I could have also specified the absolute location, but this would be the wordier way. Since the public_html directory is in my $HOME directory, and my $HOME directory is a subdirectory in the /home directory, then to specify the absolute path in the above command, I'd write:

cd /home/sean/public_html

Again, the relative path specified above would only work if I was in my home directory, because cd public_html is relative to the location of /home/sean. That is, the subdirectory public_html is in /home/sean. But specifying the absolute path would work no matter where I was located in the filesystem. For example, if I was working on a file in the /etc/apache2 directory, then using the absolute path (cd /home/sean/public_html) would work. But the relative path (cd public_html) command would not since there is no subdirectory called public_html in the /etc/apache2 directory.

Finally, you can use the ls command to list the contents of a directory, i.e., the files and subdirectories in a directory:

ls

We will cover this more next.

Conclusion

Understanding relative and absolute paths is one of the more difficult concepts for new commandline users to learn, but after time, it'll feel natural. So just keep practicing, and I'll go over this throughout the semester.

In this section, you learned the following commands:

tree to list directory contents in a tree-like format
cd to change directory
pwd to print working directory

You learned different ways to refer to the home directory:

/home/USER
$HOME
~

You learned about relative and absolute paths. An absolute path starts with the root directory /. Here's an absolute path to a file named paper.txt in my home directory:

absolute path: /home/sean/paper.txt

If I were already in my home directory, then the relative path would simply be:

relative path: paper.txt

Additional Readings

Read through these two pages for more info on the file system and directories:

Files and Directories

In this section, we will cover:

Basic Directory and File Commands:
- Overview of GNU Coreutils.
- Commands for navigating directories and working with files.
Directory Listing:
- ls: List directory contents.
- Options for ls (e.g., -l, -a, -al).
- pwd: Print working directory.
- Using man pages for command options.
Basic File Operations:
- cp: Copying files and directories.
- mv: Moving and renaming files and directories.
- rm: Removing files and directories.
- touch: Updating file timestamps or creating empty files.
- Options for each command, including interactive modes.
Special File Types:
- mkdir: Creating directories.
- rmdir: Deleting empty directories.
- rm -r: Recursively deleting directories with content.
Printing Text:
- echo: Printing text or variables to standard output.
- cat: Concatenating and printing file contents.
- less: Viewing file contents one page at a time.

Basic Directory and File commands

In order to explore the above directories but also to create new ones and work with files, we need to know some basic terminal commands. A lot of these commands are part of the base system called GNU Coreutils, and in this demo, we will specifically cover some of the following GNU Coreutils:

Directory Listing

I have already demonstrated one command: the cd (change directory) command. This will be one of the most frequently used commands in your toolbox.

In our current directory, or once we have changed to a new directory, we will want to learn its contents (what files and directories it contains). We have a few commands to choose from to list contents (e.g., you have already seen the tree command), but the most common command is the ls (list) command. We use it by typing the following two letters in the terminal:

ls

Again, to confirm that we're in some specific directory, use the pwd command to print the working directory.

Most commands can be combined with options. Options provide additional functionality to the base command, and in order to see what options are available for the ls command, we can look at its man(ual) page:

man ls

From the ls man page, we learn that we can use the -l option to format the output of the ls command as a long-list, or a list that provides more information about the files and directories in the working directory. Later in the semester, I will talk more about what the other parts of output of this option mean.

ls -l

We can use the -a option to list hidden files. In Linux, hidden files are hidden from the base ls command if the files begin with a period. We have a some of those files in our $HOME directories, and we can see them like so:

ls -a

We can also combine options. For example, to view all files, including hidden ones, in the long-list format, we can use:

ls -al

Basic File Operations

Some basic file operation commands include:

cp : copying files and directories
mv : moving (or renaming) files and directories
rm : removing (or deleting) files and directories
touch : change file timestamps (or, create a new, empty file)

These commands also have various options that can be viewed in their respective man pages. Again, command options provide additional functionality to the base command, and are mostly (but not always) prepended with a dash and a letter or number. To see examples, type the following commands, which will launch the manual pages for them. Press q to exit the manual pages, and use your up and down arrow keys to scroll through the manuals:

man cp
man mv
man rm
man touch

The touch command's primary use is to change a file's timestamp; that is, the command updates a file's "access and modification times" (see man touch). For example, let's say we have a file called paper.txt in our home directory. We can see the output here:

ls -l paper.txt
-rw-rw-r-- 1 sean sean 0 Jun 27 00:13 /home/sean/paper.txt

This shows that the last modification time was 12:03AM on June 27.

If I run the touch command on paper.txt, the timestamp will change:

touch paper.txt
-rw-rw-r-- 1 sean sean 0 Jun 27 00:15 /home/sean/paper.txt

This shows an updated modification timestamp of 12:15AM.

The side effect occurs when we name a file with the touch command, but the file does not exist, in which case the touch command will create an empty file with the name we use. Let's say that I do not have a file named file.txt in my home directory. If I run the ls -l file.txt command, I'll receive an error since the file does not exist. But if I then use the touch file.txt command, and then run ls -l file.txt. we'll see that the file now exists, that it has a byte size of zero:

ls -l file.txt
ls: cannot access 'file.txt': No such file or directory
touch file.txt
ls -l file.txt
-rw-rw-r-- 1 sean sean 0 Jun 27 00:18 file.txt

Here are some ways to use the other three commands and their options:

Copying Files and Directories

To copy an existing file (file1.txt) to a new file (file2.txt):

cp file1.txt file2.txt

Use the -i option to copy that file in interactive mode; that is, to prompt you before overwriting an existing file.

We also use the cp command to copy directories.

Moving Files and Directories

The mv command will move an existing file to a different directory, and/or rename the file. For example, from within our home directory (therefore, using relative path names), to move a file named "file.docx" to a subdirectory named "Documents":

mv file.docx Documents/

To rename a file only (keeping it in the same directory), the command looks like this:

mv file.docx newName.docx

To move the file to our Documents/ subdirectory and also rename it, then we'd do this:

mv file.docx Documents/newName.docx

The man page for the mv command also describes an -i option for interactive mode that helps prevent us from overwriting existing files. For example, if we have a file called paper.docx in our $HOME directory, and we have a file named paper.docx in our $HOME/Documents directory, and if these are actually two different papers (or files), then moving the file to that directory will overwrite it without asking. The -i option will prompt us first:

mv -i paper.docx Documents/paper.docx

Remove or Delete

Finally, to delete a file, we use the rm command:

rm file.html

Unlike the trash bin in your graphical user environment, it's very hard to recover a deleted file using the rm command. That is, using rm does not mean the file or directory is trashed; rather, it means it was deleted.

Special File Types

For now, let's only cover two commands here:

mkdir for creating a new directory
rmdir for deleting an empty directory

Like the above commands, these commands also have their own set of options that can be viewed in their respective man pages:

man mkdir
man rmdir

Make or Create a New Directory

We use these commands like we do the ones above. If we are in our $HOME directory, and we want to create a new directory called bin, we do:

mkdir bin

The bin directory in our $HOME directory is a default location to store our personal applications, or applications (programs) that are only available to us.

And if we run ls, we should see that it was successful.

Delete a Directory

The rmdir command is a bit weird because it only removes empty directories. To remove the directory we just created, we use it like so:

rmdir bin

However, if you want to remove a directory that contains files or other subdirectories, then you will have to use the rm command along with the -r (recursive) option:

rm -r directory-with-content/

Printing Text

There a number of ways to print text to standard output, which is our screen by default in the terminal. We could also redirect standard output to a file, to a printer, or to a remote shell. We'll see examples like that later in the semester. Here let's cover two commands:

echo : to print a line of text to standard output
cat : to concatenate and write files
less : to view files one page at a time

Standard output is by default the screen. When we print to standard output, then by default we print to the screen. However, standard output can be redirected to files, programs, or devices, like actual printers.

Print to Screen

To use echo:

echo "hello world"
echo "Today is a good day."

We can also echo variables:

a=4
echo "$a"

Print File to Screen

cat is listed elsewhere in the GNU Coreutils page. The primary use of the cat command is to join, combine, or concatenate files, but if used on a single file, it has this nice side effect of printing the content of the file to the screen:

cat file.html

If the file is very long, we might want to use what's called a pager. There are a few pagers to use, but the less command is a common one:

less file.html

Like with the man pages, use the up and down arrow keys to scroll through the output, and press q to quit the pager.

Conclusion

In this demo, we learned about the filesystem or directory structure of Linux, and we also learned some basic command to work with directories and files. You should practice using these commands as much as possible. The more you use them, the easier it'll get. Also, be sure to review the man pages for each of the commands, especially to see what options are available for each of them.

Basic commands covered in this demo include:

cat : display contents of a file
cp : copy
echo : print a line of text
less : display contents of a file by page
ls : list
man : manual pages
mkdir : create a directory
mv : move or rename
pwd : print name of current/working directory
rmdir : delete an empty directory
rm : remove or delete a file or directory
tree : list contents of directories in a tree-like format

File Permissions and Ownership

In this section, we will cover:

Identifying Ownership and Permissions:
- Overview of file ownership (user and group ownership).
- Understanding file permissions (read, write, execute).
- Using ls -l to view ownership and permissions.
- Explanation of how to interpret the output of ls -l.
Changing File Permissions:
- Using chmod to change file permissions.
- Explanation of octal values for permissions (rwx and their octal equivalents).
- Examples of setting permissions using chmod with octal notation (e.g., chmod 700, chmod 644).
Changing File Ownership:
- Using chown to change file ownership (user and group).
- Examples of changing user ownership and group ownership separately and together.
- Usage of sudo to execute administrative commands when needed.
Additional Commands:
- ls -ld: List directories and their attributes.
- groups: Show group memberships for a user.
- sudo: Run commands as another user with elevated privileges.

Identifying Permissions and Ownership

In the last section, we saw that the output of the ls -l command included a lot extra information besides a listing of file names. The output also listed the owners and permissions for each file and directory.

Each user account on a Linux system has a user name and has at least one group membership. That name and that group membership determine the user and group ownership for all files created under that account.

In order to allow or restrict access to files and directories, ownership and permissions are set in order to manage that kind of access to those files and directories. There are thus two owners for every file and directory:

user owner
group owner

And there are three permission modes that restrict or expand access to each file (or directory) based on user or group membership:

read
write
execute

I am emphasizing the rwx in the above list of modes because we will need to remember what these letters stand for when we work with file and directory permissions.

Consider the output of ls -l in my home directory that contains a file called paper.txt:

-rw-rw-r-- 1 seanburns seanburns 0 Sep  7 14:41 paper.txt

According to the above output, we can parse the following information about the file:

Attributes	`ls -l` output
File permissions	`-rw-rw-r--`
Number of links	1
Owner name	seanburns
Group name	seanburns
Byte size	0
Last modification date	Sep 7 14:41
File name	paper.txt

The Owner and Group names of the paper.txt file are both seanburns because there is a user account named seanburns on the system and a group account named seanburns on the system, and that file exists in the user seanburns's home directory. You can see which groups you belong to on your system with the groups.

The File permissions show:

-rw-rw-r--

Ignore the first dash for now. The remaining permissions can be broken down into three parts:

rw- (read and write only permissions for the Owner)
rw- (read and write only permissions for the Group)
r-- (read-only permissions for the other, or World)

We read the output as such:

User seanburns is the Owner and has read and write permissions on the file but not execute permissions (rw-).
Group seanburns is the Group owner and has read and write permissions on the file but not execute permissions (rw-).
The Other/World can read the file but cannot write to the file nor execute the file (r--).

The word write is a classical computing term that means, essentially, to edit and save edits of a file. Today we use the term save instead of write, but remember that they are basically equivalent terms.

The Other/World ownership allows people to view (read) the file but not write (save) to it nor execute (run) it. Any webpage you view on the internet at least has Other/World mode set to read.

Let's take a look at another file. In our /bin directory, we can see a listing of executable programs on the system. For example, take a look at the scp (secure copy) program as follows:

ls -l /bin/scp
-rwxr-xr-x 1 root   root    117040 Jan  2  2024  zip*

Attributes	`ls -l` output
File permissions	`-rwxr-xr-x`
Number of links	1
Owner name	root
Group name	root
Byte size	117040
Last modification date	Jan 2 2024
File name	scp

Since scp is a computer program used to securely copy files between different machines, it needs to be executable. That is, users on the system need to be able to run it. But notice that the owner and group names of the file point to the user root. We have already learned that there is a root directory in our filesystem. This is the top level directory in our filesystem and is referenced by the forward slash: /. But there is also a root user account. This is the system's superuser. The superuser can run or access anything on the system, and this user also owns most of the system files.

Back to permissions. We read the output of the ls -l /bin/scp command as such:

User root is the Owner and has read, write, and execute (rwx) permissions on the file.
Group root is the Group owner and has read and execute permissions but not write permissions (r-x)
The Other/World has read and execute permissions but not write (r-x). This permissions allows other users (like you and me) to use the scp program.

Finally, let's take a look at the permissions for a directory itself. When I run the following command in my home directory, it will show the permissions for my /home/seanburns directory:

ls -ld

And the output is:

drwxr-xr-x 51 seanburns seanburns 4096 Sep  7 23:35 .

This shows that:

Attributes	`ls -ld` output
File permissions	`drwxr-xr-x`
Number of links	1
Owner name	seanburns
Group name	seanburns
Byte size	4096
Last modification date	Sep 7
File name	.

This is a little different from the previous examples, but let's parse it:

Instead of an initial dash, this file has an initial d that identifies this as a directory. Directories in Linux are simply special types of files.
User seanburns has read, write, and execute (rwx) permissions.
Group seanburns and Other/World have execute (r-x) read and execute permissions.
. signifies the current directory, which happens to be my home directory, since I ran that command at the /home/seanburns path.

If this were a multiuser system, then other people with accounts on the system would be able to read the contents of my home directory. However, they wouldn't be able to modify, create, or delete any files or directories in that directory.

Why does the directory have an executable bit set since it's not a program? The executable bit is required on directories to access them. That is, if we want to cd into a directory, then the executable bit needs to be set on the directory.

Changing File Permissions and Ownership

Changing File Permissions

All the files and directories on a Linux system have default ownership and permissions set. This includes new files that we might create as we use our systems. There will be times when we will want to change the defaults. For example, if I were to create accounts for other people for this system, I might want to disallow them access to my home directory. There are several commands available to do that, and here I'll introduce you to the two most common ones.

The chmod command is used to change file and directory permissions: the -rwxrwxrwx part of a file.
The chown command is used to change a file's and directory's owner and group.

`chmod`

Each one of those bits (the r, the w, and the x) are assigned the following octal values:

permission	description	octal value
r	read	4
w	write	2
x	execute	1
-	no permissions	0

There are octal values for the three set of permissions represented by -rwxrwxrwx. If I bracket the sets (for demonstration purposes only), they look like this:

Owner	Group	Other/World
rwx-	rwx-	rwx-
4210	4210	4210

The first set describes the permissions for the owner. The second set describes the permissions for the group. The third set describes the permissions for the Other/World.

We use the chmod command and the octal values to change a file or directory's permissions. For each set, we add up the octal values. For example, to make a file read (4), write (2), and executable (1) for the owner only, and zero out the permissions for the group and Other/World, we use the chmod command like so:

chmod 700 paper.txt

We use 7 because 4+2+1=7, and we use two zeroes in the second two places since we're removing permissions for group and Other/World.

If we want to make the file read, write, and executable by the owner, the group, and the world, then we repeat this for each set:

chmod 777 paper.txt

More commonly, we might want to restrict ownership. Here we enable rw- for the owner, and r-- for the group and the Other/World:

chmod 644 paper.txt

Because 4+2=6 for owner, and 4 is read only for group and Other/World, respectively.

Changing File Ownership

`chown`

In order to change the ownership of a file, we use the chown command followed by the name of the owner.

I can generally only change the user owner of a file if I have admin access on a system. In such a case, I would have to use the sudo command, which gives me superuser privileges. To change the owner only, say from the user sean to the user root:

sudo chown root paper.txt

In the following example, I make the root user the group owner of my paper.txt file. Note that I include a colon before the name root. This signifies the group membership.

sudo chown :root paper.txt

Look at the output as you go using the ls -l command.

To change both user owner and group owner, we simply specify both names and separate those names by a colon. Thus, since paper.txt now has root as the user owner and root as the group owner, I revert ownership back to me for both user and group:

sudo chown sean:sean paper.txt

Conclusion

In this section, we learned:

how to identify file/directory ownership and permissions
and how to change file/directory ownership and permissions.

The commands we used to change these include:

chmod : for changing file permissions (or file mode bits)
chown : for changing file ownership

We also used the following commands:

ls : list directory contents
- ls -ld : long list directories themselves, not their contents
groups : print the groups a user is in
sudo : execute a command as another user

Text Processing: Part 1

In this section, we will cover

Text processing tools are fundamental: Learning to process and manipulate text is a crucial skill for systems administrators, programmers, and data analysts. Linux provides a variety of tools to examine, manipulate, and analyze text.
Plain text is foundational: Programs and data are often stored in plain text, making it essential to know how to handle text files effectively.
Essential text processing commands: Commands such as cat, cut, head, tail, sort, uniq, and wc allow users to view, manipulate, and analyze text files, even large datasets, with ease.
Power of pipes and redirection: Using pipes (|) and redirection (>, >>), you can chain commands together to create more complex workflows for processing text files.
CSV data manipulation: This lecture shows how to work with CSV (comma-separated value) files, demonstrating how to view, sort, and filter data with tools like cut and uniq.
Practical applications for systems administration: The lecture emphasizes that text processing skills are directly applicable to managing user accounts, security, system configuration, and more in a systems administration context.

Getting Started

One of the more important sets of tools that Linux (as well Unix-like) operating systems provide are tools that aid processing and manipulating text. The ability to process and manipulate text, programmatically, is a basic and essential part of many programming languages, (e.g., Python, JavaScript, etc), and learning how to process and manipulate text is an important skill for a variety of jobs including statistics, data analytics, data science, programming, web programming, systems administration, and so forth. In other words, this functionality of Linux (and Unix-like) operating systems essentially means that to learn Linux and the tools that it provides is akin to learning how to program.

Plain text files are the basic building blocks of programs and data. Programs are written in plain text editors (Vim, NeoVim, VS Code, etc), and data is often stored as plain text. Linux offers many tools to examine, manipulate, process, analyze, and visualize data in plain text files.

In this section, we will learn some of the basic tools to examine plain text (i.e., data). We will do some programming later in this class, but for us, the main objective with learning to program aligns with our work as systems administrators. That means our text processing and programming goals will serve our interests in managing users, security, networking, system configuration, and so forth as Linux system administrators.

In the meantime, the goal of this section is to acquaint ourselves with some of the tools that can be used to process text. In this section, we will only cover a handful of text processing programs or utilities, but here is a fairly comprehensive list, and we'll examine some additional ones from this list later in the semester:

cat : concatenate files and print on the standard output
cut : remove sections from each line of files
diff : compare files line by line
echo : display a line of text
expand : convert tabs to spaces
find : search for files in a directory hierarchy
fmt : simple optimal text formatter
fold : wrap each input line to fit in specified width
grep : print lines that match patterns
head : output the first part of files
join : join lines of two files on a common field
look : display lines beginning with a given string
nl : number lines of files
paste : merge lines of files
printf : format and print data
shuf : generate random permutations
sort : sort lines of text files
tail : output the last part of files
tr : translate or delete characters
unexpand : convert spaces to tabs
uniq : report or omit repeat lines
wc : print newline, word, and byte counts for each file

We will also discuss two types of operators, the pipe and the redirect. The latter has a version that will write over the contents of a file, and a version that will append contents to the end of a file:

| : redirect standard output from command1 to standard input for command2
> : redirect to standard output to a file, overwriting
>> : redirect to standard output to a file, appending

Today I want to cover a few of the above commands for processing data in a file; specifically:

cat : concatenate files and print on the standard output
cut : remove sections from each line of files
head : output the first part of files
sort : sort lines of text files
tail : output the last part of files
uniq : report or omit repeat lines
wc : print newline, word, and byte counts for each file

Let's look at a toy, sample file that contains structured data as a CSV (comma separated value) file. You can download the file to your gcloud virtual machine using the following command:

wget https://raw.githubusercontent.com/cseanburns/linux_sysadmin/master/data/operating-systems.csv

The file contains a list of operating systems (column one), their software license (column two), and the year the OSes were released (column three). We can use the cat command to view the entire contents of this small file:

Command:

cat operating-systems.csv

Output:

Chrome OS, Proprietary, 2009
FreeBSD, BSD, 1993
Linux, GPL, 1991
iOS, Proprietary, 2007
macOS, Proprietary, 2001
Windows NT, Proprietary, 1993
Android, Apache, 2008

It's a small file, but we might want the line and word count of the file. To acquire that, we can use the wc (word count) command. By itself, the wc command will print the number of lines, words, and bytes of a file. The following output states that the file contains seven lines, 23 words, and 165 bytes:

Command:

wc operating-systems.csv

Output:

  7  23 165 operating-systems.csv

We can use the head command to output the first ten lines of a file. Since our file is only seven lines long, we can use the -n option to change the default number of lines. In the following example, I print the first three lines of the file:

Command:

head -n3 operating-systems.csv

Output:

Chrome OS, Proprietary, 2009
FreeBSD, BSD, 1993
Linux, GPL, 1991

Using the cut command, we can select data from file. In the first example, I want to select column two (or field two), which contains the license information. Since this is a CSV file, the fields (aka, columns) are separated by commas. Therefore I use -d option to instruct the cut command to use commas as the separating character. The -f option tells the cut command to select field two. Note that a CSV file may use other characters as the separator character, like the Tab character or a colon. In such cases, it may still be called a CSV file but you might also see .dat files for data files or other variations.

Command:

cut -d"," -f2 operating-systems.csv

Output:

 Proprietary
 BSD
 GPL
 Proprietary
 Proprietary
 Proprietary
 Apache

From there it's trivial to select a different column. In the next example, I select column three to get the release year:

Command:

cut -d"," -f3 operating-systems.csv

Output:

A genius aspect of the Linux (and Unix) commandline is the ability to pipe and redirect output from one program to another program. Output can be further directed to a file. By stringing together multiple programs in this way, we can create small programs that do much more than the simple programs that compose them.

For example, in the following example, I use the pipe operators to send the output of the cut command to the sort command. This sorts the data in alphabetical or numerical order, depending on the character type (lexical or numerical). I then pipe that output to the uniq command, which removes duplicate rows. Finally, I redirect that final output to a new file titled os-years.csv. Since the year 1993 appears twice in the original file, it only appears once in the output because the uniq command removed the duplicate:

Command:

cut -d"," -f3 operating-systems.csv | sort | uniq > os-years.csv

Output:

cat os-years.csv
 1991
 1993
 2001
 2007
 2008
 2009

Data files like this often have a header line at the top row that names the data columns. It's useful to know how to work with such files, so let's add a header row to the top of the file. In this example, I'll use the sed command, which we will learn more about in the next lesson. For now, we use sed with the option -i to edit the file, then 1i instructs sed to insert text at line 1. \OS, License, Year is the text that we want inserted at line 1. We wrap the argument within single quotes:

Command:

sed -i '1i \OS, License, Year' operating-systems.csv
cat operating-systems.csv

Output:

OS, License, Year
Chrome OS, Proprietary, 2009
FreeBSD, BSD, 1993
Linux, GPL, 1991
iOS, Proprietary, 2007
macOS, Proprietary, 2001
Windows NT, Proprietary, 1993
Android, Apache, 2008

I added the header row just to demonstrate how to remove it when processing files with header rows. Say we want the license field data, but we need to remove that first line. In this case, we can use the tail command:

Command:

tail -n +2 operating-systems.csv | cut -d"," -f2 | sort | uniq > license-data.csv
cat license-data.csv

Output:

 Apache
 BSD
 GPL
 Proprietary

The tail command generally outputs the last lines of a file, but the -n +2 option is special. It makes the tail command output a file starting at the second line. We could specify a different number in order to start output at a different line. See man tail for more information.

Conclusion

In this lesson, we learned how to process and make sense of data held in a text file. We used some commands that let us select, sort, de-duplicate, redirect, and view data in different ways. Our data file was a small one, but these are powerful and useful command and operators that would make sense of large data file.

The commands we used in this lesson include:

cat : concatenate files and print on the standard output
cut : remove sections from each line of files
head : output the first part of files
sort : sort lines of text files
tail : output the last part of files
uniq : report or omit repeat lines
wc : print newline, word, and byte counts for each file

We also used two types of operators, the pipe and the redirect:

| : redirect standard output command1 to standard input of command2
> : redirect to standard output to a file, overwriting

Text Processing: Part 2

In this section, we will cover:

Expanding the toolbox: This section introduces more powerful text processing utilities: grep, sed, and awk, which are essential for advanced pattern matching, filtering, and text manipulation on the Linux command line.
grep for pattern matching: The grep command allows you to search for patterns in files and output matching lines. You can perform case-insensitive searches, invert matches, count occurrences, and use regular expressions to refine your searches.
sed for stream editing: sed is a non-interactive text editor designed for filtering and transforming text. You can delete, replace, and manipulate specific lines in a file, making it a powerful tool for batch text processing tasks.
awk for structured data: awk is a complete scripting language for pattern scanning and processing columns of structured data. It can handle arithmetic, generate reports, and perform logical comparisons, making it ideal for working with CSVs and other structured text files.
Efficiency with one-liners: The combination of grep, sed, and awk allows for creating powerful one-liner commands to process large amounts of text quickly, reducing the need for more complex scripts.
Regular expressions are key: Regular expressions (regex) play a significant role in refining searches and manipulations in both grep and sed. Understanding basic regex patterns, such as ^ for line start and $ for line end, is crucial for effective text processing.
Integration with other tools: Like the tools introduced in Part 1, grep, sed, and awk integrate well with pipes and redirection, allowing you to chain them with other commands for flexible and efficient text workflows.

Getting Started

In the last section, we covered the cat, cut, head, sort, tail, uniq, and wc utilities.

We also learned about the | pipe operator. The pipe operator is used to redirect standard output from one command to a second command. Then the second command can process the output from the first command. An example is:

sort file.txt | uniq

This sorts the lines in a file named file.txt and then prints to standard output only the unique lines Note that files must be sorted before piped to uniq.

We learned about the > and >> redirect operators. They work like the pipe operator, but instead of directing output to a new command, they direct output to a file for saving. As a reminder, the single redirect > overwrites a file or creates a file if it does not exist. The double redirect >> appends to a file or creates a file if it does not exist. It's safer to use the double redirect, but if you are processing large amounts of data, it could also mean creating large files really quickly. If that gets out of hand, then you might crash your system. To build on our prior example, we can add >> to send the output to a new file called output.txt:

sort file.txt | uniq >> output.txt

We have available more powerful utilities and programs to process, manipulate, and analyze text files. In this section, we will cover the following three of these:

grep : print lines that match patterns
sed : stream editor for filtering and transforming text
awk : pattern scanning and text processing language

Grep

The grep command is one of my most often used commands. Basically, grep "prints lines that match patterns" (see man grep). In other words, it's search, and it's super powerful.

grep works line by line. So when we use it to search a file for a string of text, it will return the whole line that matches the string. This line by line idea is part of the history of Unix-like operating systems. It's super important to remember that most utilities and programs that we use on the commandline are line oriented.

"A string is any series of characters that are interpreted literally by a script. For example, 'hello world' and 'LKJH019283' are both examples of strings." -- Computer Hope. More generally, it's the literal characters that we type. It's data.

Let's consider the file operating-systems.csv, as seen below:

OS, License, Year
Chrome OS, Proprietary, 2009
FreeBSD, BSD, 1993
Linux, GPL, 1991
macOS, Proprietary, 2001
Windows NT, Proprietary, 1993
Android, Apache, 2008

If we want to search for the string Chrome, we can use grep. Notice that even though the string Chrome only appears once, and in one part of a line, grep returns the entire line.

Command:

grep "Chrome" operating-systems.csv

Output:

Chrome OS, Proprietary, 2009

Be aware that, by default, grep is case-sensitive, which means a search for the string chrome, with a lower case c, would return no results. Fortunately, grep has an -i option, which means to ignore the case of the search string. In the following examples, grep returns nothing in the first search since we do not capitalize the string chrome. However, adding the -i option results in success:

Command:

grep "chrome" operating-systems.csv

Output:

None.

Command:

grep -i "chrome" operating-systems.csv

Output:

Chrome OS, Proprietary, 2009

We can also search for lines that do not match our string using the -v option. We can combine that with the -i option to ignore the string's case. Therefore, in the following example, all lines that do not contain the string chrome are returned:

Command:

grep -vi "chrome" operating-systems.csv

Output:

FreeBSD, BSD, 1993
Linux, GPL, 1991
iOS, Proprietary, 2007
macOS, Proprietary, 2001
Windows NT, Proprietary, 1993
Android, Apache, 2008

I used the tail command in the prior section to show how we might use tail to remove the header (1st line) line in a file. However, it's an odd use of the tail command, which normally just prints the last lines of a file. Instead, we can use grep to remove the first line. To do so, we use what's called a regular expression, or regex for short. Regex is a method used to identify patterns in text via abstractions. They can get complicated, but we can use some easy regex methods.

Let's use a version of the above file with the header line:

Command:

cat operating-systems.csv

Output:

OS, License, Year
Chrome OS, Proprietary, 2009
FreeBSD, BSD, 1993
Linux, GPL, 1991
iOS, Proprietary, 2007
macOS, Proprietary, 2001
Windows NT, Proprietary, 1993
Android, Apache, 2008

To use grep to remove the first line of a file, we can invert our search to select all lines not matching "OS" at the start of a line. Here the carat key ^ is a regex indicating the start of a line. Again, this grep command returns all lines that do not match the string os at the start of a line, ignoring case:

Command:

grep -vi "^os" operating-systems.csv

Output:

Chrome OS, Proprietary, 2009
FreeBSD, BSD, 1993
Linux, GPL, 1991
iOS, Proprietary, 2007
macOS, Proprietary, 2001
Windows NT, Proprietary, 1993
Android, Apache, 2008

Alternatively, since we know that the string Year comes at the end of the first line, we can use grep to invert a search for that. Here the dollar sign key $ is a regex indicating the end of a line. Like the above, this grep command returns all lines that do not match the string year at the end of a line, ignoring case:

Command:

grep -vi "year$" operating-systems.csv

Output:

Chrome OS, Proprietary, 2009
FreeBSD, BSD, 1993
Linux, GPL, 1991
iOS, Proprietary, 2007
macOS, Proprietary, 2001
Windows NT, Proprietary, 1993
Android, Apache, 2008

The man grep page lists other options, but a couple of other good ones include:

Get a count of the matching lines with the -c option:

Command:

grep -ic "proprietary" operating-systems.csv

Output:

Print only the match and not the whole line with the -o option:

Command:

grep -io "proprietary" operating-systems.csv

Output:

Proprietary
Proprietary
Proprietary
Proprietary

We can simulate a Boolean OR search, and print lines matching one or both strings using the -E option. We separate the strings with a vertical bar |. This is similar to a Boolean OR search since there's at least one match in the following string, there is at least one result.

Here is an example where only one string returns a true value:

Command:

grep -Ei "bsd|atari" operating-systems.csv

Output:

FreeBSD, BSD, 1993

Here's an example where both strings evaluate to true:

Command:

grep -Ei "bsd|gpl" operating-systems.csv

Output:

FreeBSD, BSD, 1993
Linux, GPL, 1991

By default, grep will return results where the string appears within a larger word, like OS in macOS.

Command:

grep -i "os" operating-systems.csv

Output:

OS, License, Year
Chrome OS, Proprietary, 2009
iOS, Proprietary, 2007
macOS, Proprietary, 2001

However, we might want to limit results so that we only return results where OS is a complete word. To do that, we can surround the string with special characters:

Command:

grep -i "\<os\>" operating-systems.csv

Output:

OS, License, Year
Chrome OS, Proprietary, 2009

Sometimes we want the context for a result. That is, we might want to print lines that surround our matches. For example, print the matching line plus the two lines after the matching line using the -A NUM option:

Command:

grep -i "chrome" -A 2 operating-systems.csv

Output:

Chrome OS, Proprietary, 2009
FreeBSD, BSD, 1993
Linux, GPL, 1991

Or, print the matching line plus the two lines before the matching line using the -B NUM option:

Command

grep -i "android" -B 2 operating-systems.csv

Output:

macOS, Proprietary, 2001
Windows NT, Proprietary, 1993
Android, Apache, 2008

We can combine many of the variations. Here I search for the whole word BSD, case insensitive, and print the line before and the line after the match:

Command:

grep -i -A 1 -B 1 "\<bsd\>" operating-systems.csv

Output:

Chrome OS, Proprietary, 2009
FreeBSD, BSD, 1993
Linux, GPL, 1991

grep is very powerful, and there are more options listed in its man page.

Note that I enclose my search strings in double quotes. For example: grep "search string" filename.txt It's not always required to enclose a search string in double quotes, but it's good practice because if your string contains more than one word or empty spaces, the search will fail.

Sed

sed is a type of non-interactive text editor that filters and transforms text (man sed). By default sed works on standard output, and edits can be redirected (> or >>) to new files or made in-place using the -i option.

Like the other utilities and programs we've covered, including grep, sed works line by line. But unlike grep, sed provides a way to address specific lines or ranges of lines, and then run filters or transformations on those lines. Once lines in a text file have been identified or addressed, sed offers commands to filter or transform the text at those specific lines.

This concept of the line address is important, but not all text files are explicitly line numbered. Below I use the nl command to number lines in our file, even though the contents of the file do not actually display line numbers:

Command:

nl operating-systems.csv

Output:

     1	OS, License, Year
     2	Chrome OS, Proprietary, 2009
     3	FreeBSD, BSD, 1993
     4	Linux, GPL, 1991
     5	iOS, Proprietary, 2007
     6	macOS, Proprietary, 2001
     7	Windows NT, Proprietary, 1993
     8	Android, Apache, 2008

After we've identified the lines in a file that we want to edit, sed offers commands to filter, transform, or edit the text at the line addresses. Some of these commands include:

a : appending text
c : replace text
d : delete text
i : inserting text
p : print text
r : append text from file
s : substitute text
= : print the current line number

Let's see how to use sed to print line numbers instead of using the nl command. To do so, we use the equal sign = to identify line numbers. Note that it places the line numbers just above each line:

Command:

sed '=' operating-systems.csv

Output:

1
OS, License, Year
2
Chrome OS, Proprietary, 2009
3
FreeBSD, BSD, 1993
4
Linux, GPL, 1991
5
iOS, Proprietary, 2007
6
macOS, Proprietary, 2001
7
Windows NT, Proprietary, 1993
8
Android, Apache, 2008

In the last section, we used the tail command to remove the header line of our file. Above, we used grep to accomplish this task. It's much easier to use sed to remove the header line of the operating-systems.csv. We simply specify the line number (1) and then use the delete command (d). Thus, we delete line 1 with the command below. As long as we don't use the -i operator, we don't change the actual file.

Command:

sed '1d' operating-systems.csv

Output:

Chrome OS, Proprietary, 2009
FreeBSD, BSD, 1993
Linux, GPL, 1991
iOS, Proprietary, 2007
macOS, Proprietary, 2001
Windows NT, Proprietary, 1993
Android, Apache, 2008

Note that I use single apostrophes for the sed command. This is required.

If I wanted to make that a permanent deletion, then I would use the -i option, which means that I would edit the file in-place (see man sed):

Command:

sed -i '1d' operating-systems.csv

To refer to line ranges, I add a comma between addresses. Therefore, to edit lines 1, 2, and 3:

Command:

sed '1,3d' operating-systems.csv

Output:

Linux, GPL, 1991
iOS, Proprietary, 2007
macOS, Proprietary, 2001
Windows NT, Proprietary, 1993
Android, Apache, 2008

I can use sed to find and replace strings. The syntax for this is:

sed 's/regexp/replacement/' filename.txt

The regexp part of the above command is where I place regular expressions. Simple strings like words work here, too, since they are treated as regular expressions themselves.

In the next example, I use sed to search for the string "Linux", and replace it with the string "GNU/Linux":

Command:

sed 's/Linux/GNU\/Linux/' operating-systems.csv

Output:

OS, License, Year
Chrome OS, Proprietary, 2009
FreeBSD, BSD, 1993
GNU/Linux, GPL, 1991
iOS, Proprietary, 2007
macOS, Proprietary, 2001
Windows NT, Proprietary, 1993
Android, Apache, 2008

Because the string GNU/Linux contains a forward slash, and because sed uses the forward slash as a separator, note that I escaped the forward slash with a back slash. This escape tells sed to interpret the forward slash in GNU/Linux literally and not as a special sed character.

If we want to add new rows to the file, we can append a or insert i text after or at specific lines:

To append text after line 3, use a:

Command:

sed '3a FreeDOS, GPL, 1998' operating-systems.csv

Output:

OS, License, Year
Chrome OS, Proprietary, 2009
FreeBSD, BSD, 1993
FreeDOS, GPL, 1998
Linux, GPL, 1991
iOS, Proprietary, 2007
macOS, Proprietary, 2001
Windows NT, Proprietary, 1993
Android, Apache, 2008

To insert at line 3, use i:

Command:

sed '3i CP\/M, Proprietary, 1974' operating-systems.csv

Output:

OS, License, Year
Chrome OS, Proprietary, 2009
CP/M, Proprietary, 1974
FreeBSD, BSD, 1993
Linux, GPL, 1991
iOS, Proprietary, 2007
macOS, Proprietary, 2001
Windows NT, Proprietary, 1993
Android, Apache, 2008

Note that the FreeDOS line doesn't appear in the last output. This is because I didn't use the -i option nor did I redirect output to a new file. If we want to edit the file in-place, that is, save the edits, then the commands would look like so:

sed -i '3a FreeDOS, GPL, 1998' operating-systems.csv
sed -i '3i CP\/M, Proprietary, 1974' operating-systems.csv

Instead of using line numbers to specify addresses in a text file, we can use regular expressions as addresses, which may be simple words. In the following example, I use the regular expression 1991$ instead of specifying line 4. The regular expression 1991$ means lines ending with the string 1991. Then I use the s command to start a find and replace. sed finds the string Linux and then replaces that with the string GNU/Linux. I use the back slash to escape the forward slash in GNU/Linux:

Command:

sed '/1991$/s/Linux/GNU\/Linux/' operating-systems.csv

Output:

OS, License, Year
Chrome OS, Proprietary, 2009
FreeBSD, BSD, 1993
GNU/Linux, GPL, 1991
iOS, Proprietary, 2007
macOS, Proprietary, 2001
Windows NT, Proprietary, 1993
Android, Apache, 2008

Here's an example using sed to simply search for a pattern. In this example, I'm interested in searching for all operating systems that were released on or after 2000:

Command:

sed -n '/20/p' operating-systems.csv

Output:

Chrome OS, Proprietary, 2009
iOS, Proprietary, 2007
macOS, Proprietary, 2001
Android, Apache, 2008

The above would be equivalent to:

grep "20" operating-systems.csv

sed is much more powerful than what I've demonstrated here, and if you're interested in learning more, there are lots of tutorials on the web. Here are a few good ones:

Awk

awk is a complete scripting language designed for "pattern scanning and processing" text. It generally performs some action when it detects some pattern and is particularly suited for columns of structured data. See man awkfor documentation..

awk works on columns regardless if the contents include structured data (like a CSV file) or not (like a letter or essay). If the data is structured, then that means the data will be formatted in some way. In the last few sections, we have looked at a CSV file. This is structured data because the data points in this file are separated by commas.

For awk to work with columns in a file, it needs some way to refer to those columns. In the examples below, we'll see that columns in a text file are referred to by a dollar sign and then the number of the column $n. So, $1 indicates column one, $2 indicates column two, and so on. If we use $0, then we refer to the entire file. In our example text file, $1 indicates the OS Name column, $2 indicates the License column, $3 indicates the release Year column, and $0 indicates all columns.

The syntax for awk is a little different than what we've seen so far. Basically, awk uses the following syntax, where pattern is optional.

awk pattern { action statements }

Let's see some examples.

To print the first column of our file, we do not need the pattern part of the command but only need to state an action statement within curly braces. In the command below, the action statement is '{ print $1 }'.

Command:

awk '{ print $1 }' operating-systems.csv

Output:

OS,
Chrome
FreeBSD,
Linux,
iOS,
macOS,
Windows
Android,

By default, awk considers the first empty space as the field delimiter. That's why in the command above only the term Windows and Chrome appear in the results even though it should be Windows NT and Chrome OS. It's also why we see commas in the output. To fix this, we tell awk to use a comma as the field separator, instead of the default empty space. To specify that we want awk to treat the comma as a field delimiter, we use the -F option, and we surround the comma with single quotes:

Command:

awk -F',' '{ print $1 }' operating-systems.csv

Output:

OS
Chrome OS
FreeBSD
Linux
iOS
macOS
Windows NT
Android

By specifying the comma as the field separator, our results are more accurate, and the commas no longer appear either.

Like grep and sed, awk can do search. In this next example, I print the column containing the string Linux. Here I am using the pattern part of the command: '/Linux/'.

Command:

awk -F',' '/Linux/ { print $1 }' operating-systems.csv

Output:

Linux

Note how awk does not return the whole line but only the match.

With awk, we can retrieve more than one column, and we can use awk to generate reports. This was part of the original motivation to create this language.

In the next example, I select columns two and one in that order, which is something the cut command cannot do. I also add a space between the columns using the double quotes to surround an empty space. Then I modified the field delimiter to include both a comma and a space to get the output that I want:

Command:

awk -F', ' '{ print $2 " " $1 }' operating-systems.csv

Output:

License OS
Proprietary Chrome OS
BSD FreeBSD
GPL Linux
Proprietary iOS
Proprietary macOS
Proprietary Windows NT
Apache Android

I can make output more readable by adding text to print:

Command:

awk -F',' '{ print $1 " was released in" $3 "." }' operating-systems.csv

Output:

OS was released in Year.
Chrome OS was released in 2009.
FreeBSD was released in 1993.
Linux was released in 1991.
iOS was released in 2007.
macOS was released in 2001.
Windows NT was released in 1993.
Android was released in 2008.

Since awk is a full-fledged programming language, it understands data structures, which means it can do math or work on strings of text. Let's illustrate this by doing some math or logic on column 3.

Here I print all of column three:

Command:

awk -F',' '{ print $3 }' operating-systems.csv

Output:

Next I print only the parts of column three that are greater than 2005, and then pipe | the output through the sort command to sort the numbers in numeric order:

Command:

awk -F',' '$3 > 2005 { print $3 }' operating-systems.csv | sort

Output:

 2007
 2008
 2009

If I want to print only the parts of column one where column three equals to 2007, then I would run this command:

Command:

awk -F',' '$3 == 2007 { print $1 }' operating-systems.csv

Output:

iOS

If I want to print only the parts of columns one and three where column 3 equals 2007:

Command:

awk -F',' '$3 == 2007 { print $1 $3 }' operating-systems.csv

Output:

iOS 2007

Or, print the entire line where column three equals 2007:

Command:

awk -F',' '$3 == 2007 { print $0 }' operating-systems.csv

Output:

iOS, Proprietary, 2007

I can print only those lines where column three is greater than 2000 and less than 2008:

Command:

awk -F',' '$3 > 2000 && $3 < 2008 { print $0 }' operating-systems.csv

Output:

iOS, Proprietary, 2007
macOS, Proprietary, 2001

Even though we wouldn't normally sum years, let's print the sum of column three to demonstrate how summing works in awk:

Command:

awk -F',' 'sum += $3 { print sum }' operating-systems.csv

Output:

Here are a few basic string operations. First, print column one in upper case:

Command:

awk -F',' '{ print toupper($1) }' operating-systems.csv

Output:

OS
CHROME OS
FREEBSD
LINUX
IOS
MACOS
WINDOWS NT
ANDROID

Or print column on in lower case:

Command:

awk -F',' '{ print tolower($1) }' operating-systems.csv

Output:

os
chrome os
freebsd
linux
ios
macos
windows nt
android

Or, get the length of each string in column one:

Command:

awk -F',' '{ print length($1) }' operating-systems.csv

Output:

We can add additional logic. The double ampersands && indicate a Boolean/Logical AND. The exclamation point ! indicates a Boolean/Logical NOT. In the next example, I print only those lines where column three is greater than 1990, and the line has the string "BSD" in it:

Command:

awk -F',' '$3 > 1990 && /BSD/ { print $0 }' operating-systems.csv

Output:

FreeBSD, BSD, 1993

Now I reverse that, and print only those lines where column three is greater than 1990 and the line DOES NOT have the string "BSD" in it:

Command:

awk -F',' '$3 > 1990 && !/BSD/ { print $0 }' operating-systems.csv

Output:

Chrome OS, Proprietary, 2009
Linux, GPL, 1991
iOS, Proprietary, 2007
macOS, Proprietary, 2001
Windows NT, Proprietary, 1993
Android, Apache, 2008

The double vertical bar || indicates a Boolean/Logical OR. The next command prints only those lines that contain the string "Proprietary" or the string "Apache", or it would print both if both strings were in the text:

Command:

awk -F',' '/Proprietary/ || /Apache/ { print $0 }' operating-systems.csv

Output:

Chrome OS, Proprietary, 2009
iOS, Proprietary, 2007
macOS, Proprietary, 2001
Windows NT, Proprietary, 1993
Android, Apache, 2008

I can take advantage of regular expressions. If I needed to analyze a large file and wasn't sure that some fields would be upper or lower case, then I could use regular expressions to consider both possibilities. That is, by adding [pP] and [aA], awk will check for both the words Proprietary and proprietary, and Apache and apache.

Command:

awk -F',' '/[pP]roprietary/ || /[aA]pache/ { print $0 }' operating-systems.csv

Output:

Chrome OS, Proprietary, 2009
iOS, Proprietary, 2007
macOS, Proprietary, 2001
Windows NT, Proprietary, 1993
Android, Apache, 2008

awk is full-fledged programming language. It provides conditionals, control structures, variables, etc., and so I've only scratched the surface. If you're interested in learning more, then check out some of these tutorials:

Conclusion

The Linux command line offers a lot of utilities to examine data. Prior to this lesson, we covered a few of them that helped us get parts of a file and then pipe those parts through other commands or redirect output to files. We can use pipes and redirects with grep, sed, and awk. If needed, we may be able to avoid using the basic utilities like cut, wc, etc if want to learn more powerful programs like grep, sed, and awk.

It's fun to learn and practice these. Despite this, you do not have to become a sed or an awk programmer. Like the utilities that we've discussed in prior lectures, the power of programs like these is that they are easy to use as one-liners. If you want to get started, the resources listed above can guide you.

Regular Expressions with `grep`

By the end of this section, you will:

Understand the purpose of grep: Recognize the versatility of grep for searching through text and its use in filtering output, searching for patterns in files, and extracting relevant data.
Perform basic searches using grep: Search for multiword strings and whole words while understanding how to handle case sensitivity and word boundaries.
Utilize regular expressions: Apply regular expressions with grep to search for more complex text patterns, using features like bracket expressions, character classes, and anchoring.
Leverage repetition and OR operators: Use repetition operators (e.g., *, +) and Boolean OR searches to find repetitive patterns or multiple possible matches in your text.
Compare outputs with process substitution: Understand how to compare the output of multiple grep commands using process substitution techniques with the diff command.
Understand broader applications: Gain a foundational understanding of regular expressions that apply across multiple programming languages and tools beyond just grep.

Getting Started

The grep command is a powerful tool used in the Linux command line for searching through text. It scans files or input for lines that match a specified pattern, which can be a simple word or a more complex regular expression. grep is often used to filter output, search for specific data in logs, or find occurrences of certain text patterns within files. Its versatility makes it an essential tool for efficiently locating information in large sets of data or documents.

In this section, we learn how to use grep to search files. We will use simple search strings with grep to search for regular words. But we will use regular expressions to search for more complex patterns.

Download Data File

To follow along in this tutorial, download the following file to your home directory on your Google Cloud VM:

wget https://raw.githubusercontent.com/cseanburns/linux_sysadmin/refs/heads/master/data/cities.md

Multiword strings

It's good habit to include search strings within quotes, but this is especially important if we would search for multiword strings. In these cases, we must enclose them in quotes.

Command:

cat cities.md

Output:

| City              | 2020 Census | Founded |
|-------------------|-------------|---------|
| New York City, NY | 8804190     | 1624    |
| Los Angeles, CA   | 3898747     | 1781    |
| Chicago, IL       | 2746388     | 1780    |
| Houston, TX       | 2304580     | 1837    |
| Phoenix, AZ       | 1624569     | 1881    |
| Philadelphia, PA  | 1576251     | 1701    |
| San Antonio, TX   | 1451853     | 1718    |
| San Diego, CA     | 1381611     | 1769    |
| Dallas, TX        | 1288457     | 1856    |
| San Jose, CA      | 983489      | 1777    |

Command:

grep "San Antonio" cities.md

Output:

| San Antonio, TX | 1451853 | 1718 |

Whole words, case sensitive by default

As a reminder, grep commands are case-sensitive by default. Thus, note that the contents of cities.md are all in lowercase. If I run the above command without the city named capitalized, then grep will return nothing:

Command:

grep "san antonio" cities.md

To tell grep to ignore case, I need to use the -i option. We also want to make sure that we enclose our entire search string withing double quotes.

This is a reminder for you to run man grep and to read through the documentation and see what the various options exit for this command.

Command:

grep -i "san antonio" cities.md

Output:

| San Antonio, TX | 1451853 | 1718 |

Whole words by the edges

To search whole words, we can use special characters to match strings at the start and/or the end of words. For example, note the output if I search for cities in California in my file by searching for the string ca. Since this string appears in Chicago, then that city matches my grep search:

Command:

grep -i "ca" cities.md

Output:

| Los Angeles, CA | 3898747 | 1781 |
| Chicago, IL     | 2746388 | 1780 |
| San Diego, CA   | 1381611 | 1769 |
| San Jose, CA    | 983489  | 1777 |

To limit results to only CA, we can enclose our search in special characters that tell grep to limit by whole words only:

Command:

grep -i "\bca\b" cities.md

Output:

| Los Angeles, CA | 3898747 | 1781 |
| San Diego, CA   | 1381611 | 1769 |
| San Jose, CA    | 983489  | 1777 |

Note: in some cases you might need an extra backslash: grep -i "\\bca\\b" cities.md.

We can reverse that output and look for strings within other words. Here is an example of searching for the string ca within words:

Command:

grep -i "\Bca\B" cities.md

Output:

| Chicago, IL | 2746388 | 1780 |

Bracket Expressions and Character Classes

In conjunction with the grep command, we can also use regular expressions to search for more general patterns in text files. For example, we can use bracket expressions and character classes to search for patterns in the text. Here again using man grep is very important because it includes instructions on how to use these regular expressions.

Bracket expressions

From man grep on bracket expressions:

A bracket expression is a list of characters enclosed by [ and ]. It matches any single character in that list. If the first character of the list is the caret ^ then it matches any character not in the list. For example, the regular expression [0123456789] matches any single digit.

The regular expression [^0123456789] matches the inverse.

Within a bracket expression, a range expression consists of two characters separated by a hyphen. It matches any single character that sorts between the two characters.

To see how this works, let's search the cities.md file for letters matching A, B, or C. Specifically, in the following command I use a hyphen to match any characters in the range A, B, C. The output does not include the cities Houston or Dallas since neither of those lines contain capital A, B, or C characters:

Command:

grep "[A-C]" cities.md

Output:

| City              | 2020 Census | Founded |
| New York City, NY | 8804190     | 1624    |
| Los Angeles, CA   | 3898747     | 1781    |
| Chicago, IL       | 2746388     | 1780    |
| Phoenix, AZ       | 1624569     | 1881    |
| Philadelphia, PA  | 1576251     | 1701    |
| San Antonio, TX   | 1451853     | 1718    |
| San Diego, CA     | 1381611     | 1769    |
| San Jose, CA      | 983489      | 1777    |

Note: Use grep -i "[A-C]" cities.md for a case insensitive search.

Bracket expressions, inverse searches

When placed after the first bracket, the carat key acts as a Boolean NOT. The following command matches any characters not in the range A,B,C:

Command:

grep "[^A-C]" cities.md

The output matches all lines since there are no instances of A, B, and C in all lines:

Output:

| City              | 2020 Census | Founded |
|-------------------|-------------|---------|
| New York City, NY | 8804190     | 1624    |
| Los Angeles, CA   | 3898747     | 1781    |
| Chicago, IL       | 2746388     | 1780    |
| Houston, TX       | 2304580     | 1837    |
| Phoenix, AZ       | 1624569     | 1881    |
| Philadelphia, PA  | 1576251     | 1701    |
| San Antonio, TX   | 1451853     | 1718    |
| San Diego, CA     | 1381611     | 1769    |
| Dallas, TX        | 1288457     | 1856    |
| San Jose, CA      | 983489      | 1777    |

Process substitution

Process substitution allows you to use the output of a command as if it were a file. This is particularly useful when you want to compare the outputs of two commands directly, without having to save them to temporary files.

For example, we can confirm that output from one command does not include Houston or Dallas in a second command by comparing the outputs. Specifically, we compare the outputs of two or more commands using process substitution. This works because the process substitution creates temporary files from the outputs.

Command:

diff <(grep "[A-C]" cities.md) <(grep "[^A-C]" cities.md)

Output:

1a2
> |-----------------|-------------|------|
4a6
> | Houston, TX     | 2304580     | 1837 |
8a11
> Dallas, TX        | 1288457     | 1856

How It Works

<(command) creates a temporary file (or file-like stream) that holds the output of command.
diff can then read from these streams as if they were regular files, comparing their contents without needing you to manually save and load files. The output of the diff command is nicely explained in this Stack Overflow answer.

Without process substitution, you would need to save the outputs of both grep commands to temporary files and then compare them:

grep "[A-C]" cities.md > output1.txt
grep "[^A-C]" cities.md > output2.txt
diff output1.txt output2.txt

This alternative works but is more cumbersome, as it requires managing temporary files. Process substitution simplifies the process by handling this behind the scenes.

Try this command for an alternate output:

diff -y <(grep "[A-C]" cities.md) <(grep "[^A-C]" cities.md)

Our ranges may be alphabetical or numerical. The following command matches any numbers in the range 1,2,3:

Command:

grep "[1-3]" cities.md

Since all single digits appear in the file, the above command returns all lines. To invert the search, we can use the following grep command. This will match all non-integers:

Command:

grep "[^0-9]" cities.md

Bracket expressions, carat preceding the bracket

We saw in a previous section that the carat ^ key indicates the start of line. However, we learned above that it can be used to return the inverse of a string in special circumstances. To use the carat to signify the start of a line, the carat key must precede the opening bracket. For example, the following command matches any lines that start with the upper case letters within the range of N,O,P:

Command:

grep "^| [N-P]" cities.md

Output:

New York City, NY | 8804190 | 1624
Phoenix, AZ       | 1624569 | 1881
Philadelphia, PA  | 1576251 | 1701

And we can reverse that with the following command, which returns all lines that do not start with N,O, or P:

Command:

grep "^| [^N-P]" cities.md

Output:

| City            | 2020 Census | Founded |
| Los Angeles, CA | 3898747     | 1781    |
| Chicago, IL     | 2746388     | 1780    |
| Houston, TX     | 2304580     | 1837    |
| San Antonio, TX | 1451853     | 1718    |
| San Diego, CA   | 1381611     | 1769    |
| Dallas, TX      | 1288457     | 1856    |
| San Jose, CA    | 983489      | 1777    |

Character classes

Character classes are special types of predefined bracket expressions. They make it easy to search for general patterns. From man grep on character classes:

Finally, certain named classes of characters are predefined within bracket expressions, as follows. Their names are self explanatory, and they are [:alnum:], [:alpha:], [:blank:], [:cntrl:], [:digit:], [:graph:], [:lower:], [:print:], [:punct:], [:space:], [:upper:], and [:xdigit:]. For example, [[:alnum:]] means the character class of numbers and letters ...

Below I use the awk command to select the fourth column (or field) using the pipe as the field delimiter. I pipe the output to grep to select lines containing a vertical bar and four digit numbers [[:digit:]]{4} from the results of the awk command:

Command:

awk -F"|" '{ print $4 }' cities.md | grep -Eo "[[:digit:]]{4}"

Output:

I first tested that the awk command selects the appropriate field by running it by itself: awk -F"|" '{ print $4 }' cities.md.

Anchoring

As seen above, outside of bracket expressions and character classes, we use the caret ^ to mark the beginning of a line. We can also use the $ to match the end of a line. Using either (or both) is called anchoring. Anchoring works in many places. For example, to search all lines that start with capital D through L

Command:

grep "^| [D-L]" cities.md

Output:

| Los Angeles, CA | 3898747 | 1781 |
| Houston, TX     | 2304580 | 1837 |
| Dallas, TX      | 1288457 | 1856 |

To show how to anchor the end of a line, let's look at the operating-systems.csv file.

Command:

grep "1993$" operating-systems.csv

Output:

FreeBSD, BSD, 1993
Windows NT, Proprietary, 1993

We can use both anchors in our grep commands. The following searches for any lines starting with capital letters that range from C through F. Then any lines ending with the numbers starting from 3 through 6. The single dot stands for any character, and the asterisk stands for "the preceding character will zero or more times" (man grep).

Command:

grep "^[C-F].*[3-6]$" operating-systems.csv

Output:

CP/M, Proprietary, 1974
FreeBSD, BSD, 1993

Repetition

If we want to use regular expressions to identify repetitive patterns, then we can use repetition operators. As we saw above, the most useful one is the * asterisk. But there are other options:

In come cases, we need to add the -E option to extend grep's regular expression functionality:

Here, the preceding item S is matched one or more times:

Command:

grep -E "S+" cities.md

Output:

| San Antonio, TX | 1451853 | 1718 |
| San Diego, CA   | 1381611 | 1769 |
| San Jose, CA    | 983489  | 1777 |

In the next search, the preceding item l is matched exactly 2 times:

Command:

grep -E "l{2}" cities.md

Output:

| Dallas, TX | 1288457 | 1856 |

Finally, in this example, the preceding item 7 is matched at least two times or at most three times:

Command:

grep -E "7{2,3}" cities.md

Output:

| San Jose, CA | 983489 | 1777 |

OR searches

We can use the vertical bar | to do a Boolean OR search. In a Boolean OR statement, the statement is True if either one part is true, the other part is true, or both are true. In a search statement, this means that at least one part of the search is true.

The following will return lines for each city because they both appear in the file:

Command:

grep -E "San Antonio|Dallas" cities.md

Output:

| San Antonio, TX | 1451853 | 1718 |
| Dallas, TX      | 1288457 | 1856 |

The following will match San Antonio even though Lexington does not appear in the file:

Command:

grep -E "San Antonio|Lexington" cities.md

Output:

| San Antonio, TX | 1451853 | 1718 |

Conclusion

We covered a lot in this section on grep and regular expressions.

We specifically covered:

multiword strings
whole word searches and case sensitivity
bracket expressions and character classes
anchoring
repetition
Boolean OR searches

Even though we focused on grep, many these regular expressions work across many programming languages.

See Regular-Expression.info for more in-depth lessons on regular expressions.

Review

Here is a review of commands and concepts that we have covered so far.

Commands

We have covered the following commands so far:

Command	Example	Explanation
tree	tree -dfL 1	List directories, full path, one level
cd	cd ~	change to home directory
	cd /	change to root directory
	cd bin	change to bin directory from current directory
pwd	pwd	print working / current directory
ls	ls ~	list home directory contents
	ls -al	list long format and hidden files in current directory
	ls -dl	list long format the current directory
man	man ls	open manual page for the ls command
	man man	open manual page for the man command
cp	cp * bin/	copy all files in current directory to bin subdir
mv	mv oldname newname	rename file oldname to newname
	mv oldir bin/newdir	move oldman to bin subdir and rename to newdir
rm	rm oldfile	delete file named oldfile
	rm -r olddir	delete directory olddir and its contents
touch	touch newfile	create a file called newfile
	touch oldfile	modify timestamp of file called oldfile
mkdir	mkdir newdir	create a new directory called newdir
rmdir	rmdir newdir	delete directory called newdir if empty
echo	echo "hello"	print "hello" to screen
cat	cat data.csv	print contents of file called data.csv to screen
	cat data1.csv data2.csv	concatenate data1.csv and data2.csv to screen
less	less file	view contents of file called file
sudo	sudo command	run command as superuser
chown	sudo chown root:root file	change owner and group to root of file file
chmod	chmod 640 file	change permissions of file to -rw-r-----
	chmod 775 somedir	change permissions of of somedir to drwxrwxr-x
groups	groups user	print the groups the user is in
wc	wc -l file	print number of lines of file
	wc -w file	print number of words of file
head	head file	print top ten lines of file
	head -n3 file	print top three lines of file
tail	tail file	print bottom ten lines of file
	tail -n3 file	print bottom three lines of file
cut	cut -d"," -f2 data.csv	print second column of file data.csv
sort	sort -n file	sort file by numerical order
	sort -rn file	sort file by reverse numerical order
	sort -df file	sort file by dictionary order and ignore case
uniq	uniq file	report or omit repeated lines in sorted file
	uniq -c file	report count of duplicate lines in sorted file

In addition to the above commands, we also have pipelines using the |. Pipelines send the standard output of one command to a second command (or more). The following command sorts the contents of a file and then sends the output to the uniq command to remove duplicates:

sort file | uniq

Redirection uses the > or the >> to redirect output of a command to a file. A single > will overwrite the contents of a file. A double >> will append to the contents of a file.

Redirect the output of the ls command to a file called dirlist:

ls > dirlist

Append the date to the end of the file dirlist:

date >> dirlist

Paths

I introduced the concept of absolute and relative paths in section 2.3. In this session, the goal is to revisit paths (locations of files and directories in the filesystem), and provide some examples. This will be important as we proceed to Bash scripting and other tasks going forward.

Change Directories

The cd command is used to change directories. When we login to our systems, we will find ourselves in our $HOME directory, which is located at /home/USER.

To change to the root directory, type:

pwd
/home/sean
cd /
pwd
/

From there, to change to the /bin directory:

cd bin
pwd
/bin

To change to the previous working directory:

cd -
pwd
/

To go home quickly, just enter cd by itself:

cd
pwd
/home/sean

To change to the public_html directory:

cd public_html
pwd
/home/sean/public_html

To change to the directory one level up:

cd ..
pwd
cd /home/sean

Make Directories

Sometimes we'll want to create new directories. To do so, we use the mkdir command.

To make a new directory in our $HOME directory:

pwd
/home/sean
mkdir documents
cd documents
pwd
/home/sean/documents
cd
pwd
/home/sean

To make more than one directory at the same time, where the second or additional directories are nested, use the -p option:

mkdir -p photos/2022

Remove or Delete Files and Directories

To remove a file, we use the rm command. If the file is in a subdirectory, specify the relative path:

pwd
/home/sean
rm public_html/index.html

To remove a file in a directory one level up, use the .. notation. For example, if I'm in my documents directory, and I want to delete a file in my home (parent) directory:

cd documents
pwd
/home/sean/documents
rm ../file.txt

Alternatively, I could the tilde as shorthand for $HOME:

rm ~/file.txt

To remove a file nested in multiple subdirectories, just specify the path (absolute or relative).

rm photos/2022/05/22/IMG_2022_05_22.jpg

Remember that the rm command deletes files and directories. Use it with caution, or with the -i option.

Copy Files or Directories

Let's say I want to copy a file in my $HOME directory to a nested directory:

cp file.txt documents/ICT418/homework/

Or, we can copy a file from one subdirectory to another. Here I copy a file in my ~/bin directory to my ~/documentsdirectory. The ~ (tilde) is shorthand for my $HOME directory.

cp ~/bin/file.txt ~/documents/``

Move or Rename Files or Directories

Let's say I downloaded a file to my ~/Downloads directory, and I want to move it to my ~/documents directory:

mv ~/Downloads/article.pdf ~/documents/

Or, let's say we rename it in the process:

mv ~/Downloads/article.pdf ~/documents/article-2022.pdf

We can also move directories. Since the commandline is case-sensitive, let's say I rename the documents directory to Documents:

mv ~/documents ~/Documents

Conclusion

Use this page as a reference to the commands that we have covered so far.

Scripting the Command Line

Up to this point, we've explored a variety of commands available on the Linux command line. We've learned how to work with files and directories, manage permissions and ownership, and process text in multiple ways.

While these tasks have focused on using the command line prompt, this approach is often temporary and manual. For more complex tasks, or when we want to automate a series of commands, it becomes useful to save them in a file. To achieve this, we need to work with a text editor. Since we're primarily operating in the terminal, we'll focus on terminal-based text editors, which offer several practical options.

In this section, we'll explore some of the most popular terminal-based text editors. We'll start with a historical perspective, beginning with the classic ed editor. From there, we'll move on to more widely-used editors like vim and nano. While I personally prefer vim for its flexibility and power, nano tends to be more accessible for beginners. We'll also touch on a couple of newer, user-friendly editors: micro and tilde.

After we've covered these editors, we'll dive into more advanced text processing techniques using regular expressions. Regular expressions allow us to identify and manipulate patterns in text. Although they're useful directly at the command line, they truly shine when integrated into scripts for automation and complex processing.

With these tools in hand, we'll then move on to creating bash scripts, combining the commands we've learned into efficient, automated workflows.

Text editors

In this section, we will cover and:

Understand the role of text editors: Recognize the importance of text editors in the command-line environment for saving commands, writing scripts, and editing configuration files.
Differentiate between text editors: Identify key differences between line editors like ed and visual editors like vim, nano, micro, and tilde.
Operate a line editor: Use basic commands in ed, such as addressing lines, manipulating text, and editing files.
Explore the functionality of vim: Understand the basic modal operation of vim, including command mode, insert mode, and how to integrate shell commands within the editor.
Utilize beginner-friendly editors: Comfortably use nano, micro, tilde, and edit for straightforward text editing tasks, taking advantage of their user-friendly key bindings and interfaces.
Appreciate historical context: Recognize the historical significance of ed and vim and how their development has influenced modern computing practices.

Getting Started

Working on the command line means writing a lot of commands. There will be times when we want to save some of the commands that we write in order to re-use them later. Or, we might want to develop the commands into a script (i.e., a program) because we might want to automate a process. The shell is great for writing one off commands, so-called one-liners, but it's not a great place to write multi-line or very long commands. Therefore it can be helpful to write and save our commands in a text editor.

In this lesson, we'll learn about several text editors: ed, vim, nano, micro, and tilde. We cover ed primarily for its historical importance, but it's a perfectly fine editor that I use all the time. vim is a more powerful and quite flexible editor that is widely used today, but it has a high learning curve. Even after using it for nearly 25 years, I'm still learning things about it. We could have a whole course on vim, but given the little time we have together, I will encourage you to use nano, micro, tilde, or edit.

I want you to know something about ed and vim because these are historically important editors. I use ed almost often, and vim is my everyday editor. I'm even writing this in vim! Vim is hugely popular, under active development. If you want to use Vim, I'd encourage you to do so. Once you've acquired skill with it, working with text is a joyful experience. But I don't teach it because would take too much time and practice to learn it.

Another thing to keep in mind is that the shell that we are working with is called bash, and bash is a full-fledged programming language. That means that when we write a simple command, like cd public_html, we are programming. It makes sense that the more programming that we do, the better we'll get at it. This requires more sophisticated environments to help manage our programs than the command line prompt can provide. Text editors fulfill that role.

As we learn more about how to do systems administration with Linux, we will need to edit configuration files, too. Most configuration files exist in the /etc directory. For example, later in the semester we will install the Apache Web Server, and we will need to edit Apache's configuration files in the process. We could do this using some of the tools that we've already covered, like sed and awk, but it'll make our lives much easier to use a text editor.

In any case, in order to save our commands or edit text files, a text editor is very helpful. Programmers use text editors to write programs, but programmers often work in graphical user environments, so they often use GUI text editors or IDEs. As systems administrators, it would be unusual to have a graphical user interface installed on a server. The servers that we manage will contain limited or specific software that serves the server's main purpose. Additional software on a server that is not relevant to the main function of a server only takes up extra disk space, consumes valuable computing resources, and poses an additional security footprint.

As stated, although ed and vim are difficult to learn, they are very powerful editors. I believe they are both worth learning; however, for the purposes of this course, it's more important that you are aware of them. If you wish to learn more, there are lots of additional tutorials on the web on how to use these fine, esteemed text editors.

`ed`

ed is a line editor that is installed by default on many Linux distributions. Ken Thompson created ed in the late 1960s to write the original Unix operating system. It was used without computer monitors because those were still uncommon, and instead for teletypewriters (TTYs) and printers. The lack of a visual display, like a monitor, is the reason that ed was written as a line editor. If you visit that second link, you will see the terminal interface from those earlier days. It is the same basic interface you are using now when you use your terminal applications, which are virtualised versions of those old teletypewriters. I think this is a testament of the power of the terminal: that advanced computer users still use the same basic technology today.

In practice, when we use a line editor like ed, the main process of entering text is like any other editor. The big difference is when we need to manipulate text. In a graphical text editor, if we want to delete a word or edit some text, we might backspace over the text or highlight a word and delete it. In a line editor, we manipulate text by referring to lines or across multiple lines and then run commands on the text in those line(s). This is process we followed when we covered grep, sed, and awk, and especially sed, and it should not surprise you that these are related to ed.

To operationalize this, like in sed, each line has an address. The address for line 7 is 7, and so forth. Line editors like ed are command driven. There is no menu to select from at the top of the window. In fact, when we used ed to open an existing file, the text in the file isn't even printed on the screen. If a user wants to delete a word, or print (to screen) some text, the user has to command the line editor to print the relevant line. The do this by specifying the line's address and issuing a command to delete the word on that line, or print the line. Line editors also work on ranges of line, including all the lines in the file, just like sed does.

Many of the commands that ed uses are also used by sed, since sed is based on ed. The following table compares commands between these two programs:

Command	`sed`	`ed`
append text	`a`	`a`
replace text	`c`	`c`
delete text	`d`	`d`
insert text	`i`	`i`
print text	`p`	`p`
substitute text	`s`	`s`
print w/ line #	`=`	`n`

However, there are big differences that mainly relate to the fact that ed is a text editor and sed is not (really). For example, here are some commands that mostly make sense in ed as a text editor. sed can do some of these tasks, where it makes sense (e.g., we don't quit sed), but sometimes in a non-trivial way.

Command	`ed` only
edit file	`e`
join lines	`j`
copies lines	`t`
moves lines	`m`
undo	`u`
saves file	`w`
quits `ed` before saving	`q`
Quits `ed` w/o saving	`Q`

There are other differences, but these are sufficient for our purposes.

Let's see how to use ed to open a file, and print the content with and without line numbers.

ed operating-systems.csv
183
1,$p
OS, License, Year
Chrome OS, Proprietary, 2009
FreeBSD, BSD, 1993
Linux, GPL, 1991
iOS, Proprietary, 2007
macOS, Proprietary, 2001
Windows NT, Proprietary, 1993
Android, Apache, 2008
1,$n
1 OS, License, Year
2 Chrome OS, Proprietary, 2009
3	FreeBSD, BSD, 1993
4	Linux, GPL, 1991
5	iOS, Proprietary, 2007
6	macOS, Proprietary, 2001
7	Windows NT, Proprietary, 1993
8	Android, Apache, 2008

Using ed, another way to remove the header line of the operating-systems.csv file is to specify the line number (1) and then the delete command (d), just like in sed. This becomes a permanent change if I save the file with the w (write) command:

1d
1,$p
Chrome OS, Proprietary, 2009
FreeBSD, BSD, 1993
Linux, GPL, 1991
iOS, Proprietary, 2007
macOS, Proprietary, 2001
Windows NT, Proprietary, 1993
Android, Apache, 2008

To refer to line ranges, I add a comma between addresses. Therefore, to delete lines 1, 2, and 3, and then quit without saving:

1,3d
,p
iOS, Proprietary, 2007
macOS, Proprietary, 2001
Windows NT, Proprietary, 1993
Android, Apache, 2008
Q

Note that with sed, in order to make a change in-place, we need to use the -i option. But with ed, we save changes with the w command.

I can use ed to find and replace strings. The syntax is the same as it is in sed. I'll start with a fresh version of the file:

ed operating-systems.csv
183
1,$s/Linux/GNU\/Linux/

If we want to add new rows to the file, we can append a or insert i text after or at specific lines. To append text after line 3, use a. We enter a period on a newline to leave input mode and return to command mode:

3a
FreeDOS, GPL, 1998
.

Because we enter input mode when using the a, i, or c commands, we enter a period . on a line by itself to revert to command mode.

To insert at line 2, use i:

2i
CP/M, Proprietary, 1974
.

Like sed, we can also find and replace using regular expressions instead of line numbers. I start a new ed session to reload the file to start fresh:

ed operating-systems.csv
183
/Linux/s/Linux/GNU\/Linux/

Of course, ed can be used to write and not simply edit files. Let's start fresh. In the following session, I'll start ed, enter append mode a, write a short letter, exit append mode ., name the file f, write w (save) the file, and quit q:

ed
a
Dear Students,

I hope you find this really interesting.
Feel free to practice and play on the command line,
as well as use tools like ed, the standard editor.

Sincerely,
Dr. Burns
.
f letter.txt
w
q

It's good to know something about ed for historical reasons. But the line editing technology developed for it is still in use today, and is a basic part of the vim text editor.

`vim`

The vim text editor is an improved version of the vi text editor and is in fact called Vi IMproved. The original vi text editor is usually available via the nvi editor these days. nvi is a rewrite of the original. vim is a visual editor. It is multi-modal like ed and is a direct descendant through vi. Due to this genealogy, vim uses many of the same commands as ed does when vim is in command mode. Like ed, we can start vim at the Bash prompt with or without a file name. Here I open the letter.txt file with vim. The default mode is command mode:

vim letter.txt
Dear Students,

I hope you find this really interesting.
Feel free to practice and play on the command line,
as well as use tools like ed, the standard editor.

Sincerely,
Dr. Burns

To enter insert mode, I can type i or a for insert or append mode. There isn't any difference on an empty file, but on a file that has text, i will start insert mode before the cursor position, and a will insert mode after the cursor position. Once in insert mode, you can type text as you normally would and use the arrow keys to navigate around the file.

To return to command mode in vim, you press the Esc key. And then you can enter commands like you would with ed, using the same syntax.

Unlike ed, when in command mode, the commands we type are not placed wherever the cursor is, but at the bottom of the screen. Let's first turn on line numbers to know which address is which, and then we'll replace ed with Ed. Note that I precede these commands with a colon:

:set number
:5s/ed/Ed/

One of the more powerful things about both ed and vim is that I can call Bash shell commands from the editors. Let's say that I wanted to add the date to my letter file. To do that, Linux has a command called date that will return today's date and time. To call the date command within Vim and insert the output into the file: I press Esc to enter command mode (if I'm not already in it), enter a colon, type r for the read into buffer command, then enter the shell escape command, which is an exclamation point !, and then the Bash shell date command:

:r !date
Dear Students,

I hope you find this really interesting.
Feel free to practice and play on the command line,
as well as use tools like ed, the standard editor.
Thu Jun 30 02:44:08 PM EDT 2022

Sincerely,
Dr. Burns

Since the last edit I made was to replace ed with Ed, vim entered the date after that line, which is line 6. To move that date line to the top of the letter, I can use the move m command and move it to line 0, which is the top of the file:

:6m0
Thu Jun 30 02:44:30 PM EDT 2022
Dear Students,

I hope you find this really interesting.
Feel free to practice and play on the command line,
as well as use tools like Ed, the standard editor.

Sincerely,
Dr. Burns

You can use the arrow keys and Page Up/Page Down keys to navigate in vim and vi, But by far the most excellent thing about this editor is to be able to use the j,k,l,h keys to navigate around a file:

j moves down line by line
k moves up line by line
l moves right letter by letter
h moves left letter by letter

Like the other commands, you can precede this with addresses. To move 2 lines down, you type 2j, and so forth. vi and vim have had a powerful impact on software development that programmers have built these keystrokes into applications like Gmail, Facebook, and more.

To save the file and exit vim, return to command mode by pressing the Esc key, and then write and quit:

:wq

The above barely scratches the surface. There are whole books on these editors as well as websites, videos, etc that explore them, and especially vim in more detail. But now that you have some familiarity with them, you might find this funny: Ed, man! !man ed.

`nano`

The nano text editor is the user-friendliest of these text editors but still requires some adjustment as a new commandline user. The friendliest thing about nano is that it is modeless. You're already accustomed to using modeless editors in GUI apps. It simply means that you can add and manipulate text without changing to insert or command mode. It is also friendly because it uses control keys to perform its operations. The tricky part is that the control keys are assigned to different keystroke combinations. For example, instead of Ctrl-c or Cmd-c to copy, in nano you press the M-6 key (press Alt, Cmd, or Esc key and 6) to copy. Then to paste, you press Ctrl-u instead of the more common Ctrl-v. Fortunately, nano lists the shortcuts at the bottom of the screen.

The shortcuts listed need some explanation, though. The carat mark is shorthand for the keyboard's Control (Ctrl) key. Therefore to Save As a file, we write out the file by pressing Ctrl-o. The M- key is also important, and depending on your keyboard configuration, it may correspond to your Alt, Cmd, or Esc keys. To search for text, you press ^W, If your goal is to copy, then press M-6 to copy a line. Move to where you want to paste the text, and press Ctrl-u to paste.

For the purposes of this class, that's all you really need to know about nano. Use it and get comfortable writing in it. Some quick tips:

nano file.txt will open and display the file named file.txt.
nano by itself will open to an empty page.
Save a file by pressing Ctrl-o.
Quit and save by pressing Ctrl-x.
Be sure to follow the prompts at the bottom of the screen.

`micro` and `tilde`

nano is usually installed by default on many Linux distributions, which is why I cover it here. However, if you want to use a more modern terminal editor, then I suggest micro or tilde.

To install the first two, I'll run the following command:

sudo apt install micro tilde

Then we launch them with or without file names. To launch micro:

micro

To launch tilde:

tilde

The micro text editor uses standard key combinations like Ctrl-S to save, Ctrl-O to open, Ctrl-Q to quit. The tilde text editor also uses the standard key combinations, but it also has a menu bar. To access the menu bar, you press the Alt key plus the first letter of the menu bar option. For example, Alt-f opens the File Menu, etc.

`edit`

The edit text editor is a very new, open source editor from Microsoft. It is built for use on Windows but can be installed on Linux. To install, visit the program's [GitHub releases][ms_edit_releases] link and copy the URL for the file named: edit-[version]-x86-64-linux-gnu.tar.zst, where [version] equals the most recent version. Then we'll use wget to download the source file to our servers. For example, to download the most recent version, which at the time of writing this is 1.2.0:

wget https://github.com/microsoft/edit/releases/download/v1.2.0/edit-1.2.0-x86_64-linux-gnu.tar.zst

The file is compressed. To decompress it, we use the following tar command:

tar --use-compress-program=unzstd -xvf edit-1.2.0-x86_64-linux-gnu.tar.zst

This will output a file called edit. Move this file to an executable $PATH, such as /usr/local/bin:

sudo mv edit /usr/local/bin

Once moved, you can edit like you use the other editors. Note that the GitHub page for edit provides instructions for installing this editor on Windows, in case you're interested in doing so.

Conclusion

In prior lessons, we learned how to use the bash command prompt and how to view, manipulate, and edit files from that shell. In this lesson, we learned how to use several command line text editors. Editors allow us to save our commands, create scripts, and in the future, edit configuration files.

The commands we used in this lesson include:

ed : line-oriented text editor
vim : Vi IMproved, a programmer's text editor
nano : Nano's ANOther editor, inspired by Pico
micro: A modern and intuitive terminal-based text editor
tilde: The Tilde Text Editor
edit: Microsoft text editor

Bash Scripting

By the end of this section, you will:

Understand Bash as both a command and scripting language: Recognize the dual functionality of Bash, allowing you to automate tasks and manage scripts efficiently within a Linux environment.
Work with variables and arrays in Bash: Learn to declare and use variables, apply command substitution, and manage arrays for more complex scripting tasks.
Apply conditional expressions for decision-making: Use conditional operators such as &&, ||, and if; then; else statements to control the flow of your scripts based on conditions and outcomes.
Implement loops to automate repetitive tasks: Utilize looping structures, such as for, to automate actions that need to be repeated under certain conditions or across arrays.
Write and execute Bash scripts with the correct structure: Include essential elements like the shebang (#!/usr/bin/env bash) at the start of your scripts, ensuring portability and clarity in execution.
Test conditions in Bash scripts: Understand how to test for specific conditions in scripts, such as file existence or the comparison of variables, to build more reliable and functional scripts.
Validate and improve Bash scripts: Learn how to use tools like shellcheck to check for errors in your Bash scripts and ensure adherence to best practices through style guides.

Getting Started

It's time to get started on Bash scripting. So far, we've been working on the Linux commandline. Specifically, we have been working in the Bash shell. Wikipedia refers to Bash as a command language. This means that Bash is used as a commandline language but also as a scripting language. The main purpose of Bash is to write small applications/scripts that analyze text (e.g., log files) and automate jobs. However, it can be used for a variety of other purposes.

Variables

One of the most important abilities of any programming or scripting language is to be able to declare a variable. Variables enable us to attach some value to a name. That value may be temporary, and it's used to pass information to other parts of a program.

In Bash, we declare a variable with the name of the variable, an equal sign, and then the value of the variable within double quotes. Do not insert spaces between the variable and assignment. In the following code snippet, which can be entered on the commandline, I create a variable named NAME and assign it the value Sean. I create another variable named BACKUP and assign it the value /media. Then I use the echo and cd commands to test the variables:

NAME="Sean"
BACKUP="/media"
echo "My name is ${NAME}"
echo "${BACKUP}"
cd "${BACKUP}"
pwd
cd

Variables may include values that may change given some context. For example, if we want a variable to refer to today's day of week, we can use command substitution. This "allows the output of a command to replace the command name" (see man bash). In the following, I use the date +%A command to assign the current day of the week to the variable named TODAY. The output at the time this variable is set will differ if it is set on a different day.

TODAY="$(date +%A)"
echo "${TODAY}"

By default, variables in Bash are global. If you are working within functions and want a variable to only be available within the function, you can declare it as local using local var_name=value.

Curly braces are not strictly necessary when calling a Bash variable, but they offer benefits when we start to use things like array variables. See:

For example, let's look at basic brace expansion, which can be used to generate arbitrary strings:

echo {1..5}
echo {5..1}
echo {a..l}
echo {l..a}

Another example: using brace notation, we can generate multiple sub-directories at once. Start off in your home directory, and:

mkdir -p homework/{drafts,notes}
cd homework
ls

But more than that, they allow us to deal with arrays (or lists). Here I create a variable named seasons, which holds an array, or multiple values: winter spring summer fall. Bash lets me access parts of that array. In the following the [@] refers to the entire array and the [n] refers to subscript in the array.

seasons=(winter spring summer fall)
echo "${seasons[@]}"
echo "${seasons[1]}"
echo "${seasons[2]}"
echo "${seasons[-1]}"

See Parameter expansions for more advanced techniques.

Conditional Expressions

Whether working on the commandline, or writing scripts in a text editor, it's sometimes useful to be able to write multiple commands on one line. There are several ways to do that. We can include a list of commands on one line in Bash where each command is separated by a semicolon. In the following example, the cd command will run and then the ls -lt command will run.

cd ; ls -lt

We can also use conditional expressions and apply logic with && (Logical AND) or || (Logical OR).

Here, command2 is executed if and only if command1 is successful:

command1 && command2

Here, command2 is executed if and only if command1 fails:

command1 || command2

In essence, && and || are short-circuit operators. This means that if the first command in command1 && command2 fails, command2 will not be executed. Conversely, if the first command in command1 || command2 succeeds, command2 will be executed.

In the example below, lines starting with a # indicate a comment that is not evaluated by bash:

# if documents/ does not exist, then the echo statement will not run
cd documents && echo "success" 
# if documents/ does not exist, then the echo statement will run
cd documents || echo "failed"

We can combine these operators:

cd test && pwd || echo "no such directory"
mkdir test
cd test && pwd || echo "no such directory"

Shebang or Hashbang

When we start to write scripts, the first thing we add is a shebang at line one. The {she,hash}bang tells the shell what program needs to run. We can do declare it a couple of ways. First, we can use the path to env, which runs the program in a modified environment that is named after env. In the following {she,hash}bang, we declare that modified environment to be the bash shell:

#!/usr/bin/env bash

If we were writing a Python script, then we could declare it to be: #!/usr/bin/env python3.

The above is more portable, but alternatively, you could put the direct path to Bash:

#!/usr/bin/bash

On POSIX compliant systems, the env program should always be located at /usr/bin/env. However, even on POSIX compliant systems, bash may be located in different paths. On some Linux distributions, it's located at /usr/bin/bash. On others, it may be located at /bin/bash. On BSD OSes, like FreeBSD, bash might be installed at /usr/local/bin/bash. Thus, by using the #!/usr/bin/env bash shebang, you help ensure that your bash program is portable across different OSes.

Even for small scripts, it's helpful to follow a consistent style. A well-written script is easier to maintain and understand. Consider checking out style guides early on, like the Google Shell Style Guide. This will help ensure your scripts remain clean and readable.

Looping

Looping is a common way to repeat an instruction until some specified condition is met. There are several looping methods Bash that include: : for, while, until, and select. The for loop is often the most useful. In the following toy looping example, we instruct bash to assign the letter i to the sequence 1,2,3,4,5. Each time it assigns i to those numbers, it echos them to standard output:

for i in {1..5} ; do
  echo "${i}"
done

Note that I take advantage of brace expansion in the above for loop.

You might notice in the echo statement above that I use ${i} instead of $i. While the latter is possible, using ${i} ensures proper variable expansion, particularly when using loops within strings or complex expressions. For example, ${} ensures that i is correctly interpreted as a variable and not part of a larger string.

Using the above for loop, we can create a rudimentary timer by calling the sleep command to pause after each count:

for i in {5..1} ; do
  echo "${i}" && sleep 1
done ; echo "BLAST OFF!"

Note that I take advantage of brace expansion again, but this time reversing the ordering, as well as conditional execution.

The sleep command is particularly useful in automation tasks where you want to pause execution between steps, such as monitoring scripts, where you might poll a resource at intervals, or in timed alerts.

We can loop through the variable arrays, too. In the following for loop, I loop through the seasons variable first introduced above:

#!/usr/bin/env bash

seasons=(winter spring summer fall)
for i in "${seasons[@]}" ; do
  echo "I hope you have a nice ${i}."
done

Note that I added the {she,hash}bang in the above example. I do this to make it clear that this is the kind of for loop that I would want to write in a text editor.

Testing

Sometimes we will want to test certain conditions. There are two parts to this, we can use if; then ; else commands, and we can also use the double square brackets: [[. There are a few ways to get documentation on these functions. See the following:

man test
help test
help [
help [[
help if

Between [ and [[, I generally prefer to use the [[ syntax, as demonstrated below. It's less error-prone and allows for more complex conditional checks. However, [[ is specific to bash and thus slightly less portable than [.

We can test integers:

if [[ 5 -ge 3 ]] ; then
  echo "true"
else
  echo "false"
fi

Reverse it to return the else statement:

if [[ 3 -ge 5 ]] ; then
  echo "true"
else
  echo "false"
fi

We can test strings. Run the command nano amihome.sh and type the script below into the file. Save and exit. Change the file's permissions: chmod 766 amihome.sh. Then move the file to /usr/local/bin with the following command: sudo mv amihome.sh /usr/local/bin

#!/usr/bin/env bash

if [[ "$HOME" = "$PWD" ]] ; then
 echo "You are home."
else
 echo "You are not home, but I will take you there."
 cd "$HOME" || exit
 echo "You are now $PWD."
 pwd
fi

Now you can run the file by typing at the command prompt: amihome.sh.

IMPORTANT: Running the above commands in a script won't result in changing your directory outside the script to your home directory. This is because of what Bash calls subshells. Subshells are a forked processes. So the script will do things in those other directories, but once the script exits, you will remain in the directory where you ran the script. If you want to execute a script in the current shell so that changes like cd persist after the script runs, you can use the source or . command to run the script. For example:

source script.sh

We can test file conditions. Let's first create a file called paper.txt and a file called paper.bak. We will add some trivial content to paper.txt but not to the paper.bak. The following if statement will test if paper.txt has a more recent modification date. If so, it'll back up the file with the cp and echo back its success:

if [[ "$HOME/paper.txt" -nt "$HOME/paper.bak" ]] ; then
  cp "$HOME/paper.txt" "$HOME/paper.bak" && echo "Paper is backed up."
fi

Here's a script that prints info depending on which day of the week it is. Let's save it to in a text file and call it schedule.sh:

#!/usr/bin/env bash

day1="Tue"
day2="Thu"
day3="$(date +%a)"

if [[ "$day3" = "$day1" ]] ; then
  printf "\nIf %s is %s, then class is at 9:30am.\n" "$day3" "$day1"
elif [[ "$day3" = "$day2" ]] ; then
  printf "\nIf %s is %s, then class is at 9:30am.\n" "$day3" "$day2"
else
  printf "\nThere is no class today."
fi

Finally, you can check your shell scripts using the shellcheck shell script analysis tool. First you will need to install it:

sudo apt -y install shellcheck

Then use it on shell script files you create. For example, let's say I have a script in a file named backup.sh, I can use the shellcheck command to find any errors:

shellcheck backup.sh

If there are errors, shellcheck will tell you what they are and provide a link to documentation on the error.

If you become seriously interested in bash scripting, then you should check out the various style guides that exist. For example, see the Shell Style Guide that was authored by coders at Google.

Resources

I encourage you to explore some useful guides and cheat sheets on Bash scripting:

Conclusion

In this lecture, we've covered the basics of Bash scripting, including working with variables, loops, and conditionals. These tools form the foundation for automating tasks and creating powerful scripts. Continue practicing by writing small scripts for your own workflow, and explore the resources and style guides provided to deepen your understanding.

Summary

In this demo, we learned about:

creating and referring to variables
conditional expressions with && and ||
adding the shebang or hashbang at the beginning of a script
looping with the for statement
testing with the if statement

Managing the System

Now that we have the basics of the command line interface down, it's time to learn some systems administration. In this section, we learn how to:

expand storage space
create new user and group accounts and manage those accounts
install and remove software, and
manage that software and other processes.

Expanding Storage

By the end of this section, you will be able to:

Understand Virtual Machine Storage Expansion: Gain practical knowledge on how to add additional storage to a virtual machine (VM) by creating and attaching a persistent disk.
Disk Formatting and Mounting: Learn the process of formatting a non-boot disk with the ext4 filesystem and mounting it to a VM for expanded storage capacity.
Filesystem Configuration: Develop skills in configuring a VM's filesystem for automatic mounting using the /etc/fstab file to ensure persistence across reboots.
Efficient Resource Management: Understand how to delete unnecessary disks and manage cloud storage costs, ensuring efficient use of resources.
Command Line Proficiency: Improve proficiency with essential command line tools such as mkfs.ext4, mount, umount, chmod, and editing configuration files with nano.
Storage Pricing Awareness: Gain insight into the cost considerations of adding additional storage in cloud environments, with practical examples of pricing.

Getting Started

I'm sure all or most of you have needed extra disk storage at some point (USB drives, optical disks, floppies???). Such needs are no different for systems administrators, who often are responsible for managing, monitoring, or storing large amounts of data.

The disk that we created for our VM is small (10 GB), and that's fine for our needs, albeit quite small in many real world scenarios. To address this, we can add a persistent disk that is much larger. In this section, we will add a disk to our VM, mount it onto the VM's filesystem, and format it. Extra storage does incur extra cost. So at the end of this section, I will show you how to delete the extra disk to avoid that if you want.

We will essentially follow the Google Cloud tutorial to add a non-boot disk to our VM, but with some modification:

Add a persistent disk to your VM

Note: the main disk used by our VM is the boot disk. The boot disk contains the software required to boot the system. All of our computers (desktops, laptops, tablets, phones, etc.), regardless of which operating system they run, have some kind of boot system.

Creating a Disk

In the Google Cloud console, first make sure you are working in your course project. Then navigate to Compute Engine, and visit the Disks page in the Storage section.

Once there, follow these steps:

Click on CREATE DISK.
Under Name, add a preferred name.
- For example, you can name it backup1.
Under Description, add text to describe your disk.
- For example, you can write Backup.
Under Location, leave or choose Single zone.
- We are not concerned about data safety. If we were, then we would select other options here.
Select the same Region and Zone as your VM instance.
Under Source, select Blank disk.
Under Disk settings, select Balanced persistent disk.
Under Size, change this to 10GB.
- You can actually choose larger sizes, but be aware that disk pricing is $0.10 per GB.
- At that cost, 100 GB = $10 / month.
Click on Enable snapshot schedule.
Under Encryption, make sure Google-managed encryption key is selected.
Click Create to create your disk.

Adding the Disk to our VM

Now that we have created our disk, we need to mount it onto our filesystem so that it's available to our VM. Conceptually, this process is like inserting a new USB drive into our computer.

To add the new disk to our VM, follow these steps:

Visit the VM instances page.
Click on the check box next to your virtual machine.
- That will convert the Name of your VM into a hyperlink if it's not already hyperlinked.
Click on the Name of your VM.
- That will take you to the VM instance details page.
Click on the Edit button at the top of the details page.
Under the Additional disks section, click on + ATTACH EXISTING DISK.
A panel will open on the right side of your browser.
Click on the drop down box and select the disk, by name, you created.
Leave the defaults as-is.
Click on the SAVE button.
Then click on the SAVE button on the details page.

If you return to the Disks page in the Storage section, you will now see that the new disk is in use by our VM.

Formatting and Mounting a Non-Boot Disk

Formatting Our Disk

In order for our VM to make use of the extra storage, the new drive must be formatted and mounted. Different operating systems use different filesystem formats. You may already know that macOS uses the Apple File System (APFS) by default and that Windows uses the New Technology File System (NTFS). Linux is no different, but uses different file systems than macOS and Windows. There are many formatting technologies that we can use in Linux, but we'll use the ext4 (fourth extended filesystem) format. This is recommended for Google Cloud, and it is also a stable and common one for Linux.

In this section, we will closely follow the steps outlined under the Formatting and mounting a non-boot disk on a Linux VM section. I replicate those instructions below, but I highly encourage you to read through the instructions on Google Cloud and here:

Use the gcloud compute ssh command that you have previously used to connect to your VM.
When you have connected to your VM, run the lsblk command.
- Ignore the loop devices. (Or grep invert them: lsblk | grep -v "^loop").
- Instead, you should see sda and sdb under the NAME column outputted by the lsblk command.
- sda represents your main disk.
  - sda1, sda14, sda15 represent the partitions of the sda disk (the sdaN may be different for you).
  - Notice the MOUNTPOINT for sda1 is /. This means that sda1 is mounted at the root level of the filesystem.
  - Since it's mounted, this means it's in use.
- sdb represents the attached disk we just added.
  - After we format this drive, there will be an sdb1, which signifies the drive has been partitioned.
  - After formatting, we will mount this partition on a different mountpoint.

To format our disk for the ext4 filesystem, we will use the mkfs.ext4 (see man mkfs.ext4 for details). The instructions tell us to run the following command (please read the Google Cloud instructions closely; it's important to understand these commands as much as possible and not just copy and paste them):

sudo mkfs.ext4 -m 0 -E lazy_itable_init=0,lazy_journal_init=0,discard /dev/DEVICE_NAME

But replace DEVICE_NAME with the name of our device. My device's name is sdb, which we saw with the output of the lsblk command; therefore, the specific command I run is:

sudo mkfs.ext4 -m 0 -E lazy_itable_init=0,lazy_journal_init=0,discard /dev/sdb

Mounting Our Disk

Now that our disk has been formatted in ext4, I can mount it.

Note: to mount a disk simply means to make the disk's filesystem available so that we can use it for accessing, storing, etc files on the disk. Whenever we insert a USB drive, a DVD drive, etc into our computers, the OS you use should mount that disk automatically so that you can access and use that disk. Conversely, when we remove those drives, the OS unmounts them. In Linux, the commands for these are mount and umount. Note that the umount command is not unmount.

You will recall that we have discussed filesystems earlier and that the term is a bit confusing since it refers to both the directory hierarchy and also the formatting type (e.g., ext4). I discussed how Windows assigns drives letters (A:, B:, etc.) when attaching new drives, like a USB drive. Unlike Windows, I mentioned that in Linux and Unix (e.g., macOS), when we add an additional disk, its filesystem gets added onto our existing system. That is, it becomes part of the directory hierarchy and under the / top level part of the hierarchy. In practice, this means that we have to create the mountpoint for our new disk, and we do that first with the mkdir command. The Google Console documentation instructs us to use the following command:

sudo mkdir -p /mnt/disks/MOUNT_DIR

And to replace MOUNT_DIR with the directory we want to create. Since my added disk is named disk-1, I'll call it that:

sudo mkdir -p /mnt/disks/disk-1

Now we can mount the disk to that directory. Per the instructions on Google Console, and given that my added drive has the device name sdb, I use the following command:

sudo mount -o discard,defaults /dev/sdb /mnt/disks/disk-1

We also need to change the modifications, and grant access for additional users:

sudo chmod 777 /mnt/disks/disk-1

We can test that it exists and is accessible with the lsblk and the cd commands. The lsblk command should show that sdb is mounted at /mnt/disks/disk-1, and we can cd (change directory) to it:

cd /mnt/disks/disk-1

Automounting Our Disk

Our disk is mounted, but if the computer (VM) gets rebooted, we would have to re-mount the additional drive manually. In order to avoid this and automount the drive upon reboot, we need to edit the file /etc/fstab.

Note that the file is named fstab and that it's located in the /etc directory. Therefore the full path is /etc/fstab

The fstab file is basically a configuration file that provides information to the OS about the filesystems the system can mount. The standard information fstab contains includes the name (or label) of the device being mounted, the mountpoint (e.g., /mnt/disks/disk-1), the filesystem type (e.g., ext4), and various other mount options. See man fstab for more details. For devices to mount automatically upon boot, they have to be listed in this file. That means we need to edit this file.

Again, here we're following the Google Cloud instructions:

Before we edit system configuration files, however, always create a backup. We'll use the cp command to create a backup of the fstab file.

sudo cp /etc/fstab /etc/fstab.backup

Next we use the blkid command to get the UUID (universally unique identifier) number for our new device. Since my device is /dev/sdb, I'll use that:

sudo blkid /dev/sdb

The output should look something like this BUT NOTE that your UUID value will be DIFFERENT:

/dev/sdb: UUID="3bc141e2-9e1d-428c-b923-0f9vi99a1123" TYPE="ext4"

We need to add that value to /etc/fstab plus the standard information that file requires. The Google Cloud documentation explicitly guides us here. We can add that directly to our fstab file by redirecting the output of the blkid command to that file. NOTE: you have to use two angle brackets in the following command, or else you will erase the contents of that file!!!

sudo su
blkid /dev/sdb >> /etc/fstab
exit

Alternatively, we can use nano to make the edit by copying and pasting the UUID at the end.

And then edit the file with additional mount information at the bottom:

sudo nano /etc/fstab

And then add the information:

UUID=3bc141e2-9e1d-428c-b923-0f9vi99a1123 /mnt/disks/disk-1 ext4 discard,defaults,nofail 0 2

Save and exit nano. And that's it! If you reboot your VM, or if your VM rebooted for some reason, the extra drive we added should automatically mount upon reboot. If it doesn't, then it may mean that the drive failed, or that there was an error (i.e., typo) in the configuration.

Let's check if it's automounted upon reboot:

sudo reboot now

Wait a minute, and log back in to check. Then run the lsblk command to see if the new drive is recognized and mounted at /mnt/disks/disk-1:

lsblk

Delete the Disk

You are welcome to keep the disk attached to the VM. But if you do not want to incur any charges for it, which would be about $1 / month at 10 GB, then we can delete it.

To delete the disk, first delete the line that we added in /etc/fstab, unmount it, and then delete the disk in the gcloud console.

To unmount the disk, we use the umount command:

sudo umount /mnt/disks/disk-1

Check if it's unmounted:

lsblk

Then we need to delete the disk in gcloud.

Go to the VM instances page.
Click on the check box next to the VM.
Click on the name, which should be a hyperlinked.
This goes to the VM instances detail page.
Click on the Edit button at the top of the page.
Scroll down to the Additional disks section.
Click the edit (looks like a pencil) button.
In the right-hand pane that opens up, select Delete disk under the Deletion rule section.
Scroll back to the Additional disks section.
Click on the X to detach the disk.
Click on Save.
Go the Disk section in the left-hand navigation pane.
Check the disk to delete, and then Delete it.
- Make sure you delete the backup disk and not the main OS disk!
Click on the Snapshots section in the left-hand navigation pane (Compute Engine, Storage, and then Snapshots).
If a snapshot has been taken, check the disk snapshot to delete, and then Delete it.

Conclusion

In this section we learned how to expand the storage of our VM by creating a new virtual drive and adding it to our VM. We also learned how to format the drive in the ext4 filesystem format, and mount the drive at /mnt/disks/disk-1. Finally, we learned how to then edit /etc/fstab to automount the drive.

In addition to using the gcloud console, the commands we used in this section include:

ssh : to connect to the remote VM
sudo : to run commands as the administrator
mkfs.ext4 : to create an ext4 filesystem on our new drive
mkdir -p : to create multiple directories under /mnt
mount : to mount manually the new drive
umount : to unmount manually the new drive
chmod : to change the mountpoint's file permission attributes
cd : to change directories
cp : to copy a file
nano : to use the text editor nano to edit /etc/fstab

Managing Users and Groups

Understanding User and Group Management: Learn how to create, modify, and delete user accounts and groups on a Linux system using essential tools like useradd, userdel, groupadd, and groupdel.
Working with System Files: Gain familiarity with critical system files like /etc/passwd, /etc/shadow, and /etc/group, and understand how user, password, and group information is stored and managed.
Customizing User Account Settings: Modify default settings for new user accounts by editing configuration files such as /etc/skel and /etc/adduser.conf, allowing for customization of new user environments.
Password and Account Security: Develop skills in securing user accounts by setting password expiration policies and managing password lifetimes through the /etc/shadow file and commands like passwd and chage.
Group-Based Permissions and Shared Resources: Learn to create and manage groups, assign users to groups, and configure shared directories with appropriate permissions to facilitate collaborative work among users.
File Permissions and Directory Management: Understand how to change ownership and permissions of directories using tools like chmod and chgrp to control access based on user and group roles.
Practical User Management Tools: Apply hands-on experience using utilities such as gpasswd, su, sudo, and nano to manage users and groups, edit system files, and adjust account settings on a Linux system.

Getting Started

If you're like me, you have user accounts everywhere. I have accounts on my phone and my laptop. I have a Google account, a GitHub account, an account at my public library, an account at Costco. I have a university account that let's me use the same login across multiple university systems, including email and our learning management systems. I have a lot of user accounts, like you probably do.

For many of those accounts, I have access to some things and not to others. For my university, I can submit grades to the registrar for my classes, but I can't submit grades to the registrar for my colleagues' classes. Lots of professors serve on lots of committees. I can access files for the committees I'm a member of, but not for those where I'm not a member. However, various administrators, like my unit director, can access all those files.

In order to define what a user can do with their account, systems implement authentication and authorization mechanisms. Authentication and authorization are fundamental concepts in system security and resource management, and they serve distinct but connected purposes. Authentication is the process of verifying a user's identity. When I attempt to login to my university system, it asks, "who are you?", and I reply with my credentials, which are my username and password.

Authorization determines what an authenticated user is allowed to do. Once I am authenticated on my university system, it asks, "what can you do?" Then the system checks various permissions to allow certain actions and not others or access to certain resources and not others.

These basic concepts are true across all operating systems and services. How these concepts are implemented vary, though. In this section, we will learn about the commands that we use to authenticate and authorize users on a Linux server.

The `man` pages

Before we begin, you need to know about the man pages. The man (short for manual) pages are internal documentation on just about every part of your system. You can read the manual for the commands on your system and for many of the special files on your system. For example, you can read the manual on the ls command with man ls. Or you can read the manual on the chmod command with man chmod. You can also read the manual on the manual, which you'd invoke with the man man command. Much of what I know about the commands in this book, I learned from their man pages, such as:

man date
man glob
man grep
man bash
man regex
and more!

The man pages are categorized by sections, which are explained in the man man page. Each section is denoted by a number. The first section, denoted by the number 1, contain man pages on executable programs or shell commands. The fifth section, denoted by the number 5, contain man pages on file formats and conventions. There are nine total sections. In the case where a command and a system file each have the same name, then we need to specify the section number when invoking man for those pages. For example, use man 1 crontab to read the man page for the crontab executable, which is located at /usr/bin/crontab. Use man 5 crontab to read the man page for the crontab file, which is locate at /etc/crontab.

You can make the man pages easier to read by installing an additional program called bat. The bat program is a drop-in replacement for the cat command but comes with syntax highlighting and more. To install bat, do:

sudo apt install bat

Then use nano to open your $HOME/.bashrc file:

nano $HOME/.bashrc

And add the following line at the end, which will add some color to the man pages:

export MANPAGER="sh -c 'col -bx | batcat -l man -p'"

Once you've closed and saved your $HOME/.bashrc file, you need to source it:

source $HOME/.bashrc

Additionally, since bat is a drop-in replacement for the cat command, you can also use it to view or concatenate files. The full command is batcat [FILE], where [FILE] is the name of the file or files to view.

The passwd file

On every system there will be some place where information about users is stored. On a Linux system, user account information is stored in the file /etc/passwd. You should take a moment to read about this file in its man page. However, if you run man passwd, you will by default get the man page on the /usr/bin/passwd command. We want to read about the passwd file located at /etc/passwd:

man 5 passwd

Let's take a look at a single line of the file. Below I show the output for a fictional user account:

grep "peter" /etc/passwd

And the output:

peter:x:1000:1000:peter,,,:/home/peter:/bin/bash

Per the man 5 passwd page, we know that the line starting with peter is a colon separated line. That means that the line is composed of multiple fields each separated by a colon (which is perfect for awk to parse).

man 5 passwd tells us what each field indicates. The first field is the login name, which in this case is peter. The second field, marked x, marks the password field. This file does not contain the password, though. The passwords, which are hashed and salted, for users are stored in the /etc/shadow file. This file can only be read by the root user (or using the sudo command).

Hashing a file or a string of text is a process of running a hashing algorithm on the file or text. If the file or string is copied exactly, byte for byte, then hashing the copy will return the same value. If anything has changed about the file or string, then the hash value will be different. By implication, this means that if two users on a system use the same password, then the hash of each will be equivalent. Salting a hashed file (or file name) or string of text is a process of adding random data to the file or string. Each password will have a unique and mostly random salt added to it. This means that even if two users on a system use the same password, salting their passwords will result in unique values.

The third column indicates the user's numerical ID, and the fourth column indicates the users' group ID. The fifth column repeats the login name, but could also serve as a comment field. Comments are added using certain commands (discussed later). The fifth field identifies the user's home directory, which is /home/peter. The sixth field identifies the user's default shell, which is /bin/bash.

The user name or comment field merely repeats the login name here, but it can hold specific types of information. We can add comments using the chfn command. Comments include the user's full name, their home and work phone numbers, their office or room number, and so forth. To add a full name to user peter's account, we use the -f option:

sudo chfn -f "peter Burns" peter

The /etc/passwd file is a standard Linux file, but data in the file will change depending on the Linux distribution. For example, the user and group IDs above start at 1000 because peter is the first human account on the system. This is a common starting numerical ID nowadays, but it could be different on other Linux or Unix-like distributions. The home directory could be different on other systems, too; for example, the default could be located at /usr/home/peter. Also, other shells exist besides bash, like zsh, which is now the default shell on macOS; so other systems may default to different shell environments.

The shadow file

The /etc/passwd file does not contain any passwords but a simple x to mark the password field. Passwords on Linux are stored in /etc/shadow and are hashed with sha512, which is indicated by $6$. You need to be root to examine the shadow file or use sudo:

The fields are (see man 5 shadow):

login name (username)
encrypted password
days since 1/1/1970 since password was last changed
days after which password must be changed
minimum password age
maximum password age
password warning period
password inactivity period
account expiration date
a reserved field

The /etc/shadow file should not be edited directly. To set, for example, a warning that a user's password will expire, we could use the passwd command (see man passwd for options), or the chage command. The following command would make it so the user peter is warned that their password will expire in 14 days:

passwd -w 14 peter

The group file

The /etc/group file holds group information about the entire system (see man 5 group). By default the file can be viewed by anyone on a system, but there is also a groups command that will return the groups for a user. See: man groups Running the groups command by itself will return your own memberships.

Management Tools

There are different ways to create new users and groups, and the following list includes most of the utilities to help with this. Note that, based on the names of the utilities, some of them are repetitive.

useradd (8) - create a new user or update default new user information
usermod (8) - modify a user account
userdel (8) - delete a user account and related files
groupadd (8) - create a new group
groupdel (8) - delete a group
groupmod (8) - modify a group definition on the system
gpasswd (1) - administer /etc/group and /etc/gshadow
adduser.conf (5) - configuration file for adduser(8) and addgroup(8) .
adduser (8) - add a user or group to the system
deluser (8) - remove a user or group from the system
delgroup (8) - remove a user or group from the system
chgrp (1) - change group ownership

The numbers within parentheses above indicate the man section. Therefore, to view the man page for the userdel command:

man 8 userdel

Authentication

Modify default new user settings

Let's modify some default user account settings for new users, and then we'll create a new user account.

Before we proceed, let's review some important configurations that establish some default settings:

/etc/skel
/etc/adduser.conf

The /etc/skel directory defines the home directory for new users. Whatever files or directories exist in this directory at the time a new user account is created will result in those files and directories being created in the new user's home directory. We can view what those are using the following command:

ls -a /etc/skel/

The /etc/adduser.conf file defines the default parameters for new users. It's in this file where the default starting user and group IDs are set, where the default home directory is located (e.g., in /home/), where the default shell is defined (e.g., /bin/bash), where the default permissions are set for new home user directories (e.g., 0755) and more.

Let's change some defaults for /etc/skel. We need to use sudo [command] since this directory and its contents are owned by the root user. First, we'll edit the default .bashrc file:

sudo nano /etc/skel/.bashrc

We want to add the following lines at the end of the file. This file is a configuration file for /bin/bash, and will be interpreted by Bash. Lines starting with a hash mark are comments:

# Dear New User,
#
# I have made the following settings
# to make your life a bit easier:
#
# make "c" a shortcut for "clear"
alias c='clear'

Save and exit the file.

Use nano again to create a README file. This file will be added to the home directories of all new users. Add any welcome message you want to add, plus any guidelines for using the system. Then save and exit the file.

sudo nano /etc/skel/README

Add new user account

After writing (saving) and exiting nano, we can go ahead and create a new user named linus.

sudo adduser linus

We'll be prompted to enter a password for the new user, plus comments (full name, phone number, etc). Any of these can be skipped by pressing enter. You can see from the output of the grep command below that I added some extra information:

grep "linus" /etc/passwd
linus:x:1003:1004:Linus Torvalds,333,555-123-4567,:/home/linus:/bin/bash

We may want to set up some password conditions to help keep the new user account secure. To do that, we can modify the minimum days before the password can be changed, the maximum days before the password expires, the number of days before the user gets a warning to change their password, and the number of days of inactivity when the password is locked. The passwd command can set some of these parameters, but the chage command is a bit more powerful:

sudo chage -m 7 -M 90 -W 14 -I 14 linus

See man chage for details, but:

-m 7 sets the minimum password age to 7 days before the user can change their password.
-M 90 sets the maximum age of the password to 90 days.
-W 14 provides a 14 day warning to the user that the password will expire.
-I 14 locks the account after 14 days of inactivity.

You can see these values by grepping the shadow file:

sudo grep "linus" /etc/shadow

To log in as the new user, use the su command and enter the password you used when creating the account:

su linus

To exit the new user's account, use the exit command:

exit

As a sysadmin, you will want to regularly review and audit the /etc/passwd and the /etc/shadow files to ensure only authorized users have access to the system.

Before proceeding, repeat the above process for a user named peter, or use a different username and adjust as necessary as you proceed.

Authorization

Let's say we've created our users and now we want to give them access to some additional resources. For example, we can set up a shared directory on the system that multiple users can access and use. To do that, we will begin to work with groups and file/directory permissions.

Add users to a new group

Because of the default configuration defined in /etc/adduser.conf, the linus user only belongs to a group of the same name. Let's create a new group that both linus and peter belong to. We'll call this developers. Then we'll add both peter and linus to that group. For that, we'll use the gpasswd -a command and option. We'll also make the user peter the group administrator using the -A option (see man gpasswd for more details).

sudo groupadd developers
sudo gpasswd -a peter developers
sudo gpasswd -a linus developers
sudo gpasswd -A peter developers
grep "developers" /etc/group

Note: if a user is logged in when you add them to a group, they need to logout and log back in before the group membership goes into effect.

Create a shared directory

One of the benefits of group membership is that members can work in a shared directory.

Let's make the /srv/developers a shared directory. The /srv directory already exists, so we only need to create the developers subdirectory:

sudo mkdir /srv/developers

Now we change ownership of the directory so that it's group owned by the developers group that we created:

sudo chgrp developers /srv/developers

The directory ownership should now reflect that it's owned by the developers group:

ls -ld /srv/developers

The default permissions are currently set to 0755. To allow group members to read and write to the above directory, we need to use the chmod command in a way we haven't yet. Specifically, we add a leading 2 that sets the group identity. The 770 indicates that the user and group owners of the directory (but not others) have read, write, and execute permissions for the directory:

sudo chmod 2770 /srv/developers

This first digit, the 2 above, is the setgid (set group ID) bit. Setting this ensures that any files or subdirectories created within /srv/developers inherit the group ownership of the parent directory. In this case, that's the developers group. This is useful for group collaboration. By setting this, either linus or peter can add, modify, and delete files in the /srv/developers directory.

User account and group deletion

You can keep the additional user and group on your system, but know that you can also remove them. The deluser and delgroup commands offer great options and may be preferable to the others utilities (see man deluser or man delgroup).

If we want to delete the new user's account and the new group, these are the commands to use. The first command will create an archival backup of linus' home directory and also remove the home directory and any files in it.

deluser --backup --remove-home linus
delgroup developers

Conclusion

Knowing how to manage user accounts and manage passwords are key sysadmin skills. They are needed to provide collaborative environments and to keep our systems secure through authentication and authorization. While the methods to manage these things vary by operating system, the basic concepts are the same across OSes and services.

Although the basic concepts hold true across systems, things get a bit more complex for enterprise systems. On enterprise systems running Windows, Active Directory (AD) is used for both authentication and authorization. On enterprise systems running Linux, the Lightweight Directory Access Protocol (LDAP) system is used to store and manage user credentials. LDAP can be integrated with AD to enable Linux systems to use AD for centralized user management. Other technologies exist that facilitate user and resource management. They include:

In this section, we learned about important user management files like /etc/passwd, /etc/shadow, /etc/group, /etc/skel, and /etc/adduser.conf. We continued to use nano to edit new configuration files, specifically /etc/skel and /etc/adduser.conf. We dove deeper into exploring how the man pages work. We also learned how to create new Linux user accounts, modify those accounts password parameters, assign those accounts to groups, and create a share directory for those accounts for collaboration.

We covered the following new commands:

adduser: add a user or group to the system
chage: change user password expiry information
chfn: change real user name and information
chgrp: change group ownership
delgroup: remove a user or group from the system
deluser: remove a user or group from the system
gpasswd: administer /etc/group and /etc/gshadow
groupadd: create a new group
passwd: the password file
su: run a command with substitute user and group ID

Managing Software

By the end of this section, you should understand:

Package Management: Modern Linux distributions use package managers like apt for software management, similar to app stores on mobile devices.
APT Basics: Essential commands include apt update to refresh the package list, apt upgrade to install available updates, and apt install to add new packages.
Installing Software: Besides apt, software can also be installed from source, as .deb packages with dpkg, or as snap packages for broader compatibility.
Maintaining Your System: Use apt autoremove to clean up unused dependencies, and apt clean or apt autoclean to clear out cached packages to free up disk space.
Package Information: Use apt show for package details, apt policy for version information, and apt search to find packages by name or keyword.
Removal Options: apt remove uninstalls packages, while apt purge also removes configuration files.
Script Automation: Common update and cleanup tasks can be automated with a bash script, run with sudo for ease of use.
APT Logs: Review apt activity by checking /var/log/apt/history.log.
Manual Reading: For deeper understanding, consult the man apt manual page.

Getting Started

Many modern Linux distributions offer some kind of package manager to install, manage, and remove software. These package management systems interact with curated and audited central repositories of software that are collected into packages. They also provide a set of tools to learn about the software that exists in these repositories.

If package management seems like an odd concept to you, it's just a way to manage software installation. It's very similar to the way that Apple and Google distribute software via the App Store and Google Play.

On Debian based systems, which includes Ubuntu, we use apt, apt-get, and apt-cache to manage most software installations. For most cases, you will simply want to use the apt command. It is meant to combine the functionality commonly used with apt-get and apt-cache.

We can also install software from source code or from pre-built binaries. On Debian and Ubuntu, for example, we might want to install (if we trust it) pre-build binaries distributed on the internet as .deb files. These are comparable to .dmg files for macOS and to .exe files for Windows. When installing .deb files, though, we generally use the dpkg command, although it's possible to use apt to install these files, too.

Installing software from source code often involves compiling the software. It's usually not difficult to install software this way. However, it can become complicated to manage software that's installed from source code simply because it means managing dependencies. This means we would need to manually track new patches or versions of the software.

Another way to install software is to use the snap command. This is a newer way of packaging programs that involves packaging all of a program and all of its dependencies into a single container. The main point of snap seems to be aimed at IoT, embedded devices, and desktop/laptop systems. It's perfectly usable and preferable on the desktop because the general aim is end users and not system administrators. See the snap store for examples.

You might also want to know that some programming languages provide their own mechanisms to install packages. In many cases, these packages may be installed with the apt command, but the packages that apt will install tend to be older (but more stable) than the packages that a programming language will install. For example, Python has the pip or pip3 command to install and remove Python libraries. The R programming language has the install.packages(), remove.packages(), and the update.packages() commands to install R libraries.

Despite all these ways to install, manage, remove, and update software, we will focus on using the apt command, which is pretty straightforward.

APT

Let's look at the basic apt commands.

`apt update`

Before installing any software, we need to update the index of packages that are available for the system.

sudo apt update

`apt upgrade`

The above command will also state if there is software on the system that is ready for an upgrade. If any upgrades are available, we run the following command:

sudo apt upgrade

`apt search`

We may know a package's name when we're ready to install it, but we also may not. To search for a package, we use the following syntax:

apt search [package-name]

Package names will never have spaces between words. Rather, if a package name has more than one word, each word will be separated by a hyphen.

In practice, say I'm curious if there are any console based games:

apt search ncurses game

I added ncurses to my search query because the ncurses library is often used to create console-based applications.

`apt show`

The above command returned a list that includes a game called ninvaders, which is a console-based Space Invaders like game. To get additional information about this package, we use the apt show [package-name] command:

apt show ninvaders

For example, if we want to see what a package needs or depends on, then we can use the following command:

apt-cache depends ninvaders

`apt policy`

To get a list of various versions that are available to download, we can use the apt policy command:

apt policy ninvaders

`apt install`

It's quite simple to install the package called ninvaders:

sudo apt install ninvaders

`apt remove` or `apt purge`

To remove an installed package, we can use either the apt remove or the apt purge commands. Sometimes when a program is installed, configuration files get installed with it in the /etc directory. The apt purge command will remove those configuration files but the apt remove command will not. Both commands are offered because sometimes it is useful to keep those configuration files.

sudo apt remove ninvaders

Or:

sudo apt purge ninvaders

`apt autoremove`

All big software requires other software to run. This other software are called dependencies. The apt show [package-name] command will list a program's dependencies, as well as the apt-cache depends command. However, when we remove software when using the apt remove or apt purge commands, the dependencies, even if no longer needed, are not necessarily removed. To remove them and restore disk space, we do:

sudo apt autoremove

`apt clean` or `apt autoclean`

When we install software using the apt command, the installed packages are stored locally. We don't necessarily need those once the binaries have been installed. Or we may want to remove them especially if we're removing the binaries. The apt clean and apt autoclean commands clear up that local cache of packages. We use either of these commands to free up disk space.

Between the two, the apt clean command removes all package files that are stored in /var/cache/apt/archives. But the apt autoclean only removes package files that can no longer be downloaded. I generally use the apt clean command to free up more disk space. I will only use apt autoclean if I want to keep an package to older software, but this is rare.

To use:

sudo apt clean

Or:

sudo apt autoclean

apt history

Unfortunately, the apt command does not provide a way to get a history of how it's been used on a system, but a log of its activity is kept. We can review that log with the following command:

less /var/log/apt/history.log

Daily Usage

This all may seem complicated, but it's really not. For example, to keep my systems updated, I run the following four commands on a daily or near daily basis:

sudo apt update
sudo apt upgrade
sudo apt autoremove
sudo apt clean

You can add these to a script that we could call update.sh and put it in /usr/local/bin:

#!/usr/bin/env bash

apt update && apt upgrade && apt autoremove && apt clean

And then run it like so after making it executable: sudo chmod 700 /usr/local/bin/update.sh:

sudo update.sh

NOTE: Running the script with the sudo command is more convenient, cleaner, and sufficient. However, if you add non-administrative commands to the script later, then you would add sudo to each of the apt commands to prevent running the whole script with elevated permissions. In other words, your script would be:

#!/usr/bin/env bash

sudo apt update && sudo apt upgrade && sudo apt autoremove && sudo apt clean

Conclusion

There are a variety of ways to install software on a Linux or Ubuntu system. The common way to do it on Ubuntu is to use the apt command, which was covered in this section.

We'll come back to this command often because we'll soon install and setup a complete LAMP (Linux, Aapache, MariaDB, and PHP) server. Until then, I encourage you to read through the manual page for apt:

man apt

Using systemd

By the end of the section, you will know how to:

Use systemctl to manage services on a Linux system, including starting, stopping, enabling, and checking the status of services.
Understand the purpose of systemd as an init system and its role in booting, service management, and log handling.
Check the status and logs of specific services with journalctl and use filters to narrow down log results by service, PID, or specific conditions.
Set up and manage systemd timers to automate tasks, such as scheduling scripts to run at specific times.
Explore and utilize additional systemd commands, such as checking enabled services, suspending or rebooting the system, and examining boot times with systemd-analyze.
View and interpret system logs, search through them efficiently, and follow logs in real-time.
Create and configure custom systemd service and timer files for more tailored system automation.
Use systemd commands to troubleshoot system issues, including identifying failed services and examining resource usage with systemd-cgtop.

Getting Started

When computers boot up, obviously some software manages that process. On Linux and other Unix or Unix-like systems, this is usually handled via an init system. For example, macOS uses launchd and many Linux distributions, including Ubuntu, use systemd.

systemd does more than handle the startup process, it also manages various services and connects the Linux kernel to various applications. In this section, we'll cover how to use systemd to manage services, and to review log files.

Manage Services

When we install complicated software, like a web server (e.g., Apache2, Nginx), a SSH server (e.g., OpenSSH), or a database server (e.g., mariaDB or MySQL), then it's helpful to have commands that manage that service: the web service, the SSH service, the database service, etc.

For example, the ssh service is installed by default on our gcloud servers, and we can check its status with the following systemctl command:

systemctl status ssh

The output tells us a few things. The line beginning with Loaded tells us that the SSH service is configured. At the end of that line, it also tells us that it is enabled. Enabled means that the service automatically starts when the system gets rebooted or starts up.

The line beginning with Active tells us that the service is active (running) and for how long. We also can see the process ID (PID) for the service as well as how much memory it's using.

At the bottom of the output, we can see the recent log files. We can view more of those log files using the journalctl command. By default, running journalctl by itself will return all log files. We can specify that we're interested in log files only for the ssh service. We can specify using the PID number. Replace N with the PID number attached to your ssh service:

journalctl _PID=N

Or we can specify by service, or more specifically, its unit name:

journalctl -u ssh

Use Cases

Later we'll install the Apache web server, and we will use systemctl to manage some aspects of this service.

In particular, we will use the following commands to:

check the state of the Apache service,
enable the Apache service to auto start on reboot,
start the service,
reload the service after editing its configuration files, and
stop the service.

In practice, these work out to:

systemctl status apache2
sudo systemctl enable apache2
sudo systemctl start apache2
sudo systemctl reload apache2
sudo systemctl stop apache2

systemctl is a big piece of software, and there are other arguments the command will take. See man systemctl for details.

NOTE: Not all services support systemctl reload [SERVICE]. You can check if a service is reloadable by checking its service file. As an example:

grep "ExecReload" /lib/systemd/system/ssh.service

You can peruse other services in /lib/systemd/system.

Examine Logs

As mentioned, the journalctl command is part of the systemd software suite, and it is used to monitor system logs.

It's important to monitor system logs. Log files help identify problems in the system or with various services. For example, by monitoring the log entries for ssh, I can see all the attempts to break into the server. Or if the Apache2 web server malfunctions for some reason, which might be because of a configuration error, the logs will indicated how to identify the problem.

If we type journalctl at the command prompt, we are be presented with the logs for the entire system. These logs can be paged through by pressing the space bar, the page up/page down keys, or the up/down arrow keys. They can also be searched by pressing the forward slash / and then entering a search keyword. To exit out of the pager, press q to quit.

journalctl

It's much more useful to specify the field and to declare an option when using journalctl, like above with ssh See the following man pages for details:

man systemd.journal-fields
man journalctl

There are many fields and options we can use, but as an example, we see that there is an option to view the more recent entries first:

journalctl -r

Or we view log entries in reverse order, for users on the system, and since the last boot with the following options:

journalctl -r --user -b 0

Or for the system:

journalctl -r --system -b 0

I can more specifically look at the logs files for a service by using the -u option with journalctl:

journalctl -u apache2

I can follow the logs in real-time (press ctrl-c to quit the real-time view):

journalctl -f

Timers (Automation)

Linux and Unix operating systems have long provided a way to automate processes. In the past, and still available on most systems, is the cron service, which I do not cover here. systemd also provides a way to automate jobs using timers.

In our bash exercises, we created a script to examine the auth.log file for invalid IP addresses. Whenever we want to check to see what IP addresses are trying to login into our system, we have to run that command.

What if we could have that script run at specific times? For example, what if we wanted to run that script every morning at 8AM and then log the output to a file for us to read? We can do that with systemd timers.

First, let's modify our auth.sh script. In the example below, I've adjusted the location of the auth.log file, created two additional variables to record the start and end dates of the auth.log file, and then modified the end echo statement to add some additional information and save the output in a file called brute.log in our /srv/developers directory. The ${end_date} and ${start_date} variables were created after closely studying the /var/log/auth.log file.

#!/usr/bin/env bash

#!/usr/bin/env bash

LOG_FILE="/var/log/auth.log"

END_DATE=$(grep -Eo "^[[:alpha:]]{3}[[:space:]]{1,2}[[:digit:]]{1,2}" "${LOG_FILE}" | tail -n1)
START_DATE=$(grep -Eo "^[[:alpha:]]{3}[[:space:]]{1,2}[[:digit:]]{1,2}" "${LOG_FILE}" | head -n1)

TOTAL_INVALID="$(grep -c "Invalid user" ${LOG_FILE})"
INVALID_IPS="$(grep "Invalid user" "${LOG_FILE}" | grep -Eo "[[:digit:]]+\.[[:digit:]]+\.[[:digit:]]+\.[[:digit:]]+" | sort | uniq | wc -l)"

echo "
Log entry created on $(date +%c).
From ${START_DATE} to ${END_DATE}, there were ${TOTAL_INVALID} attempts to login to the system.
These came from ${INVALID_IPS} unique IPs.
" >> "$HOME/brute.log"

Next, we need to create two additional files. First we create a service file. This file defines the service that we want to execute. Navigate to the service directory:

cd /etc/systemd/system

And use sudo nano to create a file called brute.service:

sudo nano brute.service

In the above file, we add the following information under two sections, a Unit section and a Service section. The Unit section includes a description of the service and a list of the service's requirements. The Service section declares the type of service, the location of the script to run, and the user to run the script under. Feel free to use this but be sure to change your User information:

[Unit]
Description="Summarize brute login attempts."
Requires=brute.timer

[Service]
Type=simple
ExecStart=/usr/local/bin/auth.sh
User=seanburns

See man 5 systemd.service for more details.

Next we need to create the timer file. Using sudo nano, run the following command in the same directory as above:

sudo nano brute.timer

In this file, add the following:

[Unit]
Description="Timer for the brute login service."

[Timer]
OnCalendar=*-*-* 08:00:00
Persistent=true

[Install]
WantedBy=timers.target

See man 5 systemd.timer for more details.

Next we need to enable and start the timer. To do that, we run two separate systemctl commands:

sudo systemctl daemon-reload

And then enable the timer:

sudo systemctl enable brute.timer

Start the timer:

sudo systemctl start brute.timer

And finally, check the status of all timers:

systemctl list-timers

Or check the status of our specific timer:

systemctl status brute.timer

You can now check that your script ran after the next time your system's clock reaches 8AM.

Useful Systemd Commands

You can see more of what systemctl or journalctl can do by reading through their documentation:

man systemctl
man journalctl

You can check if a service if enabled:

systemctl is-enabled apache2

You can reboot, poweroff, or suspend a system (suspending a system mostly makes sense for laptops and not servers):

systemctl reboot
systemctl poweroff
systemctl suspend

To show configuration file changes to the system:

systemd-delta

To list real-time control group process, resource usage, and memory usage:

systemd-cgtop

to search failed processes/services:

systemctl --state failed

to list services

systemctl list-unit-files -t service

to examine boot time:

systemd-analyze

Conclusion

This is a basic introduction to systemd, which is composed of a suite of software to help manage booting a system, managing services, and monitoring logs.

We'll put what we've learned into practice when we set up our LAMP servers.

Networking and Security

Even if we do not work as network administrators, system administrators need to know network basics. In this section, we cover TCP/IP and other protocols related to the internet protocol suite, and how to protect our systems locally, from external threats, and how to create backups of our systems in case of disaster.

Networking and TCP/IP

By the end of this section, you should know:

The role of a system administrator in setting up, configuring, and monitoring networks, from small LANs to larger networks that interface with external networks.
The structure and layers of the Internet Protocol Suite, including the Link, Internet, Transport, and Application layers.
How the Address Resolution Protocol (ARP) is used to map IP addresses to MAC addresses, and how to view network information using commands like ip a and ip link.
The distinction between public and private IP addresses, the ranges for private IPs, and the concept of IP subnets.
How the Transmission Control Protocol (TCP) and User Datagram Protocol (UDP) differ in handling data, including typical use cases for each.
The basics of TCP and UDP headers, and how to examine network traffic using tcpdump.
The significance of ports in networking, common port numbers, and how ports help direct traffic to specific services.
The concept of subnetting, including how subnet masks and CIDR notation help define network boundaries and control the number of hosts on a network.
How to convert between binary and decimal for IP addressing, as well as the importance of subnet masks and broadcast addresses.
The process of calculating available hosts in a subnet, including examples for both /24 and /16 subnet masks.

Getting Started

An important function of a system administrator is to set up, configure, and monitor a network. This may involve planning, configuring, and connecting the devices on a local area network, to planning and implementing a large network that interfaces with an outside network, and to monitoring networks for various sorts of attacks, such as denial of service attacks.

In order to prepare for this type of work, we need at least a basic understanding of how the internet works and how local devices interact with the internet. In this section, we will focus mostly on internet addressing, but we will also devote some space to TCP and UDP, two protocols for transmitting data.

Connecting two or more devices together nowadays involves the TCP/IP or the UDP/IP protocols, otherwise part of the Internet protocol suite. This suite is an expression of the more generalized OSI communication model.

The internet protocol suite is framed as a series of layers beginning with a lower layer, called the link layer, that interfaces with internet capable hardware, to the highest layer, called the application layer.

The link layer describes the local area network. Devices connected locally, e.g., via Ethernet cables or local wifi, comprise the link layer. The link layer connects to the internet layer. Data going into or out of a local network must be negotiated between these two layers.

The internet layer makes the internet possible since it provides functionality to transmit data among multiple networks possible. The internet is, in fact, a network of networks. The primary characteristic of the internet layer is the IP address, which currently comes in two versions: IPv4 (32 bit) and IPv6 (128 bit). IP addresses are used to locate hosts on a network.

The transport layer makes the exchange of data on the internet possible. There are two dominant protocols attached to this layer: UDP and TCP. Very generally, UDP is used when the integrity of data is less important than the its ability to reach its destination. For example, streaming video, Voice-over-IP (VOIP), and online gaming are often transported via UDP because the loss of some pixels or some audio is acceptable for end users. TCP is used when the integrity of the data is important. If the data cannot be transmitted without error, then the data won't reach its final destination until the error is corrected.

The application layer provides the ability to use the internet in particular ways. For example, the HTTP protocol enables the web. The web is thus an application of the internet. The SMTP, IMAP, and POP protocols provide the basis for email exchange. DNS is a system that maps IP addresses to domain names. In this book, we use SSH, also part of the application layer, to connect to remote computers.

By application, they simply mean that these protocols provide the functionality for applications. They are not themselves considered user applications, like a web browser.

The Internet Protocol Suite

Link Layer

ARP (Address Resolution Protocol)

ARP (Address Resolution Protocol) is a protocol at the link layer and is used to map network addresses, like an IP address, to ethernet addresses. An ethernet address is more commonly referred to as the MAC or Media Access Control address, or the hardware address. Routers use MAC addresses to enable communication inside networks (w/in subnets or local area networks) so that computers within a local network can talk to each other on these subnets. Networks are designed so that IP addresses are associated with MAC addresses before systems can communicate over a network. Everyone of your internet capable devices, your smartphone, your laptop, your internet connected toaster, have a MAC address.

To get the MAC address for a specific computer, we can use the following commands:

ip a
ip link

In the above command, ip is the command and a or link are considered objects (see man ip for details). Note that ip a and ip link provide slightly different views of network interfaces.

The MAC addresses are reported on the link/ether line.

On my home laptop, the ip link command produces four numbered sections of output. The first section refers to the lo or loopback device. This is a special device that allows the computer to communicate with itself. It always has an MAC address of 00:00:00:00:00:00. The next section on my home machine refers to the ethernet card. This is what I'd use if my computer was connected to the internet via a wired connection. Since there's no physical cable connecting my machine to the router, ip link reports DOWN. The MAC address is reported on the indented line below. The third numbered section on my laptop refers to the wifi card and begins with wl. Since I'm using the wifi connection, ip link reports that it is UP (or active). The fourth numbered section is a bit special and begins with enx followed by this device's MAC address. I'm not sure what device this refers to, but it might be internet over USB-C. See man systemd.net-naming-scheme for more details.

We can get the IP information with the following command:

ip a

For the same numbered devices, the output reports the MAC addresses plus the IP addresses. Here I note that the lo device has the IP address of 127.0.0.1. It always has this device. On my gcloud machine, I get an IP address like 10.X.X.X (where the Xes equals some number). This is a private IP address.

The following two commands help identify parts of the local network (or subnet) and the routing table.

ip neigh
ip route

The ip neigh command produces the ARP cache, basically what other systems your system is aware of on the local network. The ip route command is used to define how data is routed on the network but can also define the routing table. Both of these commands are more commonly used on Linux-based routers.

These details enable the following scenario: A router gets configured to use a specific network address when it's brought online. It searches the sub network for connected MAC addresses that are assigned to wireless cards or ethernet cards. It then assigns each of those MAC addresses an available IP address based on the available network address. Those network addresses are private IP addresses and will fall within a specific range (as discussed below).

Internet Layer

IP (Internet Protocol)

The Internet Protocol, or IP, address is used to uniquely identify a host on a network and place that host at a specific location (its IP address). If that network is subnetted (i.e., routed), then a host's IP address will have a subnet or private IP address. This private IP address will not be directly exposed to the Internet.

Remember this, there are public IP addresses and these are distinct from private IP addresses. Public IP addresses are accessible on the internet. Private IP addresses are not, but they are accessible on subnets or local area networks.

Private IP address ranges are reserved address ranges. This means no public internet device will have an IP address within these ranges. The private address ranges include:

Start Address	End Address
10.0.0.0	10.255.255.255
172.16.0.0	172.31.255.255
192.168.0.0	192.168.255.255

If you have a router at home, and look at the IP address for at any of your devices connected to that router, like your phone or computer, you will see that it will have an address within one of the ranges above. For example, it might have an IP address beginning with 192.168.X.X. This is a common IP address range for a home router. The 10.X.X.X private range can assign many more IP addresses on its network. This is why you'll see that IP range on bigger networks, like a university's network. We'll talk more about subnetwork sizes, shortly.

Example Private IP Usage

Let's say my campus desktop's IP address is 10.163.34.59/24 via a wired connection. And my office neighbor has an IP address of 10.163.34.65/24 via their wired connection. Both IP addresses are private because they fall within the 10.0.0.0 to 10.255.255.255 range. And it's likely they both exist on the same subnet since they share the first three octets: 10.163.34.XX.

However, if we both, using our respective wired connected computers, searched Google for what's my IP address, we will see that we share the same public IP address, which will be something like 128.163.8.25. That is a public IP address because it does not fall within the ranges listed above.

Without any additional information, we know that all traffic coming from our computers and going out to the internet looks like it's coming from the same IP address (128.163.8.25). And in reverse, all traffic coming from outside our network first goes to 128.163.8.25 before it's routed to our respective computers via the router.

Let's say I switch my network connection to the campus's wifi network. When I check with ip a, I find that the computer now has the IP address 10.47.34.150/16. You can see there's a different pattern with this IP address. The reason it has a different pattern is because this laptop is on an different subnet. This wireless subnet was configured to allow more hosts to connect to it since it must allow for more devices (i.e., laptops, phones, etc). When I searched Google for my IP address from this laptop, it reports 128.163.238.148, indicating that UK owns a range of public IP address spaces.

Here's kind of visual diagram of what this network looks like:

Network diagram — Fig. 1. This figure contains a network switch, which is used to route traffic within a subnet. The switch relies solely on MAC addresses and not IP addresses to determine the location of devices on its subnet. The router acts as the interface between the private network and the public network and is managing two subnets: a wired and a wireless one.

Using the `ip` Command

The ip command can do more than provide us information about our network. We can also use it to turn a connection to the network on or off (and more). The commands below show how we disable and then enable a connection on a machine. Note that enp0s3 is the name of my network card/device. Yours might have a different name.

sudo ip link set ens4 down
sudo ip link set ens4 up

Don't run those commands on your gcloud servers otherwise your connection will be dropped and you'll have to reboot the system from the web console.

Transport Layer

The internet (IP) layer does not transmit content, like web pages or video streams. This is the work of the transport layer. As discussed previously, the two most common transport layer protocols are TCP and UDP.

TCP, Transmission Control Protocol

TCP or Transmission Control Protocol is responsible for the transmission of data and for making sure the data arrives at its destination w/o errors. If there are errors, the data is re-transmitted or halted in case of some failure. Much of the data sent over the internet is sent using TCP.

UDP, User Datagram Protocol

The UDP or User Datagram Protocol performs a similar function as TCP, but it does not error check and data may get lost. UDP is useful for conducting voice over internet calls or for streaming video, such as through YouTube, which uses a type of UDP transmission called QUIC that has builtin encryption.

TCP and UDP Headers

The above protocols send data in data TCP packets or UDP datagrams, but these terms may be used interchangeably. Packets for both protocols include header information to help route the data across the internet. TCP includes ten fields of header data, and UDP includes four fields.

We can see this header data using the tcpdump command, which requires sudo or being root to use. The first part of the IP header contains the source address, then comes the destination address, and so forth. Aside from a few other parts, this is the primary information in an IP header.

If you want to use tcpdump, you should use it on your local computer and not on your gcloud instance. I'm not sure how Google will respond to this kind of activity because it might be deemed malicious. But to use it, first we identify the IP number of a host, which we can do with the ping command, and then run tcpdump:

ping -c1 www.uky.edu
sudo tcpdump host 128.163.35.46

While that's running, we can type that IP address in our web browser, or enter www.uky.edu, and watch the output of tcpdump.

TCP headers include port information and other mandatory fields for both source and destination servers. The SYN, or synchronize, message is sent when a source or client requests a connection. The ACK, or acknowledgment, message is sent in response, along with a SYN message, to acknowledge the request for a connection. Then the client responds with an additional ACK message. This is referred to as the TCP three-way handshake. In addition to the header info, TCP and UDP packets include the data that's being sent (e.g., a webpage) and error checking if it's TCP.

Ports

TCP and UDP connections use ports to bind internet traffic to specific IP addresses. Specifically, a port associates a process with an application (and is part of the application layer of the internet suite), such as a web service or outgoing email. That is, ports provide a way to distinguish and filter internet traffic (web, email, etc) through an IP address. For example, port 80 is the default port for unencrypted HTTP traffic. Thus, all traffic going to IP address 10.0.5.33:80 means that this is HTTP traffic for the HTTP web service. Note that the port info is attached to the end of the IP address via a colon.

Other common ports include:

21: FTP
22: SSH
25: SMTP
53: DNS
143: IMAP
443: HTTPS
587: SMTP Secure
993: IMAP Secure

There's a complete list of the 370 default ports/protocols on your Linux systems. It's located in the following file:

less /etc/services

Learning opportunity! We can view the whole file with less /etc/services or if you want to view only non-empty lines and lines not starting with comments, which are lines beginning with the pound sign #, then we can use sed:
sed -n '/^[^$]/p' /etc/services | sed -n '/^[^#]/p' | less
The first sed command prints non-empty lines. The output is piped to the second sed command, which prints lines not starting with the pound sign. This output is piped to the less command for viewing. Instead of piping the output to less, we could pipe it to wc -l to get a total count of the ports. Alternatively, we can invert grep for lines starting with a pound sign or are empty:
grep -Ev "^#|^$" /etc/services | wc -l
There so many ways!

See also the Wikipedia page: List of TCP and UDP port numbers

IP Subnetting

Let's now return to the internet layer and discuss one of the major duties of a systems administrator: subnetting.

Subnets are used to carve out smaller and more manageable subnetworks out of a larger network. They are created using routers that have this capability (e.g., commercial use routers) and certain types of network switches.

Private IP Ranges

When subnetting local area networks, recall that we work with the private IP ranges:

Start Address	End Address
10.0.0.0	10.255.255.255
172.16.0.0	172.31.255.255
192.168.0.0	192.168.255.255

It's important to be able to work with IP addresses like those listed above in order to subnet; and therefore, we will need to learn a bit of IP math along the way.

IP Meaning

An IPv4 address is 32 bits (8 x 4), or four bytes, in size. In human readable context, it's usually expressed in the following, decimal-based, notation style:

192.168.1.6
172.16.3.44

Each set of numbers separated by a dot is referred to as an octet. An octet is a group of 8 bits. Eight bits equals a single byte. By implication, 8 gigabits equals 1 gigabyte, and 8 megabits equals 1 megabyte. We use these symbols to note the terms:

Term	Symbol
bit	b
byte	B
octet	o

Each bit is represented by either a 1 or a 0. For example, the first address above in binary is:

11000000.10101000.00000001.00000110 is 192.168.1.6

Or:

Byte	Decimal Value
11000000	192
10101000	168
00000001	1
00000110	6

IP Math

When doing IP math, one easy way to do it is to simply remember that each bit in each of the above bytes is a placeholder for the following values:

128 64 32 16 8 4 2 1

Alternatively, from low to high:

base-2	Output
2⁰	1
2¹	2
2²	4
2³	8
2⁴	16
2⁵	32
2⁶	64
2⁷	128

In binary, 192 is equal to 11000000. It's helpful to work backward. For IP addresses, all octets are 255 or less (256 total, from 0 to 255) and therefore do not exceed 8 bits or places. To convert the integer 192 to binary:

1 * 2^7 = 128
1 * 2^6 =  64 (128 + 64 = 192)

Then STOP. There are no values left, and so the rest are zeroes. Thus: 11000000

Our everyday counting system is base-10, but binary is base-2, and thus another way to convert binary to decimal is to multiple each bit (1 or 0) by the power of base two of its placeholder:

(0 * 2^0) = 0 +
(0 * 2^1) = 0 +
(0 * 2^2) = 0 +
(0 * 2^3) = 0 +
(0 * 2^4) = 0 +
(0 * 2^5) = 0 +
(1 * 2^6) = 64 +
(1 * 2^7) = 128 = 192

Another way to convert to binary: simply subtract the numbers from each value. As long as there is something remaining or the placeholder equals the remainder of the previous subtraction, then the bit equals 1. So:

192 - 128 = 64 -- therefore the first bit is equal to 1.
Now take the leftover and subtract it:
64 - 64 = 0 -- therefore the second bit is equal to 1.

Since there is nothing remaining, the rest of the bits equal 0.

Subnetting Examples

Subnetting involves dividing a network into two or more subnets. When we subnet, we first identify the number of hosts, aka, the size, we will require on the subnet. For starters, let's assume that we need a subnet that can assign at most 254 IP addresses to the devices attached to it via the router.

In order to do this, we need two additional IP addresses: the subnet mask and the network address/ID. The network address identifies the network and the subnet mask marks the boundary between the network and the hosts. Knowing or determining the subnet mask allows us to determine how many hosts can exist on a network. Both the network address and the subnet mask can be written as IP addresses, but these IP addresses cannot be assigned to computers on a network.

When we have determined these IPs, we will know the broadcast address. This is the last IP address in a subnet range, and it also cannot be assigned to a connected device/host. The broadcast address is used by a router to communicate to all connected devices on the subnet.

For our sake, let's work through this process backwards; that is, we want to identify and describe a network that we are connected to. Let's work with two example private IP addresses that exist on two separate subnets.

Example IP Address 1: 192.168.1.6

Using the private IP address 192.168.1.6, let's derive the network mask and the network address (or ID) from this IP address. First, convert the decimal notation to binary. State the mask, which is /24, or 255.255.255.0. And then derive the network addressing using an bitwise logical AND operation:

11000000.10101000.00000001.00000110 IP              192.168.1.6
11111111.11111111.11111111.00000000 Mask            255.255.255.0
-----------------------------------
11000000.10101000.00000001.00000000 Network Address 192.168.1.0

Note the mask has 24 ones followed by 8 zeroes. The /24 is used as CIDR notation and marks the network portion of the IP address. The remaining 8 bits are for the host addresses.

192.168.1.6/24

For Example 1, we thus have the following subnet information:

Type	IP
Netmask/Mask	255.255.255.0
Network ID	192.168.1.0
Start Range	192.168.1.1
End Range	192.168.1.254
Broadcast	192.168.1.255

Example IP Address 2: 10.160.38.75

For example 2, let's start off with a private IP address of 10.160.38.75 and a mask of /24:

00001010.10100000.00100110.01001011 IP               10.160.38.75
11111111.11111111.11111111.00000000 Mask            255.255.255.0
-----------------------------------
00001010.10100000.00100110.00000000 Network Address   10.160.38.0

Type	IP
Netmask/Mask	255.255.255.0
Network ID	10.160.38.0
Start Range	10.160.38.1
End Range	10.160.38.254
Broadcast	10.160.38.255

Example IP Address 3: 172.16.1.62/24

For example 3, let's start off with a private IP address of 172.16.1.62 and a mask of /24:

10101100 00010000 00000001 00100111 IP                172.16.1.62
11111111 11111111 11111111 00000000 Mask            255.255.255.0
-----------------------------------
10101100 00010000 00000001 00000000 Network Address    172.16.1.0

Type	IP
Netmask/Mask	255.255.255.0
Network ID	172.16.1.0
Start Range	172.16.1.1
End Range	172.16.1.254
Broadcast	172.16.1.255

Determine the Number of Hosts

To determine the number of hosts on a CIDR /24 subnet, we look at the start and end ranges. In all three of the above examples, the start range begins with X.X.X.1 and ends with X.X.X.254. Therefore, there are 254 maximum hosts allowed on these subnets because 1 to 254, inclusive of 1 and 254, is 254.

Example IP Address 4: 10.0.5.23/16

The first three examples show instances where the CIDR is set to /24. This only allows 254 maximum hosts on a subnet. If the CIDR is set to /16, then we can theoretically allow 65,534 hosts on a subnet.

For example 4, let's start off then with a private IP address of 10.0.5.23 and a mask of /16:

00001010.00000000.00000101.00010111 IP Address: 10.0.5.23
11111111.11111111.00000000.00000000 Mask:       255.255.0.0
-----------------------------------------------------------
00001010.00000000.00000000.00000000 Network ID: 10.0.0.0

Type	IP
IP Address	10.0.5.23
Netmask/Mask	255.255.0.0
Network ID	10.0.0.0
Start Range	10.0.0.1
End Range	10.0.255.254
Broadcast	10.0.255.255

Since the last two octets/bytes now vary, we count up by each octet. Therefore, the number of hosts is:

IPs
10.0.0.1
10.0.0.255	= 256
10.0.1.1
10.0.255.255	= 256

Number of Hosts = 256 x 256 = 65536
Subtract Network ID (1) and Broadcast (1) = 2 IP addresses
Number of Usable Hosts = 256 x 256 - 2 = 65534

IPv6 subnetting

We're not going to cover IPv6 subnetting, but if you're interested, this is a nice article: IPv6 subnetting overview

Conclusion

As a systems administrator, it's important to have a basic understanding of how networking works, and the basic models used to describe the internet and its applications. System administrators have to know how to create subnets and defend against various network-based attacks.

In order to acquire a basic understanding, this section covered topics that included:

the internet protocol suite
- link layer
- internet layer
- transport layer
IP subnetting
- private IP ranges
- IP math

In the next section, we extend upon this and discuss the domain name system (DNS) and domain names.

DNS and Domain Names

By the end of this section, you should know:

The purpose of DNS and its role as the "phone book" of the internet.
The structure of domain names, including fully qualified domain names (FQDNs).
How DNS paths work, including root domains, top-level domains (TLDs), second-level domains, and third-level subdomains.
The different types of DNS records and their purposes.
How to use common tools like dig, host, nslookup, and whois for DNS-related tasks.

Getting Started

The DNS (domain name system) is referred to as the phone book of the internet. It's responsible for mapping IP addresses to memorable names. Thus, instead of having to remember:

https://128.163.35.46

We can instead remember this:

https://www.uky.edu

System administrators need to know about DNS because they may be responsible for administering a domain name system on their network, and/or they may be responsible for setting up and administrating web site domains. Either case requires a basic understanding of DNS.

DNS Intro Videos

To help you get started, watch these two YouTube videos and read the text on recursive DNS:

FQDN: The Fully Qualified Domain Name

The structure of the domain name system is like the structure of the UNIX/Linux file hierarchy; that is, it is like an inverted tree.

The fully qualified domain name (FQDN) includes a period at the end of the top-level domain to indicate the root of the DNS hierarchy. Although modern browsers often omit this trailing period, it remains essential for the proper identification of domain names within DNS systems.

Thus, for Google's main page, the FQDN is: www.google.com.

And the parts include:

.           root domain
com         top-level domain (TLD)
google      second-level domain
www         third-level domain

This is important to know so that you understand how the Domain Name System works and how DNS servers are responsible for their part of the network.

Root Domain

The root domain is managed by root name servers. These servers are listed on the IANA (Internet Assigned Numbers Authority) website, but are managed by multiple operators. The root servers manage the root domain, alternatively referred to as the zone, or the . at the end of the .com., .edu., etc.

Alternative DNS Root Systems

It's possible to have alternate internets by using outside root name servers. This is not common, but it happens. Read about a few of them here:

opennic: https://www.opennicproject.org/
alternic: https://en.wikipedia.org/wiki/AlterNIC

As an example, Russia is building its own alternate internet based on a separate DNS root system. When completed, this will create a large, second internet that would be inaccessible to the rest of the world without reconfiguring multiple devices, servers, DNS resolvers, etc. You can read about in this IEEE Spectrum article.

Top Level Domain (TLD)

We are all familiar with top level domains. These generic TLD names are the kinds that include:

.com
.gov
.mil
.net
.org

There are also country coded TLDs (ccTLDs) such as:

.ca (Canada)
.mx (Mexico)
.jp (Japan)
.uk (United Kingdom)
.us (United States)

We can get a total count of current domain names using the command below, which outputs 1,445 (as of October 2024):

curl -s https://data.iana.org/TLD/tlds-alpha-by-domain.txt | sed '1d' | wc -l

The curl command is an alternate to the wget command. They share basic functionality but also have their own use cases. curl by default does not save a retrieved file. Add the -o option to save a file: curl -o [URLs]. Or, visit the iana.org link in the code block above to peruse the list of TLDs.

Second Level Domain Names

In the Google example of www.google.com, the second level domain is google. The second level domain along with the TLD together, along with any further subdomains, forms the fully qualified domain name (FQDN). Other examples of second level domains include:

redhat in redhat.com
debian in debian.org.
wikipedia in wikipedia.org
uky in uky.edu

Third Level Domain Names / Subdomains

When you've purchased (leased) a top and second level domain like ubuntu.com, you can choose whether to add third level domains. For example: www is a third level domain or subdomain. If you owned example.org, you could dedicate a separate server (or a cluster of machines) to www.example.org that resolves to a different location, or www.example.org could resolve to the second-level domain example.org. That is:

The server located at www.debian.org can point to the server located at debian.org.

www.debian.org could be configured to point to a different server than debian.org, meaning that each domain could host separate websites or services. This would be like how maps.google.com points to a different site than mail.google.com. Yet both maps and mail are subdomains of google.com. However, it's a convention that third-level domains marked by www point to the top level domains.

For example:

google.com resolves to www.google.com
but google.com does not resolve to:
- drive.google.com, or
- maps.google.com, or
- mail.google.com

This is because drive.google.com, maps.google.com, and mail.google.com provide different but specific services.

DNS Paths

A recursive DNS server, which is usually managed by an ISP, is the first DNS server to be queried in the DNS system. This is the resolver server in the first video above. This server queries itself (recursive) to check if the domain to IP mapping has been cached (remembered/stored) in its system.

If it hasn't been cached, then the DNS query is forwarded to a root server. There are thirteen root servers.

echo {a..m}.root-servers.net.
a.root-servers.net. b.root-servers.net. c.root-servers.net. d.root-servers.net. e.root-servers.net. f.root-servers.net. g.root-servers.net. h.root-servers.net. i.root-servers.net. j.root-servers.net. k.root-servers.net. l.root-servers.net. m.root-servers.net.

The echo {a..m}.root-servers.net command has nothing to do with DNS. I'm using brace expansion in Bash to simply list the root servers.

When a DNS query is forwarded to a root server, the root server identifies the next server to query, depending on the top level domain (.com, .net, .edu, .gov, etc.). If the site ends in .com or .net, then the next server might be something like: a.gtld-servers.net. Or if the top level domain ends in .edu, then: a.edu-servers.net. might be queried. If the top level domain ends in .gov, then: a.gov-servers.net.. And so forth.

Those top level domains will know where to send the query next. In many cases, the next path is to send the query to a custom domain server. For example, Google's custom name servers are: ns1.google.com to ns4.google.com. UK's custom name servers are: sndc1.net.uky.edu and sndc2.net.uky.edu. Finally, those custom name servers will know the IP address that maps to the domain.

We can use the dig command to query the non-cached DNS paths. Let's say we want to follow the DNS path for google.com. We can start by querying any root server. In the output, we want to pay attention to the QUERY field, the ANSWER field, and the Authority Section. We continue to use the dig command until the ANSWER field returns a number greater than 0. The following commands query one of the root servers, which points us to one of the authoritative servers for .com sites, which points us to Google's custom nameserver, which finally provides an answer, in fact six answers, or six IP address that all map to google.com.

Step by step. First we query the root server and specify that we're interested in the DNS path for google.com:

dig @e.root-servers.net google.com

The output shows that ANSWER: 0 so we keep digging. The ADDITIONAL SECTION points us to the next servers in the DNS path. We can pick one and query that:

dig @a.gtld-servers.net google.com

Again, we see that ANSWER: 0, so we keep digging. The ADDITIONAL SECTION points us to the next servers in the DNS path, which are Google's name servers. We pick one and query that:

dig @ns1.google.com google.com

Here we see ANSWER: 6, which is greater than zero. We now know the DNS path.

The output for the final dig command lists six servers. Google and other major organizations often use multiple servers for load balancing, redundancy, and better geographic distribution of requests. These servers are indicated by the A records in the DNS output.

Many large organizations, especially ISPs, function as autonomous systems (AS). These systems are large collections of IP networks under the control of a single organization and they work to present a common routing policy to the internet. Remember that the internet is an internet of internets!

We can get more information about Google as an autonomous system by locating its AS number or ASN. We do this by using the whois command on one of the IP addresses listed in the final ANSWER SECTION from the last output:

whois 142.250.31.113

The output should include OriginAS: AS15169. This is Google's ASN. Autonomous systems need to communicate with other autonomous systems. This is managed by the Border Gateway Protocol (BGP). This is a core routing protocol that manages how packets are routed across the internet between autonomous systems. BGP's role is to determine the best path for data to travel from one AS to another. BGP therefore functions as the "postal service of the internet." For a humorous (but real) take on BGP, see The Internet's Most Broken Protocol.

Alternatively, we can query UK's:

dig @j.root-servers.net. uky.edu
dig @b.edu-servers.net. uky.edu
dig @sndc1.net.uky.edu. uky.edu

We can also get this path information using dig's trace command:

dig google.com +trace

There are a lot of ways to use the dig command, and you can test and explore them on your own.

DNS Record Types

The A record in the dig output from the above examples shows the mapping between the hostname and the IPv4 address. There are other types of internet records, and we can use the dig command to get information about additional these record types. Some of the more useful records include:

IN: Internet Record
SOA: Start of Authority: describes the site's DNS entries, or the primary name server and the responsible contact information
NS: Name Server: state the name servers that provide DNS resolution
A: Address records: provides mapping hostname to IPv4 address
AAAA: Address records: provides mapping hostname to IPv6 address
TXT: TXT records contain verification data for various services
MX: Mail exchanger: the MX record maps to email servers.
PTR: Pointer record: provides mapping from IP Address to Hostname. This is like the opposite of an A record and allows us to do reverse lookups..
CNAME: Canonical name: this is used to alias one domain name to another, such as www.uky.edu to uky.edu (see discussion above).

To get as much information from the dig command at one time, we use the following dig command:

dig uky.edu ANY

DNS Toolbox

It's important to be able to troubleshoot DNS issues. To do that, we have a few utilities available. Here are examples and you should read the man pages for each one:

`host` Command

The host command is used to perform DNS lookups and returns information about a domain name or IP address. Specifically, the host command resolves hostnames to IP Address; or IP addresses to hostnames.

The following command queries the domain name uky.edu and returns the IP address associated with that domain name:

host uky.edu
uky.edu has address 128.163.35.46

With the -t option, you can get IP address information for specific record types. The following queries the MX records (email servers) for the respective domains:

host -t MX uky.edu
host -t MX dropbox.com
host -t MX netflix.com
host -t MX wikipedia.org

For example, host -t MX uky.edu tells us that UK uses Microsoft Outlook for uky.edu email addresses and host -t MX g.uky.edu tells us that UK uses the Google suite for g.uky.edu email addresses.

`dig` Command

As discussed above, the dig command (Domain Information Groper) is used to retrieve DNS records, providing detailed information about how DNS resolution occurs.

We can use dig to query uky.edu (I've removed extraneous output):

dig uky.edu
;; ANSWER SECTION:
uky.edu.		3539	IN	A	128.163.35.46

ANSWER SECTION: This contains the result of our query.
Domain (uky.edu): The domain name being queried.
TTL (3539): Time to Live, or how long the result is cached.
IN: Internet class.
A: Record type, in this case this refers to an IPv4 address.
IP Address (128.163.35.46): The IP address that corresponds to the queried domain.

We can use dig to examine other record types in the following ways:

dig uky.edu MX
dig www.uky.edu CNAME

`nslookup` Command

The nslookup command queries Internet name servers interactively to find DNS-related information about a domain.

nslookup
> uky.edu
> Server:   127.0.0.53
> Address:  127.0.0.53#53

Non-authoritative answer:
Name:   uky.edu
Address: 128.163.35.46
> exit

Explanation:

Server: The DNS server used for the lookup. In this case, the server is 127.0.0.53, which falls within the loopback address range (127.0.0.0/8). This is used by systemd-resolved, which is a local DNS resolver and caching service that handles DNS lookups on our systems.
Address: The number 53 after the # represents the port number (see /etc/services for list of port numbers). This port number is the standard for DNS queries. So, 127.0.0.53#53 indicates that the DNS server listening at 127.0.0.53 is accepting requests on port 53.
Non-authoritative answer: This indicates that the response is coming from a cache, and not directly from an authoritative DNS server.
NAME (uky.edu) and Address (128.163.35.46): The domain name and the corresponding IP address.

Because of the role that systemd has here, we can use the resolvectl status command to determine which external DNS servers are used behind the scenes.

`whois` Command

The whois command is used to look up information about who owns a particular domain name or IP address.

whois uky.edu | less

Example, abbreviated output:

Domain Name: UKY.EDU

Registrant:
    University of Kentucky
    118 Hardmon Building
    Lexington, KY 40506
    USA

Name Servers:
	SNDC1.NET.UKY.EDU
	SNDC2.NET.UKY.EDU

Domain Name: The domain you queried.
Registrant: The organization that registered the domain.
Name Servers: The authoritative name servers for the domain.

While the whois command is a useful tool for retrieving information about domain ownership, it's important to note that some domain owners choose to keep their information private. Many domain registrars offer privacy protection services, which replace the owner's contact information with generic information to protect their identity. As a result, the output of a whois query may not always reveal the true owner of a domain, showing only the registrar's privacy service instead. This is particularly common for personal or small business domains to protect against spam or unwanted contact.

The `resolve.conf` File

The resolve.conf file contains local resolver info. That is, it contains your your DNS information.

man -f  resolv.conf
resolv.conf (5) - resolver configuration file
cat /etc/resolv.conf
resolvectl status

Conclusion

In the same way that phones have phone numbers to uniquely identify them, servers on the internet use IP addresses to enable communication. Since we're only human, we don't remember every phone number that we dial or every IP address that we visit. In order to make such things human friendly, we use names instead.

Nameservers and DNS records act as the phone book and phone book entries of the internet. Note that I refer to the internet and not the web here. The web is strictly defined or limited to the HTTP/HTTPS protocol, and there are protocols at the OSI application layer. For example, email servers may also have domain names and IP addresses to resolve and use protocols like POP, IMAP, and SMTP.

In this section, we covered the basics of DNS that include:

FQDN: the Fully Qualified Domain Name
Root domains
Top level domains (TLDs) and Country Code TLDS (ccTLDs)
Second level and third level domains/subdomains
DNS paths, and
DNS record types

We also looked at several command line tools to query for information about the domain name system.

For web-based DNS tools, see ViewDNS.info.

Local Security

By the end of this section, you should know:

The purpose of a chroot environment and how it enhances local security by restricting users to a defined directory.
How to create a basic chroot directory and structure it to mimic a limited root file system.
How to set up and test the chroot environment to ensure users are confined to their pseudo-root directory.
The process for identifying and copying necessary binaries and their dependencies into a chroot.
Basic troubleshooting techniques for common issues that may arise when setting up or entering a chroot.
The limitations of chroot for security and how it serves as a foundational concept for containerization technologies like Docker.
How to configure SSH to restrict specific user groups to a chroot environment for added security.

Getting Started

Security challenges predominantly emanate from network interactions; however, safeguarding a system from potential internal threats is equally critical. This can be achieved by enforcing stringent file permissions and ensuring that users lack certain types of access, such as sudo privileges. Take, for instance, the program /usr/bin/gcc. gcc serves as the GNU C and C++ compiler, translating C or C++ source code into executable programs (like exe programs on Windows computers). Unrestricted access to this compiler could potentially allow users to create programs capable of compromising the system (I know, based on personal experience).

In the following section, we will shift focus to external threats and learn about setting up a firewall. In this section, we focus on internal threats and learn how to create a chroot environment. This is a specialized environment that restricts user operations or processes to a defined directory, thereby bolstering system security. By delving into the setup of a chroot environment, we will unveil an effective strategy that can mitigate risks stemming from within the system.

chroot

As we all know, the Linux file system has a root directory /, and under this directory are other directories like /home, /bin, and so forth. A chroot (change root) environment is a way to create a fake root directory at some specific location in the directory tree, and then build an environment in that pseudo root directory that offers some applications. Once that environment is setup, we can confine a user account(s) to that pseudo directory, and when they login to the server, they will only be able to see (e.g., with the cd command) what's in that pseudo root directory and only be able to use the applications that we've made available in that chroot.

Thus, a chroot is a technology used to change the "apparent root / directory for a user or a process" and confine that user to that location on the system. A user or process that is confined to the chroot cannot easily see or access the rest of the file system and will have limited access to the binaries (executables/apps/utilities) on the system. From its man page:

chroot (8) - run command or interactive shell with special root directory

Although it is not security proof, it does have some useful security use cases. Some use chroot to contain DNS or web servers, for example.

chroot is also the conceptual basis for some kinds of virtualization technologies that are common today, like Docker.

Creating a chroot

In this tutorial, we are going to create a chroot.

First, we create a new directory for our chroot. That directory will be located at /mustafar (but it could be elsewhere). Note that the normal root directory is /, but for the chroot, the root directory will be /mustafar even though it will appear as / in the chroot.

Depending on where we create the chroot, we want to check the permissions of the new directory and make sure it's owned by root. If not, use chown root:root /mustafar to set it.

Create directory:
```
sudo mkdir /mustafar
```
Check user and group ownership:
```
ls -ld /mustafar
```
We want to make the bash shell available in the chroot. To do that, we create a /bin directory in /mustafar, and copy bash to that directory.
```
which bash
sudo mkdir /mustafar/bin
sudo cp /usr/bin/bash /mustafar/bin/
```
ALTERNATIVELY: use command substitution to copy bash:
```
sudo mkdir /mustafar/bin
sudo cp $(which bash) /mustafar/bin
```
Large software applications have dependencies, aka libraries. We need to copy those libraries to our chroot directory so applications,like bash, can run.

To identify libraries needed by bash, we use the ldd command:
```
ldd /usr/bin/bash
```
Do not copy!! Output (output may vary depending on your system):
```
linux-vdso.so.1 (0x00007fff2ab95000)
libtinfo.so.6 => /lib/x86_64-linux-gnu/libtinfo.so.6 (0x00007fbec99f6000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fbec97ce000)
/lib64/ld-linux-x86-64.so.2 (0x00007fbec9ba4000)
```
Ignore the first item in the output (linux-vdso.so.1). But we will need the libraries in the last three lines.
Next we create directories for these libraries in /mustafar that match or mirror the directories they reside in. For example, in the above ldd output, two directory paths are highlighted: /lib/x86_64-linux-gnu and /lib64. Therefore, we need to create directories with those names in /mustafar.

To do that, use the mkdir command to create a /mustafar/lib/x86_64-linux-gnu/ directory and a /mustafar/lib64 for the libraries. We need to name the library directories after the originals to stay consistent with the main environment.
```
sudo mkdir -p /mustafar/lib/x86_64-linux-gnu
```
And then:
```
sudo mkdir /mustafar/lib64
```
Then we proceed to copy (not move!) the libraries to their respective directories in the /mustafar directory:
```
cd /mustafar/lib/x86_64-linux-gnu/
sudo cp /lib/x86_64-linux-gnu/libtinfo.so.6 .
sudo cp /lib/x86_64-linux-gnu/libc.so.6
cd /mustafar/lib64/
sudo cp /lib64/ld-linux-x86-64.so.2 .
```
Finally, we can test the chroot:
```
sudo chroot /mustafar
```
If successful, you should see a new prompt like below:
```
bash-5.1#
```
If you try running some commands, that are not part of Bash itself, you'll encounter some errors.

We do have access to some commands, like help, dirs, pwd, cd, and more because these are builtin to bash. Utilities not builtin to bash are not yet available. These include ls, cat, cp, and more. The following is a brief example of interacting in a limited chroot environment with no outside utilities available:
```
bash-5.1# ls
bash: ls: command not found
bash-5.1# help
bash-5.1# dirs
bash-5.1# pwd
bash-5.1# cd bin/
bash-5.1# dirs
bash-5.1# cd ../lib64/
bash-5.1# dirs
bash-5.1# cd ..
bash-5.1# for i in {1..4} ; do echo "$i" ; done
```
To exit the chroot environment, simply type exit:
```
bash-5.1# exit
```

Exercise

Use the ldd command, to add additional binaries. Make the following utilities/binaries available in the /mustafar chroot directory:

ls
cat

Troubleshooting Common chroot Setup Errors

When setting up a chroot environment, you may encounter an error like:

chroot: failed to run command '/bin/bash': No such file or directory

This error often occurs if the chroot environment is missing critical files or directories, such as the bash executable or its required libraries. Here are some steps to resolve this:

Check for the Bash Binary: Ensure that the bash executable has been correctly copied to /mustafar/bin/:
```
sudo ls /mustafar/bin/bash
```
If this file isn't there, go back to the step where you copy bash to the chroot directory.
Verify Library Dependencies: The bash binary requires certain libraries to run. Use the ldd command to list these dependencies and confirm they are copied to /mustafar:
```
ldd /usr/bin/bash
```
Ensure each library listed is copied to the matching directory within /mustafar, such as /mustafar/lib/x86_64-linux-gnu/ and /mustafar/lib64/.
Correct File Structure: Confirm that your chroot directory structure mirrors the actual root structure. The paths within /mustafar should match the paths of the dependencies found using ldd.

After confirming these items, try running chroot again:

sudo chroot /mustafar

If these checks don't resolve the issue, double-check permissions on the chroot directory to ensure root ownership:

sudo chown root:root /mustafar

Conclusion

Systems need to be secure from the inside and out. In order to secure from the inside, system users should be given access and permissions as needed.

In this section, we covered how to create a chroot environment. The chroot confines users and processes to this pseudo root location. It provides them limited access to the overall file system and to the software on the system. We can use this chroot to confine users and processes, like apache2 or human users. Any user listed in /etc/passwd can be chrooted, and most users listed in that file are services.

Restricting a human user to a chrooted environment may not be necessary. On a multi-user system, proper education and training about the policies and uses of the system may be all that's needed. Alternatively, when creating user accounts, we could make their default shell rbash, or restricted bash. rbash limits access to a lot of Bash's main functions, and for added security, it can be used in conjunction with chroot. See man rbash for more details.

In summary, if a stricter environment is needed, you know how to create a basic chroot environment.

Additional Sources:

Appendix A: Non-Google Cloud Systems

Our user accounts and connections to our Google Cloud virtual instances are managed on the Google Cloud console, and we reach these instances using the gcloud compute ssh command. The gcloud command is special software that we installed on our personal systems and authentication happens via our Google accounts. However, on traditional remote systems, we use ssh with its standard syntax, which is: ssh user@domain.com or ssh user@<ip_address>, where user is the account name managed directly on the server and domain.com is the host name of the server.

On those traditional types of systems, we can take advantage of chroot to isolate user accounts to a chrooted environment. There are a number of ways to do this, but below I demonstrate how to isolate users to a chrooted environment based on their group membership.

Let's create a new user. After we create the new user, we will chroot that user going forward.
```
sudo adduser vader
```
Create a new group called mustafar. We can add users to this group that we want to jail in a chrooted environment.
```
sudo groupadd mustafar
sudo usermod -a -G mustafar vader
groups vader
```
Edit /etc/ssh/sshd_config to direct users in the chrootjail group to the chroot directory. Add the following line at the end of the file. Then restart ssh server.
```
sudo nano /etc/ssh/sshd_config
```
Then add:
```
Match group mustafar
            ChrootDirectory /mustafar
```
Exit nano, and restart ssh:
```
systemctl restart sshd
```
Test the ssh connection for the vader user. Here I use ssh on the local system to connect to the local system, simply to test it.
```
ssh vader@localhost
-bash-5.1$ ls
-bash: ls: command not found
exit
```
That works as expected. The user vader is now restricted to a special directory and has limited access to the system or to any utilities on that system.

Appendix B: Additional Tools for Securing Multi-User Shell Systems

In addition to chroot and rbash, other Linux tools can help secure multi-user, shell-accessible systems. These tools can be used to restrict file modifications, monitor system changes, and limit user actions. Together they provide an extra layer of control and protection.

chattr (Change File Attributes)

The chattr command changes file attributes on Linux filesystems. By setting certain attributes on files, you can restrict users—even with superuser permissions—from modifying critical files. This is particularly useful for preventing accidental or malicious deletion or alteration of configuration files and other sensitive data.
- Common Usage: The most frequently used option is the +i (immutable) attribute, which prevents modification or deletion.
```
sudo chattr +i /path/to/important/file
```
Once this attribute is set, the file cannot be modified, renamed, or deleted until the attribute is removed with chattr -i.
- Other Options: There are additional flags to explore, such as +a, which allows only appending to a file, which is useful for log files that should not be altered.
lsattr (List File Attributes)

The lsattr command is used to view the attributes set by chattr. This command shows you which files are immutable or otherwise restricted. It allows administrators to verify that critical files have the appropriate protections.
- Common Usage:
```
lsattr /path/to/important/file
```
This command outputs a list of attributes and helps administrators quickly identify files that are protected or have special restrictions.
sudo and sudoers Configuration

The sudo command grants specific users permission to execute commands as superuser, but the sudoers file can also be configured to limit which commands a user may execute with sudo. By restricting sudo permissions, you can allow users access to only the commands they need to perform their roles.
- Common Usage: Open the sudoers file with visudo to edit user permissions. For example, to allow a user only to use ls and cat with sudo, add:
```
username ALL=(ALL) /bin/ls, /bin/cat
```
ulimit (User Limits)

The ulimit command sets resource limits for user sessions, such as maximum file sizes, number of open files, and CPU time. This is essential in preventing users from consuming excessive resources, which could slow down or crash the system.
- Common Usage: To set a file size limit of 100MB for a session, use:
```
ulimit -f 100000
```
You can make these limits permanent for specific users by adding them to the user’s shell configuration file (e.g., ~/.bashrc) or to the /etc/security/limits.conf file for global settings. If you add them to a user's ~/.bashrc file, you can use the chattr command to prevent the user from editing that file.
faillock and Account Lockout Policies

faillock helps protect against brute-force attacks by locking user accounts after a specified number of failed login attempts. This can prevent unauthorized access to user accounts.
- Common Usage: To set a policy that locks an account for 10 minutes after three failed login attempts, edit /etc/security/faillock.conf:
```
deny = 3
unlock_time = 600
```
Then, restart your authentication services to apply the changes.
iptables for Access Control

While traditionally used for network security, iptables can also be configured to control user access to certain resources or services. For example, you can restrict SSH access to specific IP addresses, reducing the attack surface on multi-user systems.
- Common Usage: To limit SSH access to users coming from a specific IP address:
```
sudo iptables -A INPUT -p tcp --dport 22 -s <allowed_ip> -j ACCEPT
sudo iptables -A INPUT -p tcp --dport 22 -j DROP
```
On Debian-based systems, including Ubuntu, you can use the ufw command (Uncomplicated Firewall) instead of iptables:

To permit SSH access (port 22) only from a specific IP address, use:
```
sudo ufw allow from <allowed_ip> to any port 22
```
To block SSH access from all other IPs, use:
```
sudo ufw deny 22
```
For example, to block SSH traffic from 192.168.1.100, you would write:
```
sudo ufw allow from 192.168.1.100` to any port 22
sudo ufw deny 22
```
Make sure ufw is active and running with these commands:
```
sudo ufw enable
sudo ufw status
```

Combined with chroot and rbash, these create a layered approach to security for multi-user shell systems. Each tool has specific use cases, but together they help administrators establish a secure and controlled environment.

Firewalls and Backups

By the end of this section, you should know:

The Role of Firewalls in Network Security:
- How firewalls control incoming (ingress) and outgoing (egress) traffic at different network layers.
- The difference between link, IP, transport, and application layer rules.
Firewall Configuration in Cloud and Local Environments:
- How to create, manage, and prioritize firewall rules in Google Cloud using the VPC network interface.
- How to use Google Cloud's firewall capabilities to block specific types of traffic, such as ICMP (used by tools like ping and traceroute).
Understanding Google Cloud Firewall Features:
- The concept of default VPC firewall rules and how to override them with higher priority rules.
- How to configure firewall rules to enhance security at the network level.
Configuring Local Firewalls with Ubuntu's ufw:
- How to enable and disable ufw for local firewall management on Ubuntu systems.
- Basic commands to allow or deny access to specific ports or services, such as SSH (port 22) or HTTP (port 80).
- The use of application profiles in /etc/ufw/applications.d to simplify firewall rule creation for services like Apache.
Backup Strategies for Physical and Cloud Systems:
- The difference between backing up virtual machines with Google Cloud snapshots and using local tools for bare metal systems.
- How to create and manage Google Cloud snapshots for disaster recovery and replication purposes.
Using rsync for Backup and Synchronization:
- How to use rsync to copy files and directories locally and to a remote server over SSH.
- The effect of the trailing slash in rsync commands, and how it changes the behavior of the copy operation.
- How to use the --delete option in rsync to synchronize directories by removing files that no longer exist in the source directory.
Understanding the Utility of Different Backup and Security Options:
- When to use Google Cloud firewall versus a local firewall (ufw) and when to use snapshots versus tools like rsync based on your needs.
- The advantages and limitations of each approach to both firewall management and backup strategies.

Getting Started

Most security challenges come from outside of a local network, and the attack vectors are fairly broad. To reduce the vulnerabilities of our systems, a systems administrator must be able to handle the following kinds of tasks:

firewall configuration and management
access control management
network monitoring and intrusion detection
patch management
VPN configuration and management
network segmentation
password and authentication policy enforcement
logging and auditing
security policies and documentation
incident response and disaster recovery planning
security vulnerability assessments
network hardening
encryption implementation
DNS security
endpoint security integration

We have covered some of the tasks above. For example, we learned how to create user accounts and password policies, and both of these are a form of access control management. We learned about sub-networking, which is a form of network segmentation. We learned about the DNS system, which is a fundamental aspect of DNS security. We developed bash scripts to examine log files, which is helpful to understand logging and auditing. We learned how to install software and keep our systems updated, which is a form of patch management. Although we can only cover so much and there is a lot more to learn, in this section we'll begin to learn firewall configuration and management. We will also learn how to create systematic backups of our instances, which is an important part of disaster recovery planning.

Firewalls

A firewall program allows or denies connections for incoming (aka, ingress) or outgoing (aka, egress) traffic. Traffic can be controlled through the:

link layer: a network interface such as an ethernet or wireless card,
IP layer: IPv4 or IPv6 address or address ranges,
transport layer: TCP, UDP, ICMP, etc., or
by the application layer via port numbers: HTTP (port 80), HTTPS (port 443), SSH (port 22), SMTP (port 465), etc.

Firewalls have other abilities. For example, they can be used to:

place limits on the number of attempts to connect,
create virtual private networks, and
throttle bandwidth for certain applications, ports, etc.

As a side note, bare metal servers may have multiple ethernet network interface cards (NICs). Each NIC would, of course, have its own MAC address, and therefore would be assigned different IP addresses. Thus, at the link layer, incoming connections can be completely blocked on one card and outgoing connections can be completely blocked on the other.

To manage these connections, firewalls apply rules. A rule may block all incoming connections, but then allow SSH traffic through port 22, either via TCP or UDP, and then further restrict SSH connections to a specific IP range. And/or, another rule may block all incoming, unencrypted HTTP connections through port 80, but allow all incoming, encrypted HTTPS connections through port 443.

Firewalls on Google Cloud

Let's cover using Google Cloud to create a basic firewall rule. This will prepare us for setting up new rules when we configure our LAMP servers in the next section.

LAMP originally referred to Linux, Apache, MySQL, and PHP. These four technologies together create a web server. Technically, only Linux (or some other OS) and Apache (or some other web server software) are needed to serve a website. At a basic level, all a web server does is open up an operating system's filesystem to the world. But PHP and MySQL provide additional functionality, like the ability for a website to interact with a relational database. The M in LAMP may also refer to MariaDB, which is a fully open source clone of MySQL. Other relational databases are usable, too, such as PostgreSQL or SQLite. We'll use MariaDB later in this course.

First review how our Google Cloud instances are pre-populated with default firewall rules at the network level. Follow that with a review of the firewall documentation, which provides an overview of the rules we'll use.

Block the `ping` application

We'll begin by implementing a basic firewall rule where we block incoming ICMP traffic. ICMP traffic is used by several applications, such as ping and traceroute. The ping command is a simple tool that can test whether a server at an IP address or domain is running. It's useful for error reporting and network diagnostics. The traceroute command is used to display the path between two internet devices. While both are useful, we may want to block that traffic to prevent others from gaining information about our servers. To do so:

In the Google Cloud Console, click on VPC network and select Firewall.
The default VPC firewall rules are listed that allow for HTTP, ICMP, RDP, and SSH traffic.
Priority settings are set for each rule. Lower numbers mean the rules have a higher priority.
The predefined rules allow for incoming ICMP traffic set at a priority level of 65534.
We could delete that, but we should leave these rules in place and create a higher priority rule that will supersede that.
Click on Create Firewall Rule.
For name, enter new-deny-icmp.
Keep priority at 1000.
Under Direction of traffic, keep as Ingress.
Under Action on match, select Deny.
In the Source IPv4 ranges, enter 0.0.0.0/0 for all network traffic.
Under Protocols and ports, select Other, and type in icmp.
Click the Create button to create the rule.

On the VM instances page, you can find the external IP address for your virtual machine. For example, let's say mine is 33.333.333.100. From my laptop, if I try to ping that IP address, I should not get a response.

Once you have tested this rule, feel free to keep or delete it. To delete it, select the check box next to the rule, and then click the Delete button.

Note: Google's Firewall rules, at extra cost, now offer the ability to block specific domains (FQDN-based firewalls) and to block geographical regions.

OS Firewall Applications

In case you are working on firewalls on a specific machine instead of on the cloud, then you would want to become familiar with Linux-based firewall applications.

On Ubuntu, the main firewall application is ufw. On RedHat-based distributions, the main firewall application is firewalld. Both of these firewall applications are user-friendly front-ends of the iptables firewall application, which is built into the Linux kernel. Although we are using an Ubuntu distribution as our virtual machines, Ubuntu's ufw firewall is disabled by default. This is likely because it may be overkill to use both Google Cloud's firewall and Ubuntu's ufw.

FreeBSD and OpenBSD, two non-Linux but Unix-like operating systems, offer pf: pf on FreeBSD and pf on OpenBSD. These BSD OSes are often used to build firewall servers.

ufw (Uncomplicated Firewall) is a user-friendly command-line firewall utility. It simplifies firewall configuration on Debian-based Linux operating systems, such as Ubuntu. It is designed to provide a more user friendly interface to the underlying iptables application. However, it is still powerful enough to secure a system effectively.

Basic Features

Enable/Disable Firewall:
- Enable: sudo ufw enable
- Disable: sudo ufw disable
Allow/Deny Access:
- Allow access to port 22 (ssh): sudo ufw allow 22
- Deny access to port 80 (http): sudo ufw deny 80
- Other services can be set based on the contents of the /etc/services file.
Status and Logging:
- Check ufw status: sudo ufw status
- Log firewall entries: sudo ufw logging on
- Change logging level: sudo ufw logging low
- View log entries: sudo less /var/log/ufw.log

ufw can also control the firewall based on a list of profiles and predefined rules. These profiles and predefined rules are contained in the /etc/ufw/applications.d directory. For example, there are three protocols for the apache2 web server: Apache, Apache Secure, and Apache Full. These are defined in /etc/ufw/applications.d/apache2-utils.ufw.profile file:

[Apache]
title=Web Server
description=Apache v2 is the next generation of the omnipresent Apache web server.
ports=80/tcp

[Apache Secure]
title=Web Server (HTTPS)
description=Apache v2 is the next generation of the omnipresent Apache web server.
ports=443/tcp

[Apache Full]
title=Web Server (HTTP,HTTPS)
description=Apache v2 is the next generation of the omnipresent Apache web server.
ports=80,443/tcp

Based on the above profile, we can set ufw for Apache like so:

sudo ufw allow 'Apache Full'

To see other examples, read: ufw documentation. If you are using a RedHat distribution of Linux, then checkout A Beginner's Guide to firewalld in Linux.

Backups

Catastrophes (natural, physical, criminal, or out of negligence) happen. As a systems administrator, you may be required to have backup strategies to mitigate data loss.

How you backup depends on the machine. If I am managing bare metal, and I want to backup a physical disk to another physical disk, then that requires a specific tool. However, if I am managing virtual machines, like our Google Cloud instance, then that requires a different tool. Therefore, in this section, I will briefly cover both scenarios.

Google Cloud Snapshots

Since our instance on Google Cloud is a virtual machine, we can use the Google Cloud console to create snapshots of our instance. A snapshot is a copy of a virtual machine at the time the snapshot was taken. What's great about taking a snapshot is that the result is basically a file of a complete operating system. Since it's a file, it can itself be used in other projects or used to restore a machine to the time the snapshot was taken.

Snapshots may also be used to document or reproduce work. For example, if I worked with programmers, as a systems administrator, I might help a programmer share snapshots of a virtual machine with other programmers. Those other programmers could then restore the snapshot in their own instances, and see and run the original work in the environment it was created in.

Taking snapshots in Google Cloud is very straightforward, but since it does take up extra storage, it will accrue extra costs. Since we want avoid that for now, please see the following documentation for how to take a snapshot in Google Cloud:

Create and manage disk snapshots

Click on Compute Engine.
Click on Snapshots.
Click on Create Snapshot.
Enter a name for your snapshot.
Select the Source disk.
Select Snapshot under the Type section.
Click on Create.

`rsync`

If we were managing bare metal machines, then we might use a program like rsync to backup physical disk drives. rsync is a powerful program. It can copy disks, directories, and files. It can copy files from one location, and send the copies, encrypted, to a remote server.

For example, let's say I mount an external hard drive to my filesystem at /mnt/backup. To copy my home directory, I'd use:

rsync -av /home/me/ /mnt/backup/

where /home/me/ is the source directory, and /mnt/backup/ is the destination directory.

Syntax matters here. If I include the trailing slash on the source directory, then rsync will copy everything in /home/me/ to /mnt/backup/. However, if I leave the trailing slash off, like so:

rsync -av /home/me /mnt/backup/

then the result will be that the directory me/ will be copied to /mnt/backup/me/.

Let's see this in action. Say I have two directories. In the tmp1/ directory, there are two files: file1 and file2. The tmp2/ directory is empty. To copy file1 and file2 to tmp2, then:

ls tmp1/
file1 file2
rsync -av tmp1/ tmp2/
ls tmp2
file1 file2

However, if I leave that trailing slash off the source directory, then the tmp1/ will get copied to tmp2/:

ls tmp1
file1 file2
rsync -av tmp1 tmp2/
ls tmp2/
tmp1/
ls tmp2/tmp1/
file1 file2

rsync can also send a source directory to a directory on a remote server, and the directory and files being copied will be encrypted on the way. To do this, we use ssh style syntax:

rsync -av tmp1/ USER@REMOTE:~/tmp2/

For example:

rsync -av tmp1 linus@222.22.33.333:~/tmp2/

Delete Option

rsync has a --delete option. Adding this option means that rsync will synchronize the source directory with the destination directory. This means that if I had already created a backup of tmp1 to tmp2, and then delete file1 in tmp1 later, then run rsync with the delete option, then rsync will also delete file1 from tmp2/. This is how that looks:

ls tmp1/
file1 file2
rsync -av tmp1/ tmp2/
ls tmp2/
file1 file2
rm tmp1/file1
ls tmp1/
file2
rsync -av --delete tmp1/ tmp2/
ls tmp2
file2

Backups are no good if we don't know how to restore a backup to a disk. To restore with rsync, we just reverse the destination directory with the source directory:

rsync -av tmp2/ tmp1/

Conclusion

System security involves a multi-vectored approach that includes many tasks, from password management to log audits. In this section, we covered firewalls and backups, and have thus addressed new vectors to protect: firewall configuration and management and disaster recovery.

Since we're running an Ubuntu server on Google Cloud, we have Google Cloud options for creating firewall rules at the network level and for backing up disks as snapshots. We also have Ubuntu options for creating firewall rules at the OS level using ufw and for backing up disks using commands like rsync. How we go about either depends entirely on our needs or on our organization's needs. But knowing these options exist and the different reasons why we have these options, provides quite a bit of utility.

Creating a LAMP Server

In this section, we learn how to set up a LAMP (Linux, Apache, MariaDB, PHP) stack. This stack enables us to create a web server that provides extra functionality via PHP and MariaDB. Knowing how to set up a LAMP stack is a fun and valuable, basic skill to have as a systems administrator.

Installing the Apache Web Server

By the end of this section, you will be able to:

Install the Apache web server on an Ubuntu system and verify its status using basic commands.
Configure Apache to serve a basic web page, using both command line and graphical web browsers to view it.
Understand where Apache stores key configuration and content files, and modify the default document root to create your own custom web content.

Getting Started

Apache is an HTTP server, otherwise called web server software. An HTTP server makes files on a computer available to others who are able to establish a connection to the computer and view the files with a web browser. Other HTTP server software exists and another major product is nginx.

It's important to understand the basics of an HTTP server. Please read Apache's Getting Started page before proceeding, as it covers important basics of HTTP servers. Each of the main sections on that page describe the important elements that make up and serve a website, including:

clients, servers, and URLs
hostnames and DNS
configuration files and directives
web site content
log files and troubleshooting

Installation

Before we install Apache, we need to update our systems first.

sudo apt update
sudo apt -y upgrade

Once the machine is updated, we can install Apache2 using apt. First we'll use apt search to identify the specific package name. I already know that a lot of results will be returned, so let's pipe the apt search command through head to look at the initial results:

sudo apt search apache2 | head

On Ubuntu, the Apache package is named apache2. On other distributions, it may be named differently, such as httpd on Fedora.

apt show apache2

Once we've confirmed that apache2 is the package that we want, we install it with the apt install command.

sudo apt install apache2

Basic checks

Let's check if the server is up and running, configure some basic things, and then create a basic web site. To start, we use systemctl to ensure apache2 is enabled (starts automatically on reboot) and active (currently running):

systemctl list-unit-files apache2.service
systemctl status apache2

The output shows that apache2 is enabled, which means that it will start running automatically if the computer gets rebooted. The output of the second command shows that apache2 is enabled and that it is also active (running).

Creating Your First Web Page

Since apache2 is up and running, let's look at the default web page. The default web page is the landing page of your server, and it is stored in the document root (/var/www/html) as a file named index.html.

There are two ways we can look at the default web page. We can use a command line web browser or your regular graphical browser.. There are a number command line browsers available, such as elinks, links2, lynx, and w3m (which I prefer).

To check with w3m, we have to install it first:

sudo apt install w3m

Once it's installed, we can visit our default site using the loopback IP address (aka, localhost). From the command line on our server, we can run either of these two commands:

w3m 127.0.0.1

Or:

w3m localhost

We can also get the subnet/private IP address using the ip a command, and then use that with w3m. For example, if ip a showed that my NIC has a private IP address of 10.0.1.1, then I could use w3m with that IP address:

w3m 10.0.1.1

If the apache2 installed and started correctly, then you should see the following text at the top of the screen:

Apache2 Ubuntu Default Page
It works!

To exit w3m, press q and then y to confirm exit.

To view the default web page using a regular web browser, like Firefox, Chrome, Safari, Edge, or etc., you need to get the server's public IP address. To do that, log into the Google Cloud Console. In the left hand navigation panel, hover your cursor over the Compute Engine link, and then click on VM instances. You should see your External IP address in the table on that page. You can copy that external IP address or simply click on it to open it in a new browser tab. Then you should see the graphical version of the Apache2 Ubuntu Default Page.

Please take a moment to read through the text on the default page. It provides information about where Ubuntu stores configuration files and document roots, which is where website files go.

Let's create our first web page. The default page described above provides the location of the document root at /var/www/html. When we navigate to that location, we'll see that there is an index.html file located in that directory. This is the Apache2 Ubuntu Default Page that we described above. Let's rename that index.html file, and create a new one:

cd /var/www/html/
sudo mv index.html index.html.original
sudo nano index.html

If you know HTML, then feel free to write some basic HTML code to get started. Otherwise, you can re-type the content below in nano, and then save and exit out.

<html>
<head>
<title>My first web page using Apache2</title>
</head>
<body>

<h1>Welcome</h1>

<p>Welcome to my web site. I created this site using the Apache2 HTTP server.</p>

</body>
</html>

If you have our site open in your web browser, reload the page, and you should see the new text.

You can still view the original default page by specifying its name in the URL. For example, if your external IP address is 55.222.55.222, then you'd specify it like so:

http://55.222.55.222/index.html.original

Conclusion

In this section, we learned about the Apache2 HTTP server. We learned how to install it on Ubuntu, how to use systemd (systemctl) commands to check its default status, how to create a basic web page in /var/www/html, how to view that web page using the w3m command line browser and with our regular graphical browser.

In the next section, we will install PHP to enable dynamic content, which will make our websites more interactive.

Installing and Configuring PHP

By the end of this section, you will be able to:

Install and configure PHP to work with Apache2, which will enable dynamic content on your web server.
Provide an overview of how PHP interacts with a web server.
Modify Apache settings to serve PHP files as the default content.

Getting Started

Client-side programming languages, like JavaScript, are handled by the browser. Major browsers like Firefox, Chrome, Safari, Edge, etc. include JavaScript engines that use just-in-time compilers to execute the JavaScript code (Mozilla has a nice description of the process.) From an end user's perspective, you basically install JavaScript when you install a web browser.

PHP, on the other hand, is a server-side programming language. This means it must be installed on the server in order to be used by the browser. From a system administrator's perspective, this means that not only does PHP have be installed on a server, but it must also be configured to work with the HTTP server, which in our case is Apache2.

The main use of PHP is to interact with databases, like MySQL, MariaDB, PostgreSQL, etc., in order to create dynamic page content. This is our end goal. To accomplish this, we have to:

Install PHP and relevant Apache2 modules
Configure PHP and relevant modules to work with Apache2
Configure PHP and relevant modules to work with MariaDB

Install PHP

PHP allows us to create dynamic content, which means we can customize what is displayed on a web page based on user inputs or data from a database. As normal, we will use apt install to install PHP and relevant modules and then restart Apache2 using the systemctl command:

sudo apt install php libapache2-mod-php
sudo systemctl restart apache2

We can check its status and see if there are any errors:

systemctl status apache2

Check Install

To check that it's been installed and that it's working with Apache2, we can create a small PHP file in our web document root. To do that, we navigate to the /var/www/html/ directory, and create a file called info.php. The info.php file allows us to verify that PHP is correctly installed and configured with Apache2. It displays detailed information about the PHP environment on the server.

cd /var/www/html/
sudo nano info.php

In that file, add the following text, then save and close the file:

<?php
phpinfo();
?>

Now visit that file using the external IP address for your server. For example, in Firefox, Chrome, etc, go to (be sure to replace the IP below with your IP address):

http://55.333.55.333/info.php

You should see a page that provides system information about PHP, Apache2, and the server. The top of the page should look like Figure 1 below:

Fig. 1. A screenshot of the title of the PHP install page.

Basic Configurations

By default, Apache2 is set up to serve index.html files first, but since we are adding PHP to our setup, we want Apache2 to prioritize PHP files to allow for dynamic content creation.

To prioritize PHP files, we need to edit the dir.conf file in the /etc/apache2/mods-enabled/ directory. In that file there is a line that starts with DirectoryIndex. The first file in that line is index.html, and then there are a series of other files that Apache2 will look for in the order listed. If any of those files exist in the document root, then Apache2 will serve those before proceeding to the next. We simply want to put index.php first and let index.html be second on that line.

cd /etc/apache2/mods-enabled/
sudo nano dir.conf

And change the line to this:

DirectoryIndex index.php index.html index.cgi index.pl index.xhtml index.htm

Whenever we make a configuration change, we can use the apachectl command to check our configuration:

apachectl configtest

If we get an Syntax Ok message, you can reload the Apache2 configuration and restart the service:

sudo systemctl reload apache2
sudo systemctl restart apache2

If the configuration test does not return Synax Ok, check the error message for guidance, and ensure all modifications to dir.conf were made correctly.

Now create a basic PHP page. cd back to the document root directory and use nano to create and open and index.php file:

cd /var/www/html/
sudo nano index.php

Creating an index.php File

Let's now create an index.php page, and add some HTML and PHP to it. The PHP can be a simple browser detector. This script will detect and display the browser information of whoever is visiting your site. It uses the PHP global variable $_SERVER['HTTP_USER_AGENT'] to print the user's browser type and version.

First, make sure you are in the /var/www/html/ directory. Use sudo nano to create and edit index.php. Then add the following code:

<html>
<head>
<title>Browser Detector</title>
</head>
<body>
<p>You are using the following browser to view this site:</p>

<?php
echo $_SERVER['HTTP_USER_AGENT'] . "\n\n";

$browser = get_browser(null, true);
print_r($browser);
?>
</body>
</html>

Next, save the file and exit nano. In your browser, visit your external IP address site:

http://55.333.55.333/

Although your index.html file still exists in your document root, Apache2 now returns the index.php file instead. However, if for some reason the index.php was deleted, then Apache2 would revert to the index.html file since that's listed next in the dir.conf DirectoryIndex line.

Conclusion

In this section, we installed PHP and configured it work with Apache2. We also created a simple PHP test page that reported our browser user agent information on our website.

In the next section, we'll learn how to complete the LAMP stack by adding the MariaDB relational database to our setup. By adding a relational database, we will be able to pull data from the database and present it in the browser based on user actions.

Installing and Configuring MariaDB

By the end of this section, you will be able to:

Install and configure MariaDB as part of the LAMP stack. This will enable your server to store and manage data.
Create and secure a MariaDB root user, set up a regular user for day-to-day operations, and understand best practices for database security.
Write basic SQL commands to create tables, insert records, and run queries.
Integrate MariaDB with PHP to build dynamic web pages.

Getting Started

We started our LAMP stack when we installed Apache2, and then we added extra functionality when we installed and configured PHP to work with Apache2. In this section, our objective is to complete the LAMP stack and install and configure MariaDB.

MariaDB is a (so-far) compatible fork of the MySQL relational database. It allows us to store, retrieve, and manage data for our websites. This makes our web applications dynamic and capable of handling complex user interactions. If you need a refresher on relational databases, the MariaDB website can help. See: Introduction to Relational Databases.

It's also good to review the documentation for any technology that you use. MariaDB has good documentation and getting started pages.

Install and Set Up MariaDB

In this section, we'll learn how to install, setup, secure, and configure the MariaDB relational database. The goal it to make it work with the Apache2 web server and the PHP programming language.

First, let's install MariaDB Community Server, and then log into the MariaDB shell under the MariaDB root account.

sudo apt install mariadb-server mariadb-client

This should also start and enable the database server, but we can check if it's running and enabled using the systemctl command:

systemctl status mariadb

Next we need to run a post installation script called mysql_secure_installation that sets up the MariaDB root password and performs security checks. To do that, run the following command, and be sure to save the MariaDB root password you create:

sudo mysql_secure_installation

Again, here is where you create a root password for the MariaDB database server. Be sure to save that and not forget it! When you run the above script, you'll get a series of prompts to respond to like below. Press enter for the first prompt, press Y for the prompts marked Y, and input your own password. Since this server is exposed to the internet, be sure to use a complex password.

Enter the current password for root (enter for none):
Set root password: Y
New Password: [YOUR-PASSWORD-HERE]
Re-enter new password: [YOUR-PASSWORD-HERE]
Remove anonymous users: Y
Disallow root login remotely: Y
Remove test database and access to it: Y
Reload privilege tables now: Y

Removing anonymous users ensures that no one can access the database without credentials. Disallowing remote root login reduces the risk of unauthorized remote access.

We can login to the database to test it. In order to do so, we have to become the root Linux user, which we can do with the following command:

sudo su

Note: we need to generally be careful when we enter commands on the command line, because it's a largely unforgiving computing environment. But we need to be especially careful when we are logged in as the Linux root user. This user can delete anything, including files that the system needs in order to boot and operate. Always use exit immediately after finishing tasks as root to minimize the risk of accidental changes that could affect the entire system.

After we are root, we can login to MariaDB, run the show databases; command, and then exit MariaDB the \q command:

root@hostname:~# mariadb -u root
Welcome to the MariaDB monitor.  Commands end with ; or \g.
Your MariaDB connection id is 47
Server version: 10.3.34-MariaDB-0ubuntu0.20.04.1 Ubuntu 20.04

Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

MariaDB [(none)]> show databases;
+--------------------+
| Database           |
+--------------------+
| information_schema |
| mysql              |
| performance_schema |
+--------------------+
3 rows in set (0.002 sec)

Note: If we are logging into the root database account as the root Linux user, we don't need to enter our password.

Create and Set Up a Regular User Account

We need to reserve the root MariaDB user for special use cases. Instead we create a regular MariaDB user. Using a regular user account minimizes the security risks associated with performing everyday operations. Root privileges should be reserved for administrative tasks only!

To create a regular MariaDB user, we use the create command. In the command below, I create a new user called webapp. I use a complex password that I insert within the single quotes at the end:

MariaDB [(none)]> create user 'webapp'@'localhost' identified by '[YOUR-PASSWORD-HERE]';

If the prompt returns a Query OK message, then the new user should have been created without any issues.

Create a Practice Database

As the root database user, let's create a new database for a regular, new user.

The regular user will be granted all privileges on the new database, including all its tables. Other than granting all privileges, we could limit the user to specific privileges, including: CREATE, DROP, DELETE, INSERT, SELECT, UPDATE, and GRANT OPTION. Such privileges may be called operations or functions, and they allow MariaDB users to use and modify the databases, where appropriate. For example, we may want to limit the webapp user to only be able to use SELECT commands. It totally depends on the purpose of the database and our security risks.

MariaDB [(none)]> create database linuxdb;
MariaDB [(none)]> grant all privileges on linuxdb.* to 'webapp'@'localhost';
MariaDB [(none)]> show databases;

Exit out of the MariaDB database as the root MariaDB user. Then exit out of the root Linux user account. You should be back to your normal Linux user account:

MariaDB [(none)]> \q
root@hostname:~# exit

Note: relational database keywords are often written in all capital letters. As far as I know, this is simply a convention to make the code more readable. However, in most cases I'll write the keywords in lower case letters. This is simply because, by convention, I'm super lazy.

Logging in as Regular User and Creating Tables

We can start doing MariaDB work. As a reminder, we've created a new MariaDB user named webapp and a new database for webapp that is called linuxdb. When we run the show databases command as the webapp user, we should see the linuxdb database (and only the linuxdb database). Note below that I use the -p option. This instructs MariaDB to request the password for the webapp user, which is required to log in.

mariadb -u webapp -p
MariaDB [(none)]> show databases;
MariaDB [(none)]> use linuxdb;

A database is not worth much without data. In the following code, I create and define a new table for our linuxdb database. The table will be called distributions, and it will contain data about various Linux distributions. This includes the name of distribution, distribution developer, and founding date. Creating this kind of structure with separate fields to store essential data is a common approach for structuring data that can be easily queried and expanded.

MariaDB [(linuxdb)]> create table distributions
    -> (
    -> id int unsigned not null auto_increment,
    -> name varchar(150) not null,
    -> developer varchar(150) not null,
    -> founded date not null,
    -> primary key (id)
    -> );
Query OK, 0 rows affected (0.07 sec)

MariaDB [(linuxdb)]> show tables;
MariaDB [(linuxdb)]> describe distributions;

Congratulations! Now create some records for that table.

Adding records into the table

We can populate our linuxdb database with some data. We'll use the insert command to add our records into our distribution table.

MariaDB [(linuxdb)]> insert into distributions (name, developer, founded) values
    -> ('Debian', 'The Debian Project', '1993-09-15'),
    -> ('Ubuntu', 'Canonical Ltd.', '2004-10-20'),
    -> ('Fedora', 'Fedora Project', '2003-11-06');
Query OK, 3 rows affected (0.004 sec)
Records: 3  Duplicates: 0  Warnings: 0
MariaDB [(linuxdb)]> select * from distributions;

Success! Now let's test our table.

Testing Commands

We will complete the following tasks to refresh our MySQL/MariaDB knowledge:

retrieve some records or parts of records,
delete a record,
alter the table structure so that it will hold more data, and
add a record:

MariaDB [(linuxdb)]> select name from distributions;
MariaDB [(linuxdb)]> select founded from distributions;
MariaDB [(linuxdb)]> select name, developer from distributions;
MariaDB [(linuxdb)]> select name from distributions where name='Debian';
MariaDB [(linuxdb)]> select developer from distributions where name='Ubuntu';
MariaDB [(linuxdb)]> select * from distributions;
MariaDB [(linuxdb)]> alter table distributions
    -> add packagemanager char(3) after name;
MariaDB [(linuxdb)]> describe distributions;
MariaDB [(linuxdb)]> update distributions set packagemanager='APT' where id='1';
MariaDB [(linuxdb)]> update distributions set packagemanager='APT' where id='2';
MariaDB [(linuxdb)]> update distributions set packagemanager='DNF' where id='3';
MariaDB [(linuxdb)]> select * from distributions;
MariaDB [(linuxdb)]> delete from distributions where name='Debian';
MariaDB [(linuxdb)]> insert into distributions
    -> (name, packagemanager, developer, founded) values
    -> ('Debian', 'APT', 'The Debian Project', '1993-09-15'),
    -> ('CentOS', 'YUM', 'The CentOS Project', '2004-05-14');
MariaDB [(linuxdb)]> select * from distributions;
MariaDB [(linuxdb)]> select name, packagemanager
    -> from distributions
    -> where founded < '2004-01-01';
MariaDB [(linuxdb)]> select name from distributions order by founded;
MariaDB [(linuxdb)]> \q

Install PHP and MySQL Support

The next goal is to complete the connection between PHP and MariaDB so that we can use both for our websites. Adding PHP support for MariaDB allows us to write scripts that can interact with the database. This enables us to dynamically display and modify content in the web browser based on user interactions.

First install PHP support for MariaDB. We're installing some modules alongside the basic support. These may or may not be needed, but I'm installing them to demonstrate some basics.

sudo apt install php-mysql

And then restart Apache2 and MariaDB:

sudo systemctl restart apache2
sudo systemctl restart mariadb

Create PHP Scripts

In order for PHP to connect to MariaDB, it needs to authenticate itself. To do that, we will create a login.php file in /var/www/html. We also need to change the group ownership of the file and its permissions. Since this file contains password information, changing its permissions mean we prevent others from accessing it.

cd /var/www/html/
sudo touch login.php
sudo chmod 640 login.php
sudo chown :www-data login.php
ls -l login.php
sudo nano login.php

In the file, add the following credentials. If you used a different database name than linuxdb and a different username than webapp, then you need to substitute your names below. You need to use your own password where I have the Xs:

<?php // login.php
$db_hostname = "localhost";
$db_database = "linuxdb";
$db_username = "webapp";
$db_password = "[YOUR-PASSWORD-HERE]";
?>

Next we create a new PHP file for our website. This file will display HTML but will primarily be PHP interacting with our MariaDB distributions table in our linuxdb database.

Create a file titled distros.php.

sudo nano distros.php

Then copy over the following text. I suggest you transcribe it, especially if you're interested in learning a bit of PHP, but you can simply copy and paste it into the nano buffer:

<html>
<head>
<title>MySQL Server Example</title>
</head>
<body>

<?php

// Load MySQL credentials
require_once 'login.php';

// Establish connection
$conn = mysqli_connect($db_hostname, $db_username, $db_password) or
  die("Unable to connect");

// Open database
mysqli_select_db($conn, $db_database) or
  die("Could not open database '$db_database'");

// QUERY 1
$query1 = "show tables from $db_database";
$result1 = mysqli_query($conn, $query1);

$tblcnt = 0;
while($tbl = mysqli_fetch_array($result1)) {
  $tblcnt++;
}

if (!$tblcnt) {
  echo "<p>There are no tables</p>\n";
}
else {
  echo "<p>There are $tblcnt tables</p>\n";
}

// Free result1 set
mysqli_free_result($result1);

// QUERY 2
$query2 = "select name, developer from distributions";
$result2 = mysqli_query($conn, $query2);

$row = mysqli_fetch_array($result2, MYSQLI_NUM);
printf ("%s (%s)\n", $row[0], $row[1]);
echo "<br/>";

$row = mysqli_fetch_array($result2, MYSQLI_ASSOC);
printf ("%s (%s)\n", $row["name"], $row["developer"]);

// Free result2 set
mysqli_free_result($result2);

// Query 3
$query3 = "select * from distributions";
$result3 = mysqli_query($conn, $query3);

while($row = $result3->fetch_assoc()) {
  echo "<p>Owner " . $row["developer"] . " manages distribution " . $row["name"] . ".</p>";
}

mysqli_free_result($result3);

$result4 = mysqli_query($conn, $query3);
while($row = $result4->fetch_assoc()) {
  echo "<p>Distribution " . $row["name"] . " was released on " . $row["founded"] . ".</p>";
}

// Free result4 set
mysqli_free_result($result4);

/* Close connection */
mysqli_close($conn);

?>

</body>
</html>

Save the file and exit out of nano.

Test Syntax

After you save the file and exit the text editor, we need to test the PHP syntax. If there are any errors in our PHP, these commands will show the line numbers that are causing errors or leading up to errors. Nothing will output if all is well with the first command. If all is well with the second command, HTML should be outputted:

sudo php -f login.php
sudo php -f distros.php

Conclusion

Congratulations! If you've reached this far, you have successfully created a LAMP stack. In the process, you have learned:

how to install and set up MariaDB
how to create MariaDB root and regular user accounts
how to create a test database with play data for practicing, and
how to connect this with PHP for display on a webpage.

In regular applications of these technologies, there's a lot more involved, but completing the above process is a great start to learning more. In the next section, we will apply what we learned in the PHP and MariaDB sections to install and configure a WordPress installation.

Install WordPress

By the end of this section, you will be able to:

Manually install and configure WordPress on your server, giving you complete control over your installation.
Set up a MySQL database for WordPress and link it through configuration files.
Understand the basics of WordPress file management, security configurations, and user setup for dynamic content management.

Introduction

WordPress is a free and open source content management system (CMS). Originally, its focus was on providing a platform for blogging, but throughout its lifespan it has become a general purpose CMS that functions as a website builder. Two sites exist to provide access to WordPress: WordPress.com and Wordpress.org. WordPress.com is a hosted service where WordPress handles everything, from updates to security. Customers are mainly responsible for content and the appearance of their sites. Various paid plans can extend the functionality offered to WordPress.com customers.

WordPress.org is maintained by the WordPress Foundation, which oversees its development and provides access to the WordPress software. When we download the WordPress software, we download it from WordPress.org. Unlike the hosted solution, WordPress.org is for users who want full control and responsibility over their website installation and maintenance.

WordPress is widely used software, and because of that, it's often the focus of attack. Take a moment to read about the developer's efforts to protect WordPress: Security. We will not need to update our WordPress installs during the course of this course, but you should be familiar with the update process in case you decide to maintain your install or an install at a future date: Updating WordPress.

Plugins are often used with WordPress sites to offer all sorts of additional capabilities. Currently, there are over 60 thousand plugins available for WordPress, but some are of higher quality and utility than others. In addition to the thousands of available plugins, there are over 10 thousand free themes for WordPress sites. Plus, many businesses offer paid themes or can offer customized themes based on customer needs. These themes can drastically alter the appearance and usability of a WordPress site. I encourage you to explore plugins, develop, and add content to your WordPress sites, but the main goal as a systems administrator is to set up the sites and not build out content.

Installation

So far I have shown you how to install software using two methods:

using the apt command
downloading from GitHub

In this lesson, we are going to install WordPress by downloading the most recent version from WordPress.org and installing it manually. The WordPress application is available via the apt command, but the apt installation method often requires additional manual configuration that can be inconsistent or more difficult to troubleshoot compared to the manual process we're using here.

We are going to kind of follow the documentation provided by WordPress.org. You should read through the documentation before following my instructions, but then follow the process I outline here instead because the documentation uses some different tools than we'll use.

Another reason we do this manually is because it builds on what we have learned when we created the login.php and distros.php pages. That is, the two processes are similar. In both cases, we create a specific database for our platform, we create a specific user for that database, and we provide login credentials in a specific file.

First, read through but do not follow the following instructions:

How to install WordPress

Customized Installation Process

After you have read through the WordPress.org documentation, follow the steps below to complete the manual install:

Step 1: Requirements

All major software have dependencies, other software or code that it needs to run. We used the ldd command to discover the dependencies of simple commands like ls when we created our chroot environments. When we install software via apt, the apt program installs the needed dependencies for us. Since our plan is to install WordPress outside of the apt ecosystem, we need to make sure that our systems meet the requirements for our installation. The WordPress.org Requirements page states that the WordPress installation requires at least PHP version 7.4 or greater and MariaDB version 10.5 or greater. We can check that our systems meet these requirements with the following commands:

php --version
mariadb --version

The output from php --version shows that our systems have PHP 8.1, which is greater than PHP 7.4. The output from mariadb --version show that our systems have MariaDB 10.6.18, which is greater than MariaDB 10.5. If your versions are below the required numbers, you will need to update PHP or MariaDB before proceeding.

Next, we need to add some additional PHP modules to our system to let WordPress operate at full functionality. We can install these using the apt command:

sudo apt install php-curl php-xml php-imagick php-mbstring php-zip php-intl

Then restart Apache2 and MariaDB:

sudo systemctl restart apache2
sudo systemctl restart mariadb

Step 2: Download and Extract

The next step is to download and extract the WordPress software, which is comes as a zip file. Although we only download one file, when we extract it with the zip command, the extraction will result in a new directory that contains multiple files and subdirectories. The general instructions include:

Change to the /var/www/html directory.
Download the latest version of WordPress using the wget program.
Extract the package using the unzip program.
Delete the zip file to prevent clutter in the directory. You can wait to delete this file until you've successfully installed WordPress.

Specifically, this means we do the following on the command line:

cd /var/www/html
sudo wget https://wordpress.org/latest.zip
sudo apt install unzip
sudo unzip latest.zip
sudo rm latest.zip

Using the sudo unzip latest.zip command creates a directory called wordpress, as noted in the documentation. If we leave that alone, then the full path of our installations will located at /var/www/html/wordpress.

Step 3: Create the Database and a User

The WordPress documentation describes how to use phpMyAdmin to create the WordPress database and user. phpMyAdmin is a graphical front end to the MySQL/MariaDB relational database that you would access through the browser. However, we are going to create the WordPress database and a database user using the same process we used to create a database and user for our login.php and distros.php pages.

The general instructions are:

Specifically, we do the following on the command line:

sudo su
mariadb -u root

The mariadb -u root command puts us in the MariaDB command prompt. The next general instructions are to:

Create a new user for the WordPress database
Be sure to use a strong password.
Create a new database for WordPress
Grant all privileges to the new user for the new database
Examine the output
Exit the MariaDB prompt

Specifically, this means the following:

create user 'wordpress'@'localhost' identified by '[YOUR-PASSWORD-HERE]';
create database wordpress;
grant all privileges on wordpress.* to 'wordpress'@'localhost';
show databases;
\q

Then exit out of the Linux root account:

exit

By creating a dedicated database user for WordPress, we can limit access to only what's necessary for WordPress to function. This improves security by reducing privileges for this account.

Step 4: Set up wp-config.php

When we created the login.php file that contained the name of the database (e.g. linuxdb), the name of the database user (e.g., webapp), and the user's password, we followed the same general process that WordPress follows. Instead of login.php, WordPress uses a file called wp-config.php. We have to edit that file.

Follow these general steps:

Change to the wordpress directory, if you haven't already.
Copy and rename the wp-config-sample.php file to wp-config.php.
Edit the file and add your WordPress database name, user name, and password in the fields for DB_NAME, DB_USER, and DB_PASSWORD.

This means that we specifically do the following:

cd /var/www/html/wordpress
sudo cp wp-config-sample.php wp-config.php
sudo nano wp-config.php

In nano, add your database name, user, and password in the appropriate fields, just like we did with our login.php file. Double-check your entries in wp-config.php. Incorrect details will prevent WordPress from connecting to the database and will result in errors during setup.

Additionally, we want to disable FTP uploads to the site. To do that, navigate to the end of the wp-config.php file and add the following line:

define('FS_METHOD','direct');

Disabling FTP uploads with the above statement allows WordPress to directly write to your filesystem. This makes it easier to manage themes and plugins without needing FTP credentials every time.

Step 5: Optional

The WordPress files were installed at /var/www/html/wordpress. This means that when our WordPress websites become public, they'll be available at the following URL:

http://[IP ADDRESS]/wordpress

If you want to have a different ending to your URL, then you want to rename your wordpress directory to something else. The WordPress documentation uses blog as an example. But it could be something different, as long as it contains no spaces or special characters. Be sure to keep the directory name lowercase (no spaces and only alphanumeric characters). For example, if I want to change mine to blog, then:

cd /var/www/html
sudo mv wordpress blog

Step 6: Change File Ownership

WordPress will need to write to files in the base directory. Assuming you are still in your base directory, whether that is /var/www/html/wordpress, /var/www/html/blog, or like, run the following command:

sudo chown -R www-data:www-data *

Changing the file ownership ensures that the web server (www-data) can read and write to files as needed. Without this, WordPress might face permission errors during installation or when uploading files.

Step 7: Run the Install Script

The next part of the process takes place in the browser. The location (URL) that you visit in the browser depends on your specific IP address and the name of the directory in /var/www/html that we extracted WordPress to or that you renamed if you followed Step 5. Thus, if my IP address is 11.111.111.11 and I renamed by directory to blog, then I need to visit the following URL:

http://11.111.111.11/blog/wp-admin/install.php

IF I kept the directory named wordpress, then this is the URL that I use:

http://11.111.111.11/wordpress/wp-admin/install.php

Finishing installation

From this point forward, the steps to complete the installation are exactly the steps you follow using WordPress's documentation.

Most importantly, you should see a Welcome screen where you enter your site's information. The site Username and Password should not be the same as the username and password you used to create your WordPress database in MariaDB. Rather, the username and password you enter here are for WordPress users; i.e., those who will add content and manage the website. Make sure you save your password here!!

Two things to note:

We have not setup email on our servers. It's quite complicated to setup an email server correctly and securely, but it wouldn't work well without having a domain name setup anyway. So know that you probably should enter an email when setting up the user account, but it won't work.

Second, when visiting your site, your browser may throw an authenticaion error. Ensure the URL starts with http and not https because some browsers try to force https connections. We have not set up SSL certificates for secure connections, which would require a domain name and further configuration. Note that http connections are less secure and if you want to eventually set up a live site, you could acquire a domain name for your server and configure SSL using tools like Let's Encrypt.

Conclusion

Congrats on setting up your WordPress library site. Feel free to take a moment to modify your site's design or to add content.

Conclusion

I consider this book to be a live document. Perhaps, then, this is version 0.8, or something like that. In any case, it will be continually updated throughout the year but probably more often before and during the fall semesters when I teach my Linux Systems Administration course.

This book in no way is meant to provide a comprehensive overview of systems administration nor of Linux. It's meant to act as a starting point for those interested in systems administration, and it's meant to get students, many of whom grew up using only graphical user interfaces, familiar with command line environments. In that respect, this book, and the course that I teach, is aimed at empowering students to know their technology and become comfortable and more experienced with it, especially the behind the scenes stuff. That said, I'm proud that some of my students have gone on to become systems administrators. Other courses in our program and their own work and internships have probabably contributed more to that motivation, but I know that this course has been a factor.

If you're not a student in our program but have stumbled upon this book, I hope it's helpful to you, too. This is, in fact, why I've made it available on my website and not simply dropped it in my course shell.

C. Sean Burns, PhD
August 13, 2022