Linux Systems Administration
Author: C. Sean Burns
Date: 2022-08-12
Email: sean.burns@uky.edu
Website: cseanburns.net
GitHub: @cseanburns
Introduction
This short book was written for my Linux Systems Administration course. The book and course's goals are to provide the very basics about systems administration using Linux, and to teach students:
- how to use the command line in order to become more efficient computer users and more comfortable with using computers in general;
- how to use command line utilities and programs and to learn what can be accomplished using those programs;
- how to administer users and manage software on a Linux server;
- how to secure a Linux server; and
- the basics of cloud computing;
And finally, this book/course ends on walking students through the process of building a LAMP stack.
About This Book
Since I use this book for my Linux Systems Administration course, which I teach each fall semester, this book will be a live document. I will update the content as I teach it in order to address changes in the technology and to edit for clarity when I discover some aspect of the book causes confusion or does not provide enough information.
This book is not a comprehensive introduction to Linux nor to systems administration. It is designed for an entry level course on these topics and is focused on a select and small range of those topics that have specific pedagogical aims (see above).
The book started off as a series of transcripts and demonstrations. It still has that focus, but I've had a long-term goal to make these transcripts more cohesive. Achieving this became easier when I learned about mdBook.
The content in this book is open access and licensed under the GNU GPL v3.0. Feel free to fork it on GitHub and modify it for your own needs.
History of This Course
I created and started teaching this course in the Fall 2016 semester. I originally used Soyinka's (2016) excellent introduction to Linux administration, and we used VirtualBox and the Fedora Server distribution to practice and learn the material.
However, around 2018 or '19, I moved away from Soyinka's comprehensive book to focus the material on a more limited range of topics. I did this for two reasons. First, most of my students do not become systems administrators, although some have (to my delight). Second, my students have grown up using only graphical user interfaces on one of the two common, commercial operating systems, and consequently have very constrained and limited understandings of how computers work and what can be done with them. In redesigning this course, I wanted to strike a balance between these two problems. I wanted students to acquire enough skills and gain enough confidence to feel comfortable applying for (at least) entry level systems administrator jobs, and more basically, I wanted students to be exposed to a different type of computing environment than what they were used to and that fostered a hacking mentality, in the more benign and playful sense of the word.
I moved us away from using Fedora Server for the Fall 2022 course. Fedora Server is a great and fun operating system, and there's a lot to learn about Linux using it. However, since it is rather bleeding edge, it meant something would break in my demonstrations each semester, and identifying what had changed in Fedora each year made it somewhat of a chore to keep up. I have therefore switched to a less bleeding edge distribution of Linux: a still supported Ubuntu Server LTS release. Based on my personal experience managing servers that run on some version of Ubuntu LTS, I believe this should provide more stability. It helps that Ubuntu Server has a good share of the Linux server market.
The primary reason I moved us away from VirtualBox is because a good number of my students each year use Apple computers, which became a major obstacle when Apple switched to the M1 chip. I originally considered asking those students to use different virtualization software, but it was nice to have all students, regardless of operating system, and myself using the same software. I also considered using something like Docker as a replacement, but decided instead to use Google Cloud. I figured that learning how to use a service like Google Cloud might be more broadly useful to students, and that if we used Docker, we'd have to spend a lot of time installing and configuring that on their laptops. Time is already a constraint in this course, but we'll see how it goes this semester (Fall 2022).
References
Soyinka, W. (2016). Linux administration: A beginner's guide (7th ed.). New York: MacGraw Hill Education. ISBN: 978-0-07-184536-6
History of Unix and Linux
An outline of the history of Unix and Linux.
Location: Bell Labs, part of AT&T (New Jersey), late 1960s through early 1970s
- Starts with an operating system called Multics.
- Multics was a time sharing system
- That is, more than one person could use it at once.
- But Multics had issues and was slowly abandoned
- Ken Thompson found an old PDP-7. Started to write UNIX.
- The ed line editor was written.
- Pronounced e.d. but generally sounded out.
- This version of UNIX would later be referred to as Research Unix
- Dennis Ritchie, the creator of the C programming language, joined Thompson's efforts.
Location: Berkeley, CA (University of California, Berkeley), early to mid 1970s
- The code for UNIX was not 'free software' but low cost and easily shared.
- Ken Thompson visited Berkeley and helped install Version 6 of UNIX
- Bill Joy and others contributed heavily
- This installation of UNIX would eventually become known as the Berkeley Software Distribution, or BSD.
AT&T
- Until its breakup in 1984, AT&T was not allowed to profit off patents that were not directly related to its telecommunications businesses.
- This agreement with the US government helped protect the company from monopolistic charges, and as a result, they could not commercialize UNIX.
- This changed after the breakup. System V UNIX became the standard bearer of commercial UNIX.
Location: Boston, MA (MIT), early 1980s through early 1990s
- In the late 1970s, Richard Stallman noticed
that software began to become commercialized.
- As a result, hardware vendors stopped sharing the code they developed to make their hardware work.
- Software code became eligible for copyright protection with the Copyright Act of 1976
- Stallman, who thrived in a hacker culture, began to battle against this turn of events.
- Stallman created the GNU project, the free software philosophy, GNU Emacs, a popular and important text editor, and he wrote many other programs.
- The GNU project is an attempt to create a completely free software operating system, that was Unix-like, called GNU.
- By the early 1990s, Stallman and others had developed all the utilities needed to have a full operating system, except for a kernel, which they called GNU Hurd.
- This included the Bash shell, written by Brian Fox.
- The GNU philosophy includes several propositions that define free software:
The four freedoms, per GNU Project: 0. The freedom to run the program as you wish, for any purpose (freedom 0).
- The freedom to study how the program works, and change it so it does your computing as you wish (freedom 1). Access to the source code is a precondition for this.
- The freedom to redistribute copies so you can help others (freedom 2).
- The freedom to distribute copies of your modified versions to others (freedom 3). By doing this you can give the whole community a chance to benefit from your changes. Access to the source code is a precondition for this.
The Unix wars and the lawsuit, late 1980s through the early 1990s
- AT&T, after its breakup, began to commercialize Unix, and differences in AT&T Unix and BSD Unix arose.
- The former was aimed at commercialization, and the latter aimed at researchers and academics.
- UNIX Systems Laboratories, Inc. (USL, part of AT&T) sued Berkeley Software Design, Inc. (BSDi, part of the University of California, Berkeley) for copyright and trademark violations.
- USL ultimately lost the case, but the lawsuit delayed adoption of BSD Unix.
Linux, Linus Torvalds, University of Helsinki, Finland, early 1990s
- On August 25, 1991, Linus Torvalds announced that he had started working on a free operating system kernel for the 386 CPU architecture and for his specific hardware.
- This kernel would later be named Linux.
- Linux technically refers only to the kernel.
- An operating system kernel handles startup, devices, memory, resources, etc.
- A kernel does not provide user land utilities---the kinds of software they people use when using computers.
- Torvalds' motivation was to learn about OS development
but also to have access to a Unix-like system.
- He already had access to an Unix-like system called MINIX, but MINIX had technical and copyright restrictions.
- Torvalds has stated that if a BSD or if GNU Hurd operating system were available, then he may not have created the Linux kernel.
- But Torvalds and others took the GNU utilities and created what is now called Linux or GNU/Linux.
Distributions, early 1990s through today
- Soon after the Linux development, people would create their own Linux and GNU based operating systems and would distribute them.
- As such, these Linux operating systems became referred to as distributions.
- The two oldest distributions that are still in active development include:
Short History of BSD, 1970s through today
- Unix version numbers 1-6 eventually led to BSD 1-4.
- At BSD 4.3, all versions had some AT&T code.
- Desire to remove this code led to BSD Net/1.
- All AT&T code was removed by BSD Net/2.
- BSD Net/2 was ported to the Intel 386 processor.
- This became 386BSD and was made available in 1992, a year after the Linux kernel was released.
- 386BSD split into two projects:
- NetBSD split into another project: OpenBSD.
- All three of these BSDs are still in active development.
- From a bird's eye point of view, they each have different focuses:
- NetBSD focuses on portability (MacOS, NASA)
- FreeBSD focuses on wide applicability (WhatsApp, Netflix, PlayStation 4, MacOS)
- OpenBSD focuses on security (has contributed a number of very important applications)
MacOS is based on Darwin, is technically UNIX, and is partly based on FreeBSD with some code coming from the other BSDs. See Why is macOS often referred to as 'Darwin'? for a short history.
Short History of GNU, 1980s through today
- The GNU Hurd is still under active development, but it's the pre-production state.
- The last release was 0.9 on December 2016.
- A complete OS based on the GNU Hurd can be downloaded and ran. For example: Debian GNU/Hurd
Free and Open Source Licenses
In the free software and open source landscape, there are several important free and/or open source licenses that are used. The two biggest software licenses are based on the software used by GNU/Linux and the software based on the BSDs. They each take very different approaches to free and/or open source software. The biggest difference is this:
- Software based on software licensed under the GPL must also be licensed under the GPL. This is referred to as copyleft software, and the idea is to propagate free software.
- Software based on software licensed under the BSD license may be closed source and primarily must only attribute the original source code and author.
What is Linux?
The Linux Kernel
Technically, Linux is a kernel, and a kernel is a part of an operating system that oversees CPU activity like multitasking, as well as networking, memory management, device management, file systems, and more. The kernel alone does not make an operating system. It needs user land applications and programs, the kind we use on a daily basis, to form a whole, as well as ways for these user land utilities to interact with the kernel.
Linux and GNU
The earliest versions of the Linux kernel were combined with tools, utilities, and programs from the GNU project to form a complete operating system, without necessarily a graphical user interface. This association continues to this day. Additional non-GNU, but free and open source programs under different licenses, have been added to form a more functional and user friendly system. However, since the Linux kernel needs user land applications to form an operating system, and since user land applications from GNU cannot work without a kernel, some argue that the operating system should be called GNU/Linux and not just Linux. This has not gained wide acceptance, though. Regardless, credit is due to both camps for their contribution, as well as many others who have made substantial contributions to the operating system.
Linux Uses
We are using Linux as a server in this course, which means we will use Linux to provide various services. Our first focus is to learn to use Linux itself, but by the end of the course, we will also learn how to provide web and database services. Linux can be used to provide other services that we won't cover in this course, such as:
- file servers
- mail servers
- print servers
- game servers
- computing servers
Although it's a small overall percentage, many people use Linux as their main desktop/laptop operating system. I belong in this camp. Linux has been my main OS since the early 2000s. While our work on the Linux server means that we will almost entirely work on the command line, this does not mean that my Linux desktop environment is all command line. In fact, there are many graphical user environments, often called desktop environments, available to Linux users. Since I'm currently using the Ubuntu Desktop distribution, my default desktop environment is called Gnome. KDE is another popular desktop environment, but there are many other attractive and useful ones. And it's easy to install and switch between multiple ones on the same OS.
Linux has become quite a pervasive operating system. Linux powers the hundreds of the fastest supercomputers in the world. It, or other Unix-like operating systems, are the foundation of most web servers. The Linux kernel also forms the basis of the Android operating system and of Chrome OS. The only place where Linux does not dominate is in the desktop/laptop space.
What is Systems Administration?
Introduction
What is systems administration or who is a systems administrator (or sysadmin)? Let's start off with some definitions provided by the National Institute of Standards and Technology:
An individual, group, or organization responsible for setting up and maintaining a system or specific system elements, implements approved secure baseline configurations, incorporates secure configuration settings for IT products, and conducts/assists with configuration monitoring activities as needed.
Or:
Individual or group responsible for overseeing the day-to-day operability of a computer system or network. This position normally carries special privileges including access to the protection state and software of a system.
See: Systems Administrator @NIST
Specialized Positions
In addition to the above definitions, which broadly define the role, there are a number of related or specialized positions. We'll touch on the first three in this course:
- Web server administrator:
- "web server administrators are system architects responsible for the overall design, implementation, and maintenance of Web servers. They may or may not be responsible for Web content, which is traditionally the responsibility of the Webmaster (Web Server Administrator" @NIST).
- Database administrator:
- like web admins, and to paraphrase above, database administrators are system architects responsible for the overall design, implementation, and maintenance of database management systems.
- Network administrator:
- "a person who manages a network within an organization. Responsibilities include network security, installing new applications, distributing software upgrades, monitoring daily activity, enforcing licensing agreements, developing a storage management program, and providing for routine backups" (Network Administrator @NIST).
- Mail server administrator:
- "mail server administrators are system architects responsible for the overall design and implementation of mail servers" (Mail Server Administrators @NIST).
Depending on where a system administrator works, they may specialize in any of the above administrative areas, or if they work for a small organization, all of the above duties may be rolled into one position. Some of the positions have evolved quite a bit over the last couple of decades. For example, it wasn't too long ago when organizations would operate their own mail servers, but this has largely been outsourced to third-party providers, such as Google (via Gmail) and Microsoft (via Outlook). People are still needed to work with these third-party email providers, but the nature of the work is different than operating independent mail servers.
Certifications
It's not always necessary to get certified as a systems administrator to get work as one, but there might be cases where it is necessary; for example, in government positions or in large corporations. It also might be the case that you can get work as an entry level systems administrator and then pursue certification with the support of your organization.
Some common starting certifications are:
Plus, Google offers, via Coursera, a beginners Google IT Support Professional Certificate that may be helpful.
Associations
Getting involved in associations and related organizations is a great way to learn and to connect with others in the field. Here are few ways to connect.
LOPSA, or The League of Professional System Administrators, is a non-profit association that seeks to advance the field and membership is free for students.
ACM, or the Association for Computing Machinery, has a number of relevant special interest groups (SIGs) that might be beneficial to systems administrators.
NPA, or the Network Professional Association, is an organization that "supports IT/Network professionals."
Codes of Ethics
Systems administrators manage computer systems that contain a lot of data about us and this raises privacy and competency issues, which is why some have created code of ethics statements. Both LOPSA and NPA have created such statements that are well worth reviewing and discussing.
- LOPSA: Code of Ethics
- NPA: Code of Ethics
Keeping Up
Technology changes fast. In fact, even though I teach this course about every year, I need to revise the course each time, sometimes substantially, to reflect changes that have developed over short periods of time. It's also your responsibility, as sysadmins, to keep up, too.
I therefore suggest that you continue your education by reading and practicing. For example, there are lots of books on systems administration. O'Reilly continually publishes on the topic. RedHat, the makers of the Red Hat Linux distribution, and sponsors of Fedora Linux and CentOS Linux, provides the Enable Sysadmin site, with new articles each day, authored by systems administrators, on the field. Opensource.com, also supported by Red Hat, publishes articles on systems administration. Command Line Heroes is a fun and informative podcast on technology and sysadmin related topics. Linux Journal publishes great articles on Linux related topics.
Conclusion
In this section I provided definitions of systems administrators and also the related or more specialized positions, such as database administrator, network administrator, and others.
I provided links to various certifications you might pursue as a systems administrator, and links to associations that might benefit you and your career.
Technology manages so much of our daily lives, and computer systems store lots of data about us. Since systems administrators manage these systems, they hold a great amount of responsibility to protect them and our data. Therefore, I provided links to two code of ethics statements that we will discuss.
It's also important to keep up with the technology, which changes fast. The work of a systems administrator is much different today than it was ten or twenty years ago, and that surely indicates that it could be much different in another ten to twenty years. If we don't keep up, we won't be of much use to the people we serve.
Using Google Cloud (gcloud)
This section introduces us to Google Cloud (gcloud). We will use this platform to create virtual instances of the Ubuntu Server Linux operating system.
Using gcloud for Virtual Machines
Virtual Machines
Our goal in this section is to create a virtual machine (VM) instance. A VM is basically a virtualized operating system that runs on a host operating system. That host operating system may also be Linux, but it could be Windows or macOS. In short, when we use virtual machines, it means instead of installing an operating system (like Linux, macOS, Windows, etc) on a physical machine, we use virtual machine software to mimic the process. The virtual machine, thus, runs on top of our main OS. It's like an app, where the app is a fully functioning operating system.
In past semesters of this course, we used VirtualBox to create virtual machines with Linux as the virtual operating system. This worked despite whether you or I were running Windows, macOS, or Linux as our main operating systems. VirtualBox is freely available virtualization software, and using it let students and myself run Linux as a server on our own desktops and laptops without changing the underlying OS on those machines (e.g., Windows, macOS).
However, even though we virtualize an operating system when we run a VM, the underlying operating system and CPU architecture is still important. When Apple, Inc launched their new M1 (ARM-based) chip in 2020, it created problems for running non ARM-based operating systems as virtual machines (i.e., x86_64 chips).
Fortunately, we are able to solve that issue using a third-party virtualization platform. In this course, that means we're going to use gcloud (via Google), but there are other options available that you can explore on your own.
Google Cloud / gcloud
Google Account
We need to have a Google account to get started with gcloud. I imagine most of you already have a Google account, but if not, go ahead and create one at https://www.google.com.
Google Cloud (gcloud) Project
Next, you need to use gcloud to create a Google Cloud project. Once you've created that project, you can enable billing for that project, and then install the gcloud software on your local machine.
Follow Step 1 at the top of the Install the gcloud CLI page to in create a new project. Also, review the page on creating and managing projects.
When you create your project, you can name it anything, but try to name it something to do with this course. E.g., I am using the name sysadmin-418. Avoid using spaces when naming your project.
Then click on the Create button, and leave the organization field set to No Organization.
Google Billing
The second thing to do is to set up a billing account for your gcloud project. This does mean there is a cost associated with this product, but the good news is that our bills by the end of the semester should only amount to a couple of dollars, at most. Follow Step 2 to enable billing for your new project. See also the page on how to create, modify, or close your self-serve Cloud Billing account
Install the latest gcloud CLI version
After you have set up billing, the next step is to install gcloud on your local machines. The Install the gcloud CLI page provides instructions for different operating systems.
There are installation instructions for macOS, Windows, Chromebooks, and various Linux distributions. Follow these instructions closely for the operating system that you're using. Note that for macOS, you have to choose among three different CPU/chip architectures. If you have an older macOS machine (before November 2020 or so), it's likely that you'll select macOS 64-bit (x86_64). If you have a newer macOS machine, then it's likely you'll have to select macOS 64-bit (arm64, Apple M1 silicon). It's unlikely that any of you are using a 32-bit macOS operating system. If you're not sure which macOS system you have, then let me know and I can help you determine the appropriate platform. Alternatively, follow these instructions to find your processor information:
- click on the Apple menu
- choose About This Mac
- locate the Processor or Chip information
After you have downloaded the gcloud CLI for your particular OS and CPU architecture, you will need to open a command prompt/terminal on your machines to complete the instructions the describe how to install the gcloud CLI. macOS uses the Terminal app, which can located using Spotlight. Windows user can use Command.exe, which can be located by search also.
Windows users will download a regular .exe file,
but macOS users will download a .tar.gz file.
Since macOS is Unix, you can use the mv
command to
move that file to your $HOME
directory.
Then you extract it there using the tar
command,
and once extracted
you can change to the directory that it
creates with the cd
command.
For example, if you are downloading the X86_64 version
of the gcloud CLI, then you would run the following commands:
mv google-cloud-cli-392.0.0-darwin-x86_64.tar.gz $HOME
tar -xzf google-cloud-cli-392.0.0-darwin-x86_64.tar.gz
cd google-cloud-sdk
Modify the above commands, as appropriate, if you're using the M1 version of the gcloud CLI.
Initializing the gcloud CLI
Once you have downloaded and installed the gcloud CLI program, you need to initialize it on your local machine. Scroll down on the install page to the section titled Initializing the gcloud CLI. In your terminal/command prompt, run the initialization command, per the instructions at the above page:
gcloud init
And continue to follow the above instructions.
gcloud VM Instance
Once you've initialized gcloud, log into Google Cloud Console, which should take you to the Dashboard page.
Our first goal is to create a virtual machine (VM) instance. As a reminder, a VM is basically a virtualized operating system. That means instead of installing an operating system (like Linux, macOS, Windows, etc) on a physical machine, software is used to mimic the process.
gcloud offers a number of Linux-based operating systems to create VMs. We're going to use the Ubuntu operating system and specifically the Ubuntu 20.04 LTS version.
Ubuntu is a Linux distribution. A new version of Ubuntu is released every six months. The 20.04 signifies that this is the April 2020 version. LTS signifies Long Term Support. LTS versions are released every two years, and Canonical LTD, the owners of Ubuntu, provide standard support for LTS versions for five years.
LTS versions of Ubuntu are also more stable. Non-LTS versions of Ubuntu only receive nine months of standard support, and generally apply cutting edge technology, which is not always desirable for server operating systems. Each version of Ubuntu has a code name. 20.04 has the code name Focal Fossa. You can see a list of versions, code names, release dates, and more on Ubuntu's Releases page.
We will create our VM using the gcloud console. To do so, follow these steps:
- Click the Select from drop-down list.
- In the window, select the project that you created earlier.
- Next, click on Create a VM.
- Provide a name for your instance.
- E.g., I chose fall-2022 (no spaces)
- Under the Series dropdown box, make sure E2 is selected.
- Under the Machine type dropdown box, select e2-micro (2 vCPU, 1 GB memory)
- This is the lowest cost virtual machine and perfect for our needs.
- Under Boot disk, click on the Change button.
- In the window, select Ubuntu from the Operating system dropdown box.
- Select Ubuntu 20.04 LTS x86/64
- Leave Boot disk type be set to Balanced persistant disk
- Disk size should be set to 10 GB.
- Click on the Select button.
- Check the Allow HTTP Traffic button
- Finally, click on the Create button to create your VM instance.
Connect to our VM
After the new VM machine has been created, we need to connect to it via the command line. macOS users will connect to it via their Terminal.app. Windows users can connect to it via their command prompt.
Unlike our past ssh
sessions,
we use a slightly different ssh
command
to connect to our VMs.
The syntax follows this pattern:
gcloud compute ssh --zone "zone-info" "name-info" --project "project-id"
The values in the double quotes in the above command can be located in your Google Cloud console and in your VM instances section.
Update our Ubuntu VM
The VM will include a recently updated version of Ubuntu 20.04, but it may not be completely updated. Thus the first thing we need to do is update our machines. On Ubuntu, we'll use the following two commands, which you should run also:
sudo apt update
sudo apt -y upgrade
Then type exit
to logout and quit the connection to the remote server.
exit
Snapshots
Lastly, we have installed a pristine version of Ubuntu, but it's likely that we will mess something up as we work on our systems. Or it could be that our systems may become compromised at some point. Therefore, we want to create a snapshot of our newly installed Ubuntu server. This will allow us to restore our server if something goes wrong later.
To get started:
-
In the left hand navigation panel, click on Snapshots.
-
At the top of the page, click on Create Snapshot.
-
Provide a name for your snapshot: e.g., ubuntu-1.
-
Provide a description of your snapshot: e.g.,
This is a new install of Ubuntu 20.04.
-
Choose your Source disk.
-
Choose a Location to store your snapshot.
- To avoid extra charges, choose Regional.
- From the dropdown box, select the same location (zone-info) your VM has
-
Click on Create
Please monitor your billing for this to avoid costs
that you do not want to incur.
Conclusion
Congratulations! You have just completed your first installation of a Linux server.
To summarize, in this section, you learned about and created a VM with gcloud. This is a lot! After this course is completed, you will be able to fire up a virtual machine on short notice and deploy websites and more.
Learning the Command Line
It's obviously more common for people today to learn how to use a computer via a graphical user interface, but graphical user interfaces entail extra software, and the more software we have on a server, the more resources that software consumes, and the more we expose our systems to security risks.
Graphical user interfaces also do not provide a good platform for automation, at least not remotely as well as command line interfaces do. Working on the command line, in what is know as a shell, is in fact programming the computer.
Fortunately, Linux, and many other Unix-like operating systems, have the ability to operate without graphical user interfaces. This is partly the reason why these operating systems have done so well in the server market.
In this section, our focus is learning the command line environment, how to use it, and what it offers.
The Linux Filesystem
In this demo, we will cover the:
- the Linux filesystem and how it is structured and organized, and
- the basic commands to navigate around and to work with directories and files
The terms directories and folders are synonymous, but as users of primarily graphical user interfaces, you are more likely familiar with the term folders. I will more often use the term directories since that is the command line (text user interface) convention. I will use the term folders when referring to a graphical environment.
Throughout this demonstration,
I encourage you to gcloud compute ssh
into
our remote server and
follow along with the commands that I use.
See
Section 2.1
for details on connecting to the remote server.
Visualizing the Filesystem as a Tree
We will need to work within the filesystem quite a lot in this course, but the term filesystem may refer to different concepts, and it's important to clear that up before we start.
In come cases, a filesystem refers to how data (files) are stored and retrieved on a device like a hard drive, USB drive, etc. For example, macOS uses the Apple File System (APFS) by default, and Windows uses the New Technology File System (NTFS). Linux and other unix-like operating systems use a variety of filesystems, but presently, the two major ones are ext4 and btrfs. The former is the default filesystem on distributions like Debian and Ubuntu; the latter is the default on the Fedora and openSUSE distributions. Opensource.com has a nice overview of filesystems under this concept.
A filesystem might also be used to refer to the directory structure or directory tree of a system. This concept is related to the prior concept of a filesystem, but it's used here to refer to the location of files and directories on a system. For example, on Windows, the filesystem is identified by a letter, like the C: drive, regardless if the disk has a NTFS filesystem or a FAT filesystem. Additional drives (e.g., extra hard drives, USB drives, DVD drives, etc.), will be assigned their own letters (A:, B:, D:, etc.). macOS adheres to a tree like filesystem like Linux and other unix-like operating systems. (This is because macOS is UNIX.) In these operating systems, we have a top-level root directory identified by a single forward slash /, and then subdirectories under that root directory. Additional drives (e.g., extra hard drives, USB drives, DVD drives, etc.) are mounted under that root hierarchy and not separately like on Windows. Linux.com provides a nice overview of the most common directory structure that Linux distributions use along with an explanation for the major bottom level directories. In this section, we will learn about this type of filesystem.
On Linux, we can visualize the
filesystem with the tree
command.
The tree
command, like many Linux commands,
can be run on its own or with options,
like in the second example below:
tree
: list contents of directories in a tree-like formattree -dfL 1
: directories only, full path, one leveltree -dfL 1 /
: list directories only at root / level
The root Directory and its Base Level Directories
As explained on the Linux.com page, here are the major sub directories under / (root) and a short description of their main purpose:
/bin
: binary files needed to use the system/boot
: files needed to boot the system/dev
: device files -- all hardware has a file/etc
: system configuration files/home
: user directories/lib
: libraries/programs needed for other programs/media
: external storage is mounted/mnt
: other filesystems may be mounted/opt
: store software code to compile software/proc
: files containing info about your computer/root
: home directory of superuser/run
: used by system processes/sbin
: like/bin
, binary files that require superuser privileges/srv
: contains data for servers/sys
: contains info about devices/tmp
: temp files used by applications/usr
: user binaries, etc that might be installed by users/var
: variable files, used often for system logs
Although there are 18 directories listed above that branch off from the root directory, we will use some more often than others. For example, the /etc directory contains system configuration files, and we will use the contents of this directory, along with the /var directory, quite a bit when we set up our web servers, relational database servers, and more later in the semester. The /home directory is where our default home directories are stored, and if you manage a multi-user system, then this will be an important directory to manage.
Source: Linux Filesystem Explained
Relative and Absolute Paths
macOS users have the Finder app to navigate their filesystem, to move files to different folders, to copy files, to trash them, etc. Window users have File Explorer for these functions. Linux users have similar graphical software options, but all of these functions can be completed on the Linux command line, and generally more efficiently. To get started, we need to learn two things first:
- how to specify the locations of files and directories in the filesystem
- the commands needed to work with the filesystem
To help specify the locations of files and directories, there are two key concepts to know:
- absolute paths
- relative paths
Above we learned about the / root directory and its subdirectories. All sorts of commands, especially those that deal with files and directories (like copying, moving, deleting), require us to specify on the command line the locations of the files and directories. It's common to specify the location in two different ways, by specifying their absolute path (or location) on the filesystem, or the relative path (or location).
To demonstrate, we might want to move around the filesystem. When we first log in to our remote system, our default location will be our home directory, sometimes referred to as $HOME. The path (location) to that directory will be.
/home/USER
Where USER is your username. Therefore, since my username is sean, my home directory is located at:
/home/sean
which we can see specified with the pwd
(print working directory) command:
pwd
/home/sean
When I write $HOME, I am referring to a default, environmental variable that points to our home directory. It's variable because, depending on which account we're logged in as, $HOME will point to a different location. For me, then, that will be
/home/sean
, if I'm logged in as sean. For you it'll point to your home directory.
In my home directory, I have a subdirectory called public_html. The path to that is:
/home/sean/public_html
In a program like Finder (macOS) or File Explorer (Windows), if I want to change my location to that subdirectory (or folder), then I'd double click on its folder icon. On the command line, however, I have to write out the command and the path to the subdirectory. Therefore, starting in my home directory, I use the following command to switch to the public_html subdirectory:
cd public_html
Note that files and directories in Linux are case sensitive. This means that a directory named public_html can co-exist alongside a directory named Public_html. Or a file named paper.txt can co-exist alongside a file named Paper.txt. So be sure to use the proper case when spelling out files, directories, and even commands.
The above is an example of using a relative path, and that command would only be successful if I were first in my $HOME directory. That's because I specified the location of public_html relative to my default ($HOME) location.
I could have also specified the absolute location, but this would be the wordier way. Since the public_html directory is in my $HOME directory, and my $HOME directory is a subdirectory in the /home directory, then to specify the absolute path in the above command, I'd write:
cd /home/sean/public_html
Again, the relative path specified above
would only work if
I was in my home directory, because
cd public_html
is relative to the
location of /home/sean
.
That is, the subdirectory public_html
is in /home/sean.
But specifying the absolute path would work no matter where
I was located in the filesystem.
For example, if I was working on a file
in the /etc/apache2
directory,
then using the absolute path
(cd /home/sean/public_html
) would work.
But the relative path (cd public_html
)
command would not since
there is no subdirectory called public_html
in the /etc/apache2
directory.
Finally, you can use the ls
command to
list the contents of a directory, i.e.,
the files and subdirectories in a directory:
ls
We will cover this more next.
Conclusion
Understanding relative and absolute paths is one of the more difficult concepts for new commandline users to learn, but after time, it'll feel natural. So just keep practicing, and I'll go over this throughout the semester.
In this section, you learned the following commands:
tree
to list directory contents in a tree-like formatcd
to change directorypwd
to print working directory
You learned different ways to refer to the home directory:
- /home/USER
- $HOME
- ~
You learned about relative and absolute paths. An absolute path starts with the root directory /. Here's an absolute path to a file named paper.txt in my home directory:
- absolute path: /home/sean/paper.txt
If I were already in my home directory, then the relative path would simply be:
- relative path: paper.txt
Files and Directories
Basic Directory and File commands
In order to explore the above directories but also to create new ones and work with files, we need to know some basic terminal commands. A lot of these commands are part of the base system called GNU Coreutils, and in this demo, we will specifically cover some of the following GNU Coreutils:
Directory Listing
I have already demonstrated one command: the
cd
(change directory) command. This will be one of the most frequently used commands in your toolbox.
In our current directory, or
once we have changed to a new directory,
we will want to learn its contents
(what files and directories it contains).
We have a few commands to choose from to list contents
(e.g., you have already seen the tree
command),
but the most common command is the ls
(list) command.
We use it by typing the following two letters in the terminal:
ls
Again, to confirm that we're in some specific directory,
use the pwd
command to print the working directory.
Most commands can be combined with options.
Options provide additional functionality to the base command, and
in order to see what options are available for the ls
command,
we can look at its man(ual) page:
man ls
From the ls
man page,
we learn that we can use the -l
option to format
the output of the ls
command as a long-list,
or a list that provides more information about
the files and directories in the working directory.
Later in the semester,
I will talk more about what the other parts of output of this option mean.
ls -l
We can use the -a
option to list hidden files.
In Linux, hidden files are hidden from the base ls
command
if the files begin with a period.
We have a some of those files in our $HOME directories,
and we can see them like so:
ls -a
We can also combine options. For example, to view all files, including hidden ones, in the long-list format, we can use:
ls -al
Basic File Operations
Some basic file operation commands include:
cp
: copying files and directoriesmv
: moving (or renaming) files and directoriesrm
: removing (or deleting) files and directoriestouch
: change file timestamps (or, create a new, empty file)
These commands also have various options
that can be viewed in their respective man pages.
Again, command options provide additional functionality to the base command,
and are mostly (but not always) prepended with a dash and a letter or number.
To see examples, type the following commands,
which will launch the manual pages for them.
Press q
to exit the manual pages,
and use your up and down arrow keys to scroll through the manuals:
man cp
man mv
man rm
man touch
The touch
command's primary use is to change a file's timestamp;
that is, the command updates a file's "access and modification times"
(see man touch
).
For example, let's say we have a file called paper.txt
in our home directory.
We can see the output here:
ls -l paper.txt
-rw-rw-r-- 1 sean sean 0 Jun 27 00:13 /home/sean/paper.txt
This shows that the last modification time was 12:03AM on June 27.
If I run the touch command on paper.txt, the timestamp will change:
touch paper.txt
-rw-rw-r-- 1 sean sean 0 Jun 27 00:15 /home/sean/paper.txt
This shows an updated modification timestamp of 12:15AM.
The side effect occurs when we name a file with the touch
command,
but the file does not exist,
in which case the touch
command will create an empty file
with the name we use.
Let's say that I do not have a file named file.txt
in my home directory.
If I run the ls -l file.txt
command, I'll receive an error
since the file does not exist.
But if I then use the touch file.txt
command,
and then run ls -l file.txt
.
we'll see that the file now exists,
that it has a byte size of zero:
ls -l file.txt
ls: cannot access 'file.txt': No such file or directory
touch file.txt
ls -l file.txt
-rw-rw-r-- 1 sean sean 0 Jun 27 00:18 file.txt
Here are some ways to use the other three commands and their options:
Copying Files and Directories
To copy an existing file (file1.txt) to a new file (file2.txt):
cp file1.txt file2.txt
Use the -i
option to copy that file in interactive mode;
that is, to prompt you before overwriting an existing file.
We also use the cp
command to copy directories.
Moving Files and Directories
The mv
command will move an existing file to a different directory,
and/or rename the file.
For example, from within our home directory
(therefore, using relative path names),
to move a file named "file.docx"
to a subdirectory named "Documents":
mv file.docx Documents/
To rename a file only (keeping it in the same directory), the command looks like this:
mv file.docx newName.docx
To move the file to our Documents/ subdirectory and also rename it, then we'd do this:
mv file.docx Documents/newName.docx
The man
page for the mv
command also describes an -i
option
for interactive mode that helps prevent us from overwriting existing files.
For example, if we have a file called paper.docx in our $HOME directory,
and we have a file named paper.docx in our $HOME/Documents directory,
and if these are actually two different papers (or files),
then moving the file to that directory will overwrite it without asking.
The -i
option will prompt us first:
mv -i paper.docx Documents/paper.docx
Remove or Delete
Finally, to delete a file, we use the rm
command:
rm file.html
Unlike the trash bin in your graphical user environment, it's very hard to recover a deleted file using the
rm
command. That is, usingrm
does not mean the file or directory is trashed; rather, it means it was deleted.
Special File Types
For now, let's only cover two commands here:
mkdir
for creating a new directoryrmdir
for deleting an empty directory
Like the above commands, these commands also have their own set of options that can be viewed in their respective man pages:
man mkdir
man rmdir
Make or Create a New Directory
We use these commands like we do the ones above. If we are in our $HOME directory, and we want to create a new directory called bin, we do:
mkdir bin
The bin directory in our $HOME directory is a default location to store our personal applications, or applications (programs) that are only available to us.
And if we run ls
, we should see that it was successful.
Delete a Directory
The rmdir
command is a bit weird
because it only removes empty directories.
To remove the directory we just created, we use it like so:
rmdir bin
However, if you want to remove a directory
that contains files or other subdirectories,
then you will have to use the rm
command
along with the -r
(recursive) option:
rm -r directory-with-content/
Printing Text
There a number of ways to print text to standard output, which is our screen by default in the terminal. We could also redirect standard output to a file, to a printer, or to a remote shell. We'll see examples like that later in the semester. Here let's cover two commands:
echo
: to print a line of text to standard outputcat
: to concatenate and write filesless
: to view files one page at a time
Standard output is by default the screen. When we print to standard output, then by default we print to the screen. However, standard output can be redirected to files, programs, or devices, like actual printers.
Print to Screen
To use echo
:
echo "hello world"
echo "Today is a good day."
We can also echo
variables:
a=4
echo "$a"
Print File to Screen
cat
is listed elsewhere in the GNU Coreutils page.
The primary use of the cat
command is to join, combine, or concatenate files,
but if used on a single file,
it has this nice side effect of printing the content of the file to the screen:
cat file.html
If the file is very long, we might want to use what's called a pager.
There are a few pagers to use, but the less
command is a common one:
less file.html
Like with the man pages, use the up and down arrow keys to scroll through the output, and press q to quit the pager.
Conclusion
In this demo, we learned about the filesystem or directory structure of Linux, and we also learned some basic command to work with directories and files. You should practice using these commands as much as possible. The more you use them, the easier it'll get. Also, be sure to review the man pages for each of the commands, especially to see what options are available for each of them.
Basic commands covered in this demo include:
cat
: display contents of a filecp
: copyecho
: print a line of textless
: display contents of a file by pagels
: listman
: manual pagesmkdir
: create a directorymv
: move or renamepwd
: print name of current/working directoryrmdir
: delete an empty directoryrm
: remove or delete a file or directorytree
: list contents of directories in a tree-like format
File Attributes
Identifying Ownership and Permissions
In the last section, we saw that the output of the ls -l
command
included a lot extra information besides a listing of file names.
The output also listed the owners and permissions for each file and directory.
Each user account on a Linux system (like many operating systems) has a user name and has at least one group membership, and that name and that group membership determine the user and group ownership for all files created under that account.
In order to allow or restrict access to files and directories (for example, to allow other users to read, write to, or run your or others' files), ownership and permissions are set in order to manage that kind of access to those files and directories. There are thus two owners for every file (and directory):
- user owner
- group owner
And there are three permission modes that restrict or expand access to each file (or directory) based on user or group membership:
- (r)ead
- (w)rite
- e(x)ecute (as in a program)
I am emphasizing the rwx in the above list of modes because we will need to remember what these letters stand for when we work with file and directory permissions.
Consider the output of ls -l
in some public_html directory
that contains a single file called index.html:
-rw-rw-r-- 1 sean sean 11251 Jun 20 14:41 index.html
According to the above output, we can parse the following information about the file:
Attributes | ls -l output |
---|---|
File permissions | -rw-rw-r-- |
Number of links | 1 |
Owner name | sean |
Group name | sean |
Byte size | 11251 |
Last modification date | Jun 20 14:41 |
File name | index.html |
What's important for us right now are the File permissions row, the Owner name row, and the Group name row.
The Owner and Group names of the index.html file are sean because there is a user account named sean on the system and a group account named sean on the system, and that file exists in the user sean's home directory.
The File permissions row shows:
-rw-rw-r--
Let's ignore the first dash for now. The remaining permissions can be broken down as:
- rw- (read and write only permissions for the Owner)
- rw- (read and write only permissions for the Group)
- r-- (read-only permissions for the World)
We read the output as such (dashes, other than the initial one, signify no permissions):
- User sean is the Owner and has (r)ead and (w)rite permissions on the file
but not e(x)ecute permissions (
rw-
). - Group sean is the Group owner and has (r)ead and (w)rite permissions on
the file but not e(x)ecute permissions (
rw-
). - The World can (r)ead the file but cannot (w)rite to the file nor
e(x)ecute the file (
r--
).
The word write is a classical computing term that means, essentially, to edit and save edits of a file. Today we use the term save instead of write, but remember that they are basically equivalent terms.
Since this is an HTML page for a website, the World ownership allows people to view (read) the file but not write (save) to it nor execute (run) it. Any webpage you view on the internet at least has World mode set to read.
Let's take a look at another file.
In our /bin
directory, we can see a listing for this program
(note that I specify the absolute path of the file named bin):
ls -l /bin/zip
-rwxr-xr-x 1 root root 212K Feb 2 2021 zip*
Attributes | ls -l output |
---|---|
File permissions | -rwxr-xr-x |
Number of links | 1 |
Owner name | root |
Group name | root |
Byte size | 212K |
Last modification date | Feb 2 2021 |
File name | zip* |
Since zip
is a computer program used to package and compress files,
it needs to be e(x)ecutable.
That is, users on the system need to be able to run it.
But notice that the owner and group names of the file point to the user root.
We have already learned that there is a root level in our filesystem.
This is the top level directory in our filesystem and is referenced
by the forward slash: /
.
But there is also a root user account.
This is the system's superuser.
The superuser can run or access anything on the system, and
this user also owns most of the system files.
We read the output as such:
- User root is the Owner and has (r)ead, (w)rite, and e(x)ecute (
rwx
) permissions on the file. - Group root is the Group owner and has (r)ead and e(x)ecute permissions
but not (w)rite permissions (
r-x
) - The World has (r)ead and e(x)ecute permissions but not (w)rite (
r-x
). This permissions allows other users (like you and me) to use thezip
program.
The asterisk at the end of the file name (
zip*
) simply indicates that this file is an executable; i.e., it is a software program that you can run.
Finally, let's take a look at the permissions for a directory.
On my system, I run the following command in my home directory,
which will show the permissions for my /home/sean
directory:
ls -ld
And the output is:
drwx--x--- 51 sean sean 4.0K Jun 23 18:35 ./
This shows that:
Attributes | ls -ld output |
---|---|
File permissions | drwx--x--- |
Number of links | 1 |
Owner name | sean |
Group name | sean |
Byte size | 4.0K |
Last modification date | Jun 23 |
File name | ./ |
This is a little different from the previous examples, but let's parse it:
- Instead of an initial dash, this file has an initial d that identifies this as a directory. Directories in Linux are simply special types of files.
- User sean has read, write, and execute (
rwx
) permissions. - Group sean has execute (
--x
) permissions only. - The World has no permissions (
---
). - ./ signifies the current directory, which happens to be my home
directory, since I ran that command at the
/home/sean
path.
The takeaway from this set of permissions and the ownership is that only the user sean and those in the group sean, which is just the user sean, can access this home directory.
We might ask why the directory has an e(x)ecutable bit set
for the owner and the group if
a directory is not an executable file.
That is, it's not a program or software.
This is so that the owner and the group can access that directory
using, for example, the cd
(change directory) command.
If the directory was not executable,
like it's not for the World (---
),
then it would not be accessible
with the cd
command,
or any other command.
In this case, the World
(users who are not me)
cannot access my home directory.
Changing File Permissions and Ownership
Changing File Permissions
All the files and directories on a Linux system have default ownership and permissions set. This includes new files that we might create as we use our systems. There will be times when we will want to change the defaults, for example, the kinds of defaults described above. There are several commands available to do that, and here I'll introduce you to the two most common ones.
- The
chmod
command is used to change file (and directory) permissions (or file mode bits). - The
chown
command is used to change a file's (and directory's) owner and group.
The chmod
command changes the -rwxrwxrwx
part of a file's attributes
that we see with the ls -l
command.
Each one of those bits (the r
, the w
, and the x
)
are assigned the following octal values:
permission | description | octal value |
---|---|---|
r | read | 4 |
w | write | 2 |
x | execute | 1 |
- | no permissions | 0 |
There are three octal values for the three set of permissions
represented by -rwxrwxrwx
.
If I bracket the sets (for demonstration purposes only),
they look like this:
-[rwx][rwx][rwx]
The first set describes the permissions for the owner. The second set describes the permissions for the group. The third set describes the permissions for the World.
We use the chmod
command and the octal values to change
a file or directory's permissions.
For each set, we add up the octal values.
For example, to make a file read (4), write (2), and executable (1)
for the owner only,
and zero out the permissions for the group and World,
we use the chmod
command like so:
chmod 700 paper.txt
We use 7 because 4+2+1=7
, and
we use two zeroes in the second two places
since we're removing permissions for group and World.
If we want to make the file read, write, and executable by the owner, the group, and the world, then we repeat this for each set:
chmod 777 paper.txt
More commonly, we might want to restrict ownership.
Here we enable rw-
for the owner,
and r--
for the group and the World:
chmod 644 paper.txt
Because 4+2=6
for owner,
and 4
is read only for group and World, respectively.
Changing File Ownership
In order to change the ownership of a file,
we use the chown
command followed by
the name of the owner.
We can optionally change the owner of the group by adding a colon (no spaces)
and the name of the group.
We can see what groups we belong to with the groups
command.
On one system that I have an account on, I am a member of two groups:
a group sean (same as my user name on this system),
and a group sudo,
which signifies that I'm an administrator on this system
(more on sudo
later in the semester).
groups
sean sudo
We can only change the user and group ownership of a file or directory
if we have administrative privileges (sudo
administrative access),
or if we share group membership.
This means that,
unless we have sudo
(admin) privileges,
we often might change the group name for a file or directory
than the user owner.
Later in the semester,
you will have to do this kind of work (change user and group names)
of files and directories.
In the meantime, let's see some examples:
Imagine that my Linux user account belongs to the group sisFaculty, and that there are other users on the Linux system (my colleagues at work) who are also members of this group. If I want to make a directory or file accessible to them, then I can change the group name of a file I own to sisFaculty. Let's call that file testFile.txt. To change only the group name for the file:
chown :sisFaculty testFile.txt
I can generally only change the user owner of a file if I have admin
access on a system.
In such a case,
I might have to use the sudo
command
(you do not have access to the sudo
command
on our shared server,
but you will have it later on your virtual machines).
In this case, I don't need the colon.
To change the owner only,
say from the user sean to the user tmk:
sudo chown tmk testFile.txt
To change both user owner and group name,
we simply specify both names
and separate those names by a colon,
where the syntax is chown USER:GROUP testFile.txt
sudo chown tmk:sisFaculty testFile.txt
After using the chown
command to change
either the owner or group,
we should double check the file or directory's permissions
using the chmod
command.
Here I make it so that the user owner and the group sisFaculty has
(r)ead and (w)rite access to the file.
I use sudo
because,
as the user sean,
I'm changing the file permissions for a file
that I do not own:
sudo chmod 660 testFile.txt
Conclusion
In this section, we learned:
- how to identify file/directory ownership and permissions
- and how to change file/directory ownership and permissions.
Specifically, we looked at two ways to change the attributes of a file.
This includes changing the ownership of a file
with the chown
command, and
setting the read, write, and execute
permissions of a file with the chmod
command.
The commands we used to change these attributes include:
chmod
: for changing file permissions (or file mode bits)chown
: for changing file ownership
We also used the following commands:
ls
: list directory contentsls -ld
: long list directories themselves, not their contents
groups
: print the groups a user is insudo
: execute a command as another user
Text Processing: Part 1
One of the more important sets of tools that Linux (as well Unix-like) operating systems provide are tools that aid processing and manipulating text. The ability to process and manipulate text, programmatically, is a basic and essential part of many programming languages, (e.g., Python, JavaScript, etc), and learning how to process and manipulate text is an important skill for a variety of jobs including statistics, data analytics, data science, programming, web programming, systems administration, and so forth. In other words, this functionality of Linux (and Unix-like) operating systems essentially means that to learn Linux and the tools that it provides is akin to learning how to program.
Plain text files are the basic building blocks of programs and data. Programs are written in plain text editors, and data is often stored as plain text. Linux offers many tools to examine, manipulate, process, analyze, and visualize data in plain text files.
In this section, we will learn some of the basic tools to examine plain text (i.e., data). We will do some programming later in this class, but for us, the main objective with learning to program aligns with our work as systems administrators. That means our text processing and programming goals will serve our interests in managing users, security, networking, system configuration, and so forth as Linux system administrators.
In the meantime, the goal of this section is to acquaint ourselves with some of the tools that can be used to process text. In this section, we will only cover a handful of text processing programs or utilities, but here is a fairly comprehensive list, and we'll examine some additional ones from this list later in the semester:
cat
: concatenate files and print on the standard outputcut
: remove sections from each line of filesdiff
: compare files line by lineecho
: display a line of textexpand
: convert tabs to spacesfind
: search for files in a directory hierarchyfmt
: simple optimal text formatterfold
: wrap each input line to fit in specified widthgrep
: print lines that match patternshead
: output the first part of filesjoin
: join lines of two files on a common fieldlook
: display lines beginning with a given stringnl
: number lines of filespaste
: merge lines of filesprintf
: format and print datashuf
: generate random permutationssort
: sort lines of text filestail
: output the last part of filestr
: translate or delete charactersunexpand
: convert spaces to tabsuniq
: report or omit repeat lineswc
: print newline, word, and byte counts for each file
We will also discuss two types of operators, the pipe and the redirect. The latter has a version that will write over the contents of a file, and a version that will append contents to the end of a file:
|
: redirect standard output from command1 to standard input for command2>
: redirect to standard output to a file, overwriting>>
: redirect to standard output to a file, appending
Today I want to cover a few of the above commands for processing data in a file; specifically:
cat
: concatenate files and print on the standard outputcut
: remove sections from each line of fileshead
: output the first part of filessort
: sort lines of text filestail
: output the last part of filesuniq
: report or omit repeat lineswc
: print newline, word, and byte counts for each file
Let's look at a toy, sample file that contains
structured data as a CSV (comma separated value) file.
The file contains a list of operating systems (column one),
their software license (column two),
and the year they were released (column three).
We can use the cat
command to view the entire
contents of this small file:
Command:
cat operating-systems.csv
Output:
Chrome OS, Proprietary, 2009
FreeBSD, BSD, 1993
Linux, GPL, 1991
iOS, Proprietary, 2007
macOS, Proprietary, 2001
Windows NT, Proprietary, 1993
Android, Apache, 2008
It's a small file, but
we might want the line and word count of the file.
To acquire that, we can use the wc
(word count) command.
By itself, the wc
command will print
the number of lines, words, and bytes of a file.
The following output states that the file contains
seven lines, 23 words, and 165 bytes:
Command:
wc operating-systems.csv
Output:
7 23 165 operating-systems.csv
We can use the head
command to output,
by default,
the first ten lines of a file.
Since our file is only seven lines long,
we can use the -n
option
to change the default number of lines:
Command:
head -n3 operating-systems.csv
Output:
Chrome OS, Proprietary, 2009
FreeBSD, BSD, 1993
Linux, GPL, 1991
Using the cut
command, we can select data from file.
In the first example, I want to select column two (or field two),
which contains the license information.
Since this is a CSV file,
the fields (aka, columns) are separated by commas.
The -d
option tells the cut
command to use commas
as the separator character.
The -f
option tells the cut
command to select
field two.
(A CSV file may use other characters as the separator character,
like the Tab character or a colon.)
Command:
cut -d"," -f2 operating-system.csv
Output:
Proprietary
BSD
GPL
Proprietary
Proprietary
Proprietary
Apache
From there it's trivial to select a different column. In the next example, I select field (or column) three to get the release year:
Command:
cut -d"," -f3 operating-system.csv
Output:
2009
1993
1991
2007
2001
1993
2008
One of the magical aspects of the Linux (and Unix) commandline
is the ability to pipe and redirect output from one program
to another program, and then to a file.
By stringing together multiple programs with these operators,
we can create small programs that do much more than the
simple programs that compose them.
In this next example,
I use the pipe operators to send the output of the cut
command to the sort
command,
which sorts the data in alphabetical or numerical order,
depending on the character type (lexical or numerical),
pipe that output to the uniq
command,
which removes duplicate rows,
and then redirect that final output to a new
file titled os-years.csv.
Since the year 1993 appears twice in the original file,
it only appears once in the output
because the uniq
command removed the duplicate:
Command:
cut -d"," -f3 operating-system.csv | sort | uniq > os-years.csv
Output:
cat os-years.csv
1991
1993
2001
2007
2008
2009
Data files like this often have a
header line at the top row that
names the data columns.
It's useful to know how
to work with such files, so
let's add a header row to the top of the file.
In this example,
I'll use the sed
command,
which we will learn more about in the
next lesson.
For now,
we use sed
with the option -i
to edit the file,
then 1i
instructs sed
to insert
text at line 1.
\OS, License, Year
is the text
that we want inserted at line 1.
We wrap the argument within single quotes:
Command:
sed -i '1i \OS, License, Year' operating-systems.csv
cat operating-systems.csv
Output:
OS, License, Year
Chrome OS, Proprietary, 2009
FreeBSD, BSD, 1993
Linux, GPL, 1991
iOS, Proprietary, 2007
macOS, Proprietary, 2001
Windows NT, Proprietary, 1993
Android, Apache, 2008
Since the CSV file now has a header line, we want to remove it from the output. Say we want the license field data, but we need to remove that first line. In this case, we need the tail command:
Command:
tail -n +2 operating-system-csv | cut -d"," -f2 | sort | uniq > license-data.csv
cat license-data.csv
Output:
Apache
BSD
GPL
Proprietary
The
tail
command generally outputs the last lines of a file, but the-n +2
option is special. It makes thetail
command ouput a file starting at the second line. We could specify a different number in order to start output at a different line. Seeman tail
for more information.
Conclusion
In this lesson, we learned how to process and make sense of data held in a text file. We used some commands that let us select, sort, de-duplicate, redirect, and view data in different ways. Our data file was a small one, but these are powerful and useful command and operators that would easily make sense of large amounts of data in a file.
The commands we used in this lesson include:
cat
: concatenate files and print on the standard outputcut
: remove sections from each line of fileshead
: output the first part of filessort
: sort lines of text filestail
: output the last part of filesuniq
: report or omit repeat lineswc
: print newline, word, and byte counts for each file
We also used two types of operators, the pipe and the redirect:
|
: redirect standard output command1 to standard input of command2>
: redirect to standard output to a file, overwriting
Text Processing: Part 2
Introduction
In the last section, we covered the
cat
, cut
, head
, sort
, tail
, uniq
, and wc
utilities.
We also learned about the |
pipe operator,
which we use to redirect standard output
from one command
to a second command so that second command
can process the output from the first command.
An example is:
sort file.txt | uniq
This sorts the lines in a file named file.txt and
then prints to standard output only the unique lines
(by the way, files must be sorted before piped to uniq
).
We learned about the >
and >>
redirect operators.
They work like the pipe operator, but
instead of directing output to a new command
for the new command to process,
they direct output to a file for saving.
As a reminder,
the single redirect >
overwrites a file or creates a file
if it does not exist.
The double redirect >>
appends to a file or
creates a file if it does not exist.
It's safer to use the double redirect, but
if you are processing large amounts of data,
it could also mean creating large files really quickly.
If that gets out of hand,
then you might crash your system.
To build on our prior example,
we can add >>
to send the output to
a new file called output.txt:
sort file.txt | uniq >> output.txt
We have available more powerful utilities and programs to process, manipulate, and analyze text files. In this section, we will cover the following three of these:
grep
: print lines that match patternssed
: stream editor for filtering and transforming textawk
: pattern scanning and text processing language
Grep
The grep
command is one of my most often used commands.
Basically, grep
"prints lines that match patterns"
(see man grep
).
In other words, it's search, and
it's super powerful.
grep
works line by line.
So when we use it to search a file for a string of text,
it will return the whole line that matches the string.
This line by line idea is part of the history of
Unix-like operating systems,
and it's super important to remember that most utilities
and programs that we use on the commandline
are line oriented.
"A string is any series of characters that are interpreted literally by a script. For example, 'hello world' and 'LKJH019283' are both examples of strings." -- Computer Hope. More generally, it's the literal characters that we type. It's data.
Let's consider the file operating-systems.csv, as seen below:
OS, License, Year
Chrome OS, Proprietary, 2009
FreeBSD, BSD, 1993
Linux, GPL, 1991
macOS, Proprietary, 2001
Windows NT, Proprietary, 1993
Android, Apache, 2008
If we want to search for the string Chrome,
we can use grep
.
Notice that even though the string Chrome only appears once,
and in one part of a line,
grep
returns the entire line.
Command:
grep "Chrome" operating-systems.csv
Output:
Chrome OS, Proprietary, 2009
Be aware that, by default, grep
is case-sensitive,
which means a search for the string chrome,
with a lower case c,
would return no results.
Fortunately, grep
has an -i
option,
which means to ignore the case of the search string.
In the following examples, grep
returns nothing in the first search
since we do not capitalize the string chrome.
However, adding the -i
option results in success:
Command:
grep "chrome" operating-systems.csv
Output:
None.
Command:
grep -i "chrome" operating-systems.csv
Output:
Chrome OS, Proprietary, 2009
We can also search for lines that do not match our string
using the -v
option.
We can combine that with the -i
option to ignore the string's case.
Therefore, in the following example,
all lines that do not contain the string chrome are returned:
Command:
grep -vi "chrome" operating-systems.csv
Output:
FreeBSD, BSD, 1993
Linux, GPL, 1991
iOS, Proprietary, 2007
macOS, Proprietary, 2001
Windows NT, Proprietary, 1993
Android, Apache, 2008
I used the tail
command in the prior section to show
how we might use tail
to remove the header (1st line) line in a file,
but it's an odd use of the tail
command,
which normally just returns the last lines of a file.
Instead, we can use grep
to remove the first line.
To do so, we use what's called a regular expression,
or regex for short.
Regex is a method used to identify patterns in text via abstractions.
They can get complicated, but we can use some easy regex methods.
Let's use a version of the above file with the header line:
Command:
cat operating-systems.csv
Output:
OS, License, Year
Chrome OS, Proprietary, 2009
FreeBSD, BSD, 1993
Linux, GPL, 1991
iOS, Proprietary, 2007
macOS, Proprietary, 2001
Windows NT, Proprietary, 1993
Android, Apache, 2008
To use grep
to remove the first line of a file,
we can invert our search to select all lines not matching
"OS" at the start of a line.
Here the carat key ^
is a regex indicating the
start of a line.
Again, this grep
command returns all lines that
do not match the string os at the start of a line,
ignoring case:
Command:
grep -vi "^os" operating-systems.csv
Output:
Chrome OS, Proprietary, 2009
FreeBSD, BSD, 1993
Linux, GPL, 1991
iOS, Proprietary, 2007
macOS, Proprietary, 2001
Windows NT, Proprietary, 1993
Android, Apache, 2008
Alternatively, since we know that the string Year comes
at the end of the first line,
we can use grep
to invert search for that.
Here the dollar sign key $
is a regex indicating the
end of a line.
Like the above, this grep
command returns all lines that
do not match the string year at the end of a line,
ignoring case:
Command:
grep -vi "year$" operating-systems.csv
Output:
Chrome OS, Proprietary, 2009
FreeBSD, BSD, 1993
Linux, GPL, 1991
iOS, Proprietary, 2007
macOS, Proprietary, 2001
Windows NT, Proprietary, 1993
Android, Apache, 2008
The man grep
page lists other options,
but a couple of other good ones include:
Get a count of the matching lines with the -c
option:
Command:
grep -ic "proprietary" operating-systems.csv
Output:
4
Print only the match and not the whole line with the -o
option:
Command:
grep -io "proprietary" operating-systems.csv
Output:
Proprietary
Proprietary
Proprietary
Proprietary
We can simulate a Boolean OR search, and
print lines matching one or both strings
using the -E
option.
We separate the strings with a vertical bar |
.
This is similar to a Boolean OR search
since there's at least one match in the following string,
there is at least one result.
Here is an example where only one string returns a true value:
Command:
grep -Ei "bsd|atari" operating-systems.csv
Output:
FreeBSD, BSD, 1993
Here's an example where both strings evaluate to true:
Command:
grep -Ei "bsd|gpl" operating-systems.csv
Output:
FreeBSD, BSD, 1993
Linux, GPL, 1991
By default, grep
will return results where the
string appears within a larger word,
like OS in macOS.
Command:
grep -i "os" operating-systems.csv
Output:
OS, License, Year
Chrome OS, Proprietary, 2009
iOS, Proprietary, 2007
macOS, Proprietary, 2001
However, we might want to limit results so that we only return results where OS is a complete word. To do that, we can surround the string with special characters:
Command:
grep -i "\<os\>" operating-systems.csv
Output:
OS, License, Year
Chrome OS, Proprietary, 2009
Sometimes we want the context for a result;
that is,
we might want to print lines that surround our matches.
For example, print the matching line plus the two lines
after the matching line using the -A NUM
option:
Command:
grep -i "chrome" -A 2 operating-systems.csv
Output:
Chrome OS, Proprietary, 2009
FreeBSD, BSD, 1993
Linux, GPL, 1991
Or, print the matching line plus the two lines
before the matching line using the -B NUM
option:
Command
grep -i "android" -B 2 operating-systems.csv
Output:
macOS, Proprietary, 2001
Windows NT, Proprietary, 1993
Android, Apache, 2008
We can combine many of the variations. Here I search for the whole word BSD, case insensitive, and print the line before and the line after the match:
Command:
grep -i -A 1 -B 1 "\<bsd\>" operating-systems.csv
Output:
Chrome OS, Proprietary, 2009
FreeBSD, BSD, 1993
Linux, GPL, 1991
grep
is very powerful, and
there are more options listed in its man
page.
Note that I enclose my search strings in double quotes. For example:
grep "search string" filename.txt
It's not always required to enclose a search string in double quotes, but it's good practice because if your string contains more than one word or empty spaces, the search will fail.
Sed
sed
is a type of non-interactive text editor
that filters and transforms text (man sed
).
By default sed
works on standard output,
and edits can be redirected (>
or >>
) to new files
or, more appropriately,
made in-place using the -i
option.
Like the other utilities and programs we've covered,
including grep
,
sed
works line by line.
But unlike grep
,
sed
provides a way to address
specific lines or ranges of lines,
and then run filters or transformations on those lines.
Once the lines in a text file have been identified or addressed,
sed
offers a number of commands to filter or transform the text
at those specific lines.
This concept of the line address is important, but
not all text files are explicitly numbered by each line.
Below I use the nl
command to number lines in our file,
even though the contents of the file do not actually display line numbers:
Command:
nl operating-systems.csv
Output:
1 OS, License, Year
2 Chrome OS, Proprietary, 2009
3 FreeBSD, BSD, 1993
4 Linux, GPL, 1991
5 iOS, Proprietary, 2007
6 macOS, Proprietary, 2001
7 Windows NT, Proprietary, 1993
8 Android, Apache, 2008
After we've identified the lines in a file that we want to edit,
sed
offers commands to filter, transform, or edit
the text at the line addresses.
Some of these commands include:
a
: appending textc
: replace textd
: delete texti
: inserting textp
: print textr
: append text from files
: substitute text=
: print the current line number
Let's see how to use sed
to print line numbers
instead of using the nl
command.
To do so, we use the equal sign =
to
identify line numbers
(although note that it places the line numbers
just above each line):
Command:
sed '=' operating-systems.csv
Output:
1
OS, License, Year
2
Chrome OS, Proprietary, 2009
3
FreeBSD, BSD, 1993
4
Linux, GPL, 1991
5
iOS, Proprietary, 2007
6
macOS, Proprietary, 2001
7
Windows NT, Proprietary, 1993
8
Android, Apache, 2008
In the last section,
we used the tail
command to
remove the header line of our file, and above,
we used grep
to accomplish this task.
It's much easier to use sed
to remove the header line of the operating-systems.csv.
We simply specify the line number (1
) and
then use the delete command (d
).
Thus, we delete line 1:
Command:
sed '1d' operating-systems.csv
Output:
Chrome OS, Proprietary, 2009
FreeBSD, BSD, 1993
Linux, GPL, 1991
iOS, Proprietary, 2007
macOS, Proprietary, 2001
Windows NT, Proprietary, 1993
Android, Apache, 2008
Note that I use single apostrophes for the
sed
command. This is required.
If I wanted to make that a permanent deletion,
then I would use the -i
option,
which means that I would edit the file in-place
(see man sed
):
Command:
sed -i '1d' operating-system.csv
To refer to line ranges, I add a comma between addresses. Therefore, to edit lines 1, 2, and 3:
Command:
sed '1,3d' operating-systems.csv
Output:
Linux, GPL, 1991
iOS, Proprietary, 2007
macOS, Proprietary, 2001
Windows NT, Proprietary, 1993
Android, Apache, 2008
I can use sed
to find and replace strings.
The syntax for this is:
sed 's/regexp/replacement/' filename.txt
The regexp part of the above command can take regular expressions, but simple strings like words work here, too, since they are treated as regular expressions themselves.
In the next example,
I use sed
to search for the string "Linux",
and replace it with the string "GNU/Linux":
Command:
sed 's/Linux/GNU\/Linux/' operating-systems.csv
Output:
OS, License, Year
Chrome OS, Proprietary, 2009
FreeBSD, BSD, 1993
GNU/Linux, GPL, 1991
iOS, Proprietary, 2007
macOS, Proprietary, 2001
Windows NT, Proprietary, 1993
Android, Apache, 2008
Because the string GNU/Linux contains a forward slash, and because
sed
uses the forward slash as a separator, note that I escaped the forward slash with a back slash. This escape tellssed
to interpret the forward slash in GNU/Linux literally and not as a specialsed
character.
If we want to add new rows to the file,
we can append a
or insert i
text after or at specific lines:
To append text after line 3, use a
:
Command:
sed '3a FreeDOS, GPL, 1998' operating-systems.csv
Output:
OS, License, Year
Chrome OS, Proprietary, 2009
FreeBSD, BSD, 1993
FreeDOS, GPL, 1998
Linux, GPL, 1991
iOS, Proprietary, 2007
macOS, Proprietary, 2001
Windows NT, Proprietary, 1993
Android, Apache, 2008
To insert at line 3, use i
:
Command:
sed '3i CP\/M, Proprietary, 1974' operating-systems.csv
Output:
OS, License, Year
Chrome OS, Proprietary, 2009
CP/M, Proprietary, 1974
FreeBSD, BSD, 1993
Linux, GPL, 1991
iOS, Proprietary, 2007
macOS, Proprietary, 2001
Windows NT, Proprietary, 1993
Android, Apache, 2008
Note that the FreeDOS line doesn't appear
in the last output.
This is because I didn't use the -i
option nor
did I redirect output to a new file.
If we want to edit the file in-place,
that is, save the edits,
then the commands would look like so:
sed -i '3a FreeDOS, GPL, 1998' operating-systems.csv
sed -i '3i CP\/M, Proprietary, 1974' operating-systems.csv
Instead of using line numbers to specify
addresses in a text file,
we can use regular expressions as addresses,
which may be simple words.
In the following example,
I use the regular expression 1991$
instead of specifying line 4.
The regular expression 1991$
means
"lines ending with the string 1991".
Then I use the s
command to start a find and replace.
sed
finds the string Linux and then replaces
that with the string GNU/Linux.
I use the back slash to escape the
forward slash in GNU/Linux:
Command:
sed '/1991$/s/Linux/GNU\/Linux/' operating-systems.csv
Output:
OS, License, Year
Chrome OS, Proprietary, 2009
FreeBSD, BSD, 1993
GNU/Linux, GPL, 1991
iOS, Proprietary, 2007
macOS, Proprietary, 2001
Windows NT, Proprietary, 1993
Android, Apache, 2008
Here's an example using sed
to
simply search for a pattern.
In this example,
I'm interested in searching for all operating systems
that were released on or after 2000:
Command:
sed -n '/20/p' operating-systems.csv
Output:
Chrome OS, Proprietary, 2009
iOS, Proprietary, 2007
macOS, Proprietary, 2001
Android, Apache, 2008
The above would be equivalent to
grep "20" operating-systems.csv
.
sed
is much more powerful than
what I've demonstrated here, and
if you're interested in learning more,
there are lots of tutorials on the web.
Here are a few good ones:
- Learn to use the Sed text editor
- Sed Introduction
- Sed One-Liners Explained, Part I: File Spacing, Numbering and Text Conversion and Substitution
- sed one-liners
- Sed Tutorial
Awk
awk
is a complete scripting language designed for
"pattern scanning and processing" text.
It generally performs some action when
it detects some pattern and
is particularly suited for columns of structured data
(see man awk
).
awk
works on columns
regardless if the contents include structured data
(like a CSV file) or not (like a letter or essay).
If the data is structured,
then that means the data will be formatted in some way.
In the last few sections,
we have looked at a CSV file.
This is structured data because the data points
in this file are separated by commas.
For awk
to work with columns in a file,
it needs some way to refer to those columns.
In the examples below, we'll see that columns in a text
file are referred to by a dollar sign and
then the number of the column $n
.
So, $1
indicates column one,
$2
indicates column two, and so on.
If we use $0
, then we refer to the entire file.
In our example text file,
$1
indicates the OS Name column,
$2
indicates the License column,
$3
indicates the release Year column,
and $0
indicates all columns.
The syntax for awk
is a little different
than what we've seen so far.
Basically, awk
uses the following syntax,
where pattern is optional.
awk pattern { action statements }
Let's see some examples.
To print the first column of our file,
we do not need the pattern part of the command but
only need to state an action statement
(within curly braces).
In the command below,
the action statement is '{ print $1 }'
.
Command:
awk '{ print $1 }' operating-systems.csv
Output:
OS,
Chrome
FreeBSD,
Linux,
iOS,
macOS,
Windows
Android,
By default,
awk
considers the first empty space
as the field delimiter.
That's why in the command above
only the term Windows and Chrome appear in the results
even though it should be Windows NT and Chrome OS.
It's also why we see commas in the output.
To fix this,
we tell awk
to use a comma as the field separator,
instead of the default empty space.
To specify that we want awk
to treat the comma as a field delimiter,
we use the -F
option, and
we surround the comma with single quotes:
Command:
awk -F',' '{ print $1 }' operating-systems.csv
Output:
OS
Chrome OS
FreeBSD
Linux
iOS
macOS
Windows NT
Android
By specifying the comma as the field separator, our results are more accurate, and the commas no longer appear either.
Like grep
and sed
,
awk
can do search.
In this next example,
I print the column containing the string Linux.
Here I am using the pattern part
of the command: '/Linux/'
.
Command:
awk -F',' '/Linux/ { print $1 }' operating-systems.csv
Output:
Linux
Note how awk
does not return the whole
line but only the match.
With awk
,
we can retrieve more than one column, and
we can use awk
to generate reports,
which was part of the original motivation
to create this language.
In the next example,
I select columns two and one in that order,
which is something the cut
command cannot do.
I also add a space between the columns
using the double quotes to surround an empty space,
and I modified the field delimiter to include both
a comma and a space to get the output that I want:
Command:
awk -F', ' '{ print $2 " " $1 }' operating-systems.csv
Output:
License OS
Proprietary Chrome OS
BSD FreeBSD
GPL Linux
Proprietary iOS
Proprietary macOS
Proprietary Windows NT
Apache Android
I can make output more readable by adding text to print:
Command:
awk -F',' '{ print $1 " was released in" $3 "." }' operating-systems.csv
Output:
OS was released in Year.
Chrome OS was released in 2009.
FreeBSD was released in 1993.
Linux was released in 1991.
iOS was released in 2007.
macOS was released in 2001.
Windows NT was released in 1993.
Android was released in 2008.
Since awk
is a full-fledged programming language,
it understands data structures,
which means it can do math or work on strings of text.
Let's illustrate this by doing some math
or logic on column 3.
Here I print all of column three:
Command:
awk -F',' '{ print $3 }' operating-systems.csv
Output:
Year
2009
1993
1991
2007
2001
1993
2008
Next I print only the parts of column three that
are greater than 2005, and
then pipe |
the output through the sort
command
to sort the numbers in numeric order:
Command:
awk -F',' '$3 > 2005 { print $3 }' operating-systems.csv | sort
Output:
2007
2008
2009
If I want to print only the parts of column one where column three equals to 2007, then I would run this command:
Command:
awk -F',' '$3 == 2007 { print $1 }' operating-systems.csv
Output:
iOS
If I want to print only the parts of columns one and three where column 3 equals 2007:
Command:
awk -F',' '$3 == 2007 { print $1 $3 }' operating-systems.csv
Output:
iOS 2007
Or, print the entire line where column three equals 2007:
Command:
awk -F',' '$3 == 2007 { print $0 }' operating-systems.csv
Output:
iOS, Proprietary, 2007
I can print only those lines where column three is greater than 2000 and less than 2008:
Command:
awk -F',' '$3 > 2000 && $3 < 2008 { print $0 }' operating-systems.csv
Output:
iOS, Proprietary, 2007
macOS, Proprietary, 2001
Even though we wouldn't normally sum years,
let's print the sum of column three
to demonstrate how summing works in awk
:
Command:
awk -F',' 'sum += $3 { print sum }' operating-systems.csv
Output:
2009
4002
5993
8000
10001
11994
14002
Here are a few basic string operations. First, print column one in upper case:
Command:
awk -F',' '{ print toupper($1) }' operating-systems.csv
Output:
OS
CHROME OS
FREEBSD
LINUX
IOS
MACOS
WINDOWS NT
ANDROID
Or print column on in lower case:
Command:
awk -F',' '{ print tolower($1) }' operating-systems.csv
Output:
os
chrome os
freebsd
linux
ios
macos
windows nt
android
Or, get the length of each string in column one:
Command:
awk -F',' '{ print length($1) }' operating-systems.csv
Output:
2
9
7
5
3
5
10
7
We can add additional logic.
The double ampersands &&
indicate a
Boolean/Logical AND.
The exclamation point !
indicates a
Boolean/Logical NOT.
In the next example,
I print only those lines where
column three is greater than 1990,
and the line has the string "BSD" in it:
Command:
awk -F',' '$3 > 1990 && /BSD/ { print $0 }' operating-systems.csv
Output:
FreeBSD, BSD, 1993
Now I reverse that, and print only those lines where column three is greater than 1990 and the line DOES NOT have the string "BSD" in it:
Command:
awk -F',' '$3 > 1990 && !/BSD/ { print $0 }' operating-systems.csv
Output:
Chrome OS, Proprietary, 2009
Linux, GPL, 1991
iOS, Proprietary, 2007
macOS, Proprietary, 2001
Windows NT, Proprietary, 1993
Android, Apache, 2008
The double vertical bar ||
indicates
a Boolean/Logical OR.
The next command prints only those lines that
contain the string "Proprietary"
or the string "Apache",
or it would print both if both strings were in the text:
Command:
awk -F',' '/Proprietary/ || /Apache/ { print $0 }' operating-systems.csv
Output:
Chrome OS, Proprietary, 2009
iOS, Proprietary, 2007
macOS, Proprietary, 2001
Windows NT, Proprietary, 1993
Android, Apache, 2008
I can take advantage of regular expressions.
If the file that I was looking at was large,
and if I wasn't sure that some fields
would be upper or lower case,
then I could use regular expressions
to consider both possibilities.
That is, by adding [pP] and [aA],
awk
will check for both the words Proprietary
and proprietary,
and Apache and apache.
Command:
awk -F',' '/[pP]roprietary/ || /[aA]pache/ { print $0 }' operating-systems.csv
Output:
Chrome OS, Proprietary, 2009
iOS, Proprietary, 2007
macOS, Proprietary, 2001
Windows NT, Proprietary, 1993
Android, Apache, 2008
awk
is full-fledged programming language.
It provides conditionals, control structures, variables, etc.,
and so I've only scratched the surface.
If you're interested in learning more,
then check out some of these tutorials:
- Awk Command
- Awk One-Liners Explained, Part I: File Spacing, Numbering and Calculations
- Awk Tutorial
- How To Become a 10x Engineer using the Awk Command
- Linux/BSD command line wizardry: Learn to think in sed, awk, and grep
- Understanding AWK
Conclusion
The Linux (and other Unix-like OSes) command line
offers a lot of utilities to examine data.
Prior to this lesson, we covered a few of
them that help us get
parts of a file and then pipe those parts
through other commands
or redirect output to files.
We can use pipes and redirects with
grep
, sed
, and awk
.
If needed, we may be able to avoid using
the basic utilities like
cut
, wc
, etc if want to learn more powerful programs
like grep
, sed
, and awk
.
It's fun to learn and practice these.
Despite this, you do not have to become
a sed
or an awk
programmer.
Like the utilities that we've discussed in prior lectures,
the power of programs like these is that their on hand and
easy to use as one-liners.
If you want to get started,
the resources listed above can guide you.
Review
Here is a review of commands and concepts that we have covered so far.
Commands
We have covered the following commands so far:
Command | Example | Explanation |
---|---|---|
tree | tree -dfL 1 | List directories, full path, one level |
cd | cd ~ | change to home directory |
cd / | change to root directory | |
cd bin | change to bin directory from current directory | |
pwd | pwd | print working / current directory |
ls | ls ~ | list home directory contents |
ls -al | list long format and hidden files in current directory | |
ls -dl | list long format the current directory | |
man | man ls | open manual page for the ls command |
man man | open manual page for the man command | |
cp | cp * bin/ | copy all files in current directory to bin subdir |
mv | mv oldname newname | rename file oldname to newname |
mv oldir bin/newdir | move oldman to bin subdir and rename to newdir | |
rm | rm oldfile | delete file named oldfile |
rm -r olddir | delete directory olddir and its contents | |
touch | touch newfile | create a file called newfile |
touch oldfile | modify timestamp of file called oldfile | |
mkdir | mkdir newdir | create a new directory called newdir |
rmdir | rmdir newdir | delete directory called newdir if empty |
echo | echo "hello" | print "hello" to screen |
cat | cat data.csv | print contents of file called data.csv to screen |
cat data1.csv data2.csv | concatenate data1.csv and data2.csv to screen | |
less | less file | view contents of file called file |
sudo | sudo command | run command as superuser |
chown | sudo chown root:root file | change owner and group to root of file file |
chmod | chmod 640 file | change permissions of file to -rw-r----- |
chmod 775 somedir | change permissions of of somedir to drwxrwxr-x | |
groups | groups user | print the groups the user is in |
wc | wc -l file | print number of lines of file |
wc -w file | print number of words of file | |
head | head file | print top ten lines of file |
head -n3 file | print top three lines of file | |
tail | tail file | print bottom ten lines of file |
tail -n3 file | print bottom three lines of file | |
cut | cut -d"," -f2 data.csv | print second column of file data.csv |
sort | sort -n file | sort file by numerical order |
sort -rn file | sort file by reverse numerical order | |
sort -df file | sort file by dictionary order and ignore case | |
uniq | uniq file | report or omit repeated lines in sorted file |
uniq -c file | report count of duplicate lines in sorted file |
In addition to the above commands,
we also have pipelines using the |
.
Pipelines send the standard output of
one command to a second command
(or more).
The following command sorts the
contents of a file and then
sends the output to the uniq
command to remove duplicates:
sort file | uniq
Redirection uses the >
or the >>
to redirect output of a command to a file.
A single >
will overwrite the contents
of a file.
A double >>
will append to the
contents of a file.
Redirect the output of the ls
command to a file called dirlist:
ls > dirlist
Append the date to the end of the file dirlist:
date >> dirlist
Paths
I introduced the concept of absolute and relative paths in section 2.3. In this session, the goal is to revisit paths (locations of files and directories in the filesystem), and provide some examples. This will be important as we proceed to Bash scripting and other tasks going forward.
Change Directories
The cd
command is used to change directories.
When we login to our systems,
we will find ourselves in our $HOME directory,
which is located at /home/USER
.
To change to the root directory, type:
pwd
/home/sean
cd /
pwd
/
From there, to change to the /bin
directory:
cd bin
pwd
/bin
To change to the previous working directory:
cd -
pwd
/
To go home quickly, just enter cd
by itself:
cd
pwd
/home/sean
To change to the public_html
directory:
cd public_html
pwd
/home/sean/public_html
To change to the directory one level up:
cd ..
pwd
cd /home/sean
Make Directories
Sometimes we'll want to create new directories.
To do so, we use the mkdir
command.
To make a new directory in our $HOME directory:
pwd
/home/sean
mkdir documents
cd documents
pwd
/home/sean/documents
cd
pwd
/home/sean
To make more than one directory at the same time,
where the second or additional directories are nested,
use the -p
option:
mkdir -p photos/2022
Remove or Delete Files and Directories
To remove a file, we use the rm
command.
If the file is in a subdirectory,
specify the relative path:
pwd
/home/sean
rm public_html/index.html
To remove a file in a directory one level up,
use the ..
notation.
For example, if I'm in my documents directory,
and I want to delete a file in my home (parent) directory:
cd documents
pwd
/home/sean/documents
rm ../file.txt
Alternatively, I could the tilde as shorthand for $HOME:
rm ~/file.txt
To remove a file nested in multiple subdirectories, just specify the path (absolute or relative).
rm photos/2022/05/22/IMG_2022_05_22.jpg
Remember that the rm
command deletes files and directories.
Use it with caution,
or with the -i
option.
Copy Files or Directories
Let's say I want to copy a file in my $HOME directory to a nested directory:
cp file.txt documents/ICT418/homework/
Or, we can copy a file from one subdirectory to another.
Here I copy a file in my ~/bin
directory
to my ~/documents
directory.
The ~
(tilde) is shorthand for my $HOME directory.
cp ~/bin/file.txt ~/documents/``
Move or Rename Files or Directories
Let's say I downloaded a file to my ~/Downloads
directory,
and I want to move it to my ~/documents
directory:
mv ~/Downloads/article.pdf ~/documents/
Or, let's say we rename it in the process:
mv ~/Downloads/article.pdf ~/documents/article-2022.pdf
We can also move directories. Since the commandline is case-sensitive, let's say I rename the documents directory to Documents:
mv ~/documents ~/Documents
Conclusion
Use this page as a reference to the commands that we have covered so far.
Scripting the Command Line
We have learned some of the many commands available on the Linux command line as well as how to navigate around the filesystem. Now we can begin to learn how to use command line text editors in order to write Bash scripts.
Text editors
Working on the command line
means writing a lot of commands.
But there will be times when we
want to save some of the commands
that we write in order to re-use them later,
or we might want to develop the commands
into a script (i.e., program) because we might
want to automate a process.
The shell is great for writing one off commands,
so-called one-liners,
but it's not a great place
to write multi-line or very long commands.
Therefore it can be helpful
to write and save our commands
in a text editor.
In this lesson,
we'll learn about three text editors:
ed
, vim
, and nano
.
Of these,
I'll encourage you to use nano
, but
I want you to know something about
ed
and vim
because
ed
, even if not often used,
is historically important
to the Unix and Linux ecosystem.
(I use ed
almost daily).
Vim, which is my everyday editor,
is important and
highly used and
under active development to this day.
If you want to use Vim,
I'd encourage you to do so, but
know that it's not required because
it takes some time and
consistent practice to get good at it.
Another thing to keep
in mind is that
the shell that we are
working with is called bash
,
and bash
is a
full-fledged programming language.
That means that when we write a simple command,
like cd public_html
,
we are programming.
It makes sense that the
more programming that we do,
the better we'll get at it.
This requires
more sophisticated environments
to help manage our programs
than the command line
prompt can provide.
Text editors fulfill that role.
As we learn more about
how to do systems administration
with Linux,
we will need to edit
configuration files, too.
Most configuration files exist
in the /etc
directory.
For example,
later in the semester
we will install the
Apache Web Server, and
we will need to edit
Apache's configuration
files in the process.
We could do this using
some of the tools
that we've already covered,
like sed
and awk
,
but it'll make our lives
much easier to use a text editor.
In any case, in order to save our commands or edit text files, a text editor is very helpful. Programmers use text editors to write programs, but because programmers often work in graphical user environments, they may often use graphical text editors or graphical IDEs. As systems administrators, it would be unusual to have a graphical user interface installed on a server. The servers that we manage will contain limited or specific software that serves the server's main purpose. Additional software on a server that is not relevant to the main function of a server only takes up extra disk space, consumes valuable computing resources, and poses an additional security footprint.
As stated,
although ed
and vim
are difficult,
they are very powerful editors.
I use both daily, and
am in fact using vim
to write what this.
I believe they are
both worth learning;
however, for the purposes
of this course,
I think it's more important
that you are aware of them.
If you wish to learn more,
there are lots of additional
tutorials on the web
on how to use these
fine, esteemed text editors.
ed
ed
is a line editor
that is installed by default
on many Linux distributions.
Ken Thompson created ed
in the late 1960s
to write the original
Unix operating system.
It was used without computer monitors
because those were still uncommon,
and instead for teletypewriters (TTYs)
and printers.
The lack of a visual display,
like a monitor,
is the reason
that ed(1) was written
as a line editor.
If you visit that second link,
you will see the
terminal interface
from those earlier days.
It is the same basic
interface you are using now
when you use
your terminal applications,
which are virtualised versions
of those old teletypewriters.
I think this is a testament
of the power of the terminal:
that advanced computer users
still use the same
basic technology today.
In practice,
when we use a
line editor like ed
,
the main process of
entering text is like
any other editor.
The big difference is
when we need
to manipulate text.
In a graphical text editor,
if we want
to delete a word or
edit some text,
we might backspace
over the text or
highlight a word
and delete it.
In a line editor,
we manipulate text by
referring to lines or
across multiple lines and
then run commands
on the text in those line(s).
This is much
the same process
we followed when
we covered
grep
, sed
, and awk
,
and especially sed
, and
it should not
surprise you that
these are related.
To operationalize this,
like in sed
,
each line has an address.
The address for
line 7 is 7,
and so forth.
Line editors like
ed
are command driven.
There is no menu to
select from at
the top of the window,
and in fact,
when we used ed
to open an existing file,
the text in the file
isn't even printed on the screen.
If a user wants
to delete a word, or
print (to screen) some text,
the user has to command
the line editor
to print the relevant line
by specifying its address and
issuing a command
to delete the word on that line,
or print the line.
Line editors also
work on ranges of line,
including all the
lines in the file,
just like sed
does.
In fact,
many of the commands
that ed
uses are
also used by sed
,
since sed
is
based on ed
.
To compare:
Command | sed | ed |
---|---|---|
append text | a | a |
replace text | c | c |
delete text | d | d |
insert text | i | i |
print text | p | p |
substitute text | s | s |
print w/ line # | = | n |
However, there are big
differences that mainly relate
to the fact that
ed
is a text editor
and sed
is not.
For example,
here are some commands
that mostly make sense in
ed
as a text editor.
sed
can do
some of these tasks,
where it makes sense
(e.g., we don't quit sed
),
but sometimes in a non-trivial way.
Command | ed only |
---|---|
edit file | e |
join lines | j |
copies lines | t |
moves lines | m |
undo | u |
saves file | w |
quits ed | q |
Quits ed w/o saving | Q |
There are other differences, but these are sufficient for our purposes.
Let's see how to use
ed
to open a file, and
print the content with
and without line numbers.
ed operating-systems.csv
183
1,$p
OS, License, Year
Chrome OS, Proprietary, 2009
FreeBSD, BSD, 1993
Linux, GPL, 1991
iOS, Proprietary, 2007
macOS, Proprietary, 2001
Windows NT, Proprietary, 1993
Android, Apache, 2008
1,$n
1 OS, License, Year
2 Chrome OS, Proprietary, 2009
3 FreeBSD, BSD, 1993
4 Linux, GPL, 1991
5 iOS, Proprietary, 2007
6 macOS, Proprietary, 2001
7 Windows NT, Proprietary, 1993
8 Android, Apache, 2008
Using ed
,
another way to remove
the header line of the
operating-systems.csv file
is to specify the line number (1
)
and then the delete command (d
),
just like in sed
.
This becomes a permanent change
if I save the file
with the w
(write) command:
1d
1,$p
Chrome OS, Proprietary, 2009
FreeBSD, BSD, 1993
Linux, GPL, 1991
iOS, Proprietary, 2007
macOS, Proprietary, 2001
Windows NT, Proprietary, 1993
Android, Apache, 2008
To refer to line ranges, I add a comma between addresses. Therefore, to delete lines 1, 2, and 3, and then quit without saving:
1,3d
,p
iOS, Proprietary, 2007
macOS, Proprietary, 2001
Windows NT, Proprietary, 1993
Android, Apache, 2008
Q
Note that with sed
,
in order to make a change in-place,
we need to use the -i
option.
But with ed
,
we save changes
with the w
command.
I can use ed
to find and replace strings.
The syntax is the same as it is in sed
.
I'll start with a fresh version of the file:
ed operating-systems.csv
183
1,$s/Linux/GNU\/Linux/
If we want to add
new rows to the file,
we can append a
or insert i
text
after or at specific lines.
To append text after
line 3, use a
.
We enter a period
on a newline to leave
input mode and return
to command mode:
3a
FreeDOS, GPL, 1998
.
Because we enter input mode
when using the a
, i
,
or c
commands,
we enter a period .
on a line
by itself to revert
to command mode.
To insert at line 2, use i
:
2i
CP/M, Proprietary, 1974
.
Like sed
,
we can also
find and replace using
regular expressions instead
of line numbers.
I start a new ed
session
to reload the file to start fresh:
ed operating-systems.csv
183
/Linux/s/Linux/GNU\/Linux/
Of course,
ed
can be used
to write and
not simply edit files.
Let's start fresh,
in the following session,
I'll start ed
,
enter append mode a
,
write a short letter,
exit append mode .
,
name the file f
,
write w
(save) the file,
and quit q
:
ed
a
Dear Students,
I hope you find this really interesting.
Feel free to practice and play on the command line,
as well as use tools like ed, the standard editor.
Sincerely,
Dr. Burns
.
f letter.txt
w
q
It's good to know something
about ed
not just
for historical reasons,
but also because
the line editing technology
developed for it
is still in use today,
and is a basic part of
the vim
text editor,
which is a very
widely used application.
vim
The vim
text editor
is an improved version of
the vi
text editor
and is in fact called Vi IMproved.
(The original vi
text editor is
usually available via the nvi
editor these days.
nvi
is a rewrite
of the original.)
vim
is a visual editor.
It is multi-modal like ed
and
is a direct descendant through vi.
Due to this genealogy,
vim
can use many
of the same commands as ed
does
when vim
is in command mode.
Like ed
,
we can start vim
at the Bash prompt
with or without a file name.
Here I will open
the letter.txt file with vim
.
The default mode is command mode:
vim letter.txt
Dear Students,
I hope you find this really interesting.
Feel free to practice and play on the command line,
as well as use tools like ed, the standard editor.
Sincerely,
Dr. Burns
To enter insert mode,
I can type i
or a
for insert or append mode.
There isn't any difference
on an empty file,
but on a file that has text,
i
will start insert
mode where the cursor lies,
and a will start
insert mode
right-adjacent to the cursor.
Once in insert mode,
you can type text
as you normally would and
use the arrow keys
to navigate around the file.
To return to command mode in vim
,
you press the Esc key.
And then you can enter
commands like you would with ed
,
using the same syntax.
Unlike ed
,
when in command mode,
the commands we type are
not placed wherever the cursor is,
but at the bottom of the screen.
Let's first turn on
line numbers to know
which address is which,
and then we'll
replace ed with Ed.
Note that I precede
these commands with a colon:
:set number
:5s/ed/Ed/
One of the more powerful things
about both ed
and vim
is that
I can call Bash shell
commands from the editors.
Let's say that I wanted
to add the date to my letter file.
To do that,
Linux has a command
called date
that
will return today's
date and time.
To call the date
command
within Vim and
insert the output
into the file,
I press Esc to enter command mode
(if I'm not already in it),
enter a colon,
type r
for the
read into buffer command,
then enter the shell escape command,
which is an exclamation point !
,
and then the Bash shell date
command:
:r !date
Dear Students,
I hope you find this really interesting.
Feel free to practice and play on the command line,
as well as use tools like ed, the standard editor.
Thu Jun 30 02:44:08 PM EDT 2022
Sincerely,
Dr. Burns
Since the last edit I made
was to replace ed with Ed,
vim
entered the date
after that line,
which is line 6.
To move that date line
to the top of the letter,
I can use the move m
command and move it to line 0,
which is the top of the file:
:6m0
Thu Jun 30 02:44:30 PM EDT 2022
Dear Students,
I hope you find this really interesting.
Feel free to practice and play on the command line,
as well as use tools like Ed, the standard editor.
Sincerely,
Dr. Burns
Although you can use the
arrow keys and Page Up/Page Down keys
to navigate in vim
and vi
,
by far the most excellent thing
about this editor is to be
able to use the j,k,l,h keys
to navigate around a file:
j
moves down line by linek
moves up line by linel
moves right letter by letterh
moves left letter by letter
Like the other commands,
you can precede this with addresses.
To move 2 lines down,
you type 2j
,
and so forth.
vi
and vim
have had
such a powerful impact on
software development that
you can in fact use these
same keystrokes to navigate
a number of sites such as
Gmail, Facebook, Twitter, and more.
To save the file and exit vim
,
return to command mode
by pressing the Esc key,
and then write and quit:
:wq
The above barely
scratches the surface.
There are whole books
on these editors as well as
websites, videos, etc that explore them, and
especially vim
in more detail.
But now that you have
some familiarity with them,
you might find this funny:
Ed, man! !man ed.
nano
The nano
text editor
is the user-friendliest
of these text editors but
still requires some adjustment
as a new commandline user.
The friendliest thing about
nano
is that it is modeless,
which is what you're
already accustomed to using,
because it can be used
to enter text and manipulate text
without changing to
insert or command mode.
It is also friendly because,
like many graphical text editors
and software,
it uses control keys
to perform its operations.
The tricky part is that
the control keys are assigned
to different keystroke combinations
than what
many graphical editors
(or word processors) use
by convention today.
For example,
instead of Ctrl-c or Cmd-c to copy,
in nano
you press the M-6
key
(press Alt, Cmd, or Esc key
and 6
) to copy.
Then to paste,
you press Ctrl-u
instead
of the more common Ctrl-v
.
Fortunately, nano
lists
the shortcuts at the bottom
of the screen.
The shortcuts listed
need some explanation, though.
The carat mark is shorthand
for the keyboard's Control (Ctrl) key.
Therefore to Save As a file,
we write out the file
by pressing Ctrl-o
.
The M- key is also important,
and depending on your keyboard
configuration,
it may correspond to your
Alt, Cmd, or Esc
keys.
To search for text,
you press ^W
,
If your goal is to copy,
then press M-6
to copy a line.
Move to where you want
to paste the text,
and press Ctrl-u
to paste.
For the purposes of this class,
that's all you really
need to know about nano
.
Use it and get comfortable writing in it.
Some quick tips:
nano file.txt
will open and display the file named file.txt.nano
by itself will open to an empty page.- Save a file by pressing
Ctrl-o
. - Quit and save by pressing
Ctrl-x
. - Be sure to follow the prompts at the bottom of the screen.
Conclusion
In prior lessons, we learned how to use the Bash interactive shell and how to view, manipulate, and edit files from that shell. In this lesson, we learned how to use several command line text editors. Editors allow us to save our commands, create scripts, and in the future, edit configuration files.
The commands we used in this lesson include:
ed
: line-oriented text editorvim
: Vi IMproved, a programmer's text editornano
: Nano's ANOther editor, inspired by Pico
Regular Expressions
Oftentimes, as systems administrators,
we will need to search the contents of a file, like a log file.
One of the commands that we use to do that is the grep
command.
We have already discussed using the grep
command,
which is not unlike doing any kind of search,
such as in Google.
The command simply involves running grep
along with the search string and against a file.
Multiword strings
It's good habit to include search strings within quotes, but this is especially important if we would search for multiword strings. In these cases, we must enclose them in quotes.
Command:
cat cities.csv
Output:
City | 2020 Census | Founded
New York City, NY | 8804190 | 1624
Los Angeles, CA | 3898747 | 1781
Chicago, IL | 2746388 | 1780
Houston, TX | 2304580 | 1837
Phoenix, AZ | 1624569 | 1881
Philadelphia, PA | 1576251 | 1701
San Antonio, TX | 1451853 | 1718
San Diego, CA | 1381611 | 1769
Dallas, TX | 1288457 | 1856
San Jose, CA | 983489 | 1777
Command:
grep "San Antonio" cities.csv
Output:
San Antonio, TX | 1451853 | 1718
Whole words, case sensitive by default
As a reminder,
grep
commands are case-sensitive
by default, and
since the contents of cities.csv
are all in lowercase,
if I run the above command without
the city named capitalized,
then grep
will return nothing:
Command:
grep "san antonio" cities.csv
In order to tell grep to ignore case,
I need to use the -i
option.
We also want to make sure that
we enclose our entire search string
withing double quotes.
This is a reminder for you to run man grep
and
to read through the documentation and
see what the various options exit for this command.
Command:
grep -i "san antonio" cities.csv
Output:
San Antonio, TX | 1451853 | 1718
Whole words by the edges
To search whole words, we can use special characters to match strings at the start and/or the end of words. For example, note the output if I search for cities in California in my file by searching for the string ca. Since this string appears in Chicago, then that city matches my grep search:
Command:
grep -i "ca" cities.csv
Output:
Los Angeles, CA | 3898747 | 1781
Chicago, IL | 2746388 | 1780
San Diego, CA | 1381611 | 1769
San Jose, CA | 983489 | 1777
To limit results to only CA, we can enclose our search in certain special characters:
Command:
grep -i "\bca\b" cities.csv
Output:
Los Angeles, CA | 3898747 | 1781
San Diego, CA | 1381611 | 1769
San Jose, CA | 983489 | 1777
We can reverse that output and look for strings within other words. Here is an example of searching for the string ca within words:
Command:
grep -i "\Bca\B" cities.csv
Output:
Chicago, IL | 2746388 | 1780
Bracket Expressions and Character Classes
In conjunction with
the grep
command,
we can also use regular expressions
to search for more general patterns
in text files.
For example, we can use bracket expressions and
character classes to search
for patterns in the text.
Here again using man grep
is very important because
it includes instructions on
how to use these regular expressions.
Bracket expressions
From man grep
on bracket expressions:
A bracket expression is a list of characters enclosed by [ and ]. It matches any single character in that list. If the first character of the list is the caret ^ then it matches any character not in the list ... For example, the regular expression [0123456789] matches any single digit.
Within a bracket expression, a range expression consists of two characters separated by a hyphen. It matches any single character that sorts between the two characters
To see how this works, let's search the cities.csv file for letters matching A, B, or C. Specifically in the following command I use a hyphen to match any characters in the range A, B, C. The output does not include the cities Houston or Dallas since neither of those lines contain capital A, B, or C characters:
Command:
grep "[A-C]" cities.csv
Output:
City | 2020 Census | Founded
New York City, NY | 8804190 | 1624
Los Angeles, CA | 3898747 | 1781
Chicago, IL | 2746388 | 1780
Phoenix, AZ | 1624569 | 1881
Philadelphia, PA | 1576251 | 1701
San Antonio, TX | 1451853 | 1718
San Diego, CA | 1381611 | 1769
San Jose, CA | 983489 | 1777
Bracket expressions, inverse searches
When placed after the first bracket, the carat key acts as a Boolean NOT. The following command matches any characters not in the range A,B,C:
Command:
grep "[^A-C]" cities.csv
However, the output matches all lines since there are no instances of A, B, and C in all lines:
Output:
City | 2020 Census | Founded
New York City, NY | 8804190 | 1624
Los Angeles, CA | 3898747 | 1781
Chicago, IL | 2746388 | 1780
Houston, TX | 2304580 | 1837
Phoenix, AZ | 1624569 | 1881
Philadelphia, PA | 1576251 | 1701
San Antonio, TX | 1451853 | 1718
San Diego, CA | 1381611 | 1769
Dallas, TX | 1288457 | 1856
San Jose, CA | 983489 | 1777
Process substitution
We can confirm that output
from the first command
does not include Houston or Dallas
in the second command by comparing
the outputs of the two commands
using process substitution.
This is a technique that pipes
the standard output of multiple
commands to be processed by
another command.
Here I use the diff
command
to compare the output of both
grep
commands:
Command:
diff <(grep "[A-C]" cities.csv) <(grep "[^A-C]" cities.csv)
The diff
output shows
that the second grep
command includes the
two lines below that
are not in the output
of the first grep
command:
Output:
4a5
> Houston, TX | 2304580 | 1837
8a10
> Dallas, TX | 1288457 | 1856
The output of the
diff
command is nicely explained in this Stack Overflow answer.
Try this command for an alternate output:
diff -y <(grep "[A-C]" cities.csv) <(grep "[^A-C]" cities.csv)
Our ranges may be alphabetical or numerical. The following command matches any numbers in the range 1,2,3:
Command:
grep [1-3] cities.csv
Since all single digits appear in the file, the above command returns all lines. To invert the search, we can use the following grep command. This will match all non-integers:
Command:
grep [^0-9] cities.csv
Bracket expressions, carat preceding the bracket
We saw in a previous
section that the carat ^
key indicates
the start of line;
however, we learned above
that it is used to
return the inverse of a string.
To use the carat to signify
the start of a line,
the carat key must precede
the opening bracket.
For example, the following command matches
any lines that start with the upper case letters
within the range of N,O,P:
Command:
grep ^[N-P] cities.csv
Output:
New York City, NY | 8804190 | 1624
Phoenix, AZ | 1624569 | 1881
Philadelphia, PA | 1576251 | 1701
And we can reverse that with the following command, which returns all lines that do not start with N,O, or P:
Command:
grep ^[^N-P] cities.csv
Output:
City | 2020 Census | Founded
Los Angeles, CA | 3898747 | 1781
Chicago, IL | 2746388 | 1780
Houston, TX | 2304580 | 1837
San Antonio, TX | 1451853 | 1718
San Diego, CA | 1381611 | 1769
Dallas, TX | 1288457 | 1856
San Jose, CA | 983489 | 1777
Character classes
Character classes are special
types of predefined
bracket expressions.
They make it easy to
search for general patterns.
From man grep
on character classes:
Finally, certain named classes of characters are predefined within bracket expressions, as follows. Their names are self explanatory, and they are [:alnum:], [:alpha:], [:blank:], [:cntrl:], [:digit:], [:graph:], [:lower:], [:print:], [:punct:], [:space:], [:upper:], and [:xdigit:]. For example, [[:alnum:]] means the character class of numbers and letters ...
Here I search for anything
that matches the Year column.
Specifically, I search for
a empty space [[:blank:]]
,
a four digit string [[:digit:]]{4}
.
The {4}
means
"The preceding item is matched
exactly 4 times" (man grep
),
and the number 4 can be replaced
with any relevant number.
and an end of line $
:
Command:
grep -Eo "[[:blank:]][[:digit:]]{4}$" cities.csv
Output:
1624
1781
1780
1837
1881
1701
1718
1769
1856
1777
In the above command, the [[:blank:]]
can be excluded and
we'd still retrieve the desired results because
we've included the dollar sign to
mark the end of the line, but
I include it here for demonstration purposes.
Note that I also added the -E
option.
This is required for character classes.
Anchoring
As seen above,
outside of
bracket expressions and character classes,
we use the caret ^
to mark the beginning of a line.
We can also use the $
to match the end of a line.
Using either (or both)
is called anchoring.
Anchoring works in many places.
For example, to search all lines
that start with capital D through L
Command:
grep "^[D-L]" cities.csv
Output:
Los Angeles, CA | 3898747 | 1781
Houston, TX | 2304580 | 1837
Dallas, TX | 1288457 | 1856
And all lines that end with the numbers 4, 5, or 6:
Command:
grep "[4-6]$" cities.csv
Output:
New York City, NY | 8804190 | 1624
Dallas, TX | 1288457 | 1856
We can use both anchors in
our grep
commands.
The following searches
for any lines starting
with capital letters ranging
from D through L and any lines
ending with the numbers
starting from 4 through 6.
The single dot stands for any character,
and the asterisk stands for
"the preceding character will
zero or more times" (man grep
).
Command:
grep "^[D-L].*[4-6]$" cities.csv
Output:
Dallas, TX | 1288457 | 1856
Repetition
If we want to use regular expressions to identify repetitive patterns,
then we can use repetition operators.
As we saw above,
the most useful one is the *
asterisk.
But there are other options:
In come cases, we need to add the -E option
to extend grep
's regular expression functionality:
Here, the preceding item S is matched one or more times:
Command:
grep -E "S+" cities.csv
Output:
San Antonio, TX | 1451853 | 1718
San Diego, CA | 1381611 | 1769
San Jose, CA | 983489 | 1777
In the next search, the preceding item l is matched exactly 2 times:
Command:
grep -E "l{2}" cities.csv
Output:
Dallas, TX | 1288457 | 1856
Finally, in this example, the preceding item 7 is matched at least two times or at most three times:
Command:
grep -E "7{2,3}" cities.csv
Output:
San Jose, CA | 983489 | 1777
OR searches
We can use the vertical bar |
to do a Boolean OR search.
In a Boolean OR statement,
the statement is True if either
one part is true,
the other part is true,
or both are true.
In a search statement,
this means that at least one part
of the search is true.
The following will return lines for each city because they both appear in the file:
Command:
grep -E "San Antonio|Dallas" cities.csv
Output:
San Antonio, TX | 1451853 | 1718
Dallas, TX | 1288457 | 1856
The following will match San Antonio even though Lexington does not appear in the file:
Command:
grep -E "San Antonio|Lexington" cities.csv
Output:
San Antonio, TX | 1451853 | 1718
Conclusion
We covered a lot in this section on grep
and regular expressions.
We specifically covered:
- multiword strings
- whole word searches and case sensitivity
- bracket expressions and character classes
- anchoring
- repetition
- Boolean OR searches
Even though we focused on grep
,
many these regular expressions work
across many programming languages.
See Regular-Expression.info for more in-depth lessons on regular expressions.
Bash Scripting
It's time to get started on Bash scripting. So far, we've been working on the Linux commandline. Specifically, we have been working in the Bash shell. Wikipedia refers to Bash as a command language, and by that it means that Bash is used as a commandline language but also as a scripting language. The main purpose of Bash is to write small applications/scripts that analyze text (e.g., log files) and automate jobs, but it can be used for a variety of other purposes.
Variables
One of the most important abilities of any programming or scripting language is to be able to declare a variable. Variables enable us to attach some value to a name. That value may be temporary, and it's used to pass information to other parts of a program.
In Bash, we declare a variable with the name of the variable,
an equal sign,
and then the value of the variable within double quotes.
Do not insert spaces.
In the following code snippet,
which can be entered on the commandine,
I create a variable named name
and assign it the value Sean
.
I create another variable named backup
and assign it the value /media
.
Then I use the echo
and cd
commands
to test the variables:
name="Sean"
backup="/media"
echo "My name is ${name}"
echo "${backup}"
cd "${backup}"
pwd
cd
Variables may include values that may change given some context.
For example, if we want a variable to refer to today's day of week,
we can use command substitution,
which "allows the output of a command
to replace the command name" (see man bash
).
Thus, the output at the time this variable is set
will differ if it is set on a different day.
today="$(date +%A)"
echo "${today}"
The curly braces are not strictly necessary, but they offer benefits when we start to use things like array variables. See:
For example, let's look at basic brace expansion, which can be used to generate arbitrary strings:
echo {1..5}
echo {5..1}
echo {a..l}
echo {l..a}
Another example: using brace notation, we can generate multiple sub-directories at once. Start off in your home directory, and:
mkdir -p homework/{drafts,notes}
cd homework
ls
But more than that, they allow us to deal with arrays (or lists).
Here I create a variable named seasons
,
which holds an array, or multiple values: winter spring summer fall
.
Bash lets me access parts of that array:
seasons=(winter spring summer fall)
echo "${seasons[@]}"
echo "${seasons[1]}"
echo "${seasons[2]}"
echo "${seasons[-1]}"
See Parameter expansions for more advanced techniques.
Conditional Expressions
Whether working on the commandline, or writing scripts in a text editor, it's sometimes useful to be able to write multiple commands on one line. There are several ways to do that. We can include a list of commands on one line in Bash where each command is separated by a semicolon:
cd ; ls -lt
But we can use conditional expressions and
apply logic with &&
(Logical AND) or ||
(Logical OR).
Here, command2
is executed if and only if command1
is successful:
command1 && command2
Here, command2
is executed if and only if command1
fails:
command1 || command2
Example:
cd documents && echo "success"
cd documents || echo "failed"
# combine them:
cd test && pwd || echo "no such directory"
mkdir test
cd test && pwd || echo "no such directory"
Shebang or Hashbang
When we start to write scripts, the first thing we add is a shebang at line one. We can do so a couple of ways:
##!/usr/bin/env bash
The first one should be more portable, but alternatively, you could put the direct path to Bash:
#!/usr/bin/bash
Looping
There are several looping methods Bash,
including: for
, while
, until
, and select
.
The for
loop is often very useful.
for i in {1..5} ; do
echo "${i}"
done
With that, we can create a rudimentary timer:
for i in {1..5} ; do
echo "${i}" && sleep 1
done
We can loop through our seasons variable:
seasons=(winter spring summer fall)
for i in ${seasons[@]} ; do
echo "I hope you have a nice ${i}"
done
Testing
Sometimes we will want to test certain conditions.
There are two parts to this,
we can use if; then ; else
commands,
and we can also use the double square brackets: [[
.
There are a few ways to get documentation on these functions.
See the following:
man test
help test
help [
help [[
help if
We can test integers:
if [[ 5 -ge 3 ]] ; then
echo "true"
else
echo "false"
fi
Reverse it to return the else statement:
if [[ 3 -ge 5 ]] ; then
echo "true"
else
echo "false"
fi
We can test strings:
if [[ "$HOME" = "$PWD" ]] ; then
echo "You are home."
else
echo "You are not home, but I will take you there."
cd "$HOME"
pwd
fi
We can test file conditions.
Let's first create a file called paper.txt and
a file called paper.bak.
We will add some trivial content to paper.txt
but not to the paper.bak.
The following if
statement will test if paper.txt
has a more recent modification date, and if so,
it'll back up the file with the cp
and echo back its success:
if [[ "$HOME/paper.txt" -nt "$HOME/paper.bak" ]] ; then
cp "$HOME/paper.txt" "$HOME/paper.bak" && echo "Paper is backed up."
fi
Here's a script that prints info depending on which day of the week it is:
day1="Tue"
day2="Thu"
day3="$(date +%a)"
if [[ "$day3" = "$day1" ]] ; then
printf "\nIf %s is %s, then class is at 9:30am.\n" "$day3" "$day1"
elif [[ "$day3" = "$day2" ]] ; then
printf "\nIf %s is %s, then class is at 9:30am.\n" "$day3" "$day2"
else
printf "\nThere is no class today."
fi
Resources
I encourage you to explore some useful guides and cheat sheets on Bash scripting:
- Advanced Bash-Scripting Guide
- Bash scripting cheatsheet
- Bash shellcheck
- Shell Scripting for Beginners
- Bash Shell Scripting for Beginners
- Introduction to Bash
Summary
In this demo, we learned about:
- creating and referring to variables
- conditional expressions with
&&
and||
- adding the shebang or hashbang at the beginning of a script
- looping with the
for
statement - testing with the
if
statement
These are the basics. I'll cover more practical examples in upcoming demos, but note that mastering the basics requires understanding a lot of the commands and paths that we have covered so far in class. So keep practicing.
Managing the System
Now that we have the basics of the command line interface down, it's time to learn some systems administration. In this section, we learn how to expand storage space, create new user and group accounts and manage those accounts, install and remove software, and manage that software and other processes.
Expanding Storage
I'm sure all or most of you have needed extra disk storage at some point (USB drives, optical disks, floppies???). Such needs are no different for systems administrators, who often are responsible for managing, monitoring, or storing large amounts of data.
The disk that we created for our VM is small (10 GB), and that's fine for our needs, albeit quite small in many real world scenarios. To address this, we can add a persistent disk that is much larger. In this section, we will add a disk to our VM, mount it onto the VM's filesystem, and format it. Extra storage does incur extra cost. So at the end of this section, I will show you how to delete the extra disk to avoid that if you want.
We will essentially follow the Google Cloud tutorial to add a non-boot disk to our VM, but with some modification.
Add a persistent disk to your VM
Note: the main disk used by our VM is the boot disk. The boot disk contains the software required to boot the system. All of our computers (desktops, laptops, tablets, phones, etc.), regardless of which operating system they run, have some kind of boot system.
Creating a Disk
In the Google Cloud console, visit the Disks page in the Storage section, which should be here:
And then follow these steps:
- Under Name, add a preferred name or leave the default.
- Under Description, add text to describe your disk.
- Under Location, leave or choose Single zone.
- We are not concerned about data safety.
- If we were, then we would select other options here.
- Under Source, select Blank disk.
- Under Disk settings, select Balanced persistent disk.
- Under Size, change this to 10GB.
- You can actually choose larger sizes, but be aware that disk pricing is $0.10 per GB.
- At that cost, 100 GB = $10 / month.
- Click on Enable snapshot schedule.
- Under Encryption, make sure Google-managed encryption key is selected.
- Click Create to create your disk.
Adding the Disk to our VM
Now that we have created our disk, we need to mount it onto our filesystem so that it's available to our VM. Conceptually, this process is like inserting a new USB drive into our computer.
To add the new disk to our VM, follow these steps:
- Visit the VM instances page.
- Click on the check box next to your virtual machine.
- That will convert the Name of your VM into a hyperlink.
- Click on that Name.
- That will take you to the VM instance details page.
- Click on the Edit button at the top of the details page.
- Under the Additional disks section, click on + ATTACH EXISTING DISK.
- A panel will open on the right side of your browser.
- Click on the drop down box and select the disk, by name, you created.
- Leave the defaults as-is.
- Click on the SAVE button.
- Then click on the SAVE button on the details page.
If you return to the Disks page in the Storage section, you will now see that the new disk is in use by our VM.
Formatting and Mounting a Non-Boot Disk
Formatting Our Disk
In order for our VM to make use of the extra storage, the new drive must be formatted and mounted. Different operating systems use different filesystem formats. You may already know that macOS uses the Apple File System (APFS) by default and that Windows uses the New Technology File System (NTFS). Linux is no different, but uses different file systems than macOS and Windows, by default. There are many formatting technologies that we can use in Linux, but we'll use the ext4 (fourth extended filesystem) format, since this is recommended by Google Cloud and is also a stable and common one for Linux.
In this section, we will closely follow the steps outlined under the Formatting and mounting a non-boot disk on a Linux VM section. I replicate those instructions below, but I highly encourage you to read through the instructions on Google Cloud and here:
- Use the
gcloud compute ssh
command that you have previously used to connect to your VM.- Alternatively, you can
ssh
to your VM via your browser:- Click on the VM instances page.
- Under the Connect column, select Open in browser window next to SSH.
- Alternatively, you can
- When you have connected to your VM's command line, run the
lsblk
command.- Ignore the loop devices.
- Instead, you should see sda and sdb under the NAME column
outputted by the
lsblk
command. - sda represents your main disk.
- sda1, sda14, sda15 (may be slightly different for you) represent the partitions of the sda disk.
- Notice the MOUNTPOINT for sda1 is
/
, or the root level of our filesystem.
- sdb represents the attached disk we just added.
- After we format this drive, there will be an sdb1, and this partition will also have a mountpoint.
To format our disk for the ext4 filesystem,
we will use the mkfs.ext4
(see man mkfs.ext4
for details).
The instructions tell us to run the following command
(please read the Google Cloud instructions closely;
it's important to understand these commands as much
as possible and not just copy and paste them):
sudo mkfs.ext4 -m 0 -E lazy_itable_init=0,lazy_journal_init=0,discard /dev/DEVICE_NAME
But replace DEVICE_NAME with the name of our device.
My device's name is sdb,
which we saw with the output of the lsblk
command;
therefore, the specific command I run is:
sudo mkfs.ext4 -m 0 -E lazy_itable_init=0,lazy_journal_init=0,discard /dev/sdb
Mounting Our Disk
Now that our disk has been formatted in ext4, I can mount it.
Note: to mount a disk simply means to make the disk's filesystem available so that we can use it for accessing, storing, etc files on the disk. Whenever we insert a USB drive, a DVD drive, etc into our computers, the OS you use should mount that disk automatically so that you can access and use that disk. Conversely, when we remove those drives, the OS unmounts them. In Linux, the commands for these are
mount
andumount
. Note that theumount
command is not unmount.
You will recall that we have
discussed filesystems earlier,
and that the term is a bit confusing since it refers to both
the directory hierarchy and also the formatting type (e.g., ext4).
In that prior section, I discussed how in Windows,
attaching a new drive,
whether it's a USB drive, a DVD drive,
an additional disk drive, or an external drive,
Windows gives the new drive a letter,
like A:, B:, D:, etc.
Unlike Windows, I mentioned that in Linux and Unix (e.g., macOS),
when we add an additional disk,
its filesystem gets added onto our existing one.
That is, it becomes part of the directory hierarchy
and under the /
top level part of the hierarchy.
In practice, this means that we have to create the mountpoint
for our new disk, and
we do that with the mkdir
command.
The Google Console documentation instructs us
to use the following command:
sudo mkdir -p /mnt/disks/MOUNT_DIR
And to replace MOUNT_DIR with the directory we want to create. Since my added disk is named disk-1, I'll call it that:
sudo mkdir -p /mnt/disks/disk-1
Now we can mount
the disk to that directory.
Per the instructions on Google Console,
and given that my added drive has the device name sdb,
I use the following command:
sudo mount -o discard,defaults /dev/sdb /mnt/disks/disk-1
We also need to change the modifications, and grant access for additional users:
sudo chmod 777 /mnt/disks/disk-1
We can test that it exists and is accessible with
the lsblk
and the cd
commands.
The lsblk
command should show that sdb is
mounted at /mnt/disks/disk-1
, and
we can cd
(change directory) to it:
cd /mnt/disks/disk-1
Automounting Our Disk
Our disk is mounted, but if the computer (VM) gets rebooted,
we would have to re-mount
the additional drive manually.
In order to avoid this and automount the drive upon reboot,
we need to edit the file /etc/fstab
.
Note that the file is named fstab and that it's located in the /etc directory. Therefore the full path is
/etc/fstab
The fstab file is basically a configuration file
that provides information to the OS about the filesystems
the system can mount.
The standard information fstab contains includes
the name (or label) of the device being mounted,
the mountpoint (e.g., /mnt/disks/disk-1
),
the filesystem type (e.g., ext4),
and various other mount options.
See man fstab
for more details.
For devices to mount upon boot up automatically,
they have to be listed in this file.
That means we need to edit this file on our VM.
Again, here we're following the Google Cloud instructions:
Before we edit system configuration files, however,
always create a backup.
We'll use the cd
command to create a backup of the fstab file.
sudo cp /etc/fstab /etc/fstab.backup
Next we use the blkid
command to get
the UUID (universally unique identifier)
number for our new device.
Since my device is /dev/sdb
,
I'll use that:
sudo blkid /dev/sdb
The output should look something like this BUT NOTE that your UUID value will be DIFFERENT:
/dev/sdb: UUID="3bc141e2-9e1d-428c-b923-0f9vi99a1123" TYPE="ext4"
We need to add that value to /etc/fstab
plus the standard information that file requires.
The Google Cloud documentation explicitly guides us here.
We'll use nano
to make the edit:
sudo nano /etc/fstab
And then add this line at the bottom:
UUID=3bc141e2-9e1d-428c-b923-0f9vi99a1123 /mnt/disks/disk-1 ext4 discard,defaults,nofail 0 2
And that's it! If you reboot your VM, or if your VM rebooted for some reason, the extra drive we added should automatically mount upon reboot. If it doesn't, then it may mean that the drive failed, or that there was an error (i.e., typo) in the configuration.
Delete the Disk
You are welcome to keep the disk attached to the VM, but if you do not want to incur any charges for it, which would be about $1 / month at 10 GB, then we can delete it.
To delete the disk,
first delete the line that we added in /etc/fstab
,
unmount it,
and then delete the disk in the gcloud console.
To unmount the disk, we use the umount
command:
sudo umount /mnt/disks/disk-1
Then we need to delete the disk in gcloud.
- Go to the VM instances page.
- Click on the check box next to the VM.
- Click on the name, which should be a hyperlink.
- This goes to the VM instances detail page.
- Click on the Edit button at the top of the page.
- Scroll down to the Additional disks section.
- Click the edit (looks like a pencil) button.
- In the right-hand pane that opens up, select Delete disk under the Deletion rule section.
- Scroll back to the Additional disks section.
- Click on the
X
to detach the disk. - Click on Save.
- Go the Disk section in the left-hand navigation pane.
- Check the disk to delete, and then Delete it.
- Click on the Snapshots section in the left-hand navigation pane.
- Check the disk snapshot to delete, and then Delete it.
- Be sure you don't delete your VM here but just your disk.
Conclusion
In this section we learned how to expand
the storage of our VM by creating a new virtual drive,
adding it to our VM,
formatting the drive in the ext4 filesystem format,
mounting the drive at /mnt/disks/disk-1
, and
then editing /etc/fstab
to make automount the drive.
In addition to using the gcloud console, the commands we used in this section include:
ssh
: to connect to the remote VMsudo
: to run commands as the administratormkfs.ext
: to create an ext4 filesystem on our new drivemkdir -p
: to create multiple directories under/mnt
mount
: to mount manually the new driveumount
: to unmount manually the new drivechmod
: to change the mountpoint's file permission attributescd
: to change directoriescp
: to copy a filenano
: to use the text editornano
to edit/etc/fstab
Managing Users and Groups
In some cases we'll want to provide user accounts on the servers we administrate, or we'll want to set up servers for others to use. The process of creating accounts is fairly straightforward, but there are a few things to know about how user accounts work.
The passwd file
The /etc/passwd file contains information
about the users on your system.
There is a man page that describes the file, but
man pages are divided into sections (see man man
), and
the man page for the passwd
file is in section 5.
Therefore in order to read the man page
for the /etc/passwd file,
we run the following command:
man 5 passwd
Before we proceed, let's take a look at a single line of the file. Below I'll show the output for a made up user account:
grep "peter" /etc/passwd
peter:x:1000:1000:peter,,,:/home/peter:/bin/bash
The line starting with peter is a colon separated line. That means that the line is composed of multiple fields each separated by a colon.
man 5 passwd
tells us what each field indicates.
The first field is the login name,
which in this case is peter.
The second field, marked x, marks the password field.
This file does not contain the password, though.
The passwords, which are hashed and salted,
for users are stored in the /etc/shadow file,
which can only be read by the root user
(or using the sudo
command).
Hashing a file or a string of text is a process of running a hashing algorithm on the file or text. If the file or string is copied exactly, byte for byte, then hashing the copy will return the same value. If anything has changed about the file or string, then the hash value will be different. By implication, this means that if two users on a system use the same password, then the hash of each will be equivalent. Salting a hashed file (or file name) or string of text is a process of adding random data to the file or string. Each password will have a unique and mostly random salt added to it. This means that even if two users on a system use the same password, salting their passwords will result in unique values.
The third column indicates the user's numerical ID, and
the fourth column indicates the users' group ID.
The fifth column repeats the login name, but
could also serve as a comment field.
Comments are added using certain commands (discussed later).
The fifth field identifies the user's home directory,
which is /home/peter.
The sixth field identifies the user's default shell,
which is /bin/bash
.
The user name or comment field merely repeats the login name here,
but it can hold specific types of information.
We can add comments using the chfn
command.
Comments include the user's full name,
their home and work phone numbers,
their office or room number, and so forth.
To add a full name to user peter's account,
we use the -f option:
sudo chfn -f "Peter Parker" peter
The /etc/passwd file is a standard Linux file, but
some things will change depending on the Linux distribution.
For example, the user and group IDs above start at 1000 because
peter is the first human account on the system.
This is a common starting numerical ID nowadays,
but it could be different on other Linux or Unix-like distributions.
The home directory could be different on other systems, too;
for example, the default could be located at /usr/home/peter.
Also, other shells exist besides bash
,
like zsh,
which is now the default shell on macOS;
so other systems may default to different shell environments.
The shadow file
The /etc/passwd file does not contain any passwords but
a simple x to mark the password field.
Passwords on Linux are stored in /etc/shadow and
are hashed with sha512,
which is indicated by $6$.
You need to be root to examine the shadow file or use sudo
:
The fields are (see man 5 shadow
):
- login name (username)
- encrypted password
- days since 1/1/1970 since password was last changed
- days after which password must be changed
- minimum password age
- maximum password age
- password warning period
- password inactivity period
- account expiration date
- a reserved field
The /etc/shadow file should not be directly edited.
To set, for example, a warning that a user's password will expire,
we would use the passwd
command
(see man passwd
for options).
The following command would make it so the user peter
is warned that their password will expire in 14 days:
passwd -w 14 peter
The group file
The /etc/group file holds group information
about the entire system
(see man 5 group
).
The file can be viewed by anyone on a system, by default,
but there is also a groups
command
(see man groups
)
that will return the groups for a user.
Running the groups
command by itself
will return your own memberships.
Management Tools
There are different ways to create new users and groups, and the following list includes most of the utilities to help with this. Note that, based on the names of the utilities, some of them are repetitive.
- useradd (8) - create a new user or update default new user information
- usermod (8) - modify a user account
- userdel (8) - delete a user account and related files
- groupadd (8) - create a new group
- groupdel (8) - delete a group
- groupmod (8) - modify a group definition on the system
- gpasswd (1) - administer /etc/group and /etc/gshadow
- adduser.conf (5) - configuration file for adduser(8) and addgroup(8) .
- adduser (8) - add a user or group to the system
- deluser (8) - remove a user or group from the system
- delgroup (8) - remove a user or group from the system
- chgrp (1) - change group ownership
The numbers within parentheses above indicate the man
section.
Therefore, to view the man page for the userdel
command:
man 8 userdel
Practice
Modify default new user settings
Let's modify some default user account settings for new users, and then we'll create a new user account.
Before we proceed, let's review several important configuration files that establish some default settings:
- /etc/skel
- /etc/adduser.conf
The /etc/skel directory defines the home directory for new users. Whatever files or directories exist in this directory at the time a new user account is created will result in those files and directories being created in the new user's home directory. We can view what those are using the following command:
ls -a /etc/skel/
The /etc/adduser.conf file defines
the default parameters for new users.
It's in this file
where the default starting user and group IDs are set,
where the default home directory is located
(e.g., in /home/),
where the default shell is defined
(e.g., /bin/bash
),
where the default permissions are set for new
home user directories
(e.g., 0755)
and more.
Let's change some defaults for /etc/skel.
We need to use sudo [command]
or
use su
to become the root user.
I prefer to use sudo [command]
since this is a bit safer than becoming root.
Let's edit the default .bashrc file:
sudo nano /etc/skel/.bashrc
We want to add these lines at the end of the file.
This file is a configuration file for /bin/bash
,
and will be interpreted by Bash.
Therefore, lines starting with a hash mark
are comments:
# Dear New User,
#
# I have made the following settings
# to make your life a bit easier:
#
# make "c" a shortcut for "clear"
alias c='clear'
Use nano
again to create a README file.
This file will be added to the
home directories of all new users.
Add any welcome message you want to add,
plus any guidelines for using the system.
sudo nano /etc/skel/README
Add new user account
After writing (saving) and exiting nano
,
we can go ahead and create a new user named linus.
sudo adduser linus
We'll be prompted to enter a password for the new user,
plus comments (full name, phone number, etc).
Any of these can be skipped by pressing enter.
You can see from the output of the grep
command
below that I added some extra information:
grep "linus" /etc/passwd
linus:x:1003:1004:Linus Torvalds,333,555-123-4567,:/home/linus:/bin/bash
Let's modify the minimum days before the password can be changed, and the maximum days of the password's lifetime:
sudo passwd -n 90 linus
sudo passwd -x 180 linus
You can see these values by grepping the shadow file:
sudo grep "linus" /etc/shadow
To log in as the new user,
use the su
command:
su linus
To exit the new user's account,
use the exit
command:
exit
Add users to a new group
Because of the default configuration defined in /etc/adduser.conf,
the linus user only belongs to a group of the same name.
Let's create a new group that both
linus and peter belong to.
For that, we'll use the -a option for the gpasswd
command.
We'll also make the user peter the group administrator
using the -A option
(see man gpasswd
for more details).
sudo groupadd developers
sudo gpasswd -a peter developers
sudo gpasswd -A peter developers
sudo gpasswd -a linus developers
grep "developers" /etc/group
Note: if a user is logged in when you add them to a group, they need to logout and log back in before the group membership goes into effect.
Create a shared directory
One of the benefits of group membership is that members can work in a shared directory.
Let's make the /srv/developers a shared directory. The /srv directory already exists, so we only need to create the developers subdirectory:
sudo mkdir /srv/developers
We'll have to change the default permissions, which are currently set to 0755:
ls -ld /srv
ls -ld /srv/developers
Now we can change ownership of the directory:
sudo chgrp developers /srv/developers
The directory ownership should now reflect that it's owned by the developers group:
ls -ld /srv/developers
In order to allow group members to read and write to
the above directory,
we need to use the chmod
command in a way we haven't yet.
Specifically, we add a leading 2 sets the group identity.
The 770 indicates that the user and group owners of
the directory have read, write, and execute permissions
for the directory:
sudo chmod 2770 /srv/developers
Now either linus or peter can add, modify, and delete files in the /srv/developers directory.
User account and group deletion
You can keep the additional user and group on your system,
but know that you can also remove them.
The deluser
and delgroup
commands
offer great options
and may be preferable to the others utilities
(see man deluser
or man delgroup
).
If we want to delete the new user's account and the new group, these are the commands to use. The first command will create an archival backup of linus' home directory and also remove the home directory and any files in it.
deluser --backup --remove-home linus
delgroup developers
Managing Software
Introduction
Many modern Linux distributions offer some kind of package manager to install, manage, and remove software. These package management systems interact with curated and audited central repositories of software that are collected into packages. They also provide a set of tools to learn about the software that exists in these repositories.
If package management seems like an odd concept to you, it's just a way to manage software installation, and it's very similar to the way that Apple and Google distribute software via the App Store and Google Play.
On Debian based systems,
which includes Ubuntu,
we use apt
, apt-get
, and apt-cache
to manage most software installations.
For most cases,
you will simply want to use the apt
command,
as it is meant to combine the functionality commonly
used with apt-get
and apt-cache
.
We can also install software from source code or
from pre-built binaries.
On Debian and Ubuntu, for example,
we might want to install
(if we trust it)
pre-build binaries distributed on the
internet as .deb files.
These are comparable to .dmg files for macOS
and to .exe files for Windows.
When installing .deb files, though,
we need to use the dpkg
command.
Installing software from source code often involves compiling the software. It's usually not difficult to install software this way, but it can become complicated to manage software that's installed from source code simply because it means managing dependencies and keeping a close eye on new versions of the software.
Another way to install software
(I know, there's a lot)
is to use the snap
command.
This is a newer way of packaging programs
that involves packaging all of a program
and all of its dependencies
into a single container.
The main point of snap seems to be aimed
at IoT and embedded devices, but
it's perfectly usable and preferable
(in some scenarios)
on the desktop because the general
aim is end users and not system administrators.
See the snap store for examples.
You might also want to know that some programming languages provide their own mechanisms to install packages. In many cases, these packages may be installed with the
apt
command, but the packages thatapt
will install tend to be older (but more stable) than the packages that a programming language will install. For example, Python has thepip
orpip3
command to install and remove Python libraries. The R programming language has theinstall.packages()
,remove.packages()
, and theupdate.packages()
commands to install R libraries.
Despite all these ways to install, manage, remove, and
update software,
we will focus on using the apt
command, which
is pretty straightforward.
APT
Let's look at the basic apt
commands.
apt update
Before installing any software, we need to update the index of packages that are available for the system.
sudo apt update
apt upgrade
The above command will also state if there is software on the system that is ready for an upgrade. If any upgrades are available, we run the following command:
sudo apt upgrade
apt search
We may know a package's name when we're ready to install it, but we also may not. To search for a package, we use the following syntax:
apt search [package-name]
Package names will never have spaces between words. Rather, if a package name has more than one word, each word will be separated by a hyphen.
In practice, say I'm curious if there are any console based games:
apt search ncurses game
I added ncurses to my search query because the ncurses library is often used to create console-based applications.
apt show
The above command returned a list that includes
a game called ninvaders, which
seems to be a console-based Space Invaders like game.
To get additional information about this package,
we use the apt show [package-name]
command:
apt show ninvaders
apt install
It's quite simple to install the package called ninvaders:
sudo apt install ninvaders
apt remove or apt purge
To remove an installed package,
we can use either the apt remove
or
the apt purge
commands.
Sometimes when a program is installed,
configuration files get installed with it
in the /etc directory.
The apt purge
command will remove
those configuration files but the
apt remove
command will not.
Both commands are offered because sometimes
it is useful to keep those configuration files.
sudo apt remove ninvaders
Or:
sudo apt purge ninvaders
apt autoremove
All big software requires other software to run.
This other software are called dependencies.
The apt show [package-name]
command
will list a program's dependencies.
However, when we remove software with the prior
two commands, the dependencies,
even if no longer needed,
are not necessarily removed.
To remove them,
(which restores more disk space)
we do:
sudo apt autoremove
apt history
Unfortunately, the apt
command does
not provide a way to get a history of how
it's been used on a system, but
a log of its activity is kept.
We can review that log with the following command:
less /var/log/apt/history.log
Daily Usage
This all may seem complicated, but it's really not. For example, to keep my systems updated, I run the following two commands on a daily or near daily basis:
apt update
sudo apt upgrade
Conclusion
There are a variety of ways to install
software on a Linux or Ubuntu system.
The common way to do it on Ubuntu is to
use the apt
command, which
was covered in this section.
We'll come back to this command often because
we'll soon install and setup a complete LAMP
(Linux, Aapache, MariaDB, and PHP) server.
Until then, I encourage you to read through the
manual page for apt
:
man apt
Using systemd
Introduction
When computers boot up, obviously some software manages that process. On Linux and other Unix or Unix-like systems, this is usually handled via an init system. For example, macOS uses launchd and many Linux distributions, including Ubuntu, use systemd.
systemd does more than handle the startup process, it also manages various services and connects the Linux kernel to various applications. In this section, we'll cover how to use systemd to manage services, and to review log files.
Manage Services
When we install complicated software, like a web server (e.g., Apache2, Nginx), a SSH server (e.g., OpenSSH), or a database server (e.g., mariaDB or MySQL), then it's helpful to have commands that manage that service (the web service, the SSH service, the database service, etc).
For example, the ssh
service is installed
by default on our gcloud servers, and
we can check its status with the following
systemctl
command:
systemctl status ssh
The output tells us a few things. The line beginning with Loaded tells us that the SSH service is configured. At the end of that line, it also tells us that it is enabled, which means that the service will automatically start when the system gets rebooted or starts up.
The line beginning with Active tells us
that the service is active (running) and for how long.
It has to say this since I'm connecting to the machine
using ssh
.
If the service was not active (running), then
I wouldn't be able to login remotely.
We also can see the process ID (PID) for the service
as well as how much memory it's using.
At the bottom of the output,
we can see the recent log files.
We can view more of those log files
using the journalctl
command.
By default, running journalctl
by itself
will return all log files, but
we can specify that we're interested in
log files only for the ssh service.
We can specify using the PID number.
Replace NNN with the PID number attached
to your ssh service:
journalctl _PID=NNN
Or we can specify by service, or more specifically, its unit name:
journalctl -u ssh
Use Cases
Later we'll install the Apache web server, and
we will use systemctl
to manage some aspects of this service.
In particular, we will use the following commands to:
- check the state of the Apache service,
- configure the Apache service to auto start on reboot,
- start the service,
- reload the service after editing its configuration files, and
- stop the service.
In order, these work out to:
systemctl status apache2
sudo systemctl enable apache2
sudo systemctl start apache2
sudo systemctl reload apache2
sudo systemctl stop apache2
systemctl
is a big piece of software, and
there are other arguments the command will take.
See man systemct
for details.
Examine Logs
As mentioned, the journalctl
command is
part of the systemd software suite, and
it is used to monitor system logs.
It's really important to monitor system logs. They help identify any problems in the system or with various services. For example, by monitoring the log entries for ssh, I can see all the attempts to break into the server. Or if the Apache2 web server malfunctions for some reason, which might be because of a configuration error, the logs will indicated how to identify the problem.
If we type journalctl
at the command prompt,
we are be presented with the logs for the entire system.
These logs can be paged through by pressing the space bar,
the page up/page down keys, or
the up/down arrow keys, and
they can also be searched by pressing the forward slash / and
then entering a search keyword.
To exit out of the pager,
press q to quit.
journalctl
It's much more useful to specify the field and
to declare an option when using journalctl
.
See the following man pages for details:
man systemd.journal-fields
man journalctl
There are many fields and options we can use, but as an example, we see that there is an option to view the more recent entries first (which is not the default):
journalctl -r
Or we view log entries in reverse order, for users on the system, and since the last boot with the following options:
journalctl -r --user -b 0
Or for the system:
journalctl -r --system -b 0
I can more specifically look at the
logs files for a service by using the -u
option with journalctl
:
journalctl -u apache2
I can follow the logs in real-time (press ctrl-c to quit the real-time view):
journalctl -f
Useful Systemd Commands
You can see more of what systemctl
or journalctl
can do by reading through their documentation:
man systemctl
man journalctl
You can check if a service if enabled:
systemctl is-enabled apache2
You can reboot, poweroff, or suspend a system (suspending a system mostly makes sense for laptops and not servers):
systemctl reboot
systemctl poweroff
systemctl suspend
To show configuration file changes to the system:
systemd-delta
To list real-time control group process, resource usage, and memory usage:
systemd-cgtop
- to search failed processes/services:
systemctl --state failed
- to list services
systemctl list-unit-files -t service
- to examine boot time:
systemd-analyze
Conclusion
This is a basic introduction to systemd, which is composed of a suite of software to help manage booting a system, managing services, and monitoring logs.
We'll put what we've learned into practice when we set up our LAMP servers.
Networking and Security
Even if we do not work as network administrators, system administrators need to know network basics. In this section, we cover TCP/IP and other protocols related to the internet protocol suite, and how to protect our systems locally, from external threats, and how to create backups of our systems in case of disaster.
Networking and TCP/IP
An important function of a system administrator is to set up, configure, and monitor a network. This may involve planning, configuring, and connecting the devices on a local area network, to planning and implementing a large network that interfaces with an outside network, and to monitoring networks for various sorts of attacks, such as denial of service attacks.
In order to prepare for this type of work, we need at least a basic understanding of how the internet works and how local devices interact with the internet. In this section, we will focus mostly on internet addressing, but we will also devote some space to TCP and UDP, two protocols for transmitting data.
Connecting two or more devices together nowadays involves the TCP/IP or the UDP/IP protocols, otherwise part of the Internet protocol suite. This suite is a kind of expression of the more generalized OSI communication model.
The internet protocol suite is generally framed as a series of layers beginning with a lower layer, the link layer, that interfaces with internet capable hardware, to the highest layer, the application layer.
The link layer describes the local area network. Devices connected locally, e.g., via Ethernet cables, comprise the link layer. The link layer connects to the internet layer. Data going into or out of a local network must be negotiated between these two layers.
The internet layer makes the internet possible by basically making the ability to transmit data among multiple networks possible. (The internet is, in fact, a network of networks). The primary characteristic of the internet layer is the IP address, which currently comes in two versions: IPv4 (32 bit) and IPv6 (128 bit). IP addresses are used to locate hosts on a network.
The transport layer makes the exchange of data on the internet possible. There are two dominant protocols attached to this layer: UDP and TCP. Very generally, UDP is used when the integrity of data is less important than the its ability to reach its destination. For example, streaming video, VOIP, and online gaming are often transported via UDP because the loss of some pixels or some audio is acceptable. TCP is used when the integrity of the data is important. If the data cannot be transmitted without error, then the data won't reach its final destination until the error is corrected.
The application layer provides the ability to use the internet in particular ways. For example, the HTTP protocol enables the web, which is simply an application on the internet. The SMTP, IMAP, and POP protocols control email exchange. DNS is a system that maps IP addresses to domain names. In this book, we use SSH, also part of the application layer, to connect to remote computers.
By application, they simply mean that these protocols provide the functionality for applications. They are not themselves considered user applications, like a web browser.
The Internet Protocol Suite
Link Layer
ARP (Address Resolution Protocol)
ARP (Address Resolution Protocol) is a protocol at the link layer and is used to map network addresses, like an IP address, to the ethernet addresses, also called the MAC or Media Access Control address, or the hardware address. Routers use MAC addresses to enable communication inside networks (w/in subnets or local area networks) so that computers within a local network can talk to each other. Networks are designed so that IP addresses are associated with MAC addresses before systems can communicate over a network. Everyone of your internet capable devices, your smartphone, your laptop, your internet connected toaster, have a MAC address.
To get ARP info for a system,
we use the ip
command,
which uses regular options (like -b
) and
names specific objects.
To get the MAC address for a specific computer,
we can use the following command,
where ip
is the command and
a
or link
are considered objects
(see man ip
for details):
ip a
On my home system, the above command produces three numbered sections of output. The first section refers to the lo or loopback device. This is a special device that allows the computer to communicate with itself. It always has an IPv4 address of 127.0.0.1. The next section on my home machine refers to the ethernet card. Currently, I'm connected via wifi, and so this section reports the MAC address for that ethernet card plus some other other information, such as whether the device is down or up. Since there's no physical cable connecting my machine to the router, this section reports DOWN. The third section on my home system refers to the wifi card. Since this is UP (or active), it reports the internal IP address (e.g., 192.168.0.4), plus the MAC address, and other details. The internal address is different from the machine's external address, which might be something like 159.3.45.2.
We can get just the link object information with the following command:
ip link
The following two commands help identify parts of the local network (or subnet) and the routing table.
ip neigh
ip route
The ip neigh
command produces the ARP cache,
basically what other systems your system
is aware of on the local network.
The ip route
command is used to
define how data is routed on the network but
can also define the routing table.
Both of these commands are more commonly used on
Linux-based routers.
These details enable the following scenario: A router gets configured to use a specific network address when it's brought online. It searches the sub network for connected MAC addresses that are assigned to wireless cards or ethernet cards. It then assigns each of those MAC addresses an available IP address based on the network address.
Internet Layer
IP (Internet Protocol)
The Internet Protocol, or IP, address is used to uniquely identify a host on a network and place that host at a specific location (its IP address). If that network is subnetted (i.e., routed), then a host's IP address will have a subnet or private IP address that will not be directly exposed to the Internet. Remember this, there are public IP addresses that are distinct from private IP addresses. Public IP addresses are accessible on the internet. Private IP addresses are not, but they are accessible on subnets or local area networks.
Private IP address ranges are reserved address ranges, which means no public internet device will have an IP address within these ranges. The private address ranges include:
Start Address | End Address |
---|---|
10.0.0.0 | 10.255.255.255 |
172.16.0.0 | 172.31.255.255 |
192.168.0.0 | 192.168.255.255 |
If you have a router at home, and look at the IP address for at any of your devices connected to that router, like your phone or computer, you will see that it will have an address within one of the ranges above. For example, it might have an IP address beginning with 192.168.X.X. This is a standard IP address range for a home router. The 10.X.X.X private range can assign many more IP addresses on its network, which is why you'll see that IP range on bigger networks, like a university's network. We'll talk more about subnetwork sizes, shortly.
Example Private IP Usage
Let's say my office computer's IP address is 10.163.34.59/24 via a wired connection. My office neighbor has an IP address of 10.163.34.65/24 via their wired connection. Both IP addresses are private because they fall within the 10.0.0.0 to 10.255.255.255 range. And it's likely they both exist on the same subnet since they share the first three octets: 10.163.34.XX.
However, if we both, using our respective wired connected computers, searched Google for what's my IP address, we will see that we share the same public IP address, which will be something like 128.163.8.25. That is a public IP address because it does not fall within the ranges listed above.
Without any additional information, therefore, we know that all traffic coming from our computers and going out to the internet looks like it's coming from the same IP address (128.163.8.25). And in reverse, all traffic coming from outside our network first goes to 128.163.8.25 before it's routed to our respective computers via the router.
Let's say I also have a laptop in my office, and
that it has a wireless connection.
When I check with ip a
,
I find that the laptop had the IP address
10.47.34.150/16.
You can see there's a different pattern with this IP address.
The reason it has a different pattern is
because this laptop is on an different subnet even
though it's physically sitting next to the wired computer.
This wireless subnet was configured to allow
more hosts to connect to it since it must allow for more devices
(i.e., laptops, phones, etc).
When I searched Google for my IP address from this laptop,
it reports 128.163.238.148,
indicating that UK owns a range of public IP address spaces.
Here's kind of visual diagram of what this network looks like:

Using the ip
Command
The ip
command can do more than
provide us information about our network.
We can also use it to turn a connection
to the network on or off (and more).
Here we disable and then enable a connection on a machine.
Note that enp0s3 is the name of my network card/device.
Yours might have a different name.
sudo ip link set enp0s3 down
sudo ip link set enp0s3 up
Transport Layer
The internet layer does not transmit content, like web pages or video streams. This is the work of the transport layer. As discussed previously, the two most common transport layer protocols are TCP and UDP.
TCP, Transmission Control Protocol
TCP or Transmission Control Protocol is responsible for the transmission of data and for making sure the data arrives at its destination w/o errors. If there are errors, the data is re-transmitted or halted in case of some failure. Much of the data sent over the internet is sent using TCP.
UDP, User Datagram Protocol
The UDP or User Datagram Protocol performs a similar function as TCP, but it does not error check and data may get lost. UDP is useful for conducting voice over internet calls or for streaming video, such as through YouTube, which uses a type of UDP transmission called QUIC that has builtin encryption.
TCP and UDP Headers
The above protocols send data in data TCP packets or UDP datagrams, but these terms may be used interchangeably. Packets for both protocols include header information to help route the data across the internet. TCP includes ten fields of header data, and UDP includes four fields.
We can see this header data using
the tcpdump
command,
which requires sudo
or being root to use.
The first part of the IP header contains the source address,
then comes the destination address, and so forth.
Aside from a few other parts,
this is the primary information in an IP header.
You should use tcpdump
on your local computer and not
on your gcloud instance.
First we identify the IP number of a host,
which we can do with the ping
command, and
then run tcpdump
:
ping -c1 www.uky.edu
sudo tcpdump host 128.163.35.46
While that's running,
we can type that IP address in our web browser,
or enter www.uky.edu, and
watch the output of tcpdump
.
TCP headers include port information and other mandatory fields for both source and destination servers. The SYN, or synchronize, message is sent when a source or client requests a connection. The ACK, or acknowledgment, message is sent in response, along with a SYN message, to acknowledge the request for a connection. Then the client responds with an additional ACK message. This is referred to as the TCP three-way handshake. In addition to the header info, TCP and UDP packets include the data that's being sent (e.g., a webpage) and error checking if it's TCP.
Ports
TCP and UDP connections use ports to bind internet traffic to specific IP addresses. Specifically, a port associates a process with an application (and is part of the application layer of the internet suite), such as a web service or outgoing email. That is, ports provide a way to distinguish and filter internet traffic (web, email, etc) through an IP address. For example, all traffic going to IP address 10.0.5.33:80 means that this is http traffic for the http web service, since HTTP is commonly associated with port 80. Note that the port info is attached to the end of the IP address via a colon.
Common ports include:
- 21: FTP
- 22: SSH
- 25: SMTP
- 53: DNS
- 143: IMAP
- 443: HTTPS
- 587: SMTP Secure
- 993: IMAP Secure
There's a complete list of the 318 default ports/protocols on your Linux systems. It's located in the following file:
less /etc/services
And to get a count of the ports, we can invert grep for lines starting with a pound sign or are empty
grep -Ev "^#|^$" /etc/services | wc -l
See also the Wikipedia page: List of TCP and UDP port numbers
IP Subnetting
Let's now return to the internet layer and discuss one of the major duties of a systems administrator: subnetting.
Subnets are used to carve out smaller and more manageable subnetworks out of a larger network. They are created using routers that have this capability (e.g., commercial use routers) and certain types of network switches.
Private IP Ranges
When subnetting local area networks, we work with the private IP ranges:
Start Address | End Address |
---|---|
10.0.0.0 | 10.255.255.255 |
172.16.0.0 | 172.31.255.255 |
192.168.0.0 | 192.168.255.255 |
It's important to be able to work with IP addresses like those listed above in order to subnet; and therefore, we will need to learn a bit of IP math along the way.
IP Meaning
An IPv4 address is 32 bits (8 x 4), or four bytes, in size. In human readable context, it's usually expressed in the following, decimal-based, notation style:
- 192.168.1.6
- 172.16.3.44
Each set of numbers separated by a dot is referred to as an octet. An octet is a group of 8 bits. Eight bits equal a single byte. By implication, 8 gigabits equals 1 gigabyte, and 8 megabits equals 1 megabyte. We use these symbols to note the terms:
Term | Symbol |
---|---|
bit | b |
byte | B |
octet | o |
Each bit is represented by either a 1 or a 0. For example, the first address above in binary is:
- 11000000.10101000.00000001.00000110 is 192.168.1.6
Or:
Byte | Decimal Value |
---|---|
11000000 | 192 |
10101000 | 168 |
00000001 | 1 |
00000110 | 6 |
IP Math
When doing IP math, one easy way to do it is to simply remember that each bit in each of the above bytes is a placeholder for the following values:
128 64 32 16 8 4 2 1
Alternatively, from low to high:
base-2 | Output |
---|---|
20 | 1 |
21 | 2 |
22 | 4 |
23 | 8 |
24 | 16 |
25 | 32 |
26 | 64 |
27 | 128 |
In binary, 192 is equal to 11000000. It's helpful to work backward. For IP addresses, all octets are 255 or less (256 total, from 0 to 255) and therefore do not exceed 8 bits or places. To convert the integer 192 to binary:
1 * 2^7 = 128
1 * 2^6 = 64 (128 + 64 = 192)
Then STOP. There are no values left, and so the rest are zeroes. Thus: 11000000
Our everyday counting system is base-10, but binary is base-2, and thus another way to convert binary to decimal is to multiple each bit (1 or 0) by the power of base two of its placeholder:
(0 * 2^0) = 0 +
(0 * 2^1) = 0 +
(0 * 2^2) = 0 +
(0 * 2^3) = 0 +
(0 * 2^4) = 0 +
(0 * 2^5) = 0 +
(1 * 2^6) = 64 +
(1 * 2^7) = 128 = 192
Another way to convert to binary: simply subtract the numbers from each value. As long as there is something remaining or the placeholder equals the remainder of the previous subtraction, then the bit equals 1. So:
- 192 - 128 = 64 -- therefore the first bit is equal to 1.
- Now take the leftover and subtract it:
- 64 - 64 = 0 -- therefore the second bit is equal to 1.
Since there is nothing remaining, the rest of the bits equal 0.
Subnetting Examples
Subnetting involves dividing a network into two or more subnets. When we subnet, we first identify the number of hosts, aka, the size, we will require on the subnet. For starters, let's assume that we need a subnet that can assign at most 254 IP addresses to the devices attached to it via the router.
In order to do this, we need two additional IP addresses: the subnet mask and the network address/ID. The network address identifies the network and the subnet mask marks the boundary between the network and the hosts. Knowing or determining the subnet mask allows us to determine how many hosts can exist on a network. Both the network address and the subnet mask can be written as IP addresses, but these IP addresses cannot be assigned to computers on a network.
When we have determined these IPs, we will know the broadcast address. This is the last IP address in a subnet range, and it also cannot be assigned to a connected device/host. The broadcast address is used by a router to communicate to all connected devices on the subnet.
For our sake, let's work through this process backwards; that is, we want to identify and describe a network that we are connected to. Let's work with two example private IP addresses that exist on two separate subnets.
Example IP Address 1: 192.168.1.6
Using the private IP address 192.168.1.6, let's derive the network mask and the network address (or ID) from this IP address. First, convert the decimal notation to binary. State the mask, which is /24, or 255.255.255.0. And then derive the network addressing using an bitwise logical AND operation:
11000000.10101000.00000001.00000110 IP 192.168.1.6
11111111.11111111.11111111.00000000 Mask 255.255.255.0
-----------------------------------
11000000.10101000.00000001.00000000 Network Address 192.168.1.0
Note the mask has 24 ones followed by 8 zeroes. That 24 is used as CIDR notation:
- 192.168.1.6/24
For Example 1, we thus have the following subnet information:
Type | IP |
---|---|
Netmask/Mask | 255.255.255.0 |
Network ID | 192.168.1.0 |
Start Range | 192.168.1.1 |
End Range | 192.168.1.254 |
Broadcast | 192.168.1.255 |
Example IP Address 2: 10.160.38.75
For example 2, let's start off with a private IP address of 10.160.38.75 and a mask of /24:
00001010.10100000.00100110.01001011 IP 10.160.38.75
11111111.11111111.11111111.00000000 Mask 255.255.255.0
-----------------------------------
00001010.10100000.00100110.00000000 Network Address 10.160.38.0
Type | IP |
---|---|
Netmask/Mask | 255.255.255.0 |
Network ID | 10.160.38.0 |
Start Range | 10.160.38.1 |
End Range | 10.160.38.254 |
Broadcast | 10.160.38.255 |
Example IP Address 3: 172.16.1.62/24
For example 3, let's start off with a private IP address of 172.16.1.62 and a mask of /24:
10101100 00010000 00000001 00100111 IP 172.16.1.62
11111111 11111111 11111111 00000000 Mask 255.255.255.0
-----------------------------------
10101100 00010000 00000001 00000000 Network Address 172.16.1.0
Type | IP |
---|---|
Netmask/Mask | 255.255.255.0 |
Network ID | 172.16.1.0 |
Start Range | 172.16.1.1 |
End Range | 172.16.1.254 |
Broadcast | 172.16.1.255 |
Determine the Number of Hosts
To determine the number of hosts on a CIDR /24 subnet, we look at the start and end ranges. In all three of the above examples, the start range begins with X.X.X.1 and ends with X.X.X.254. Therefore, there are 254 maximum hosts allowed on these subnets because 1 to 254, inclusive of 1 and 254, is 254.
Example IP Address 4: 10.0.5.23/16
The first three examples show instances where the CIDR is set to /24. This only allows 254 maximum hosts on a subnet. If the CIDR is set to /16, then we can theoretically allow 65,534 hosts on a subnet.
For example 4, let's start off then with a private IP address of 10.0.5.23 and a mask of /16:
00001010.00000000.00000101.00010111 IP Address: 10.0.5.23
11111111.11111111.00000000.00000000 Mask: 255.255.0.0
-----------------------------------------------------------
00001010.00000000.00000000.00000000 Network ID: 10.0.0.0
Type | IP |
---|---|
IP Address | 10.0.5.23 |
Netmask/Mask | 255.255.0.0 |
Network ID | 10.0.0.0 |
Start Range | 10.0.0.1 |
End Range | 10.0.255.254 |
Broadcast | 10.0.255.255 |
Since the last two octets/bytes now vary, we count up by each octet. Therefore, the number of hosts is:
IPs | |
---|---|
10.0.0.1 | |
10.0.0.255 | = 256 |
10.0.1.1 | |
10.0.255.255 | = 256 |
- Number of Hosts = 256 x 256 = 65536
- Subtract Network ID (1) and Broadcast (1) = 2 IP addresses
- Number of Usable Hosts = 256 x 256 - 2 = 65534
IPv6 subnetting
We're not going to cover IPv6 subnetting, but if you're interested, this is a nice article: IPv6 subnetting overview
Conclusion
As a systems administrator, it's important to have a basic understanding of how networking works, and the basic models used to describe the internet and its applications. System administrators have to know how to create subnets and defend against various network-based attacks.
In order to acquire a basic understanding, this section covered topics that included:
- the internet protocol suite
- link layer
- internet layer
- transport layer
- IP subnetting
- private IP ranges
- IP math
In the next section, we extend upon this and discuss the domain name system (DNS) and domain names.
DNS and Domain Names
The DNS (domain name system) is referred to as the phone book of the internet, and it's responsible for mapping IP addresses to memorable names. Thus, instead of having to remember:
https://128.163.35.46
We can instead remember this:
https://www.uky.edu
System administrators need to know about DNS because they may be responsible for administrating a domain name system on their network, and/or they may be responsible for setting up and administrating web site domains. Either case requires a basic understanding of DNS.
DNS Intro Videos
To help you get started, watch these two YouTube videos. The first one provides an overview of the DNS system:
How a DNS Server (Domain Name System) works
The second video illustrates how to use a graphical user interface to create and manage DNS records.
And here is a nice intro to recursive DNS:
https://www.cloudflare.com/learning/dns/what-is-recursive-dns/
FQDN: The Fully Qualified Domain Name
The structure of the domain name system is like the structure of the UNIX/Linux file hierarchy; that is, it is like an inverted tree.
The fully qualified domain name includes a period at the end of the top-level domain. Your browser is able to supply that dot since we often don't use it when typing website addresses.
Thus, for Google's main page, the FQDN is:
FQDN: www.google.com.
And the parts include:
. root domain
com top-level domain
google. second-level domain
www. third-level domain
This is important to know so that you understand how the Domain Name System works and which DNS servers are responsible for their part of the network.
Root Domain
The root domain is managed by root name servers.
These servers are listed on the IANA,
the Internet Assigned Numbers Authority, website, but
are managed by multiple operators.
The root servers manage the root domain,
alternatively referred to as the zone, or
the . at the end of the .com.
, .edu.
, etc.
Alternative DNS Root Systems
It's possible to have alternate internets by using outside root name servers. This is not common, but it happens. Read about a few of them here:
- sdf: https://web.archive.org/web/20081121061730/http://www.smtpnic.org/
- opennic: https://www.opennicproject.org/
- alternic: https://en.wikipedia.org/wiki/AlterNIC
Russia, as an example, has threated to use it's own alternate internet based on a different DNS root system. This would essentially create a large, second internet. You can read about in this IEEE Spectrum article.
Top Level Domain (TLD)
We are all familiar with top level domains. Specific examples include:
- generic TLD names:
- .com
- .gov
- .mil
- .net
- .org
- and ccTLD, country code TLDs
- .ca (Canada)
- .mx (Mexico)
- .jp (Japan)
- .uk (United Kingdom)
- .us (United States)
We can download a list of those top level names from IANA, and get a total count of 1,487 (as of August 2022):
wget https://data.iana.org/TLD/tlds-alpha-by-domain.txt
sed '1d' tlds-alpha-by-domain.txt | wc -l
Second Level Domain Names
In the Google example, the second level domain is google. The second level domain along with the TLD together, along with any further subdomains, for the fully qualified domain name. Other examples include:
- redhat in redhat.com
- debian in debian.org.
- wikipedia in wikipedia.org
- uky in uky.edu
- twitter in twitter.com
Third Level Domain Names / Subdomains
When you've purchased (leased) a top and second level
domain like ubuntu.com,
you can choose whether you add third level domains.
For example: www is a third level domain or subdomain.
If you owned example.org
,
you could dedicate a machine or a cluster of machines
to www.example.org
that
resolve to a different location, or
www.example.org
could resolve to the second-level domain itself.
That is:
- www.debian.org can point to debian.org
It could also point to a separate server, such that debian.org and www.debian.org would be two separate servers with two separate websites or services, just like maps.google.com points to a different site than mail.google.com. Both maps and mail are subdomains of google.com. Although this is not common with third-level domains that start with www, it is common with others.
For example, with hostnames that are not www
:
- google.com resolves to www.google.com
- google.com does not resolve to:
- drive.google.com, or
- maps.google.com, or
- mail.google.com
This is because those other three provide different, but specific services.
DNS Paths
A recursive DNS server is the first DNS server to be queried in the DNS system, which is usually managed by an ISP. This is the resolver server in the first video above. This server queries itself (recursive) to check if the domain to IP mapping has been cached (remembered/stored) in its system.
If it hasn't been cached, then the DNS query is forwarded to a root server. There are thirteen root servers.
echo {a..m}.root-servers.net.
Those root servers will identify the next server to query, depending on the top level domain (.com, .net, .edu, .gov, etc.). If the site ends in .com or .net, then the next server might be something like: a.gtld-servers.net. Or if the top level domain ends in .edu, then: a.edu-servers.net.. If the top level domain ends in .gov, then: a.gov-servers.net.. And so forth.
Those top level domains should know where to send the query next. In many cases, the next path is to send the query to a custom domain server. For example, Google's custom name servers are: ns1.google.com to ns4.google.com. UK's custom name servers are: sndc1.net.uky.edu and sndc2.net.uky.edu. Finally, those custom name servers will know the IP address that maps to the domain.
We can use the dig
command to query
the non-cached DNS paths.
Let's say we want to follow the DNS path for google.com,
then we can start by querying any root server.
In the output, we want to pay attention to the QUERY field,
the ANSWER field, and the Authority Section.
We keep digging until the ANSWER field returns
a number greater than 0.
The following commands query one of the root servers,
which points us to one of the authoritative servers for
.com sites,
which points us to Google's custom nameserver,
which finally provides an answer,
in fact six answers,
or six IP address that all map to google.com.
dig @e.root-servers.net google.com
dig @a.gtld-servers.net google.com
dig @ns1.google.com google.com
Alternatively, we can query UK's:
dig @j.root-servers.net. uky.edu
dig @b.edu-servers.net. uky.edu
dig @sndc1.net.uky.edu. uky.edu
We can also get this path information using
dig
's trace command:
dig google.com +trace
There are a lot of ways to use the dig command, and you can test and explore them on your own.
DNS Record Types
In the dig
command output above,
you will see various fields.
- SOA: Start of Authority: describes the site's DNS entries
- IN: Internet Record
- NS: Name Server: state which name server provides DNS resolution
- A: Address records: provides mapping hostname to IPv4 address
- AAAA: Address records: provides mapping hostname to IPv6 address
dig google.com
google.com. IN A 142.251.32.78
Other record types include:
- PTR: Pointer Record: provides mapping from IP Address to Hostname
- MX: Mail exchanger: the MX record maps your email server.
- CNAME: Canonical name: used so that a domain name may act as an alias for another domain name. Thus, say someone visits www.example.org, but if no subdomain is set up for www, then the CNAME can point to example.org.
DNS Toolbox
It's important to be able to troubleshoot DNS issues.
To do that, we have a few utilities available.
Here are examples and you should read the man
pages for each one:
host
: resolve hostnames to IP Address; or IP addresses to hostnames
man -f host
host (1) - DNS lookup utility
host uky.edu
host 128.163.35.46
host -t MX uky.edu
host -t MX dropbox.com
host -t MX netflix.com
host -t MX wikipedia.org
dig
: domain information gopher -- get info on DNS servers
man -f dig
dig (1) - DNS lookup utility
dig uky.edu
dig uky.edu MX
dig www.uky.edu CNAME
nslookup
: query internet name servers
man -f nslookup
nslookup (1) - query Internet name servers interactively
nslookup
> uky.edu
> yahoo.com
> exit
whois
: determine ownership of a domain
man -f whois
whois (1) - client for the whois directory services
whois uky.edu | less
resolve.conf
: local resolver info; what's your DNS info
man -f resolv.conf
resolv.conf (5) - resolver configuration file
cat /etc/resolv.conf
resolvectl status
Conclusion
In the same way that phones have phone numbers, servers on the internet have IP addresses. Since we're only human, we don't remember every phone number that we dial or every IP address that we visit. In order to make such things human friendly, we use names instead.
Nameservers and DNS records act as the phone book and phone book entries of the internet. Note that I refer to the internet and not the web here. There is more at the application layer than the HTTP/HTTPS protocols, and so other types of servers, e.g., mail servers, may also have domain names and IP addresses to resolve.
In this section, we covered the basics of DNS that include:
- FQDN: the Fully Qualified Domain Name
- Root domains
- Top level domains (TLDs) and Country Code TLDS (ccTLDs)
- Second level and third level domains/subdomains
- DNS paths, and
- DNS record types
We'll come back to this material when we set up our websites.
Local Security
Introduction
Most security issues come from the network, but
we also need to secure a system from inside attacks, too.
We can do that by setting appropriate file permissions and
by making sure users on a system do not have certain kinds
of access (e.g., sudo
access).
For example, the /usr/bin/gcc
program is the
GNU C and C++ compiler.
That is, it's used to compile
C or C++ source code into executable programs.
If users have unrestricted access to that compiler,
then it's possible for them to compile programs
that compromise the system.
In the next section, we'll cover how to set up a firewall, but in this section, we'll learn how to set up a chroot jail.
chroot
As we all know,
the Linux file system has a root directory /,
and under this directory are other directories like
/home, /bin, and so forth.
A chroot (change root) jail
is a way to create a pseudo root directory
at some specific location in the directory tree, and
then build an environment in that pseudo root directory
that offers some applications.
Once that environment is setup,
we can then confine a user account(s) to that
pseudo directory, and
when they login to the server,
they will only be able to see
(e.g., with the cd
command)
what's in that pseudo root directory and
only be able to use the applications that
we've made available in that chroot.
Thus, a chroot jail is a technology used
to change the
"apparent root / directory for a user or a process" and
confine that user to that location on the system.
A user or process that is confined to the
chroot jail cannot easily see or access
the rest of the file system and
will have limited access to the binaries
(executables/apps/utilities) on the system.
From its man
page:
chroot (8) - run command or interactive shell with special root directory
Although it is not security proof,
it does have some useful security use cases.
Some use chroot
to contain DNS servers, for example.
chroot
is also the conceptual basis for some kinds of
virtualization technologies that are common today,
like Docker.
Creating a chroot
In this tutorial,
we are going to create a chroot
.
-
First, we create a new directory for our jail. That directory will be located at
/mustafar
(but it could be elsewhere). Note that the normal root directory is/
, but for the chroot, the root directory will be/mustafar
even though it will appear as/
in thechroot
.Depending on where we create the jail, we want to check the permissions of the new directory and make sure it's owned by root. If not, use
chown root:root /mustafar
to set it.sudo mkdir /mustafar ls -ld /mustafar
-
We want to make the
bash
shell available in the jail. To do that, we'll create a/bin
directory in/mustafar
, and copybash
to that directory.which bash sudo mkdir /mustafar/bin sudo cp /usr/bin/bash /mustafar/bin/
-
Large software applications have dependencies, aka libraries. We need to copy those libraries to our jail directory so applications, like Bash, can run. To identify libraries needed by
bash
, we use theldd
command:ldd /usr/bin/bash
Output (output may vary depending on your system):
linux-vdso.so.1 (0x00007fff2ab95000) libtinfo.so.6 => /lib/x86_64-linux-gnu/libtinfo.so.6 (0x00007fbec99f6000) libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fbec97ce000) /lib64/ld-linux-x86-64.so.2 (0x00007fbec9ba4000)
We can ignore the first item in the output. But we will need the libraries in the last three lines.
-
Next we create directories for these libraries in
/mustafar
that match or mirror the directories they reside in.To do that, use the
mkdir
command to create a /mustafar/lib/x86_64-linux-gnu/ directory and a /mustafar/lib64 for the libraries. We need to name the library directories after the originals to stay consistent with the main environment.sudo mkdir -p /mustafar/lib/x86_64-linux-gnu sudo mkdir /mustafar/lib64
Then we proceed to copy (not move!) the libraries to their respective directories in the /mustafar directory:
cd /mustafar/lib/x86_64-linux-gnu/ sudo cp /lib/x86_64-linux-gnu/libtinfo.so.6 . sudo cp /lib/x86_64-linux-gnu/libc.so.6 cd /mustafar/lib64/ sudo cp /lib64/ld-linux-x86-64.so.2 .
-
Finally, we can test the
chroot
sudo chroot /mustafar bash-5.1# ls bash: ls: command not found bash-5.1# help bash-5.1# dirs bash-5.1# pwd bash-5.1# cd bin/ bash-5.1# dirs bash-5.1# cd ../lib64/ bash-5.1# dirs bash-5.1# cd .. bash-5.1# for i in {1..4} ; do echo "$i" ; done bash-5.1# exit
We get a Bash prompt, which is great, but we do not have the main utilities that we normally use. If you type in
help
, you will however find that you have some commands available, likepwd
,dirs
,cd
,help
,for
, and more.
Exercise
Use the ldd
command,
to add additional binaries.
Make the following utilities/binaries
available in the /mustafar
chroot directory:
ls
cat
Conclusion
Systems need to be secure from the inside and out. In order to secure from the inside, system users should be given access and permissions as needed.
In this section, we covered how to create a chroot jail.
The jail confines users and processes to this pseudo root location.
It provides them limited access to the overall file system and
to the software on the system.
We can use this jail to confine users and processes,
like apache2 or another human user.
Any user listed in /etc/passwd
can be jailed, and
most users listed in that file are services.
Jailing a human user may not be necessary.
On a multi-user system,
proper education and training about the policies
and uses of the system may be all that's needed.
Alternatively, when creating user accounts,
we could make their default shell rbash
,
or restricted bash.
rbash
limits access to a lot of
Bash's main functions, and
for added security, it
can be used in conjunction with chroot
.
In summary, if a stricter environment is needed,
now you know how to create a basic chroot jail.
Additional Sources:
- How to automatically chroot jail selected ssh user logins.
- BasicChroot
- How to Use chroot for Testing on Ubuntu
- How To Setup Linux Chroot Jails
Firewalls and Backups
Google Cloud Firewall and Ubuntu's UFW
A firewall program allows or denies connections for incoming (ingress) or outgoing (egress) traffic. Traffic can be controlled by link layer (e.g., a network interface such as an ethernet or wireless card), by IP layer, e.g., IPv4 or IPv6 address or address ranges; by transport layer, e.g., TCP, UDP, etc.; or by application layer via port numbers, e.g., HTTP (port 80), HTTPS (port 443), SSH (port 22), SMTPs (port 465), etc. Firewalls have other abilities. For example, they can also place limits on the number of attempts to connect.
As a side note, physical, bare metal servers may have multiple ethernet network interface cards (NICs). Each NIC would, of course, have its own MAC address, and therefore would be assigned different IP addresses. Thus, at the link layer, incoming connections can be completely blocked on one card and then outgoing connections can be completely blocked on the other. This is a made up scenario. In practice, whatever firewall rules in place would be such that would make sense for the person or organization creating them.
To control these types of connections, firewalls apply rules. A rule may block all incoming connections, but then allow SSH traffic through port 22, either via TCP or UDP, and then further restrict SSH connections to a specific IP range. And/or, another rule may block all incoming, unencrypted HTTP connections through port 80, but allow all incoming, encrypted HTTPS connections through port 443.
Let's briefly cover two ways to define firewall rules. When we set up our LAMP servers in the last part of this course, we'll need to implement some rules to allow outside connections to our server.
LAMP originally referred to Linux, Apache, MySQL, and PHP; these four technologies create a web server. Technically, only Linux (or some other OS) and Apache (or some other web server software) are needed to serve a website. PHP and MySQL provide additional functionality, like the ability for a website to interact with a relational database. The M in LAMP may also refer to MariaDB, which is a fully open source clone of MySQL. We'll use MariaDB later in this course.
First, our Google Cloud instance is pre-populated with default firewall rules at the network level, and the documentation provides an overview of these rules. Second, Ubuntu uses a firewall called ufw, which can be used to control additional connections at the operating system level. (Please read the documentation at those three links.)
It's important to know that these two firewalls provide
protection at different traffic stops, so to speak.
By that I mean,
a Google Cloud firewall rule may allow SSH (port 22)
traffic to a server instance, but
if Ubuntu's ufw
firewall blocks port 22
connections at the server level,
then SSH traffic won't pass through.
In other words, incoming connections must pass
through the network firewall first, and then pass
through the server firewall second.
Outgoing connections must pass through
the server firewall first, and
then the network firewall second.
It's also important to know that Ubuntu's ufw
firewall
is disabled by default.
In fact, it may be overkill to use both
Google Cloud's firewall and Ubuntu's ufw
, or
it may not.
It simply depends on our needs and our circumstances.
We'll return to firewalls and put some rules into practice when we work on our LAMP setup.
Backups
Catastrophes (natural, physical, criminal, or out of negligence) happen, and as a systems administrator, you may be required to have backup strategies to mitigate data loss.
How you backup depends on the machine. If I am managing physical hardware, for instance, and I want to backup a physical disk to another physical disk, then that requires a specific tool. However, if I am managing virtual machines, like our Google Cloud instance, then that requires a different tool. Therefore, in this section, I will briefly cover both scenarios.
rsync
If we were managing bare metal machines, then
we might use a program like rsync
to backup
physical disk drives.
rsync
is a powerful program.
It can copy disks, directories, and files.
It can copy files from one location,
and send the copies, encrypted, to a remote server.
For example, let's say I mount an external hard drive to my filesystem at /mnt/backup. To copy my home directory, I'd use:
rsync -av /home/me/ /mnt/backup/
where /home/me/ is the source directory, and /mnt/backup/ is the destination directory.
Syntax matters here.
If I include the trailing slash on
the source directory,
then rsync
will copy everything in /home/me/
to /mnt/backup/.
However, if I leave the trailing slash off,
like so:
rsync -av /home/me /mnt/backup/
then the result will be that the directory me/ will be copied to /mnt/backup/me/.
Let's see this in action. Say I have two directories. In the tmp1/ directory, there are two files: file1 and file2. The tmp2/ directory is empty. To copy file1 and file2 to tmp2, then:
ls tmp1/
file1 file2
rsync -av tmp1/ tmp2/
ls tmp2
file1 file2
However, if I leave that trailing slash off the source directory, then the tmp1/ will get copied to tmp2/:
ls tmp1
file1 file2
rsync -av tmp1 tmp2/
ls tmp2/
tmp1/
ls tmp2/tmp1/
file1 file2
rsync
can also send a source directory
to a directory on a remote server, and
the directory and files being copied will
be encrypted on the way.
To do this, we use ssh
style syntax:
rsync -av tmp1/ USER@REMOTE:~/tmp2/
For example:
rsync -av tmp1 linus@222.22.33.333:~/tmp2/
In fact, not only do I use rsync
to backup
my desktop computer to external hard drives,
I also use a command like the above to copy
local web projects to remote servers.
Delete Option
rsync
has a
--delete
option.
Adding this option means that rsync
will
synchronize the source directory with the destination directory.
This means that if I had already created a backup
of tmp1 to tmp2, and
then delete file1 in tmp1 later,
then run rsync
with the delete option,
then rsync
will also delete file1 from tmp2/.
This is how that looks:
ls tmp1/
file1 file2
rsync -av tmp1/ tmp2/
ls tmp2/
file1 file2
rm tmp1/file1
ls tmp1/
file2
rsync -av --delete tmp1/ tmp2/
ls tmp2
file2
Backups are no good if we don't know how to restore
a backup to a disk.
To restore with rsync
, we just reverse the
destination directory with the source directory:
rsync -av tmp2/ tmp1/
Google Cloud
Since our instance on Google Cloud is a virtual machine, we can use the Google Cloud console to create snapshots of our instance. A snapshot is a copy of a virtual machine at the time the snapshot was taken. What's great about taking a snapshot is that the result is basically a file of a complete operating system. Since it's a file, it can itself be used in other projects or used to restore a machine to the time the snapshot was taken.
Snapshots may also be used to document or reproduce other's work. For example, if I worked with programmers, as a systems administrator, I might help a programmer share snapshots of a virtual machine with other programmers. Those other programmers could then restore the snapshot in their own instances, and see and run the original work in the environment it was created in.
Taking snapshots in Google Cloud is very straightforward, but since it does take up extra storage, it will accrue extra costs. Since we want avoid that for now, please see the following documentation for how to take a snapshot in Google Cloud:
Create and manage disk snapshots
Conclusion
In this section,
we covered firewalls and backups.
Since we're running an Ubuntu server on Google Cloud,
we have Google Cloud options
for creating firewall rules at the network level and
for backing up disks as snapshots, and
we have Ubuntu options for creating firewall rules at the OS level and
for backing up disks using commands like rsync
.
How we go about either depends entirely on our needs or on our organization's needs. But knowing these options exist and the different reasons why we have these options, provides quite a bit of utility.
Creating a LAMP Server
In this section, we learn how to set up a LAMP (Linux, Apache, MariaDB, PHP) stack. This stack enables us to create a web server that provides extra funtionality via PHP and MariaDB. Even if we do not become web server administrators, knowing how to set up a LAMP stack is not only fun, but it's also a valuable skill to have.
Installing the Apache Web Server
Introduction
Apache is an HTTP server, otherwise called web server software. Other HTTP server software exists. Another big one is nginx. An HTTP server essentially makes files on a computer available to others who are able to establish a connection to the computer and view the files with a web browser.
It's important to understand the basics of an HTTP server, and therefore I ask you to read Apache's Getting Started page before proceeding with the rest of this section. Each of the main sections on that page describe the important elements that make up and serve a website, including
- clients, servers, and URLs
- hostnames and DNS
- configuration files and directives
- web site content
- log files and troubleshooting
Installation
Before we install Apache, we need to update our systems first.
sudo apt update
sudo apt -y upgrade
Once the machine is updated,
we can install Apache2 using apt
.
First we'll use apt search
to identify
the specific package name.
I already know that a lot of results
will be returned,
so let's pipe the apt search
command
through head
to look at the initial results:
sudo apt search apache2 | head
The package that we're interested in
happens to be named apache2 on Ubuntu.
This is not a given.
On other distributions,
like Fedora,
the Apache package is called httpd.
To learn more about the apache2 package,
let's examine it with the apt show
command:
apt show apache2
Once we've confirmed that apache2 is the package
that we want,
we install it with the apt install
command.
Press Y to agree to continue after running
the command below:
sudo apt install apache2
Basic checks
One of the things that makes Apache2, and some other web servers, powerful is the library of modules that extend Apache's functionality. We'll come back to modules soon. For now, we're going to make sure the server is up and running, configure some basic things, and then create a basic web site.
To start,
let's use systemctl
to acquire some info about apache2 and
make sure it is enabled and running:
systemctl list-unit-files apache2.service
systemctl status apache2
The output shows that apache2 is enabled, which means that it will start running automatically if the computer gets rebooted.
The output of the second command also shows that apache2 is enabled and that it is also active (running).
Creating a web page
Since apache2 is up and running, let's look at the default web page.
There are two ways we can look at the default web page.
We can use a command line web browser.
There are a number available, but
I like w3m
.
We can also use our regular web browsers and view the site by entering the IP address of the server in our browser URL bar.
To check with w3m
,
we have to install it first:
sudo apt install w3m
Once it's installed, we can visit our default site using the loopback IP address (aka, localhost). From the command line on our server, we can run either of these two commands:
w3m 127.0.0.1
w3m localhost
We can also get the subnet/private IP address
using the ip a
command, and
then use that with w3m
.
For example, if ip a
showed that my NIC
has an IP address of 10.0.1.1, then
I could use w3m
with that IP address:
w3m 10.0.1.1
If the apache2 installed and started correctly, then you should see the following text at the top of the screen:
Apache2 Ubuntu Default Page
It works!
To exit w3m
,
press q and then y to confirm exit.
To view the default web page using a regular web browser, like Firefox, Chrome, Safari, Edge, or etc., you need to get our server's public IP address. To do that, log into the Google Cloud Console, in the left hand navigation panel, hover your cursor over the Compute Engine link, and then click on VM instances. You should see your External IP address in the table on that page. You can copy that external IP address or simply click on it to open it in a new browser tab. Then you should see the graphical version of the Apache2 Ubuntu Default Page.
Please take a moment to read through the text on the default page. It provides important information about where Ubuntu stores configuration files and what those files do, and document roots, which is where website files go.
Create a Web Page
Let's create our first web page. The default page described above provides the location of the document root at /var/www/html. When we navigate to that location, we'll see that there is already an index.html file located in that directory. This is the Apache2 Ubuntu Default Page that we described above. Let's rename that index.html file, and create a new one:
cd /var/www/html/
sudo mv index.html index.html.original
sudo nano index.html
If you know HTML, then
feel free to write some basic HTML code to get started.
Otherwise, you can re-type the content below
in nano
, and
then save and exit out.
<html>
<head>
<title>My first web page using Apache2</title>
</head>
<body>
<h1>Welcome</h1>
<p>Welcome to my web site. I created this site using the Apache2 HTTP server.</p>
</body>
</html>
If you have our site open in your web browser, reload the page, and you should see the new text.
You can still view the original default page by specifying its name in the URL. For example, if your external IP address is 55.222.55.222, then you'd specify it like so:
http://55.222.55.222/index.html.original
User Directories
You may have visited sites in the past that have a tilde in the URL and look like this:
http://example.com/~user/
These are called user directories, and the provide additional path to the document root that's located in users' home directories in a directory called public_html. This is the default document root for user directories, but the default can be changed to different locations. Please read the documentation on what's called the Apache Module mod_userdir before proceeding.
By default, users with accounts on the server need to have a public_html directory in their home directories, and Apache2 needs to be configured to serve sites from those directories. For example, for the user linus, they should have the following file path available:
/home/linus/public_html/
Enable mod_userdir
The configuration file for mod_userdir is
located in /etc/apache2/mods-available/
and is named userdir.conf.
Files in this directory are modules that
are available to Apache2 but that are not
enabled (i.e., they're turned off) by default.
We can view that the userdir.conf file
with the less
command:
less /etc/apache2/mods-available/userdir.conf
The default configuration does not need to be modified.
Therefore, all we need to do is enable this module.
To do that, we use the a2enmod
Apache2 command
(see man a2enmod
for details.)
sudo a2enmod userdir
After enabling, we need to reload the HTTP service, and we can also check its status:
sudo systemctl restart apache2
systemctl status apache2
Create a User Directory Website
Let's say I am logged in as the user linus on the system and
will use that to test if the user directory is working.
First, let's go home.
For me, as the user linus,
that would /home/linus/, and
I just have to type in the cd
command and press Enter:
cd
Now I need to create a public_html directory in my home directory (make sure you're in your home directory!), and change into that directory:
mkdir public_html
cd public_html
By default, Apache2 looks for a file named index.html in the document root. I'll create that and add some basic HTML to it:
nano index.html
And in that file:
<html>
<head>
<title>My home site</title>
</head>
<body>
<p>This is my home site.</p>
</body>
</html>
Now simply add /~linus/ to your external IP address in your browser's URL bar. Like so (of course, replace the external IP address with your external IP address and the username with the username that you're using):
http://55.222.55.222/~linus/
Note that this process is pretty easy but that it will be different on other distributions. For example, the Fedora distribution has different Apache2 defaults. Also, on some distributions, we might need to change the directory permissions before this will work. By default, Ubuntu sets directory permissions to on our home directories to:
drwxr-xr-x
That means that any user can view the contents
of our home directories.
And Ubuntu sets directories created with mkdir
in the home directory with these permissions by default:
drwxrwxr-x
These default settings make those directories
world readable, but
other distributions do not default to those permissions.
If the last r-x
was set to ---
, then
we would need to use the chmod
command
to make these directories executable and readable
before files in our public_html directory
could be accessed in a browser.
Conclusion
In this section,
we learned about the Apache2 HTTP server.
We learned how to install it on Ubuntu,
how to use systemd (systemctl
) commands
to check its default status,
how to create a basic web page in /var/www/html,
how to view that web page using the w3m
command line browser and with our regular graphical browser,
how to enable the user directory module, and
repeat the steps above to create a website
in our home directories.
In the next section, we will learn how to make our sites applications by installing PHP and enabling the relevant PHP modules.
Installing and Configuring PHP
Introduction
Client-side programming languages, like JavaScript, are handled by the browser. Major browsers like Firefox, Chrome, Safari, Edge, etc. all include JavaScript engines that use just-in-time compilers to execute the JavaScript code (Mozilla has a nice description of the process.) From an end user's perspective, you basically install JavaScript when you install a web browser.
PHP, on the other hand, is a server-side programming language, which means it must be installed on the server in order to be used. From a system or web administrator's perspective, this means that not only does PHP have be installed on a server, but it must also be configured to work with the HTTP server, which in our case is Apache2.
The main use of PHP is to interact with databases, like MySQL, MariaDB, PostgreSQL, etc., in order to create dynamic page content. This is our goal in the last part of this class. To accomplish this, we have to:
- Install PHP and relevant Apache2 modules
- Configure PHP and relevant modules to work with Apache2
- Configure PHP and relevant modules to work with MariaDB
Install PHP
As normal, we will use apt install
to install PHP and relevant modules and
then restart Apache2 using the systemctl
command:
sudo apt install php libapache2-mod-php
sudo systemctl restart apache2
We can check its status and see if there are any errors:
systemctl status apache2
Check Install
To check that it's been installed and that it's working with Apache2, we can create a small PHP file in our web document root. To do that, we change to the /var/www/html/ directory, and create a file called info.php:
cd /var/www/html/
sudo nano info.php
In that file, add the following text, then save and close the file:
<?php
phpinfo();
?>
No visit that file using the external IP address for your server. For example, in Firefox, Chrome, etc, go to (be sure to replace the IP below with your IP address):
http://55.333.55.333/info.php
You should see a page that provides system information about PHP, Apache2, and the server. The top of the page should look like Figure 1 below:

Basic Configurations
By default, when Apache2 serves a web page,
it looks for and servers a
file titled index.html,
even if it does not display that file in the URL bar.
Thus, http://example.com/
actually
resolves to http://example.com/index.html
in such cases.
However, if our plan is to provide for PHP, we want Apache2 to default to a file titled index.php instead and to the index.html file as backup. To configure that, we need to edit the dir.conf file in the /etc/apache2/mods-enabled/ directory. In that file there is a line that starts with DirectoryIndex. The first file in that line is index.html, and then there are a series of other files that Apache2 will look for in the order listed. If any of those files exist in the document root, then Apache2 will serve those before proceeding to the next. We simply want to put index.php first and let index.html be second on that line.
cd /etc/apache2/mods-enabled/
sudo nano dir.conf
And change the line to this:
DirectoryIndex index.php index.html index.cgi index.pl index.xhtml index.htm
Whenever we make a configuration change,
we can use the apachectl
command to
check our configuration:
apachectl configtest
If we get an Syntax Ok message, you can reload the Apache2 configuration and restart the service:
sudo systemctl reload apache2
sudo systemctl restart apache2
Now create a basic PHP page.
cd
back to the document root directory and
use nano
to create and open and index.php
file:
cd /var/www/html/
sudo nano index.php
Creating an index.php File
Let's now create an index.php page, and
add some HTML and PHP to it.
The PHP can be a simple browser detector.
Change to the /var/www/html/ directory,
and use sudo nano
to create and edit index.php.
Then add the following code:
<html>
<head>
<title>Broswer Detector</title>
</head>
<body>
<p>You are using the following browser to view this site:</p>
<?php
echo $_SERVER['HTTP_USER_AGENT'] . "\n\n";
$browser = get_browser(null, true);
print_r($browser);
?>
</body>
</html>
Next, save the file and exit nano
.
In your browser,
visit your external IP address site:
http://55.333.55.333/
Although your index.html file still exists in your document root, Apache2 now returns the index.php file instead. However, if for some reason the index.php was deleted, then Apache2 would revert to the index.html file since that's what's next in the dir.conf DirectoryIndex line.
Conclusion
In this section, we installed PHP and configured it work with Apache2. We also created a simple PHP test page that reported our browser user agent information on our website.
In the next section, we'll learn how to complete the LAMP stack by adding the MariaDB relational database to our setup.
Installing and Configuring MariaDB
Introduction
We started our LAMP stack when we installed Apache2 on Linux, and then we added extra functionality when we installed and configured PHP to work with Apache2. In this section, our objective is to complete the LAMP stack and install and configure MariaDB, a (so-far) compatible fork of the MySQL relational database.
If you need a refresher on relational databases, the MariaDB website can help. See: Introduction to Relational Databases.
It's also good to review the documentation for any technology that you use. MariaDB has good documentation and getting started pages.
Install and Set Up MariaDB
In this section, we'll learn how to install, setup, secure, and configure the MariaDB relational database so that it works with the Apache2 web server and the PHP programming language.
First, let's install MariaDB Community Server, and then log into the MariaDB shell under the MariaDB root account.
sudo apt install mariadb-server mariadb-client
This should also start and enable the database server, but
we can check if it's running and enabled
using the systemctl
command:
systemctl status mariadb
Next we need to run a post installation script
called mysql_secure_installation
(that's not a typo)
that sets up the MariaDB root password and
performs some security checks.
To do that, run the following command, and
be sure to save the MariaDB root password you create:
sudo mysql_secure_installation
Again, here is where you create a root password for the MariaDB database server. Be sure to save that and not forget it! When you run the above script, you'll get a series of prompts to respond to like below. Press enter for the first prompt, press Y for the prompts marked Y, and input your own password. Since this server is exposed to the internet, be sure to use a complex password.
Enter the current password for root (enter for none):
Set root password: Y
New Password: XXXXXXXXX
Re-enter new password: XXXXXXXXX
Remove anonymous users: Y
Disallow root login remotely: Y
Remove test database and access to it: Y
Reload privilege tables now: Y
We can login to the database to test it. In order to do so, we have to become the root Linux user, which we can do with the following command:
sudo su
Note: we need to generally be careful when we enter commands on the command line, because it's a largely unforgiving computing environment. But we need to be especially careful when we are logged in as the Linux root user. This user can delete anything, including files that the system needs in order to boot and operate.
After we are root,
we can login to MariaDB,
run the show databases;
command, and
then exit with the \q
command:
root@hostname:~# mariadb -u root
Welcome to the MariaDB monitor. Commands end with ; or \g.
Your MariaDB connection id is 47
Server version: 10.3.34-MariaDB-0ubuntu0.20.04.1 Ubuntu 20.04
Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
MariaDB [(none)]> show databases;
+--------------------+
| Database |
+--------------------+
| information_schema |
| mysql |
| performance_schema |
+--------------------+
3 rows in set (0.002 sec)
Note: If we are logging into the root database account as the root Linux user, we don't need to enter our password.
Create and Set Up a Regular User Account
We need to reserve the root MariaDB user for special use cases and instead create a regular MariaDB user, or more than one MariaDB user, as needed.
To create a regular MariaDB user,
we use the create
command.
In the command below,
I'll create a new user called webapp
with a complex password within the single quotes
at the end (marked with a series of Xs here for demo purposes):
MariaDB [(none)]> create user 'webapp'@'localhost' identified by 'XXXXXXXXX';
If the prompt returns a Query OK message, then the new user should have been created without any issues.
Create a Practice Database
As the root database user, let's create a new database for a regular, new user.
The regular user will be granted all privileges on the new database, including all its tables. Other than granting all privileges, we could limit the user to specific privileges, including: CREATE, DROP, DELETE, INSERT, SELECT, UPDATE, and GRANT OPTION. Such privileges may be called operations or functions, and they allow MariaDB users to use and modify the databases, where appropriate. For example, we may want to limit the webapp user to only be able to use SELECT commands. It totally depends on the purpose of the database and our security risks.
MariaDB [(none)]> create database linuxdb;
MariaDB [(none)]> grant all privileges on linuxdb.* to 'webapp'@'localhost';
MariaDB [(none)]> show databases;
Exit out of the MariaDB database as the root MariaDB user, and then exit out of the root Linux user account, and you should be back to your normal Linux user account:
MariaDB [(none)]> \q
root@hostname:~# exit
Note: relational database keywords are often written in all capital letters. As far as I know, this is simply a convention to make the code more readable. However, in most cases I'll write the keywords in lower case letters. This is simply because, by convention, I'm super lazy.
Logging in as Regular User and Creating Tables
We can start doing MariaDB work.
As a reminder,
we've created a new MariaDB user named webapp and
a new database for webapp that is called linuxdb.
When we run the show databases
command as
the webapp user,
we should see the linuxdb database
(and only the linuxdb database).
Note below that I use the -p
option.
This instructs MariaDB to request the password
for the webapp user, which
is required to log in.
mariadb -u webapp -p
MariaDB [(none)]> show databases;
MariaDB [(none)]> use linuxdb;
A database is not worth much without data. In the following code, I create and define a new table for our linuxdb database. The table will be called distributions, and it will contain data about various Linux distributions (name of distribution, distribution developer, and founding date).
MariaDB [(linuxdb)]> create table distributions
-> (
-> id int unsigned not null auto_increment,
-> name varchar(150) not null,
-> developer varchar(150) not null,
-> founded date not null,
-> primary key (id)
-> );
Query OK, 0 rows affected (0.07 sec)
MariaDB [(linuxdb)]> show tables;
MariaDB [(linuxdb)]> describe distributions;
Congratulations! Now create some records for that table.
Adding records into the table
We can populate our linuxdb database
with some data.
We'll use the insert
command to add our records
into our distribution table:
MariaDB [(linuxdb)]> insert into distributions (name, developer, founded) values
-> ('Debian', 'The Debian Project', '1993-09-15'),
-> ('Ubuntu', 'Canonical Ltd.', '2004-10-20'),
-> ('Fedora', 'Fedora Project', '2003-11-06');
Query OK, 3 rows affected (0.004 sec)
Records: 3 Duplicates: 0 Warnings: 0
MariaDB [(linuxdb)]> select * from distributions;
Success! Now let's test our table.
Testing Commands
We will complete the following tasks to refresh our MySQL/MariaDB knowledge:
- retrieve some records or parts of records,
- delete a record,
- alter the table structure so that it will hold more data, and
- add a record:
MariaDB [(linuxdb)]> select name from distributions;
MariaDB [(linuxdb)]> select founded from distributions;
MariaDB [(linuxdb)]> select name, developer from distributions;
MariaDB [(linuxdb)]> select name from distributions where name='Debian';
MariaDB [(linuxdb)]> select developer from distributions where name='Ubuntu';
MariaDB [(linuxdb)]> select * from distributions;
MariaDB [(linuxdb)]> alter table distributions
-> add packagemanager char(3) after name;
MariaDB [(linuxdb)]> describe distributions;
MariaDB [(linuxdb)]> update distributions set packagemanager='APT' where id='1';
MariaDB [(linuxdb)]> update distributions set packagemanager='APT' where id='2';
MariaDB [(linuxdb)]> update distributions set packagemanager='DNF' where id='3';
MariaDB [(linuxdb)]> select * from distributions;
MariaDB [(linuxdb)]> delete from distributions where name='Debian';
MariaDB [(linuxdb)]> insert into distributions
-> (name, packagemanager, developer, founded) values
-> ('Debian', 'The Debian Project', '1993-09-15'),
-> ('CentOS', 'YUM', 'The CentOS Project', '2004-05-14');
MariaDB [(linuxdb)]> select * from distributions;
MariaDB [(linuxdb)]> select name, packagemanager
-> from distributions
-> where founded < '2004-01-01';
MariaDB [(linuxdb)]> select name from distributions order by founded;
MariaDB [(linuxdb)]> \q
Install PHP and MySQL Support
The next goal is to complete the connection between PHP and MariaDB so that we can use both for our websites.
First install PHP support for MariaDB. We're installing some modules alongside the basic support. These may or may not be needed, but I'm installing them to demonstrate some basics.
sudo apt install php-mysql
And then restart Apache2 and MariaDB:
sudo systemctl restart apache2
sudo systemctl restart mariadb
Create PHP Scripts
In order for PHP to connect to MariaDB, it needs to authenticate itself. To do that, we will create a login.php file in /var/www/html. We also need to change the group ownership of the file and its permissions so that the file can be read by the Apache2 web server but not by the world, since this file will store password information.
cd /var/www/html/
sudo touch login.php
sudo chmod 640 login.php
sudo chown :www-data login.php
ls -l login.php
sudo nano login.php
In the file, add the following credentials. If you used a different database name than linuxdb and a different username than webapp, then you need to substitute your names below. You need to use your own password where I have the Xs:
<?php // login.php
$db_hostname = "localhost";
$db_database = "linuxdb";
$db_username = "webapp";
$db_password = "XXXXXXXXX";
?>
Next we create a new PHP file for our website. This file will display HTML but will primarily be PHP interacting with our MariaDB distributions database.
Create a file titled distros.php.
sudo nano distros.php
Then copy over the following text
(I suggest you transcribe it, especially
if you're interested in learning a bit of PHP, but
you can simply copy and paste it into the nano
buffer):
<html>
<head>
<title>MySQL Server Example</title>
</head>
<body>
<?php
// Load MySQL credentials
require_once 'login.php';
// Establish connection
$conn = mysqli_connect($db_hostname, $db_username, $db_password) or
die("Unable to connect");
// Open database
mysqli_select_db($conn, $db_database) or
die("Could not open database '$db_database'");
// QUERY 1
$query1 = "show tables from $db_database";
$result1 = mysqli_query($conn, $query1);
$tblcnt = 0;
while($tbl = mysqli_fetch_array($result1)) {
$tblcnt++;
}
if (!$tblcnt) {
echo "<p>There are no tables</p>\n";
}
else {
echo "<p>There are $tblcnt tables</p>\n";
}
// Free result1 set
mysqli_free_result($result1);
// QUERY 2
$query2 = "select name, developer from distributions";
$result2 = mysqli_query($conn, $query2);
$row = mysqli_fetch_array($result2, MYSQLI_NUM);
printf ("%s (%s)\n", $row[0], $row[1]);
echo "<br/>";
$row = mysqli_fetch_array($result2, MYSQLI_ASSOC);
printf ("%s (%s)\n", $row["name"], $row["developer"]);
// Free result2 set
mysqli_free_result($result2);
// Query 3
$query3 = "select * from distributions";
$result3 = mysqli_query($conn, $query3);
while($row = $result3->fetch_assoc()) {
echo "<p>Owner " . $row["developer"] . " manages distribution " . $row["name"] . ".</p>";
}
mysqli_free_result($result3);
$result4 = mysqli_query($conn, $query3);
while($row = $result4->fetch_assoc()) {
echo "<p>Distribution " . $row["name"] . " was released on " . $row["founded"] . ".</p>";
}
// Free result4 set
mysqli_free_result($result4);
/* Close connection */
mysqli_close($conn);
?>
</body>
</html>
Save the file and exit out of nano
.
Test Syntax
After you save the file and exit the text editor, we need to test the PHP syntax. If there are any errors in our PHP, these commands will show the line numbers that are causing errors or leading up to errors. Nothing will output if all is well with the first command. If all is well with the second command, HTML should be outputted:
sudo php -f login.php
sudo php -f index.php
Conclusion
Congratulations! If you've reached this far, you have successfully created a LAMP stack. In the process, you have learned how to install and set up MariaDB, how to create MariaDB root and regular user accounts, how to create a test database with play data for practicing, and how to connect this with PHP for display on a webpage.
In regular applications of these technologies, there's a lot more involved, but completing the above process is a great start to learning more.
Conclusion
I consider this book to be a live document. Perhaps, then, this is version 0.8, or something like that. In any case, it will be continually updated throughout the year but probably more often before and during the fall semesters when I teach my Linux Systems Administration course.
This book in no way is meant to provide a comprehensive overview of systems administration nor of Linux. It's meant to act as a starting point for those interested in systems administration, and it's meant to get students, many of whom grew up using only graphical user interfaces, familiar with command line environments. In that respect, this book, and the course that I teach, is aimed at empowering students to know their technology and become comfortable and more experienced with it, especially the behind the scenes stuff. That said, I'm proud that some of my students have gone on to become systems administrators. Other courses in our program and their own work and internships have probabably contributed more to that motivation, but I know that this course has been a factor.
If you're not a student in our program but have stumbled upon this book, I hope it's helpful to you, too. This is, in fact, why I've made it available on my website and not simply dropped it in my course shell.
C. Sean Burns, PhD
August 13, 2022