Introduction to Semantic Web Development

Author: C. Sean Burns
Date: 2024-07-26
Email: sean.burns@uky.edu
Website: cseanburns.net
GitHub: @cseanburns

Introduction

This book will serve as an introduction to semantic web development. A more advanced semantic web development course would introduce students to RDFS, SKOS, OWL, SPARQL or more. But this book expects undergraduate students who are new to any kind of web development, and will therefore focus on more entry level topics: HTML5 and its semantic elements, CSS3, and JSON-LD with schema.org (and perhaps other vocabularies).

This book is a work in progress. The goal is to complete an initial draft by January 2025 and then polish it as I teach my spring course on semantic web development.

The Semantic Web

The Semantic Web is an extension of the current web. Whereas the current web was designed for people to read, the semantic web is designed to help machines process, intepret, and reuse data. That is, the goal of the semantic web is to allow machines (AI, etc) to understand the words on a webpage with respect to however machines understand anything. The outcome of this goal is a web of data that can be more easily processed by machines, including AI, and allow for enhanced data sharing, integration, and resuse across applications.

The vision of the semantic web originates with Tim Berners-Lee, the inventor of the web. He described the comparison between the web that we know and the semantic web as early as 2000:

"If you think of the web today as turning all the documents in the world into one big book, then think of the Semantic Web as turning all the data into one big database ..." (Berners-Lee et al., 2000).

There are two broad ways to study the semantic web. The more advanced way is to learn how to use semantic web technologies such as:

However, the above technologies are more focused on underlying data models and for querying those models rather than website development. To focus on web development, in the sense that we want to create actual websites, then the semantic web technologies we are interested in are a bit different. Specifically, in this book, we will learn about:

  • HTML5, which includes semantic elements.
  • CSS3, which plays an important role in separating content from presentation.
  • JSON-LD (JavaScript Object Notation for Linked Data), which is actually JavaScript agnostic and is used create linked data for machines to parse and generate.
  • Schema.org, which provides a collection of shared vocabularies (or terms and definitions) to annotate web content.

The Use of The Semantic Web

There are several important reasons to make data semantic on the web.

  1. First, semantic data enhanches searchability. Recipes are a common example. When you search for a recipe, semantic data on recipe web pages search engines understand the ingredients, cooking times, and other factors.
  2. Second, the semantic web enables data integration. This is through the use of multiple vocabularies, such as Schema.org, but there are other vocabularies that can be used at the same time. This is a common use case in more advanced cases of the semantic web. For example, scientific databases can be enhanced with semantic data that integrates data from multiple scientific domains.
  3. Third, the semantic web helps machines process data better and thus begin to automate complex tasks. Personal assistants like Siri and Google Assistant rely on semantic data.
  4. Fourth, semantic data fosters interoperability between systems and applications. For example, a weather application can integrate data from multiple weather stations in order to provide a comprehensive forecast.

Our Goals

In this book, we will gain hands-on experience with HTML5, CSS3, and JSON-LD. These technologies will help you apply semantic data to websites and make your websites not only more searchable but also more accessible. Our tasks will include:

  1. We will learn how to structure web pages using HTML5 semantic elements. This will make our web content more meaningful and accessible to users and machines.
  2. We will laern how to style and present our web pages using CSS3. Cascading Style Sheets (CSS) help us create visually appealing webpages and sites but, importantly, keep content separated from presentation.
  3. We will embed structured, linked data into our webpages using JSON-LD and Schema.org. This will further enhance our content for machine-readability.
  4. Finally, throughout this book, we will work on creating a final website using all of these technologies.

Conclusion

While you may not have heard of The Semantic Web, it is an important part of the evolution of the web and internet. When we make our data (content) machine-readable, we add context and meaning to that data for machines to use. This opens up possibilities for data integration, searchability, and automation.

Tools, Setup, and Workflow

Install Software for Web Development

Web developers, and other programmers, rely on a suite of applications to conduct their work. Thus, in order to learn how to become web developers, we will need learn how to use some of these applications.

To learn to use these applications, we need to install them. Hence, our first task this week is to install the software we need to start web development. For this course, we will use the following tools:

  • Text editor
  • Vector graphics editor
  • Version control software

The tricky part about this week is that I cannot show you how to install this software since the installation process is dependent on the operating system you use. But whether you use Windows, macOS, or Linux (as I do), then you should already know the basics of installing. And fortunately, once these tools are up and running on your systems, then we'll all be in sync and what you'll see on my screen going forward will be what you see on your screen.

Text Editor

Let's start with the most important application: a text editor. Text editors are the bread and butter for programmers, and there's a long history, and even funny cultural wars, about text editors. Personally, I use the Vim editor, which is a command line editor. Vim can be difficult to learn, and we don't have the time to spend on learning how to use an advanced text editor like Vim (even though it's worth it). Therefore, for this course we can use a GUI (graphical user interface) text editor.

We use a text editor, and not a word processor like Microsoft Word or Google Docs, because when we write any kind of code, the code needs to be saved as plain text and not as encoded text. While a word processor application will encode text when a file is saved, text editors allow us to control how text is encoded directly in the file, e.g, by using HTML and CSS. Thus text editors are more powerful in that way.

Also, text editors these days offer a number of functions that are designed to help us write better programs. And fortunately, many are free and open source software.

For this course, you will use the free VS Code text editor. We'll learn more about how to use VS Code as we progress through the course, but first you need to install it. You are welcome to use another text editor, especially if you are used to working with one and know how to use it. But if you are new to this tool, then I'd encourage you use what I'll use for this course. Plus, all my text editor examples and demos will involve VS Code.

Download and install VS Code from:

Vector Graphics Editor

When we begin a web development project, it is a pretty bad idea to just sit down and start coding a website without first thinking about its architecture, how it looks, its design, what it contains, who the audience is or are, and so forth. The good idea is to start with a plan.

The same is true for any professional who designs anything in this world, like an architect. Imagine having the money to build your own home and then hiring a builder who goes out and just starts assembling a bunch of lumber and pipes and wires with only a vague idea of what they want to accomplish. That would be foolish as well as a waste of money and time. The same is true for any profession that builds or develops anything, and if you want to build a website, then you should start with a plan.

For this course, for now I mainly want you to think about how the website you will build will look on a desktop/laptop browser and also on mobile. We could easily hand draw this, but in some settings, you will want to share your plans with others, and thus it makes more sense to use a drawing tool in order to share native, digital files with your colleagues or customers.

Enter Inkscape, a vector graphics editor. Vector graphics editors are often used to design things like logos because unlike raster-based editors, like Photoshop or Gimp, vector graphics scale to any size and still maintain quality. Adobe Illustrator is a commonly used vector graphics editor, but Inkscape is as advanced as Illustrator and is also free and open source software.

If you have and are comfortable using a vector graphics editor, like Illustrator, you are free to use that, but for this course, I will use and demonstrate Inkscape.

Download and install Inkscape:

Version Control

Finally, one of other most important tools for actual web development (and for any kind of programming) is version control. Version control is about project management, such as keeping track of your work and the history of our work, and collaboration, such as working with other developers. A number of version control software systems are available, but one of the most popular ones is Git, and we'll use that in this course.

Version control systems often require a version control repository, which is used to send, store, and share code and other work. We could set up our own Git repository, but for this course, we will use GitHub.

I'll show you the basics of using Git and GitHub in this course, but for now, we need to download Git and also create an account on GitHub, if you don't already have one.

First, create an account on GitHub. Use a personal email address and not your UK one when setting up the account:

Second, download and install Git per the instructions for your operating system:

Install Git

WINDOWS USERS

For Windows users, I advise you follow the instructions in this video and on this page:

Git on Windows

NOTE: Two modifications to the instructions at the above link.

First, in addition to running the following command in the video:

git config –global user.name <github_username>

Also run the following command. Use the email address that you used to sign up on GitHub with:

git config –global user.email <email_address>

Second, in the example video above, the narrator uses the branch "master" to push her commit to GitHub. You will not use that. Instead, you will use the branch "main". So your command will look like:

git push origin main

MACOS USERS

Create an account on GitHub, and then follow the instructions in the macos section here to install git. After that, follow the instructions in the Windows video above to create repo on GitHub that syncs to a local folder on your computer. Be sure to run the git config commands above, too, before creating your local git folder.

Git, GitHub, and VS Code

Git is a free and open source version control system. It is developed and maintained by a community of software developers, but like the Linux kernel, was original written by Linus Torvalds. Since it's creation in 2005, it has become the most widely used version control system for software development and other kinds of coding work.

The basics of Git are fairly straightforward but can become reasonably complex. It is used for single person or simple projects and multi-person or complex projects. In cases involving multi-person and/or complex projects, Git can be difficult to use (or grasp) because big software projects, with multiple contributors, perhaps from multiple organizations, are complicated and may involve complicated work flows. Our goals for this class are not that ambitious, and so our use of Git will be more straightforward and introductory.

Whether straightforward or complex, it is important to get a handle on the basics of Git. Once you have the basics down, it becomes much easier to use Git in more advanced ways. We'll cover the basics in a moment, and we'll keep practicing the basics throughout the remainder of the course.

It is important to recognize that GitHub is not Git. GitHub is a hosting site for projects that use the Git version control system. It does not, however, merely host (store) code and other text that is managed with Git but adds to Git and integrates Git on its web platform and other projects. There are other hosting and service Git providers, such as GitLab, and self-hosting providers, such as Gitea. Any server, e.g., a Linux one, can also serve as a personal or private hosting site.

VS Code is a text editor developed and maintained by Microsoft, which owns GitHub. Perhaps because of this ownership, as well as the popularity of Git and GitHub, Git and GitHub are integrated into the VS Code text editor. VS Code users can augment the text editor with a variety of community created extensions and themes, and you're welcome to explore these for your personal use. Many other text editors can be augmented in this way. Personally, my main text editor is Vim, which is also highly customizable, but for the purposes of this class, I will use VS Code, which is easier to learn.

Git Basics

Repos

The first Git concept to learn is the repository concept. Git uses two kinds of repositories:

  • local repository (repo)
  • remote repository repo)

The local repo is a project directory (or folder) on your computer. Henceforth I will use the term directory and not folder since the former term is more commonly used in tech fields. The project directory contains all the project files and any sub-directories for the project.

The remote repo is where we send, retrieve, or sync the files and directories that are contained in the local repo. We can also retrieve projects from other repos that other people or organizations have created, if those repos are public.

With Git and GitHub, we can start a project on the local system (i.e., our computers) or start a project by creating a remote repo on GitHub and then copying it to our local system.

Branches

The second Git concept to learn is:

  • branches

When you configure a directory on your local system to become a Git project, you create a default branch for your project. For small projects, we might only work in the default branch. The default branch will either be named main or master.

However, since Git is a version control system, we can create additional branches to test or work on different components of our projects without messing with the main branch. For large or complex projects, we would definitely work and switch among different branches. A large project might be a big website, an software application, or even an operating systems. Working in non-main branches, like a testing branch, allows us to develop components of our project without interfering with the main branch, which might be at a stable version of our project. And then when we are ready, we can merge a testing branch with our main branch, and thus create a new version, or we can delete the testing branch if we don't want to use it.

We will primarily work with the default, main branch with our projects, but you should read the Git documentation on branches.

Important note: If we create a new repository on our local machines using Git, the default branch will be called Master. We will reconfigure this, though. However, if we create a new repository on GitHub, the default branch will be called Main. (Git itself will eventually switch to Main as its default branch name in a future update.)

There is a long history of using terms like master and slave in various technologies, and the technology industry is beginning to come to terms with this and to use more inclusive terms. You can read more about the reasons here:

Gitting Started

Git Configuration

Although you have already installed and configured Git on your local machines, I would like to rehash what we have done.

Before using Git to work with local and remote repos, we configure our local operating system to use Git. We do that by giving Git our name and email address plus some other details. Here we need to be sure to use the same email address that we used to create our accounts on GitHub.

To get started, open a command shell or terminal on your computer (e.g., CMD.exe on Windows or Terminal.app on macOS) and run the following command. Note the quotes around the name but not around the email address. Use YOUR NAME AND YOUR EMAIL ADDRESS.

git config --global user.name "Your Name"
git config --global user.email youremail@example.com

Here are some new configurations to make. First, we can configure Git to use Main as our default branch name and to use VS Code as our default Git editor. Run these two commands as-is, but if you are using a different text editor, then be sure to lookup the appropriate command for that editor:

git config --global core.editor "code --wait"
git config --global init.defaultBranch main

For additional details, see the Git documentation on getting started:

Git Fork and Clone

Once we have configured Git to use our information, we can start coding. To help us begin to work on our websites, I have created a repository that we will all use to start. The remote repo is located on my GitHub account and is called cseanburns/web2023, and it contains a basic website template that you can use for your websites. The default index.html file will be the homepage for our site. (For those of you who took my Linux Systems Administration course, you should know why that file is the default home page.)

You are going to fork that repo to your GitHub account, and then clone the fork to your local system. We only have to do this one time, but this will allow you to use my template for yourselves. Going forward, we will use other Git commands. You can do all of this within VS Code.

As an example, in the steps that follow, I am going to fork and clone the octocat/Spoon-Knife repo. This is an example repo used to teach people how to fork on GitHub. You don't have to fork and clone this repo, but you should use my demo of it fork and clone my cseanburns/web2023 repo.

Be sure to configure Git on your system first, as described above. After that, follow these steps.

Steps to Fork, Clone and then Modify a Repo:

  1. Visit the Spoon-Knife repo and copy the URL.
  2. In VS Code, click on the Source Control icon and then click: Clone Repository.
  3. In the pop box, respond Open to the question, "Would you like to open the cloned repository?"
  4. Click on index.html to open and make a change and save.
  5. At the bottom of screen, click on the Synchronize icon and at the prompt, click on Ok.
  6. Click on the Source Control icon again, and then add Message and click on Commit.
  7. At the Stage message, click on Yes to stage commit and commit them directly.
  8. Then click on Sync Changes and click Ok to push.

Once you've completed the steps for the cseanburns/web2023 repository, you should have the web2023 repository on your computer. These are the local repos for those remote repos.

Going forward, we will use VS Code to:

  1. Edit and write HTML, CSS, JSON-LD in our local repo.
  2. Save the edits and new code.
  3. Stage the changes so that Git tracks the new changes.
  4. Commit the changes with a meaningful commit message.
  5. And push the changes to the remote repo.

For future reference, here's a nice cheat sheet of Git commands. Most of these commands are to be used from the command line (Windows, macOS, or Linux), and so if we explore any command line usage of Git, these will be good to have on hand.

User Experience and Accessibility

Accessibility

Note: Readings for this course are linked to in these transcripts. It is important to visit and review the information that are provided at these links.

Hi Class, this week we're studying general web accessibility and how web semantics helps make sites more accessible. I use the term accessibility as a kind of shorthand: I also include, broadly, things like usability and inclusion.

One of our readings makes important distinctions among these terms. That is, accessibility has a distinct definition, as well as the terms usability and inclusion. Our reading from w3.org defines accessibility as that which:

addresses discriminatory aspects related to equivalent user experience for people with disabilities, including people with age-related impairments. For the web, accessibility means that people with disabilities can perceive, understand, navigate, and interact with websites and tools, and that they can contribute equally without barriers.

And then usability is related to user experience design or UX design. Usability is:

about designing products to be effective, efficient, and satisfying. Specifically, ISO (International Organization for Standardization) defines usability as the "extent to which a product can be used by specified users to achieve specified goals effectively, efficiently and with satisfaction in a specified context of use"

Finally, inclusion is referred to as:

Inclusive design, universal design, and design for all involves designing products, such as websites, to be usable by everyone to the greatest extent possible, without the need for adaptation. Inclusion addresses a broad range of issues including access to and quality of hardware, software, and Internet connectivity; computer literacy and skills; economic situation; education; geographic location; and language — as well as age and disability.

Things related to inclusion are pretty broadly defined. This can be related to the responsiveness of a website to various displays, such as mobile phones, and then to the whole ranges of mobile phones that exist, as well as to desktops, laptops, and tablets, and even to smart watches.

In class, we're going to discuss these issues. You're going to identify some websites and rank and judge these sites according to the principles of accessibility, usability, and inclusion by enabling accessibility mode on your laptop/desktop browser and on your phones.

It will help to watch a short video about accessibility on YouTube, which does a really nice job demonstrating all the above issues.

Semantics and Accessibility

Semantics refers to meaning, or what something means. Humans are pretty good (most of the time) at interpreting meaning, but not computers. For example, you and I know what a person is, but a computer does not, yet despite that, if you search for a person on the web, you will likely get accurate results back. The reason this is successful is because information retrieval algorithms use other methods to figure out to identify what is relevant based on what we search.

Thus when we talk about semantics and accessibility, we talk about how to make it so that the structure and the content on our websites can be understood by computers. The result of this is better information search and information retrieval, via search engines and artificial intelligence assistants.

It makes sense then that the meaning of a thing relates to its accessibility. Therefore, one way that we will address accessible web development is through HTML5, which was specifically released to add semantic HTML elements. This helps computer systems (like search engines and screen readers) interpret the meaning of a document's structure better than prior versions of HTML could and provide better accessibility options to people, then, too. Later we'll learn and apply JSON-LD technologies to provide meaning to the content on a page, so that a search technology can have a better understanding of what a webpage is about.

Do More with Less

There is another aspect of HTML and accessibility that we need to consider. You can think of HTML as the least common denominator among the languages that we use to develop websites. That is, at its very basic, a web page can be nothing more than an HTML page since HTML offers elements that are used to structure and provide content to a website. All other scripting additions to a web page primarily work with HTML.

This means also that sometimes you can do things in JavaScript, PHP, Python, Ruby, etc that you can do just with HTML. One of the goals in meeting accessibility, usability, and inclusion issues is to take advantage of the least common denominator aspect of HTML as much as possible. By this I mean that if you have a choice between a JavaScript (or PHP, Ruby, etc.) solution that will perform some task, and you can do the same thing in HTML, then do it in HTML. As soon as we start to make things more complicated and more sophisticated, then the more likely it becomes that we will break something, and that makes it likely to make a site less accessible, usable, and inclusive.

The Semantic Structure

I mentioned this above, but we'll talk about semantics in two ways. We we start creating our sites, we will use HTML5 to add semantics to a web page, and later we will use JSON-LD to add additional semantics that describe the content on a webpage. Some example semantic HTML5 elements include:

  • <article>
  • <aside>
  • <figure>
  • <footer>
  • <header>
  • <nav>
  • <section>
  • <summary>
  • <time>

Below are links to two lists of HTML5 elements. You should begin to review them and save or bookmark these pages for constant reference:

Attempts to provide semantic data were provided for in previous versions of HTML, like XHTML and HTML4, and through hacks that were not all that ideal, such as excessive use of the

element, or through scripting languages like JavaScript. However, HTML5's great benefit is that it provides semantic elements directly. For example, the <article> element was created to describe an article, like a blog entry, on a web page, and the <section> element was created to delineate a generic section on a web page. By doing that, we provide semantic information not just to the user of the site, or to other developers, but also to machines that parse that website for data and information, such as web crawlers from search engines or to screen readers for people who are visually impaired in some way. These semantic elements also provide text-to-speech advantages. These are a few reasons why HTML5 semantic elements are important. You won't have to memorize all of the elements, but there will be ones that you'll use more often, like the ones that I listed above.

All right, please watch the video linked to in this transcript and complete the task assigned in the discussion prompt. See you on the board.

HTML5 and The Web

Document Structure and Metadata

W3C is the organizing body for managing the HTML standards (among other web related technologies), its specifications, and documentation. The documentation for HTML5 should be constantly referenced throughout this course:

  • HTML5 at W3C or the most current draft: https://html.spec.whatwg.org/
  • Specifically the section on semantics: https://html.spec.whatwg.org/#semantics

Another nice reference source is maintained by Mozilla:

The Root Element

The <html> element represents the root of an HTML document (w3org, Root Element section).

This element establishes the root (base) of the document and it follows the DOCTYPE (document type) declaration. The <html> element should be accompanied with a language attribute (more on attributes soon).

<!DOCTYPE html>
<html lang="en">

Document Metadata

The <head> element contains the document's metadata plus other information (w3org, Document Metadata section).

There are five elements that belong to the <head> section. They include:

  • the <title> element: states the document's title
  • the <base> element: declares the base URL
  • the <link> element: establishes links to other resources, such as a CSS stylesheet
  • the <meta> element: provides additional metadata
  • the <style> element: enables internal styling with CSS

HTML Section Elements

There are ten main HTML section elements, or elements used to define the sections of an HTML document. These will be some of the main and most often elements that we'll use in our HTML documents. They include:

  • the <body> element
  • the <section> element
  • the <nav> element
  • the <article> element
  • the <aside> element
  • the <h1>, <h2>, ..., <h6> elements
  • the <hgroup> element
  • the <header> element
  • the <footer> element
  • the <address> element

See: HTML Sections

Grouping Content Elements

The other set of common elements that we'll use quite often are the HTML grouping elements. There are 13 elements categorized as grouping elements. These include:

  • the <p> element
  • the <hr> element
  • the <pre> element
  • the <blockquote> element
  • the <ol> element
  • the <ul> element
  • the <li> element
  • the <dl> element
  • the <dt> element
  • the <dd> element
  • the <figure> element
  • the <figcaption> element
  • the <div> element

Some of these elements must be used together. For example, the