diagrams | ||
screenshots | ||
ReadMe.md |
How do Websites on the Internet Work?
There is a lot happening when you visit a website. Technologists split these complex process up into different "Layers of Abstraction" in order to understand them.
Typically the digital machinery of one layer enables the next layer, etc, and the're stacked up like Legos to form a whole cohesive tool or experience like a website.
This document is intended to be sort of like a textbook: You can go through the "excercises" and gain exposure to many different concepts, but you can also return here and search for things as a quick reference.
Table of Contents:
- To summarize the layers of a typical website
- Simplified Sequence Diagram
- What is all this acronym soup?!!?
- But how can I see these things / interact with them?
- Where can I learn more?
To summarize the layers of a typical website:
Starting from the things that you see and interact with, and diving in deeper towards the basis of it's creation:
CSS
andJavaScript
HTML
- Web Browser Software
HTTP
TLS
DNS
TCP
IP/NAT
(The Public Internet)- Computer Operating Systems
- Computer Hardware
Simplified Sequence Diagram:
A sequence diagram shows how multiple different components of a process interact over time. Time starts and the top and proceeds downwards.
See the footnote 2 below for a fuller, more overly detailed diagram.
What is all this acronym soup?!!?
- The
CSS
(Cascading Style Sheets) andJavaScript
of the websiteCSS
: Color, fonts, layout, animationJavaScript
: Interaction, buttons, transformation, live chats, progress bars, graphs
- The
HTML
(HyperText Markup Language) of the website- Structure, content, text, images, accessibility features, forms
- The Web Browser
- Program which runs on the web-surfer's computer or phone 🏄
- Provides a platform for
HTML
,CSS
, andJavaScript
to run on
HTTP
(HyperText Transfer Protocol)- World Wide Web protocol for communication between web browsers and web servers
- The two lifelong friends, always seen together:
Request
andResponse
- Transfers files:
HTML
,CSS
, andJavaScript
, or any other file.HTML
,CSS
,JavaScript
, and all other "source code" files are just plain text files, like.txt
.
TLS
(Transport Layer Security)- Used to be called
SSL
(Secure Sockets Layer) - Encryption. Created to facilitate buying things online, now almost ubiquitous.
HTTP
running on top ofTLS
is referred to asHTTPS
🔒
- Used to be called
DNS
Domain Name SystemTCP
Transmission Control ProtocolIP/NAT
Internet Protocol and Network Address TranslationIP
Defines Networks and gives each computer anIP Address
on 1 or more networks.IPv4
Addresses look like this:12.34.56.78
IPv6
Addresses look like this:2600:1406:bc00:53::b81e:94ce
- There is only one Public network: The Internet.
- Unlimited number of private networks, or
LAN
s (Local Area Network) - 99.9% of computers we interact with are on
LAN
s. They can only talk to the Internet through aNAT
, also referred to as aRouter
.NAT
s are like one-way valves: You can talk to the internet, and the internet can respond. But someone on the internet can't directly connect to your computer on a LAN footnote 1
But how can I see these things / interact with them?
HTML
(HyperText Markup Language) and CSS
(Cascading Style Sheets)
- Try right-clicking on this text and choosing
Inspect Element
orInspect (Q)
. This should open the web browser developer tools! - If you don't see that option, look up "how to open developer tools in xyz browser"
- You can modify all of this stuff!
- You can modify existing CSS rules or add new ones.
- Try making the text red and bold by clicking on
element { ...
in theCSS
area and adding:color: red;
font-weight: bold;
- Try making the text red and bold by clicking on
- You can modify any of the HTML: Just right-click it and choose
Edit as HTML
- Try modifying the text inside the
<li>
tag (List Item tag).
- Try modifying the text inside the
- You can modify existing CSS rules or add new ones.
JavaScript
- Once you have the web browser developer tools open, you can:
- Try running some JavaScript on the current page using the Console.
- Navigate to the
Console
Tab and type inalert("Hello!")
, then press enter. - It should pop up an alert box on the page!
- This kind of console is often called a
REPL
(Read-Evaluate-Print Loop), pronounced "reh pl".- If you make a mistake and the code contains invalid syntax or throws an error when executed, the Console will throw an error:
- In this case the quotation marks around
"Hello"
were missing, so instead of it being interpreted as a literal value of a string of characters, it was interpreted as a reference to a variable namedHello
. Thus theReferenceError: Hello is not defined
.
- Navigate to the
- You can also use a different tool, the Step-by-Step Debugger, to watch the page's JavaScript execute one line at a time.
- I thought this post did a good job of explaining what it's like to use a debugger 😆
- Navigate to
Debugger
Tab - In the
Sources
pane on the left, open upMain Thread
->git.cyberia.club
->QueerCoded/how-...
->main
- You may have to refresh the page.
- Scroll down to Line 22, right below the openining
<script>
tag. - Click on the line number to add a
Breakpoint
- Refresh the page again, and the debugger should have paused the rendering and execution of code on the page. The page should be all white, and if it's wide enough you will see this debugger widget:
- In the Debugger tab of the developer tools, you should see the current execution state paused on your breakpoint.
- This code doesn't do anything exciting, it's just setting up error handlers and configuration for other JavaScript which will load later, but we can still step through it one line at a time and examine things as we go.
- To use the Debugger and the Console at the same time, you'll need to show the split console (Under the
...
menu in the top right): - Try running
window.config
in the Console. The resulting value that gets printed should beundefined
. - Now press the Step Over button a few times, until the line which sets
window.config
has been executed. - If you run
window.config
again, now it should show you that there is an object there. - Don't forget to scroll back up and click on the breakpoint on line 22 again to clear it so it won't trigger every time you visit this page in the future.
- Try running some JavaScript on the current page using the Console.
Combining Inspect Element
and the Console
- There is a neat trick built into web browsers' developer tools which makes it easy to use JavaScript on any element you can see on the page.
- The last element you used
right-click
->Inspect Element
on will be stored in a special variable called$0
- So you can use JavaScript to modify elements as well: Just run something like
$0.style.color = 'blue';
in the Console.- Or, if horror is more your thing, you can make your HTML scream endlessly with:
setInterval(() => $0.textContent += 'A', 10);
- The developer console may require you to type
'allow pasting'
before you can paste these code snippets in.- This is a small warning that browser developers put there to help prevent scammers from manipulating people into running malicious code via the developer tools.
- Or, if horror is more your thing, you can make your HTML scream endlessly with:
HTTP
(HyperText Transfer Protocol)
-
Basic anatomy of an HTTP Request/Response pair:
-
Request
- Request Method: commonly
GET
,POST
,PUT
, orDELETE
- Request Path:
/files/index.html
- Headers. For example:
Host: example.com
Host
is the only required header for all requests.
Authorization: Basic dXNlcm5hbWU6cGFzc3dvcmQK
- Body
GET
andDELETE
requests are not supposed to have request bodies.- Request bodies are optional for
POST
andPUT
requests. - Usually used for uploading files or other information, like when you submit a form or send a message.
- A request body can be any data, it's essentially a file.
- Request Method: commonly
-
Response
- Status Code. Common examples:
200
(Ok / Success)302
(Redirect)404
(File not Found)500
(Internal Server Error)
- Headers. For example:
Content-Type: text/plain
- This header would instruct the browser to render the response as if it was a plain text file.
Location: https://example.com/
- This header would be used in conjunction with a
302
Redirect status to tell the browser to redirect to a different URL.
- This header would be used in conjunction with a
- Body
- Almost all responses have a body, altho technically the body is still optional.
- Again, the body is essentially just a file, the contents of a file. It could be anything.
- Status Code. Common examples:
-
-
ℹ️ NOTE: The total size of the HTTP Request and Response Headers, including the request path etc, typically must fit within a limited size (about 4 kilobytes). However, the size of the request and response body is unlimited.
- Usually a
Content-Length
header is provided on any request or response which has a body, which informs the client/server about how many bytes of data to expect.
- Usually a
-
Once you have the web browser developer tools open, you can see HTTP Requests and Responses in the
Network
tab. -
The network tab only records and displays HTTP traffic that happens after it was opened. So you probably want to refresh the page after opening it.
-
It should show a list of requests like this:
-
Clicking on one of them will open up a panel on the right which shows details about the request.
- The request headers and response headers are both shown on this panel.
- ℹ️ NOTE: the
Authorization
andCookie
headers may contain sensitive information -- these are where your login session is stored! - I cut off the following screenshot right before the part where it displayed the value of the
i_like_gitea
cookie which contains my login session 😉
-
There is a lot of useful information on this
Headers
tab of the request/response details pop-out pane.- For Requests: Login / Auth info like
Authorization
andCookie
- Date and Time of the Request/Response.
- Metadata, like
Content-Type
,Content-Length
, etc.
- For Requests: Login / Auth info like
-
Each of the Request Body / Response Body get their own tab:
TLS
(Transport Layer Security)
- Click on the Lock icon in the URL bar of your web browser. (The way this works is different for every browser)
- In Firefox, it's 🔒 ->
Connection Secure
->More Information
->View Certificate
- The three tabs you see here represent the cryptographic chain of trust that goes from our
git.cyberia.club
certificate all the way to theISRG Root X1
certificate which is pre-installed on every copy of MacOS, Windows, and Linux.
- The three tabs you see here represent the cryptographic chain of trust that goes from our
- When connecting to a server, a TLS client validates the following:
- That the server's certificate was signed by a trusted CA root certificate.
- That the server's certificate has the domain name that the user requested.
- That the server's certificate isn't expired.
- There is a cool website called
badssl.com
which maintains examples of all of the failure cases of all of these validations!- A self-signed certificate: https://self-signed.badssl.com/
- A certificate with the wrong domain name: https://wrong.host.badssl.com/
- An expired certificate: https://expired.badssl.com/
DNS
(Domain Name System)
- Open up your terminal and run
nslookup example.com
-
forest@debian:~$ nslookup example.com Server: 8.8.8.8 Address: 8.8.8.8#53 Non-authoritative answer: Name: example.com Address: 23.192.228.84 Name: example.com Address: 23.192.228.80
TCP
(Transmission Control Protocol)
- TCP connections have the following elements:
- Source IP Address:
- Who's trying to connect?
- Source Port Number:
- This is generated randomly by the OS.
- The source port is used by NATs, routers, and other computers to tell connections apart and route responses back to the correct one.
- Destination IP Address:
- What computer are you trying to connect to?
- Destination Port Number:
- What service on that computer are you trying to talk to?
- Different ports are used for different services by convention. Here are some common ones:
22
SSH (Secure Shell)53
DNS80
Plain Unencrypted HTTP443
HTTP inside TLS (https)
- Source IP Address:
- Open up your terminal and run
curl -v localhost
- Since you probably aren't running a web server on your laptop (YET! 😉), curl won't be able to connect to anything.
- You will see a TCP error:
Connection refused
-
forest@debian:~$ curl -v localhost * Trying 127.0.0.1:80... * connect to 127.0.0.1 port 80 failed: Connection refused
- There are multiple ways for TCP to fail. It can also time out (Connect Timeout or Read/Write Timeout, Idle Timeout, etc)
- We can see a Connect Timeout by trying to connect to something which is blocked by a firewall.
- Try connecting to example.com on a different port like
8080
. It should just try to connect / hang forever. - I added the
-4
flag to my command to force it to use IPv4 only, just because it makes the output cleaner and easier to see.
-
forest@debian:~$ curl -v -4 example.com:8080 * Trying 23.192.228.84:8080...
- If you waited long enough, you would see:
connect to 23.192.228.84 port 8080 failed: Connection timed out
- Press Ctrl-C to cancel the command and get your command prompt back.
- If you waited long enough, you would see:
- Yet another TCP failure mode: You can try to connect to an address which neither your computer, nor your router, nor the internet know how to get to.
- This address might be on a different private network, and you can't get there from here.
- In this case you will see a
no route to host
error: -
forest@debian:~$ curl -v 10.69.4.20 * Trying 10.69.4.20:80... * connect to 10.69.4.20 port 80 failed: No route to host
Seeing it all together at once with curl
- First, we see that the
cyberia.club
domain name resolves to69.61.2.178
. -
forest@debian:~$ curl -v https://cyberia.club * Trying 69.61.2.178:443...
- We see that TCP starts, and it succeeds in connecting to
69.61.2.178
. It uses port443
because that's the default port for HTTPS. -
* Connected to cyberia.club (69.61.2.178) port 443 (#0)
- Next, we see that TLS starts up inside the TCP connection, and the server's certificate is verified:
-
* ALPN: offers h2,http/1.1 * TLSv1.3 (OUT), TLS handshake, Client hello (1): * CAfile: /etc/ssl/certs/ca-certificates.crt * CApath: /etc/ssl/certs * TLSv1.3 (IN), TLS handshake, Server hello (2): * TLSv1.3 (IN), TLS handshake, Encrypted Extensions (8): * TLSv1.3 (IN), TLS handshake, Certificate (11): * TLSv1.3 (IN), TLS handshake, CERT verify (15): * TLSv1.3 (IN), TLS handshake, Finished (20): * TLSv1.3 (OUT), TLS change cipher, Change cipher spec (1): * TLSv1.3 (OUT), TLS handshake, Finished (20): * SSL connection using TLSv1.3 / TLS_AES_128_GCM_SHA256 * ALPN: server accepted h2 * Server certificate: * subject: CN=cyberia.club * start date: Mar 14 16:13:25 2025 GMT * expire date: Jun 12 16:13:24 2025 GMT * subjectAltName: host "cyberia.club" matched cert's "cyberia.club" * issuer: C=US; O=Let's Encrypt; CN=E6 * SSL certificate verify ok.
- Finally, the HTTP Request is fired off (wrapped inside the TLS session, which is wrapped inside the TCP connection)
- We see the Method
GET
, the request path/
, and three headers, the requiredHost
header plususer-agent
andaccept
.
- We see the Method
-
> GET / HTTP/2 > Host: cyberia.club > user-agent: curl/7.88.1 > accept: */* >
- And the server's HTTP Response comes back. First the HTTP Status Code (
200
, aka "OK"), and the Headers: -
< HTTP/2 200 < accept-ranges: bytes < alt-svc: h3=":443"; ma=2592000 < content-type: text/html; charset=utf-8 < etag: "d8xhd0qredlh240" < last-modified: Fri, 04 Apr 2025 01:57:55 GMT < server: Caddy < vary: Accept-Encoding < content-length: 2736 < date: Tue, 08 Apr 2025 07:09:49 GMT <
- After the headers, we see the Response Body which contains our HTML.
-
<!doctype html> <html lang="en"> <head> <title>Cyberia Computer Club</title> <meta charset="utf-8"> <meta name="viewport" content="width=device-width,initial-scale=1.0"> <meta name="Description" content="Cyberia Computer Club"> <link rel="stylesheet" href="/cyberia.css"> <link rel="icon" href="/favicon.ico"> </head> <body> <nav> <a href="/">Cyberia</a> <a href="/culture">Culture</a> <a href="https://blog.cyberia.club/read">Blog</a> <a href="/calendar">Calendar</a> <a href="/donate">Donate</a> <a href="/matrix">Matrix (Chat)</a> <a href="/mumble">Mumble</a> <br> <a class="external" href="https://capsul.org">Capsul</a> <a class="external" href="https://nullhex.com">Nullhex</a> <a class="external" href="https://git.cyberia.club">Git</a> <a class="external" href="https://wiki.cyberia.club">Wiki</a> <a class="external" href="https://layerze.ro/">Layer Zero (Twin Cities HQ)</a> </nav> <main><h1>Cyberia Computer Club</h1> <p>A kind and amazing hacker collective centered in Minnesota, with global friends.</p> <pre aria-label="ASCII art of Iwakura Lain saying 'close the old world, open the next'"> _..--------.._ ,-'' `-. ,' `. , \ / \ / '. \ ' / || ; ; n /| |/ | | | / v /\/`-'vv\'.|\ , : /v`,--- ----.^. ; ' | / .`, , .`\| ; | n| '.__/ \ ___/|\ ; ` | ` | \/| \ \ \ | /\/ '; `-\ `' /|/ |' ` \ - /| | ` `. .' | | v,_ `;._ _.-; | / `'`\|-_`'-''__/^'^' | | \-v-/ | | cL0s3 th3 o1d w0rld | / 0p3n th3 n3xt || || |, </pre> ....
How can I make this stuff myself?
I haven't written this section yet.
Where can I learn more?
I've only just barely scratched the surface, but hopefully opened up enough questions that you can find room to explore.
In my opinion, MDN (Mozilla Developer Network) has the best documentation for these things:
- https://developer.mozilla.org/en-US/docs/Web/HTML
- https://developer.mozilla.org/en-US/docs/Web/CSS
- https://developer.mozilla.org/en-US/docs/Web/JavaScript
I often just search google for "MDN <insert name of thing here>
" and almost never regret it 🙂
We use Let's Encrypt as our CA (Certificate Authority) for TLS. They have some nice docs:
Wikipedia has a neat diagram which explains the fundamentals behind the asymmetric (public/private key) encryption that TLS uses.
If you are curious about the intersection between the networking stuff like TCP
, the Operating System, and application code (like Python, Go, Java, C, Rust, etc), you may be interested in a couple blog articles I wrote about it:
- When It Does Not Listen for Thee, Ask for Whom the Server Listens (Understanding Listening Addresses)
- What is a Process? (Operating Systems)
Footnotes
- footnote 1 History of
IPv4
,IPv6
, andNAT
s:- It wasn't supposed to be this way.
- When the internet was originally created, it was planned to always be public, every device would get a public address.
- However, there's only about 3 billion
IPv4
addresses total.- A lot of them end up being unnoccupied for various techno-social reasons, even though they are in short supply.
IPv6
was created to try to remedy the situation. It has enough address space to give each grain of sand on earth a unique address.- However,
IPv6
was never fully deployed, and much of the internet still depends onIPv4
. - NATs were created and deployed instead because they were cheaper and easier, and the people deploying them didn't care about the idealistic vision of the internet: They just wanted it to work for consumers and make money right now.
- However,
- footnote 2 Here is the full (overly detailed) diagram: