How a Browser Works: A Beginner-Friendly Guide to Browser Internals

Most people think after they type a URL and press Enter, the webpage opens but behind the scenes browser have to do so much query/work to show the page. In this blog we will discuss the process.
What a browser actually is
Most people think browser are used to show the websites nothing wrong it’s like saying kitchen gives food. Kitchen is used for processing, making and serving of food from raw vegetables that’s what browsers do. Browsers are complex application software that takes the raw data / files from the internet and turns them into a usable, visual page for humans. When we enter a URL or domain name in browsers, it searches all the cache memories in case it has stored the data temporarily, if not found request goes to DNS recursive resolver, DNS resolver finds the IP for the domain by visiting Root servers then TLD then Authoritative DNS server , then gives the IP to browser. Then a connection is made with the help of TCP via 3-way handshake. Once connection is made now browsers can make request with http’s and get the raw internet files as response. For example, when you open youtube.com, the browser downloads HTML, CSS and JavaScript, then turns it into You tube page layout you see that includes videos, buttons and menus etc.
Main parts of a browser

Since a browser is so complex, it has decentralized or divided the work into many components to make the experience smooth. The components are User Interface, Browser Engine, Rendering Engine, Networking, JS Interpreter, Storage. So, we will start with User Interface. User Interface what you see on the screen like buttons, address bar, tabs, refresh buttons etc. Now the Browser Engine. Browser is like the manager which controls what happens next like user clicked on video like whatever data needs to play video. Browser engine acts as a bridge between User Interface and the rendering engine. Next up Rendering Engine, whatever you see visually on the browser is done by it, simple meaning turns the code from HTML, CSS & JavaScript into visuals eventually from HTML to DOM and CSS to CSSOM and merging them. Some examples of engines are Chromium uses Blink, Firefox uses Gecko. Networking, this part handles the internet communication between client & server, from DNS query to find IP address till fetching of raw files like HTML, CSS, JavaScript, Images & JSON etc. JS-interpreter as the names suggest it runs JavaScript code to use it for logic. Some engines that are used are V8 in Chrome, Spider Monkey in Firefox. Storage, it consists of the storage facility of the Browser like local storage, session storage, Cookies etc.
User Interface
So, User Interface is the part where user connects. The search button, the address bar where you type URL’s, navigation for refreshing, going back or forward and tabs etc. are all User Interface. When you search something user interface captures the input URL and send it to browser engine then it commands networking to search for the domain and after successful fetching of files it will be parsed and then rendering engine will show the design with data.
Browser Engine vs Rendering Engine
So, think of Browser Engine as Senior Developer and rendering engine as Junior one. Now junior developer will do things exactly what Senior developer will ask him. Just like that Browser Engine acts like a Senior developer, it decides what to load, when to reload, which page to load etc. Browser engine is the connecting bridge between User Interface and the rendering engine. Rendering engine, on the other hand, is responsible for conversion of HTML & CSS code into visual pixels on the screen while following the standards and protocols. Rendering engine doesn’t know which page to load and when etc. That’s where Browser engine helps.
Networking
So, to communicate with server, client (browser) will setup a new connection with the Server with TCP or UDP, which suits better. After successful connection, Browser sends request to the server. After checking if the client is authorized (if necessary) and trust-worthy server will respond back with HTML, CSS, JS and Images one after one and separately. The browser may check cache for these files before communicating with server. Then Browser and rendering engine handle this. So, Networking helps in communication of client-server, DNS lookup, etc.
HTML parsing and DOM creation
When a browser fetches data from a server, it downloads raw HTML files (along with CSS and JavaScript).
These files are just plain text, so the browser needs to parse them to understand their structure. Browser builds a structured representation called the DOM (Document Object Model).
The DOM is like a tree structure:
Each HTML element becomes a node
Nested elements become child nodes
Elements at the same level become siblings
For example, given this HTML:
<body>
<h1>Hello</h1>
<div>
<h2>Mehtab</h2>
</div>
</body>
The browser creates the following DOM tree:
HTML
├── HEAD
└── BODY
├── H1
└── DIV
└── H2
This tree structure helps the browser understand:
Which elements are inside others
How content is organized
How styles and JavaScript should be applied
Once the DOM is created, it is passed to the rendering engine, which uses it to calculate layout and display the page on the screen.
CSS parsing and CSSOM creation
Just like HTML, CSS is also parsed by the browser. When CSS is downloaded, the browser reads and builds a structured representation called the CSSOM (CSS Object Model).
The CSSOM stores all styling information such as Colors, Fonts, Spacing, Layout rules etc.
For example, consider this CSS:
h1 {
color: red;
}
The browser converts it into an internal CSSOM rule similar to:
Selector: h1
Property: color
Value: red
The CSSOM is not plain text — it is a structured model that allows the browser to:
Match styles to DOM elements
Apply cascading and inheritance rules
Recalculate styles when changes occur
Once the CSSOM is ready, it is combined with the DOM to continue the rendering process.
How DOM and CSSOM come together

After parsing of HTML code. Browser creates a DOM structure with HTML. Simultaneously CSS is being parsed and a CSSOM structure is prepared. Now browser combines DOM & CSSOM to create something called Render Tree that contains only visible elements i.e., Hidden elements with visibility: none and display: none are usually skipped. Because If an element is not visible , there’s no point in showing it to users. Render Tree is ready to be shown in UI
Layout (reflow), painting, and display

All the mathematical calculations are handled by Layout (also called Reflow) like calculating exact position and sizes of elements. After Layout, Browsers performs Painting which includes filling pixels with colors, borders, text colors etc. Finally, Browsers displays everything on the screen. Now if something changes, browser will not restart the process, it will make changes based on Layout, Paint etc. This optimization helps browsers keep pages fast and smooth.
Parsing
Parsing is dividing or breaking something into smaller meaningful parts. Browsers, parsing converts plain HTML text into structured object that browsers can work with.
Math expression
2 + 3 * 4
Parsed Structure (not calculation):
+
/ \
2 *
/ \
3 4
HTML parsing helps in: -
Understand hierarchy and nesting of tags
Builds DOM (structure model)
Reads the tags, attributes and text




