Modern Browser Architecture#
Before introducing the rendering pipeline, we need to first discuss the architecture of Chromium's browser and its process model as background knowledge.
Two Formulas#
Formula 1: Browser = Browser Engine + Services
- Safari = WebKit + Other components, libraries, services
- Chrome = Chromium + Google service integration
- Microsoft Edge (Chromium) = Chromium + Microsoft service integration
- Yandex Browser = Chromium + Yandex service integration
- 360 Safe Browser = Trident + Chromium + 360 service integration
- Chromium = Blink + V8 + Other components, libraries, services
Formula 2: Engine = Rendering Engine + JavaScript Engine + Others
Browser | Rendering Engine | JavaScript Engine |
---|---|---|
Internet Explorer | Trident (MSHTML) | JScript/Chakra |
Microsoft Edge | EdgeHTML → Blink | Chakra → V8 |
Firefox | Gecko | SpiderMonkey |
Safari | KHTML → WebKit | JavaScriptCore |
Chrome | WebKit → Blink | V8 |
Opera | Presto → WebKit → Blink | Carakan → V8 |
Here we can see that, except for Firefox and the now-defunct IE, most browsers on the market have evolved towards the Blink + V8 or WebKit + JavaScriptCore route.
Rendering Engine#
Responsible for parsing HTML, CSS, and JavaScript, and rendering pages.
Taking Firefox as an example, it has the following working groups:
- Document parser (handles HTML and XML)
- Layout engine with content model
- Style system (handles CSS, etc.)
- JavaScript runtime (SpiderMonkey)
- Image library
- Networking library (Necko)
- Platform-specific graphics rendering and widget sets for Win32, X, and Mac
- User preferences library
- Mozilla Plug-in API (NPAPI) to support the Navigator plug-in interface
- Open Java Interface (OJI), with Sun Java 1.2 JVM
- RDF back end
- Font library
- Security library (NSS)
Next, let's look at the development history of WebKit.
Apple developed WebKit based on KHTML in 2001 as the core for Safari. Later, in 2008, Google developed Chromium based on WebKit, and at that time, Chrome's rendering engine also used WebKit. In 2010, Apple upgraded and restructured WebKit, which is now WKWebView and the rendering engine WebKit2 for Safari. In 2013, Google developed its own rendering engine, Blink, based on WebKit, which is now the rendering engine for Chromium. Due to open-source licensing, we can still see many traces of Apple and WebKit in the Blink source code today.
The evolutionary path of WebKit is roughly illustrated in the following diagram:
According to the test reports from Web Platform Tests, the compatibility of the Chromium rendering engine is also excellent:
JavaScript Engine#
The JavaScript engine in the browser is usually a built-in module of the rendering engine, but it also has very good independence and can be ported to other places as a standalone engine.
Here are a few well-known JavaScript engines in the industry:
- SpiderMonkey: Mozilla's JavaScript engine, written in C/C++, serves as the JavaScript engine for Firefox.
- Rhino: Mozilla's open-source JavaScript engine, written in Java.
- Nashorn: The JavaScript engine built into Oracle's Java Development Kit (JDK) 8, written in Java.
- JavaScriptCore: The JavaScript engine built into WebKit, provided to developers by the system. iOS mobile applications can directly introduce JavaScriptCore with zero incremental overhead (but JIT cannot be enabled in this scenario).
- ChakraCore: Microsoft's open-source JavaScript engine, which is now fully using Chromium as Edge, so except for Edge on iOS (which uses JavaScriptCore as the JavaScript engine), other versions of Edge use the V8 engine.
- V8: Google's open-source JavaScript engine, written in C++, serves as the built-in JavaScript engine for Chromium (or more accurately, Blink) and is also the built-in engine for the Android system WebView (since Android WebView is also based on Chromium). It has excellent performance, and its performance after enabling JIT outperforms many engines. Additionally, its compatibility with ES syntax is also quite good (as seen in the tables later).
- JerryScript: An open-source JavaScript engine from Samsung, used by IoT.js.
- Hermes: Facebook's open-source JavaScript engine designed for Hybrid UI systems like React Native. It supports direct loading of bytecode, which reduces JS loading time and optimizes TTI. Additionally, the engine has optimized bytecode and supports incremental loading, making it more friendly for mid-range and low-end devices. However, it is designed as a glue language interpreter, so it does not support JIT (mobile JS engines tend to limit the use of JIT because enabling JIT can significantly increase warm-up time, affecting the first screen time; it also increases package size and memory usage).
- QuickJS: Developed by FFmpeg author Fabrice Bellard, it is very small (210 KB) and has good compatibility. It generates bytecode directly and supports importing C native modules, offering excellent performance. It has a very low startup time of 300 μs on single-core machines and low memory usage, using reference counting for excellent memory management. QuickJS is very suitable for Hybrid architectures, game scripting systems, or other embedded systems.
The performance of each engine is shown in the following diagram:
ECMAScript standard support status:
Chromium Process Model#
Chromium has 5 types of processes:
- Browser Process: 1
- Utility Process: 1
- Viz Process: 1
- Plugin Process: multiple
- Render Process: multiple
Excluding the Plugin Process related to Chrome extensions, the processes closely related to rendering are the Browser Process, Render Process, and Viz Process. Next, we will focus on these three types of processes.
Render Process#
- Number: multiple
- Responsibilities: Responsible for rendering, animations, scrolling, input events, etc. for a single site within a single tab (note the case of cross-site iframes).
- Threads:
- Main thread x 1
- Compositor thread x 1
- Raster thread x 1
- Worker thread x N
The area managed by the Render Process is WebContent:
Main thread#
Responsibilities:
- Execute JavaScript
- Event Loop
- Document lifecycle
- Hit-testing
- Event scheduling
- Parsing data formats such as HTML, CSS, etc.
Compositor Thread#
Responsibilities:
- Input Handler & Hit Tester
- Scrolling and animations within Web Content
- Calculating the optimal layering of Web Content
- Coordinating image decoding, drawing, and rasterization tasks (helpers)
The number of Compositor thread helpers depends on the number of CPU cores.
Browser Process#
- Number: 1
- Responsibilities: Responsible for all capabilities of the Browser UI (excluding the UI of WebContent), including rendering, animations, routing, input events, etc.
- Threads:
- Render & Compositing Thread
- Render & Compositing Thread Helpers
Viz Process#
- Number: 1
- Responsibilities: Accepts viz::CompositorFrame generated by the Render Process and Browser Process, aggregates it, and finally uses the GPU to display the aggregated result.
- Threads:
- GPU main thread
- Display Compositor Thread
Chromium's Process Model#
- Process-per-site-instance: The default strategy in older versions. If a new page is opened from one page, and the new page and the current page belong to the same site (same root domain and protocol), then these two pages will share a Render Process.
- Process-per-site
- Process-per-tab: The default strategy in current versions, where each tab starts a Render Process. However, note that cross-site iframes within the same site will also start a new Render Process. See the Example below.
- Single Process: Single-process mode, with controllable startup parameters, used for debugging.
Example:
Suppose there are now 3 tabs, opening the sites foo.com, bar.com, and baz.com respectively, where bar.com and baz.com do not involve iframes; however, foo.com does, and its code is as follows:
<html>
<iframe id=one src="foo.com/other-url"></iframe>
<iframe id=two src="bar.com"></iframe>
</html>
Then according to the Process-per-tab model, the final process model is illustrated in the following diagram:
Chromium Rendering Pipeline#
Now that the background knowledge has been introduced, let's start the core part of this article — the Chromium Rendering Pipeline.
The rendering pipeline refers to the process of accepting network bytecode and progressively processing this bytecode to transform it into pixels on the screen. After sorting, it includes the following 13 processes:
- Parsing
- Style
- Layout
- Pre-paint
- Paint
- Commit
- Compositing
- Tiling
- Raster
- Activate
- Draw
- Aggregate
- Display
After organizing the respective modules and process threads, the final pipeline is illustrated in the following diagram:
Next, we will look at each step one by one.
Note: This article is an Overview, so it aims for brevity and does not include source code, but will link to parts involving source code for readers to index and read themselves. Additionally, I have written more detailed process analysis articles for some steps, which will be linked at the beginning of the corresponding sections for interested readers to click and read in detail.
Parsing#
This section recommends reading the series of articles “Chromium Rendering Pipeline - Parsing” for an in-depth understanding of Parsing.
- Module: blink
- Process: Render Process
- Thread: Main thread
- Responsibilities: Parse the bytes sent from the Browser Process network thread, process them, and generate the DOM Tree
- Input: bytes
- Output: DOM Tree
The data flow designed for this step is: bytes → characters → token → nodes → object model (DOM Tree)
We can sort out each twist in the data flow and obtain the following 5 steps:
- Loading: Blink receives bytes from the network thread
- Conversion: HTMLParser converts bytes to characters
- Tokenizing: Converts characters to W3C standard tokens
- Lexing: Through lexical analysis, converts tokens to Element objects
- DOM construction: Uses the constructed Element objects to build the DOM Tree
Loading#
Responsibilities: Blink receives bytes from the network thread.
Process:
- Browser process downloads webpage content
- Passes to the Content module of the Render Process
- blink::DocumentLoader
- blink::HTMLDocumentParser
Conversion#
Responsibilities: Convert bytes to characters.
Core stack:
#0 0x00000002d2380488 in blink::HTMLDocumentParser::Append(WTF::String const&) at /Users/airing/Files/code/chromium/src/third_party/blink/renderer/core/html/parser/html_document_parser.cc:1037
#1 0x00000002cfec278c in blink::DecodedDataDocumentParser::UpdateDocument(WTF::String&) at /Users/airing/Files/code/chromium/src/third_party/blink/renderer/core/dom/decoded_data_document_parser.cc:98
#2 0x00000002cfec268c in blink::DecodedDataDocumentParser::AppendBytes(char const*, unsigned long) at /Users/airing/Files/code/chromium/src/third_party/blink/renderer/core/dom/decoded_data_document_parser.cc:71
#3 0x00000002d2382778 in blink::HTMLDocumentParser::AppendBytes(char const*, unsigned long) at /Users/airing/Files/code/chromium/src/third_party/blink/renderer/core/html/parser/html_document_parser.cc:1351
Tokenizing#
Responsibilities: Convert characters to tokens.
Core functions:
It is important to note that during this step, if link, script, or img tags are parsed, network requests will continue to be initiated; when parsing script, it is necessary to wait for the parsed JavaScript to execute before continuing to parse HTML. This is because JavaScript may change the structure of the DOM tree (e.g., document.write()
), so it needs to wait for its execution to finish.
Lexing#
Responsibilities: Convert tokens to Elements.
Core functions:
Note that during this step, a stack structure is used to store Nodes (HTML Tags) for subsequent construction of the DOM Tree — for example, for HTMLToken::StartTag
type tokens, ProcessStartTag
will be called to perform a push operation, while for HTMLToken::EndTag
type tokens, ProcessEndTag
will be called to perform a pop operation.
For the following DOM Tree:
<div>
<p>
<div></div>
</p>
<span></span>
</div>
The push and pop process for each Node is as follows:
DOM construction#
Responsibilities: Instantiate Element into DOM Tree.
The final data structure of the DOM Tree can be previewed from blink::TreeScope
:
We can use DevTools to view the Parsing process of the page:
However, this flame graph does not show the stack calls on the C++ side. If you want to delve into the kernel-side stack situation, you can use Perfetto for page recording and analysis, which not only shows the stack situation on the C++ side but also analyzes the thread to which each call belongs, and will also connect the functions that send and receive communications during inter-process communication.
After analyzing Parsing, we can improve our flowchart:
Style#
- Module: blink
- Process: Render Process
- Thread: Main thread
- Responsibilities: The Style Engine traverses the DOM, performing style analysis (resolution) and style recalculation (recalc) through matching CSSOM to construct the Render Tree
- Input: DOM Tree
- Output: Render Tree
The Render Tree consists of Render Objects, each corresponding to a DOM node, which will have ComputedStyle (computed style) information attached to it.
ComputedStyle can be viewed directly through DevTools, which is often used during CSS debugging.
Core function: Document::UpdateStyleAndLayout (the Layout part can be ignored for now)
The logic of this function is illustrated in the following diagram; we refer to this step of generating ComputedStyle as style recalc:
The complete Style process is illustrated in the following diagram:
We can break it down into 3 steps:
- CSS loading
- CSS parsing
- CSS calculation
CSS Loading#
Core stack printout:
[DocumentLoader.cpp(558)] “<!DOCType html>\n<html>\n<head>\n<link rel=\”stylesheet\” href=\”demo.css\”> \n</head>\n<body>\n<div class=\”text\”>\n <p>hello, world</p>\n</div>\n</body>\n</html>\n”
[HTMLDocumentParser.cpp(765)] “tagName: html |type: DOCTYPE|attr: |text: “
[HTMLDocumentParser.cpp(765)] “tagName: |type: Character |attr: |text: \n”
[HTMLDocumentParser.cpp(765)] “tagName: html |type: startTag |attr: |text: “
…
[HTMLDocumentParser.cpp(765)] “tagName: html |type: EndTag |attr: |text: “
[HTMLDocumentParser.cpp(765)] “tagName: |type: EndOfFile|attr: |text: “
[Document.cpp(1231)] readystatechange to Interactive
[CSSParserImpl.cpp(217)] received and parsing stylesheet: “.text{\n font-size: 20px;\n}\n.text p{\n color: #505050;\n}\n”
It is important to note that after the DOM is constructed, the HTML page will not be rendered immediately; it must wait for the CSS to finish processing. This is because the style recalc and other subsequent processes will only occur after the CSS is loaded; rendering a style-less DOM is meaningless.
The browser blocks rendering until it has both the DOM and the CSSOM. ——Render blocking CSS
CSS Parsing#
The data flow involved in CSS parsing is: bytes → characters → tokens → StyleRule → RuleMap. The processing of bytes has been discussed earlier, so we will focus on the subsequent processes.
First: characters → tokens.
The tokens involved in CSS are shown in the following diagram:
It is important to note that FunctionToken will have additional calculations. For example, Blink uses RGBA32 to store Color (CSSColor::Create). According to my micro-benchmark test results, converting Hex to RGBA32 is about 15% faster than rgb().
The second step is: tokens → StyleRule.
StyleRules = selectors (selectors) + properties (property set).
It is worth noting that CSS selector parsing is right-to-left.
For example, for this CSS:
.text .hello{
color: rgb(200, 200, 200);
width: calc(100% - 20px);
}
#world{
margin: 20px;
}
The parsing results are as follows:
selector text = “.text .hello”
value = “hello” matchType = “Class” relation = “Descendant”
tag history selector text = “.text”
value = “text” matchType = “Class” relation = “SubSelector”
selector text = “#world”
value = “world” matchType = “Id” relation = “SubSelector”
Additionally, Blink has a set of rules for applying default styles: the loading order is html.css (default styles) → quirk.css (quirky styles) → android/linux/mac.css (styles for each operating system) → other.css (business styles).
For more built-in CSS loading order, refer to blink_resources.grd configuration.
Finally: StyleRule → RuleMap.
All StyleRules are stored in different Maps based on selector types, which allows for quick retrieval of all rules matching the first selector during comparison, and then each rule checks whether its next selector matches the current element.
- RuleMap id_rules_: RuleMap for id selectors
- RuleMap class_rules_: RuleMap for class selectors
- RuleMap attr_rules_: RuleMap for attribute selectors
- RuleMap tag_rules_: RuleMap for tag selectors
- RuleMap ua_shadow_pseudo_element_rules_: RuleMap for pseudo-class selectors
Recommended reading: blink/renderer/core/css/rule_set.h
CSS Calculation#
- Product: ComputedStyle
Why calculate CSS Style? Because multiple selectors may match the DOM node, and it also needs to inherit properties from parent elements as well as properties provided by the UA.
Steps:
- Find the matching selectors
- Set styles
It is important to note the priority order for applying styles:
- Cascade layers order
- Selector priority order
- Proximity sorting
- Declaration position order
Source code: ElementRuleCollector::CompareRules :
We all know that the priority order for applying styles is the sum of selector priorities, but this is only the second-level priority inside. If the first three priorities are completely the same, the last applied style will depend on the timing of the style declaration — the later the declaration, the higher the priority.
As shown in the figure:
In this case, the class of h1, whether written as main-heading 2 main-heading
or reversed, the title is blue because .main-heading2
is declared later, thus having a higher priority.
Layout#
- Module: blink
- Process: Render Process
- Thread: Main thread
- Responsibilities: Handle the geometric properties of Elements, namely position and size
- Input: Render Tree
- Output: Layout Tree
Layout Objects record the geometric properties of Render Objects.
A LayoutObject has an attached LayoutRect property, including:
- x
- y
- width
- height
However, it is important to note that LayoutObjects and DOM Nodes do not have a 1:1 relationship, as shown in the following diagram:
The core function of the Layout process: Document::UpdateStyleAndLayout, after this step, the DOM tree will become the Layout Tree, as shown in the following code:
<div style="max-width: 100px">
<div style="float: left; padding: 1ex">F</div>
<br>The <b>quick brown</b> fox
<div style="margin: -60px 0 0 80px">jumps</div>
</div>
Each LayoutObject node records position and size information:
We know that avoiding Layout (reflow) can improve page performance. So how can we reduce reflows? The main idea is to merge multiple reflows and then feedback to the render tree. Specific measures include:
- Directly changing the classname instead of style → avoiding CSSOM regeneration and composition
- Taking frequently reflowing Elements "offline"
- Replacing properties that trigger reflow
- Controlling the impact range of reflow within a separate layer
Among them, the properties that trigger the first/second Layout (reflow), Paint (repaint), and Compositor can refer to CSS Triggers:
We can see that each browser engine handles properties differently. If performance optimization is needed, we can refer to this table to see if there are any CSS properties that can be optimized.
Pre-paint#
- Module: blink
- Process: Render Process
- Thread: Main thread
- Responsibilities: Generate Property trees for use by the Compositor thread, avoiding redundant Raster of certain resources
- Input: Layout Tree
- Output: Property Tree
Based on the property tree, Chromium can operate on a specific node's transformations, clipping, effects, and scrolling without affecting its child nodes.
Core functions:
The new version of Chromium has changed to CAP (composite after paint) mode
Property trees include the following four trees:
Paint#
- Module: blink
- Process: Render Process
- Thread: Main thread
- Responsibilities: Blink interfaces with the cc drawing interface to perform Paint, generating data sources for the cc module cc::Layer
- Input: Layout Object
- Output: PaintLayer (cc::Layer)
Note: cc = content collator (content arranger), not Chromium Compositor.
Core functions:
The Paint stage converts Layout Objects in the Layout Tree into drawing instructions and encapsulates these operations in cc::DisplayItemList, which are then injected into cc::PictureLayer.
The process of generating the display item list is also a stack structure traversal:
For example, for the following HTML:
<style> #p {
position: absolute; padding: 2px;
width: 50px; height: 20px;
left: 25px; top: 25px;
border: 4px solid purple;
background-color: lightgrey;
} </style>
<div id=p> pixels </div>
The corresponding generated display items are shown in the following diagram:
Finally, let's introduce cc::Layer, which runs on the main thread, and there is only one cc::Layer tree within a Render Process.
A cc::Layer represents a UI within a rectangular area, and the following subclasses represent different types of UI data:
- cc::PictureLayer: Used to implement self-drawing UI components, allowing external implementations through the cc::ContentLayerClient interface to provide a cc::DisplayItemList object, which represents a list of drawing operations. After passing through the cc pipeline, it is converted into one or more viz::TileDrawQuad stored in viz::CompositorFrame.
- cc::TextureLayer: Corresponds to viz's viz::TextureDrawQuad; any UI component that wants to use its own logic for Raster can use this layer, such as Flash plugins, WebGL, etc.
- cc::UIResourceLayer/cc::NinePatchLayer: Similar to TextureLayer, used for software rendering.
- cc::SurfaceLayer/cc::VideoLayer (deprecated): Corresponds to viz's viz::SurfaceDrawQuad, used to embed other CompositorFrames. Blink's iframes and video players can use this layer for implementation.
- cc::SolidColorLayer: Used to display pure color UI components.
Commit#
- Module: cc
- Process: Render Process
- Thread: Compositor thread
- Responsibilities: Submit the data produced during the Paint stage (cc::Layer) to the Compositor thread
- Input: cc::Layer (main thread)
- Output: LayerImpl (compositor thread)
Core function: PushPropertiesTo
The core logic is to commit the data of LayerTreeHost to LayerTreeHostImpl. We can set a breakpoint where we receive the Commit message, and the stack is as follows:
libcc.so!cc::PictureLayer::PushPropertiesTo(cc::PictureLayer * this, cc::PictureLayerImpl * base_layer)
libcc.so!cc::PushLayerPropertiesInternal<std::__Cr::__wrap_iter<cc::Layer**> >(std::__Cr::__wrap_iter<cc::Layer**> source_layers_begin, std::__Cr::__wrap_iter<cc::Layer**> source_layers_end, cc::LayerTreeHost * host_tree, cc::LayerTreeImpl * target_impl_tree)
libcc.so!cc::TreeSynchronizer::PushLayerProperties(cc::LayerTreeHost * host_tree, cc::LayerTreeImpl * impl_tree)
libcc.so!cc::LayerTreeHost::FinishCommitOnImplThread(cc::LayerTreeHost * this, cc::LayerTreeHostImpl * host_impl)
libcc.so!cc::SingleThreadProxy::DoCommit(cc::SingleThreadProxy * this)libcc.so!cc::SingleThreadProxy::ScheduledActionCommit(cc::SingleThreadProxy * this)libcc.so!cc::Scheduler::ProcessScheduledActions(cc::Scheduler * this)
libcc.so!cc::Scheduler::NotifyReadyToCommit(cc::Scheduler * this, std::__Cr::unique_ptr<cc::BeginMainFrameMetrics, std::__Cr::default_delete<cc::BeginMainFrameMetrics> > details)
libcc.so!cc::SingleThreadProxy::DoPainting
libcc.so!cc::SingleThreadProxy::BeginMainFrame(cc::SingleThreadProxy * this, const viz::BeginFrameArgs & begin_frame_args)
Compositing#
- Module: cc
- Process: Render Process
- Thread: Compositor thread
- Responsibilities: Divide the entire page into multiple independent layers according to certain rules to facilitate isolation of updates
- Input: PaintLayer(cc::Layer)
- Output: GraphicsLayer
Core function:
Why is a Compositor thread needed? Let's assume that if this step were omitted, and Paint directly rasterized to the screen, what would happen:
If rasterization were done directly to the screen, and the data source required for Raster was not ready by the time the vertical sync signal arrived, it would lead to dropped frames and cause "Janky."
Of course, to avoid Janky, Chromium also implements conventional optimizations at each stage — caching. As shown in the following diagram, caching strategies are applied in the Style, Layout, Paint, and Raster stages to avoid unnecessary rendering, thus reducing the likelihood of Janky occurring:
However, even with so many caching optimizations, a simple scroll can cause all pixels to be re-Painted + Raster!
The product of the Compositing stage, after layering, allows Chromium to only operate on the necessary layers during rendering, while other layers only need to participate in composition, thus improving rendering efficiency:
As shown in the following diagram:
If a wobble class has a transform animation, then the entire div node is an independent GraphicsLayer, and the animation only needs to render this part of the layer.
We can also use the layer tool in DevTools to view all Layers, which will tell us the reason for the layer's creation, how much memory it occupies, and how many times it has been drawn so far, allowing us to optimize memory and rendering efficiency.
This also answers why CSS animations perform excellently. Because of the involvement of the Compositor thread, it processes CSS animations based on Property Trees, and can handle CSS animations separately in the Compositor thread. Additionally, we can use will-change to inform the Compositor thread in advance to optimize layer merging. However, this solution is not universal, as each Layer consumes a certain amount of memory.
The Compositor Thread also has the ability to handle input events, as shown in the following diagram, where it listens for various events from the Browser Process:
However, it is important to note that if an event listener is registered in JavaScript, it will forward the input event to the main thread for processing.
Tiling#
- Module: cc
- Process: Render Process
- Thread: Compositor thread
- Responsibilities: Split a cc::PictureLayerImpl into multiple cc::TileTask tasks for processing at different scale levels and sizes.
- Input: LayerImpl (compositor thread)
- Output: cc::TileTask (raster thread)
Tiling is the basic working unit of Raster. In this stage, the Layer (LayerImpl) will be split into tiles. After the Commit is completed, Tile tasks cc::RasterTaskImpl will be created as needed, and these tasks will be posted to the Raster thread for execution.
Core function: PrepareTiles
Recommended reading: cc/tiles/tile_manager.h
This step mainly submits cc::TileTask tasks to the raster thread for tiled rendering (Tile Rendering). Tiled rendering refers to dividing the webpage cache into small blocks, usually 256x256 or 512x512, and rendering them in chunks.
The necessity of tiled rendering is reflected in the following two aspects:
- GPU composition is usually implemented using OpenGL ES textures, and the cache at this point is actually textures (GL Texture); many GPUs have size limitations on textures. GPUs cannot support caches of arbitrary sizes.
- Tiled caches facilitate the browser's use of a unified buffer pool to manage caches. Small tiled caches are shared by all WebViews; when opening a webpage, a small tiled cache is requested from the buffer pool, and when closing the webpage, these caches are reclaimed.
If the previous environment's layering improves rendering efficiency on a macro level, then tiling improves rendering efficiency on a micro level.
Chromium's strategy for tiled rendering also includes the following optimizations:
- Prioritize rendering tiles close to the viewport: Raster will arrange the priority order for Raster based on the distance between Tiling and the visible viewport; closer tiles will be prioritized for Raster, while farther ones will have their Raster priority downgraded.
- When first composing tiles, reduce the resolution to minimize the time spent on texture composition and upload.
We can set a breakpoint at the position where TileTask is submitted to see the complete stack of this step:
libcc.so!cc::SingleThreadTaskGraphRunner::ScheduleTasks(cc::TestTaskGraphRunner * this, cc::NamespaceToken token, cc::TaskGraph * graph)
libcc.so!cc::TileTaskManagerImpl::ScheduleTasks(cc::TileTaskManagerImpl * this, cc::TaskGraph * graph)
libcc.so!cc::TileManager::ScheduleTasks(cc::TileManager * this, cc::TileManager::PrioritizedWorkToSchedule work_to_schedule)
libcc.so!cc::TileManager::PrepareTiles(cc::TileManager * this, const cc::GlobalStateThatImpactsTilePriority & state)
libcc.so!cc::LayerTreeHostImpl::PrepareTiles(cc::LayerTreeHostImpl * this)
libcc.so!cc::LayerTreeHostImpl::NotifyPendingTreeFullyPainted(cc::LayerTreeHostImpl * this)
libcc.so!cc::LayerTreeHostImpl::UpdateSyncTreeAfterCommitOrImplSideInvalidation(cc::LayerTreeHostImpl * this)
libcc.so!cc::LayerTreeHostImpl::CommitComplete(cc::LayerTreeHostImpl * this)
libcc.so!cc::SingleThreadProxy::DoCommit(cc::SingleThreadProxy * this)
Raster#
- Module: cc
- Process: Render Process
- Thread: Raster thread
- Responsibilities: The Raster stage executes each TileTask, ultimately producing a resource recorded in LayerImpl (cc::PictureLayerImpl). It will playback the drawing operations from DisplayItemList into the viz's CompositorFrame.
- Input: cc::TileTask
- Output: LayerImpl (cc::PictureLayerImpl)
Recommended reading: cc/raster/
These color value bitmaps are stored in GPU memory (GPUs can also perform rasterization, i.e., hardware acceleration).
In addition, Raster also includes the ability to decode images:
The core class of Raster cc::RasterBufferProvider has several key subclasses:
- cc::GpuRasterBufferProvider: Uses GPU for Raster, and the results of Raster are stored directly in SharedImage.
- cc::OneCopyRasterBufferProvider: Uses Skia for Raster, saving the results first to GpuMemoryBuffer, and then copying the data from GpuMemoryBuffer to the resource's SharedImage using CopySubTexture.
- cc::ZeroCopyRasterBufferProvider: Uses Skia for Raster, saving the results to GpuMemoryBuffer, and then directly creating SharedImage from GpuMemoryBuffer.
- cc::BitmapRasterBufferProvider: Uses Skia for Raster, saving the results in shared memory.
GPU Shared Image#
The so-called SharedImage mechanism essentially abstracts the data storage capabilities of the GPU, allowing applications to directly store data in GPU memory and read data directly from the GPU, while allowing crossing shared group boundaries. In earlier versions of Chromium, the Mailbox mechanism was used, but most modules have now been restructured to GPU Shared Image.
GPU Shared Image includes Client and Service sides, where the Client side can be for Browser / Render / GPU processes, etc., and there can be multiple Client sides; while the Service side can only have one, running in the GPU process. The architecture diagram is as follows:
Some scenarios where the SharedImage mechanism is used in Chromium include:
- CC module: First raster the scene to SharedImage, then send it to Viz for composition.
- OffscreenCanvas: First raster the content of the Canvas to SharedImage, then send it to Viz for composition.
- Image processing/rendering: One thread decodes the image into the GPU, while another thread uses the GPU to modify or render the image.
- Video playback: One thread decodes the video into the GPU, while another thread renders it.
Rasterization Strategy#
Based on whether the Compositor and Raster stages are synchronized (note that synchronization does not necessarily require being in the same thread) or asynchronous, they can be divided into synchronous rasterization and asynchronous rasterization, with asynchronous rasterization being performed in chunks, hence also called asynchronous tiled rasterization.
Synchronous rasterization is used by Android, iOS, and Flutter, and they also support additional pixel buffers for indirect rasterization.
The rendering pipeline for synchronous rasterization is simple, as shown in the following diagram:
Asynchronous rasterization is the strategy currently adopted by browsers and WebViews. Except for some special layers (such as Canvas, Video), layers undergo tiled rasterization, with each rasterization task executing drawing instructions within the corresponding tiled area of the corresponding layer, and the results written into the pixel buffer of that tile; moreover, rasterization and composition do not execute in the same thread and are not synchronized. If a tile has not completed rasterization during the composition process, it will remain blank or display a checkerboard pattern.
Both rasterization strategies have their pros and cons, roughly summarized in the following table:
Synchronous Rasterization | Asynchronous Rasterization | |
---|---|---|
Memory Usage | Excellent | Poor |
First Screen Performance | Good | Average |
Rendering Efficiency for Dynamic Content | High | Low |
Layer Animation | Average | Absolute Advantage for Inertia Animation |
Rasterization Performance | Slightly Weak on Low-end Machines | Good |
In terms of memory usage, synchronous rasterization has a clear advantage, while asynchronous rasterization consumes a lot of memory. It can be said that the performance of the browser engine is largely achieved at the cost of memory.
In terms of first screen performance, the synchronous rasterization pipeline is more streamlined, with no complex scheduling tasks, allowing it to achieve screen display earlier. However, this improvement is actually limited; in terms of first screen performance, synchronous rasterization can theoretically complete one or two frames earlier than asynchronous rasterization, which may only be 20 milliseconds (of course, the resources for asynchronous rasterization are also loaded locally).
For dynamic content, if the content of the page is constantly changing, this means that most of the intermediate caches for asynchronous rasterization will be invalidated and need to be re-rasterized. Due to the more streamlined pipeline of synchronous rasterization, the efficiency of re-rendering in this case is also higher.
For layer animations, asynchronous rasterization has an absolute advantage. As mentioned earlier, the property trees and compositing can control the range of re-rendered layers, making it very efficient. Although asynchronous rasterization requires additional time for tiling, this overhead is not high, usually around 2 ms. If the page animation is particularly complex, the advantages of asynchronous rasterization can be highlighted. For inertia scrolling, asynchronous rasterization will pre-rasterize areas outside the viewport to optimize the experience. However, synchronous rasterization also has its strengths; for instance, during encoding, iOS, Android, and Flutter emphasize the reuse mechanism at the cell level to optimize scrolling effects.
Finally, in terms of rasterization performance, synchronous rasterization has higher performance requirements because it requires a lot of CPU calculations, making it prone to frame drops on low-end machines. However, as mobile CPU performance improves, the advantages of synchronous rasterization become more apparent, as it has an absolute memory advantage compared to asynchronous rasterization and can also address inertia animations through reuse mechanisms, making its overall advantages quite clear.
In addition, asynchronous rasterization also has some unavoidable issues, such as white screens during rapid scrolling and DOM updates not being synchronized during scrolling.
Activate#
- Module: cc
- Process: Render Process
- Thread: Compositor thread
- Responsibilities: Implement a buffering mechanism to ensure that the data for Raster is ready before the Draw stage operations. Specifically, the process of copying Layers from the Pending Tree to the Active Tree is called Activation.
Core function: LayerTreeHostImpl::ActivateSyncTree
The Compositor thread has three cc::LayerImpl trees:
- Pending tree: Responsible for receiving commits and rasterizing LayerImpl
- Active tree: Rasterized LayerImpl is taken from here for Draw operations
- Recycle tree: To avoid frequent creation of LayerImpl objects, the Pending tree will not be destroyed but will degrade into a Recycle tree.
// Tree currently being drawn.
std::unique_ptr<LayerTreeImpl> active_tree_;
// In impl-side painting mode, tree with possibly incomplete rasterized
// content. May be promoted to active by ActivateSyncTree().
std::unique_ptr<LayerTreeImpl> pending_tree_;
// In impl-side painting mode, inert tree with layers that can be recycled
// by the next sync from the main thread.
std::unique_ptr<LayerTreeImpl> recycle_tree_;
The target of the Commit phase is actually the Pending tree, and the results of Raster are also stored in the Pending tree. Through Active, it is possible to rasterize tiles from the latest commit while drawing the previous commit.
Draw#
This stage can also be called Submit; in this article, we will uniformly refer to it as Draw.
- Module: cc
- Process: Render Process
- Thread: Compositor thread
- Responsibilities: The process of generating draw quads from the rastered tiles (Tiling).
- Input: cc::LayerImpl (Tiling)
- Output: viz::DrawQuad
The Draw stage does not perform the actual drawing but traverses the cc::LayerImpl objects in the Active Tree and calls its cc::LayerImpl::AppendQuads method to create appropriate viz::DrawQuad and place them in the RenderPass of the CompositorFrame.
Core functions:
Viz#
Now let's introduce an important module for screen display in Chromium — viz.
viz = visuals
In Chromium, the core logic of viz runs in the Viz Process, responsible for receiving viz::CompositorFrame (abbreviated as CF) generated by other processes, then composing these CFs and finally rendering the composite result on the window.
The core classes of the viz module are illustrated in the following diagram:
A CF object represents a frame of content in a rectangular display area. The viz::CompositorFrame internally stores the following types of data:
- Metadata: CompositorFrameMetadata
- Referenced resources: TransferableResource
- Drawing operations: RenderPass/DrawQuad
The metadata viz::CompositorFrameMetadata records metadata related to the CF, such as the scaling level of the frame, scrolling area, referenced surfaces, etc.:
- device_scale_factor
- latency_info:ui::LatencyInfo
- referenced_surfaces
- begin_frame_ack
The referenced resources viz::TransferableResource record the resources referenced by this CF, which can be understood as an image. Resources can exist in two forms:
- Software resources stored in memory
- Textures stored in the GPU
If hardware-accelerated rendering is not enabled, only software resources can be used; if hardware acceleration is enabled, only hardware-accelerated resources can be used.
The drawing operations of the CF, viz::RenderPass, consist of a series of related viz::DrawQuad. Effects, transformations, mipmaps, caching, and screenshots can be applied to a RenderPass individually. There are many types of DrawQuad:
- viz::TextureDrawQuad: Internally references a resource.
- viz::TileDrawQuad: Represents a Tile block, similar to TextureDrawQuad, also references a resource; DisplayItemList will be rasterized into TileDrawQuad;
- viz::PictureDrawQuad: Internally stores DisplayItemList, but currently can only be used for Android WebView;
- viz::SolidColorDrawQuad: Represents a color block;
- viz::RenderPassDrawQuad: Internally references the Id of another RenderPass;
- viz::SurfaceDrawQuad: Internally stores a viz::SurfaceId, whose content is created by another CompositorFrameSinkClient, used for viz nesting, such as OOPIF, OffscreenCanvas, etc.;
After introducing the basic knowledge of the Viz module, let's move our pipeline into the Viz Process.
Aggregate#
- Module: Viz
- Process: Viz Process
- Thread: Display Compositor thread
- Responsibilities: Surface aggregation, accepting multiple CFs passed from different processes and composing them.
Core class: SurfaceAggregator.
The display compositor (viz process compositor thread) will accept multiple CFs passed from different processes and call functions in SurfaceAggregator to perform the composition.
Display#
- Module: Viz
- Process: Viz Process
- Thread: GPU main thread
- Responsibilities: After generating the CF, viz will call GL instructions to output the draw quads to the screen.
- Input: The composed viz::CompositorFrame (needs the DrawQuad within)
- Output: Draw pixels
First, let's introduce the rendering targets of Viz. viz::DirectRenderer and viz::OutputSurface are used to manage rendering targets. Based on their different subclass combinations, there are three different rendering schemes:
- Software rendering: viz::SoftwareRenderer + viz::SoftwareOutputSurface + viz::SoftwareOutputDevice
- Skia rendering: viz::SkiaRenderer + viz::SkiaOutputSurface(Impl) + viz::SkiaOutputDevice
- OpenGL rendering: viz::GLRenderer + viz::GLOutputSurface
First is software rendering, where SoftwareRenderer is used for pure software rendering when hardware acceleration is turned off.
The second is Skia rendering, where SkiaOutputSurface controls the rendering target through SkiaOutputDevice, which has many subclasses, including SkiaOutputDeviceOffscreen for off-screen rendering and SkiaOutputDeviceGL for GL rendering.
SkiaRenderer draws DrawQuad onto the canvas provided by SkiaOutputSurfaceImpl, but this canvas does not perform the actual drawing action; instead, it records these drawing operations through Skia's ddl (SkDeferredDisplayListRecorder) mechanism. Once all RenderPass drawings are completed, these recorded drawing operations will be sent to SkiaOutputSurfaceImplOnGpu for actual drawing.
Skia rendering offers the greatest flexibility, supporting GL rendering, Vulkan rendering, off-screen rendering, etc.
The core functions and related classes are summarized as follows:
- viz::SkiaRenderer
- viz::SkiaOutputSurface
- viz::SkiaOutputDevice
- SkiaOutputSurfaceImplOnGpu
- [SkSurface::draw](https://source.chromium.org/chromium/chromium/src/+/main:third_party/skia/src/image/SkSurface.cpp;bpv=1;bpt=1;l=243?q=SkSurface::draw&ss=chromium&gsn=draw&gs=kythe%3A%2F%2Fchromium.googlesource.com%2Fchromium%2Fsrc%3Flang%3Dc%252B%252B%3Fpath%3Dthird_party%2Fskia%2Finclude%2Fcore%2FSkSurface.h%23JJqTiW3_ZrLk4BWPX-Bz-0r49wyE1npszhYd3P3EtTo&gs=kythe%3A%2F%2Fchromium.googlesource.com%2Fchromium%2Fsrc%3Flang%3Dc%252B%252B%3Fpath%3Dthird_party%2Fskia%2Fsrc%2Fimage%2FSkSurface.cpp%23Qao2aijSUuzq