Оценка на читателите: / 9
Слаба статияОтлична статия 

Новини от света на уеб дизайна и СЕО

Представям Ви синдикирани новини от няколко от водещите сайтове в областта на уеб дизайна и СЕО - оптимизирането за търсачки.

A List Apart: The Full Feed
Articles for people who make web sites.
  • Progressive Web Apps: The Case for PWAs

    A note from the editors: We’re pleased to share an excerpt from Chapter 2 of Jason Grigsby’s Progressive Web Apps, from A Book Apart.

    Now that you know what a progressive web app is, you’re probably wondering if your organization would benefit from one. To determine if it makes sense for your organization, ask yourself two questions:

    1. Does your organization have a website? If so, you would probably benefit from a progressive web app. This may sound flippant, but it’s true: nearly every website should be a progressive web app, because they represent best practices for the web.
    2. Does your organization make money on your website via ecommerce, advertising, or some other method? If so, you definitely need a progressive web app, because progressive web apps can have a significant impact on revenue.

    This doesn’t mean that your site needs to have every possible feature of progressive web apps. You may have no need to provide offline functionality, push notifications, or even the ability for people to install your website to their homescreen. You may only want the bare minimum: a secure site, a service worker to speed up the site, and a manifest file—things that benefit every website.

    Of course, you may decide that your personal website or side project doesn’t warrant the extra effort to make it into a progressive web app. That’s understandable—and in the long run, even personal websites will gain progressive web app features when the underlying content management systems add support for them. For example, both Magento and WordPress have already announced their plans to bring progressive web apps to their respective platforms. Expect other platforms to follow suit.

    But if you’re running any kind of website that makes money for your organization, then it would behoove you to start planning for how to convert your website to a progressive web app. Companies that have deployed progressive web apps have seen increases in conversion, user engagement, sales, and advertising revenue. For example, Pinterest saw core engagement increase by 60 percent and user-generated ad revenue increase by 44 percent (Fig 2.1). West Elm saw a 15 percent increase in average time spent on their site and a 9 percent lift in revenue per visit.

    Comparing old mobile web to the progressive web version of Pinterest, the time spent that was greater than 5 minutes increased by 40%, the user-generated ad revenue increased by 44%, ad clickthroughs increased by 50%, and core engagement metrics improved by 60%. Even comparing to the native app, most of these same metrics increased between 2-5%.
    Fig 2.1: Addy Osmani, an engineering manager for Google, wrote a case study about Pinterest’s progressive web app, comparing it to both their old mobile website and their native app.

    The success stories for progressive web apps are so abundant that my company, Cloud Four, started a website called PWA Stats to keep track of them (Fig 2.2). There’s a good chance that we’ve collected a case study from an organization similar to yours that you can use to convince your coworkers that building a progressive web app makes sense.

    A screenshot of the PWA Stats homepage, showing case studies for Uber, Trivago, Petlove, and the Grand Velas Riviera Maya resort.
    Fig 2.2: PWAstats.com collects statistics and stories documenting the impact of progressive web apps.

    And convincing them may be necessary. Despite the clear benefits of progressive web apps, many businesses still haven’t converted—often because they simply don’t know about PWAs yet. (So if you start building one now, you may get a jump on your competition!)

    But there is also a lot of confusion about what progressive web apps are capable of, where they can be used, and how they relate to native apps. This confusion creates fear, uncertainty, and doubt (FUD) that slow the adoption of progressive web apps.

    If you advocate for progressive web apps in your organization, you’ll likely find some confusion and possibly even encounter some resistance. So let’s equip you with arguments to cut through the FUD and convince your colleagues.

    Native apps and PWAs can coexist

    If your organization already has a native app, stakeholders may balk at the idea of also having a progressive web app—especially since the main selling point of PWAs is to enable native app features and functionality.

    It’s tempting to view progressive web apps as competition to native apps—much of the press coverage has adopted this storyline. But the reality is that progressive web apps make sense irrespective of whether a company has a native app.

    Set aside the “native versus web” debate, and focus on the experience you provide customers who interact with your organization via the web. Progressive web apps simply make sense on their own merits: they can help you reach more customers, secure your site, generate revenue, provide more reliable experiences, and notify users of updates—all as a complement to your native app.

    Reach more customers

    Not all of your current customers—and none of your potential customers—have your native app installed. Even your average customer is unlikely to have your app installed, and those customers who do have your app may still visit your site on a desktop computer.

    Providing a better experience on the website itself will increase the chances that current and future customers will read your content or buy your products (or even download your native app!). A progressive web app can provide that better experience.

    Despite what the tech press might have you believe, the mobile web is growing faster than native apps. comScore compared the top one thousand apps to the top one thousand mobile web properties and found that “mobile web audiences are almost 3x the size and growing 2x as fast as app audiences”.

    And while it’s true that people spend more time in their favorite apps than they do on the web, you may have trouble convincing people to install your app in the first place. Over half of smartphone users in the United States don’t download any apps in a typical month.

    Having a native app in an app store doesn’t guarantee that people will install it. It costs a lot to advertise an app and convince people to try it. According to app marketing company Liftoff, the average cost to get someone to install an app is $4.12, and that shoots up to $8.21 per install if you want someone to create an account in your app.

    If you’re lucky enough to get someone to install your app, the next hurdle is convincing them to continue to use it. When analyst Andrew Chen analyzed user retention data from 125 million mobile phones, he found that “the average app loses 77% of its DAUs [daily active users] within the first 3 days after the install. Within 30 days, it’s lost 90% of DAUs. Within 90 days, it’s over 95%” (Fig 2.3).

    Chart: The average retention curve for Android apps drops precipitously within the first three days and continues to drop more slowly to near 0 over the next 90 days.
    Fig 2.3: App loyalty remains a big issue for native apps. The average app loses over 95 percent of its daily active users within 90 days.

    Progressive web apps don’t have those same challenges. They’re as easy for people to discover as your website is, because they are your website. And the features of a progressive web app are available immediately. There’s no need to jump through the hoops of visiting an app store and downloading the app. Installation is fast: it happens in the background during the first site visit, and can literally be as simple as adding an icon to the home screen.

    As Alex Russell wrote in a 2017 Medium post:

    The friction of PWA installation is much lower. Our internal metrics at Google show that for similar volume of prompting for PWA banners and native app banners — the closest thing to an apples-to-apples comparison we can find — PWA banners convert 5–6x more often. More than half of users who chose to install a native app from these banners fail to complete installing the app whereas PWA installation is near-instant.

    In short, a large and growing percentage of your customers interact with you on the web. Progressive web apps can lead to more revenue and engagement from more customers.

    Secure your website

    If you’re collecting credit cards or private information, providing a secure website for your web visitors is a must. But even if your website doesn’t handle sensitive data, it still makes sense to use HTTPS and provide a secure experience. Even seemingly innocuous web traffic can provide signals that can identify individuals and potentially compromise them. That’s not to mention the concerns raised by revelations of government snooping.

    It used to be that running a secure server was costly, confusing, and (seemingly) slower. Things have changed. SSL/TLS certificates used to cost hundreds of dollars, but now certificate provider Let’s Encrypt gives them out for free. Many hosting providers have integrated with certificate providers so you can set up HTTPS with a single click. And it turns out that HTTPS wasn’t as slow as we thought it was.

    Websites on HTTPS can also move to a new version of HTTP called HTTP/2. The biggest benefit is that HTTP/2 is significantly faster that HTTP/1. For many hosting providers and content delivery networks (CDNs), the moment you move to HTTPS, you get HTTP/2 with no additional work.

    If that wasn’t enough incentive to move to HTTPS, browser makers are using a carrot-and-stick approach for pushing websites to make the change. For the stick, Chrome has started warning users when they enter data on a site that isn’t running HTTPS. By the time you read this, Google plans to label all HTTP pages with a “Not secure” warning (Fig 2.4). Other browsers will likely follow suit and start to flag sites that aren’t encrypted to make sure users are aware that their data could be intercepted.

    The eventual treatment of all HTTP pages in Chrome will be to show a red yield icon with the words 'Not secure'.
    Fig 2.4: Google has announced its intention to label any website that isn’t running HTTPS as not secure. Different warning styles will be rolled out over time, until the label reaches the final state shown here.

    For the HTTPS carrot, browsers are starting to require HTTPS to use new features. If you want to utilize the latest and greatest web tech, you’ll need to be running HTTPS. In fact, some features that used to work on nonsecure HTTP that are considered to contain sensitive data—for example, geolocation—are being restricted to HTTPS now. On second thought, perhaps this is a bit of a stick as well. A carrot stick?

    With all that in mind, it makes sense to set up a secure website for your visitors. You’ll avoid scary nonsecure warnings. You’ll get access to new browser features. You’ll gain speed benefits from HTTP/2. And: you’ll be setting yourself up for a progressive web app.

    In order to use service workers, the core technology for progressive web apps, your website must be on HTTPS. So if you want to reap the rewards of all the PWA goodness, you need to do the work to make sure your foundation is secure.

    Generate more revenue

    There are numerous studies that show a connection between the speed of a website and the amount of time and money people are willing to spend on it. DoubleClick found that “53% of mobile site visits are abandoned if pages take longer than 3 seconds to load.” Walmart found that for every 100 milliseconds of improvement to page load time, there was up to a one percent increase in incremental revenue.

    Providing a fast web experience makes a big difference to the bottom line. Unfortunately, the average load time for mobile websites is nineteen seconds on 3G connections. That’s where a progressive web app can help.

    Progressive web apps use service workers to provide an exceptionally fast experience. Service workers allow developers to explicitly define what files the browser should store in its local cache and under what circumstances the browser should check for updates to the cached files. Files that are stored in the local cache can be accessed much more quickly than files that are retrieved from the network.

    When someone requests a new page from a progressive web app, most of the files needed to render that page are already stored on the local device. This means that the page can load nearly instantaneously because all the browser needs to download is the incremental information needed for that page.

    In many ways, this is the same thing that makes native apps so fast. When someone installs a native app, they download the files necessary to run the app ahead of time. After that occurs, the native app only has to retrieve any new data. Service workers allow the web to do something similar.

    The impact of progressive web apps on performance can be astounding. For example, Tinder cut load times from 11.91 seconds to 4.69 seconds with their progressive web app—and it’s 90 percent smaller than their native Android app. Hotel chain Treebo launched a progressive web app and saw a fourfold increase in conversion rates year-over-year; conversion rates for repeat users saw a threefold increase, and their median interactive time on mobile dropped to 1.5 seconds.

    Ensure network reliability

    Mobile networks are flaky. One moment you’re on a fast LTE connection, and the next you’re slogging along at 2G speeds—or simply offline. We’ve all experienced situations like this. But our websites are still primarily built with an assumption that networks are reliable.

    With progressive web apps, you can create an app that continues to work when someone is offline. In fact, the technology used to create an offline experience is the same technology used to make web pages fast: service workers.

    Remember, service workers allow us to explicitly tell the browser what to cache locally. We can expand what is stored locally—not only the assets needed to render the app, but also the content of pages—so that people can continue to view pages offline (Fig 2.5).

    Three screens from the housing.com site show how the design adapts to show when it is offline and that it can continue to show saved results even when offline.
    Fig 2.5: The header in housing.com’s progressive web app changes from purple (left) to gray when offline (middle). Content the user has previously viewed or favorited is available offline (right), which is important for housing.com’s home market in India, where network connectivity can be slow and unreliable.

    Using a service worker, we can even precache the shell of our application behind the scenes. This means that when someone visits a progressive web app for the first time, the whole application could be downloaded, stored in the cache, and ready for offline use without requiring the person to take any action to initiate it. For more on when precaching makes sense, see Chapter 5.

    Keep users engaged

    Push notifications are perhaps the best way to keep people engaged with an application. They prompt someone to return to an app with tantalizing nuggets of new information, from breaking news alerts to chat messages.

    So why limit push notifications to those who install a native application? For instance, if you have a chat or social media application, wouldn’t it be nice to notify people of new messages (Fig 2.6)?

    Two screens: On the left, a list of system notifications including one from the Twitter website. On the right, the notification opened on the Twitter site to a funny tweet about WiFi passwords in a bar.
    Fig 2.6: Twitter’s progressive web app, Twitter Lite, sends the same notifications that its native app sends. They appear alongside other app notifications (left). Selecting one takes you directly to the referenced tweet in Twitter Lite (right).

    Progressive web apps—specifically our friend the service worker—make push notifications possible for any website to use. Notifications aren’t required for something to be a progressive web app, but they are often effective at increasing re-engagement and revenue:

    We’ll talk more about push notifications in Chapter 6. For now, it can be helpful to know that progressive web apps can send push notifications, just like a native app—which may help you make the case to your company.

    Whether you have a native app or not, a progressive web app is probably right for you. Every step toward a progressive web app is a step toward a better website. Websites should be secure. They should be fast. They would be better if they were available offline and able to send notifications when necessary.

    For your customers who don’t have or use your native app, providing them with a better website experience is an excellent move for your business. It’s really that simple.

  • var to JIT

    In our previous article we described how the browser uses CSS to render beautiful pixels to the user’s screen. Although modern CSS can (and should!) be used to create highly interactive user experiences, for the last mile of interactivity, we need to dynamically update the HTML document. For that, we’re going to need JavaScript.

    Bundle to bytecode

    For a modern web application, the JavaScript that the browser first sees will typically not be the JavaScript written by a developer. Instead, it will most likely be a bundle produced by a tool such as webpack. And it will probably be a rather large bundle containing a UI framework such as React, various polyfills (libraries that emulate new platform features in older browsers), and an assortment of other packages found on npm. The first challenge for the browser’s JavaScript engine is to convert that big bundle of text into instructions that can be executed on a virtual machine. It needs to parse the code, and because the user is waiting on JavaScript for all that interactivity, it needs to do it fast.

    At a high level, the JavaScript engine parses code just like any other programming language compiler. First, the stream of input text is broken up into chunks called tokens. Each token represents a meaningful unit within the syntactic structure of the language, similar to words and punctuation in natural written language. Those tokens are then fed into a top-down parser that produces a tree structure representing the program. Language designers and compiler engineers like to call this tree structure an AST (abstract syntax tree). The resulting AST can then be analyzed to produce a list of virtual machine instructions called bytecode.

    JavaScript is run through the abstract syntax tree, which produces byte code

    The process of generating an AST is one of the more straightforward aspects of a JavaScript engine. Unfortunately, it can also be slow. Remember that big bundle of code we started out with? The JavaScript engine has to parse and build syntax trees for the entire bundle before the user can start interacting with the site. Much of that code may be unnecessary for the initial page load, and some may not even be executed at all!

    Fortunately, our compiler engineers have invented a variety of tricks to speed things up. First, some engines parse code on a background thread, freeing up the main UI thread for other computations.  Second, modern engines will delay the creation of in-memory syntax trees for as long as possible by using a technique called deferred parsing or lazy compilation. It works like this: if the engine sees a function definition that might not be executed for a while, it will perform a fast, “throwaway” parse of the function body. This throwaway parse will find any syntax errors that might be lurking within the code, but it will not generate an AST. Later, when the function is called for the first time, the code will be parsed again. This time, the engine will generate the full AST and bytecode required for execution. In the world of JavaScript, doing things twice can sometimes be faster than doing things once!

    The best optimizations, though, are the ones that allow us to bypass doing any work at all. In the case of JavaScript compilation, this means skipping the parsing step completely. Some JavaScript engines will attempt to cache the generated bytecode for later reuse in case the user visits the site again. This isn’t quite as simple as it sounds. JavaScript bundles can change frequently as websites are updated, and the browser must carefully weigh the cost of serializing bytecode against the performance improvements that come from caching.

    Bytecode to runtime

    Now that we have our bytecode, we’re ready to start execution. In today’s JavaScript engines, the bytecode that we generated during parsing is first fed into a virtual machine called an interpreter. An interpreter is a bit like a CPU implemented in software. It looks at each bytecode instruction, one at a time, and decides what actual machine instructions to execute and what to do next.

    The structure and behavior of the JavaScript programming language is defined in a document formally known as ECMA-262. Language designers like to call the structure part “syntax” and the behavior part “semantics.” The semantics of almost every aspect of the language is defined by algorithms that are written using prose-like pseudo-code. For instance, let’s pretend we are compiler engineers implementing the signed right shift operator (>>). Here’s what the specification tells us:

    ShiftExpression : ShiftExpression >> AdditiveExpression

    1. Let lref be the result of evaluating ShiftExpression.
    2. Let lval be ? GetValue(lref).
    3. Let rref be the result of evaluating AdditiveExpression.
    4. Let rval be ? GetValue(rref).
    5. Let lnum be ? ToInt32(lval).
    6. Let rnum be ? ToUint32(rval).
    7. Let shiftCount be the result of masking out all but the least significant 5 bits of rnum, that is, compute rnum & 0x1F.
    8. Return the result of performing a sign-extending right shift of lnum by shiftCount bits. The most significant bit is propagated. The result is a signed 32-bit integer.

    In the first six steps we convert the operands (the values on either side of the >>) into 32-bit integers, and then we perform the actual shift operation. If you squint, it looks a bit like a recipe. If you really squint, you might see the beginnings of a syntax-directed interpreter.

    Unfortunately, if we implemented the algorithms exactly as they are described in the specification, we’d end up with a very slow interpreter. Consider the simple operation of getting a property value from a JavaScript object.

    Objects in JavaScript are conceptually like dictionaries. Each property is keyed by a string name. Objects can also have a prototype object.

    A JavaScript object with a prototype, an arrow pointing to an object.prototype, an arrow pointing to obj, an arrow pointing to obj2

    If an object doesn’t have an entry for a given string key, then we need to look for that key in the prototype. We repeat this operation until we either find the key that we’re looking for or get to the end of the prototype chain.

    That’s potentially a lot of work to perform every time we want to get a property value out of an object!

    The strategy used in JavaScript engines for speeding up dynamic property lookup is called inline caching. Inline caching was first developed for the language Smalltalk in the 1980s. The basic idea is that the results from previous property lookup operations can be stored directly in the generated bytecode instructions.

    To see how this works, let’s imagine that the JavaScript engine is a towering gothic cathedral. As we step inside, we notice that the engine is chock full of objects swarming around. Each object has an identifiable shape that determines where its properties are stored.

    Now, imagine that we are following a series of bytecode instructions written on a scroll. The next instruction tells us to get the value of the property named “x” from some object. You grab that object, turn it over in your hands a few times to figure out where “x” is stored, and find out that it is stored in the object’s second data slot.

    It occurs to you that any object with this same shape will have an “x” property in its second data slot.  You pull out your quill and make a note on your bytecode scroll indicating the shape of the object and the location of the “x” property. The next time you see this instruction you’ll simply check the shape of the object. If the shape matches what you’ve recorded in your bytecode notes, you’ll know exactly where the data is located without having to inspect the object. You’ve just implemented what’s known as a monomorphic inline cache!

    But what happens if the shape of the object doesn’t match our bytecode notes? We can get around this problem by drawing a small table with a row for each shape we’ve seen. When we see a new shape, we use our quill to add a new row to the table. We now have a polymorphic inline cache. It’s not quite as fast as the monomorphic cache, and it takes up a little more space on the scroll, but if there aren’t too many rows, it works quite well.

    If we end up with a table that’s too big, we’ll want to erase the table, and make a note to remind ourselves to not worry about inline caching for this instruction. In compiler terms, we have a megamorphic callsite.

    In general, monomorphic code is very fast, polymorphic code is almost as fast, and megamorphic code tends to be rather slow. Or, in haiku form:

    One shape, flowing wind
    Several shapes, jumping fox
    Many shapes, turtle

    Interpreter to just-in-time (JIT)

    The great thing about an interpreter is that it can start executing code quickly, and for code that is run only once or twice, this “software CPU” performs acceptably fast. But for “hot code” (functions that are run hundreds, thousands, or millions of times) what we really want is to execute machine instructions directly on the actual hardware. We want just-in-time (JIT) compilation.

    As JavaScript functions are executed by the interpreter, various statistics are gathered about how often the function has been called and what kinds of arguments it is called with. If the function is run frequently with the same kinds of arguments, the engine may decide to convert the function’s bytecode into machine code.

    Let’s step once again into our hypothetical JavaScript engine, the gothic cathedral. As the program executes, you dutifully pull bytecode scrolls from carefully labeled shelves. For each function, there is roughly one scroll. As you follow the instructions on each scroll, you record how many times you’ve executed the scroll. You also note the shapes of the objects encountered while carrying out the instructions. You are, in effect, a profiling interpreter.

    When you open the next scroll of bytecode, you notice that this one is “hot.” You’ve executed it dozens of times, and you think it would run much faster in machine code. Fortunately, there are two rooms full of scribes that are ready to perform the translation for you. The scribes in the first room, a brightly lit open office, can translate bytecode into machine code quite fast. The code that they produce is of good quality and is concise, but it’s not as efficient as it could be. The scribes in the second room, dark and misty with incense, work more carefully and take a bit longer to finish. The code that they produce, however, is highly optimized and about as fast as possible.

    In compiler-speak, we refer to these different rooms as JIT compilation tiers. Different engines have different numbers of tiers depending on the tradeoffs they’ve chosen to make.

    You decide to send the bytecode to the first room of scribes. After working on it for a bit, using your carefully recorded notes, they produce a new scroll containing machine instructions and place it on the correct shelf alongside the original bytecode version. The next time you need to execute the function, you can use this faster set of instructions.

    The only problem is that the scribes made quite a few assumptions when they translated our scroll. Perhaps they assumed that a variable would always hold an integer. What happens if one of those assumptions is invalidated?

    In that case we must perform what’s known as a bailout. We pull the original bytecode scroll from the shelf, and figure out which instruction we should start executing from. The machine code scroll disappears in a puff of smoke and the process starts again.

    To infinity and beyond

    Today’s high-performance JavaScript engines have evolved far beyond the relatively simple interpreters that shipped with Netscape Navigator and Internet Explorer in the 1990s. And that evolution continues. New features are incrementally added to the language. Common coding patterns are optimized. WebAssembly is maturing. A richer standard module library is being developed. As developers, we can expect modern JavaScript engines to deliver fast and efficient execution as long as we keep our bundle sizes in check and try to make sure our performance-critical code is not overly dynamic.


  • Braces to Pixels

    Doesn’t CSS seem like magic? Well, in this third installment of “URL to Interactive” we’ll look at the journey that your browser goes through to take your CSS from braces to pixels. As a bonus, we’ll also quickly touch on how end-user interaction affects this process. We have a lot of ground to cover, so grab a cup of <insert your favorite drink’s name here>, and let’s get going.


    Similar to what we learned about HTML in “Tags to DOM,” once CSS is downloaded by the browser, the CSS parser is spun up to handle any CSS that it encounters. This can be CSS within individual documents, inside of <style> tags, or inline within the style attribute of a DOM element. All the CSS is parsed out and tokenized in accordance with the syntax specification. At the end of this process, we have a data structure with all the selectors, properties, and properties’ respective values.

    For example, consider the following CSS:

    .fancy-button {
    	background: green;
    	border: 3px solid red;
    	font-size: 1em;

    That will result in the following data structure for easy utilization later in the process:

    Selector Property Value
    .fancy-button background-color rgb(0,255,0)
    .fancy-button border-width 3px
    .fancy-button border-style solid
    .fancy-button border-color rgb(255,0,0)
    .fancy-button font-size 1em

    One thing that is worth noting is that the browser exploded the shorthands of background and border into their longhand variants, as shorthands are primarily for developer ergonomics; the browser only deals with the longhands from here on.

    After this is done, the engine continues constructing the DOM tree, which Travis Leithead also covers in “Tags to DOM”; so go read that now if you haven’t already, I’ll wait.


    Now that we have parsed out all styles within the readily available content, it’s time to do style computation on them. All values have a standardized computed value that we try to reduce them to. When leaving the computation stage, any dimensional values are reduced to one of three possible outputs: auto, a percentage, or a pixel value. For clarity, let’s take a look at a few examples of what the web developer wrote and what the result will be following computation:

    Web Developer Computed Value
    font-size: 1em font-size: 16px
    width: 50% width: 50%
    height: auto height: auto
    width: 506.4567894321568px width: 506.46px
    line-height: calc(10px + 2em) line-height: 42px
    border-color: currentColor border-color: rgb(0,0,0)
    height: 50vh height: 540px
    display: grid display: grid

    Now that we’ve computed all the values in our data store, it’s time to handle the cascade.


    Since the CSS can come from a variety of sources, the browser needs a way to determine which styles should apply to a given element. To do this, the browser uses a formula called specificity, which counts the number of tags, classes, ids, and attribute selectors utilized in the selector, as well as the number of !important declarations present. Styles on an element via the inline style attribute are given a rank that wins over any style from within a <style> block or external style sheet. And if a web developer utilizes !important on a value, the value will win over any CSS no matter its location, unless there is a !important inline as well.

    Graphic showing a hierarchy for determining CSS priority

    To make this clear, let’s show a few selectors and their resulting specificity scores:

    Selector Specificity Score
    li 0 0 0 0 1
    li.foo 0 0 0 1 1
    #comment li.foo.bar 0 0 1 2 1
    <li style="color: red"> 0 1 0 0 0
    color: red !important 1 0 0 0 0

    So what does the engine do when the specificity is tied? Given two or more selectors of equal specificity, the winner will be whichever one appears last in the document. In the following example, the div would have a blue background.

    div {
    	background: red;
    div {
    	background: blue;

    Let’s expand on our .fancy-button example a little bit:

    .fancy-button {
    	background: green;
    	border: 3px solid red;
    	font-size: 1em;
    div .fancy-button {
    	background: yellow;

    Now the CSS will produce the following data structure. We’ll continue building upon this throughout the article.

    Selector Property Value Specificity Score Document Order
    .fancy-button background-color rgb(0,255,0) 0 0 0 1 0 0
    .fancy-button border-width 3px 0 0 0 1 0 1
    .fancy-button border-style solid 0 0 0 1 0 2
    .fancy-button border-color rgb(255,0,0) 0 0 0 1 0 3
    .fancy-button font-size 16px 0 0 0 1 0 4
    div .fancy-button background-color rgb(255,255,0) 0 0 0 1 1 5

    Understanding origins

    In “Server to Client,” Ali Alabbas discusses origins as they relate to browser navigation. In CSS, there are also origins, but they serve different purposes:

    • user: any styles set globally within the user agent by the user;
    • author: the web developer’s styles;
    • and user agent: anything that can utilize and render CSS (to most web developers and users, this is a browser).

    The cascade power of each of these origins ensures that the greatest power lies with the user, then the author, and finally the user agent. Let’s expand our dataset a bit further and see what happens when the user sets their browser’s font size to a minimum of 2em:

    Origin Selector Property Value Specificity Score Document Order
    Author .fancy-button background-color rgb(0,255,0) 0 0 0 1 0 0
    Author .fancy-button border-width 3px 0 0 0 1 0 1
    Author .fancy-button border-style solid 0 0 0 1 0 2
    Author .fancy-button border-color rgb(255,0,0) 0 0 0 1 0 3
    Author .fancy-button font-size 16px 0 0 0 1 0 4
    Author div .fancy-button background-color rgb(255,255,0) 0 0 0 1 1 5
    User * font-size 32px 0 0 0 0 1 0

    Doing the cascade

    When the browser has a complete data structure of all declarations from all origins, it will sort them in accordance with specification. First it will sort by origin, then by specificity, and finally, by document order.

    Origin ⬆ Selector Property Value Specificity Score ⬆ DocumentOrder ⬇
    User * font-size 32px 0 0 0 0 1 0
    Author div .fancy-button background-color rgb(255,255,0) 0 0 0 1 1 5
    Author .fancy-button background-color rgb(0,255,0) 0 0 0 1 0 0
    Author .fancy-button border-width 3px 0 0 0 1 0 1
    Author .fancy-button border-style solid 0 0 0 1 0 2
    Author .fancy-button border-color rgb(255,0,0) 0 0 0 1 0 3
    Author .fancy-button font-size 16px 0 0 0 1 0 4

    This results in the “winning” properties and values for the .fancy-button (the higher up in the table, the better). For example, from the previous table, you’ll note that the user’s browser preference settings take precedence over the web developer’s styles. Now the browser finds all DOM elements that match the denoted selectors, and hangs the resulting computed styles off the matching elements, in this case a div for the .fancy-button:

    Property Value
    font-size 32px
    background-color rgb(255,255,0)
    border-width 3px
    border-color rgb(255,0,0)
    border-style solid

    If you wish to learn more about how the cascade works, take a look at the official specification.

    CSS Object Model

    While we’ve done a lot up to this stage, we’re not done yet. Now we need to update the CSS Object Model (CSSOM). The CSSOM resides within document.stylesheets, we need to update it so that it represents everything that has been parsed and computed up to this point.

    Web developers may utilize this information without even realizing it. For example, when calling into getComputedStyle(), the same process denoted above is run, if necessary.


    Now that we have a DOM tree with styles applied, it’s time to begin the process of building up a tree for visual purposes. This tree is present in all modern engines and is referred to as the box tree. In order to construct this tree, we traverse down the DOM tree and create zero or more CSS boxes, each having a margin, border, padding and content box.

    In this section, we’ll be discussing the following CSS layout concepts:

    • Formatting context (FC): there are many types of formatting contexts, most of which web developers invoke by changing the display value for an element. Some of the most common formatting contexts are block (block formatting context, or BFC), flex, grid, table-cells, and inline. Some other CSS can force a new formatting context, too, such as position: absolute, using float, or utilizing multi-column.
    • Containing block: this is the ancestor block that you resolve styles against.
    • Inline direction: this is the direction in which text is laid out, as dictated by the element’s writing mode. In Latin-based languages this is the horizontal axis, and in CJK languages this is the vertical axis.
    • Block direction: this behaves exactly the same as the inline direction but is perpendicular to that axis. So, for Latin-based languages this is the vertical axis, and in CJK languages this is the horizontal axis.

    Resolving auto

    Remember from the computation phase that dimension values can be one of three values: auto, percentage, or pixel. The purpose of layout is to size and position all the boxes in the box tree to get them ready for painting. As a very visual person myself, I find examples can make it easier to understand how the box tree is constructed. To make it easier to follow, I will not be showing the individual CSS boxes, just the principal box. Let’s look at a basic “Hello world” layout using the following code:

    <p>Hello world</p>
    	body {
    		width: 50px;
    Diagram showing an HTML body, a CSS box, and a property of width with a value of 50 pixels
    The browser starts at the body element. We produce its principal box, which has a width of 50px, and a default height of auto.
    Diagram showing a tree with a CSS box for the body and a CSS box for a paragraph
    Now the browser moves on to the paragraph and produces its principal box, and since paragraphs have a margin by default, this will impact the height of the body, as reflected in the visual.
    Diagram showing a tree with a CSS box for the body and a CSS box for a paragraph, and now a line box appended to the end
    Now the browser moves onto the text of “Hello world,” which is a text node in the DOM. As such, we produce a line box inside of the layout. Notice that the text has overflowed the body. We’ll handle this in the next step.
    Diagram showing a tree with a CSS box for the body and a CSS box for a paragraph, and now a line box appended to the end, which has an arrow pointing back to the paragraph CSS box
    Because “world” does not fit and we haven’t changed the overflow property from its default, the engine reports back to its parent where it left off in laying out the text.
    Diagram showing a tree with a CSS box for the body and a CSS box for a paragraph, and now two line boxes appended to the end
    Since the parent has received a token that its child wasn’t able to complete the layout of all the content, it clones the line box, which includes all the styles, and passes the information for that box to complete the layout. Once the layout is complete, the browser walks back up the box tree, resolving any auto or percentage-based values that haven’t been resolved. In the image, you can see that the body and the paragraph is now encompassing all of “Hello world” because its height was set to auto.

    Dealing with floats

    Now let’s get a little bit more complex. We’ll take a normal layout where we have a button that says “Share It,” and float it to the left of a paragraph of Latin text. The float itself is what is considered to be a “shrink-to-fit” context. The reason it is referred to as “shrink-to-fit” is because the box will shrink down around its content if the dimensions are auto. Float boxes are one type of box that matches this layout type, but there are many other boxes, such as absolute positioned boxes (including position: fixed elements) and table cells with auto-based sizing, for example.

    Here is the code for our button scenario:

    	<button>SHARE IT</button>
    	<p>Lorem ipsum dolor sit amet, consectetur adipiscing elit. Nullam pellentesq</p>
    	article {
    		min-width: 400px;
    		max-width: 800px;
    		background: rgb(191, 191, 191);
    		padding: 5px;
    	button {
    		float: left;
    		background: rgb(210, 32, 79);
    		padding: 3px 10px;
    		border: 2px solid black;
    		margin: 5px;
    	p {
    		margin: 0;
    Diagram of a box tree with a CSS box for an article, a CSS box for a button floated left, and a line box
    The process starts off by following the same pattern as our “Hello world” example, so I’m going to skip to where we begin handling the floated button.
    Diagram of a box tree with a CSS box and a line box that calculates the maximum and minimum width for the button
    Since a float creates a new block formatting context (BFC) and is a shrink-to-fit context, the browser does a specific type of layout called content measure. In this mode, it looks identical to the other layout but with an important difference, which is that it is done in infinite space. What the browser does during this phase is lay out the tree of the BFC in both its largest and smallest widths. In this case, it is laying out a button with text, so its narrowest size, including all other CSS boxes, will be the size of the longest word. At its widest, it will be all of the text on one line, with the addition of the CSS boxes. Note: The color of the buttons here is not literal. It is for illustrative purposes only.
    Diagram of a box tree with a CSS box for an article, a CSS box for a button floated left, and a line box, with the CSS box for the button now communicating the min and max width back up to the CSS box for the article
    Now that we know that the minimum width is 86px, and the maximum width is 115px, we pass this information back to the parent box for it to decide the width and to place the button appropriately. In this scenario, there is space to fit the float at max size so that is how the button is laid out.
    Diagram of a box tree with a CSS box for an article with two branches: a CSS box for a button floated left and a CSS box for a paragraph. The CSS box for the article is communicating the min and max width for the button to the paragraph.
    In order to ensure that the browser adheres to the standard and the content wraps around the float, the browser changes the geometry of the article BFC. This geometry is passed to the paragraph to use during its layout.
    Diagram of a box tree with a CSS box for an article with two branches: a CSS box for a button floated left and a CSS box for a paragraph. The paragraph has not been parsed yet and is on one line overflowing the parent container.
    From here the browser follows the same layout process as it did in our first example—but it ensures that any inline content’s inline and block starting positions are outside of the constraint space taken up by the float.
    Diagram of a box tree with a CSS box for an article with two branches: a CSS box for a button floated left and a CSS box for a paragraph. The paragraph has now been parsed and broken into four lines, and there are four line boxes in the diagram to show this.
    As the browser continues walking down the tree and cloning nodes, it moves past the block position of the constraint space. This allows the final line of text (as well as the one before it) to begin at the start of the content box in the inline direction. And then the browser walks back up the tree, resolving auto and percentage values as necessary.

    Understanding fragmentation

    One final aspect to touch on for how layout works is fragmentation. If you’ve ever printed a web page or used CSS Multi-column, then you’ve taken advantage of fragmentation. Fragmentation is the logic of breaking content apart to fit it into a different geometry. Let’s take a look at the same example utilizing CSS Multi-column:

    		<p>Lorem ipsum dolor sit amet, consectetur adipiscing elit. Cras nibh orci, tincidunt eget enim et, pellentesque condimentum risus. Aenean sollicitudin risus velit, quis tempor leo malesuada vel. Donec consequat aliquet mauris. Vestibulum ante ipsum primis in faucibus
    	body {
    		columns: 2;
    		column-fill: auto;
    		height: 300px;
    Diagram of a box tree showing a CSS box for a body and a multicol box for a div
    Once the browser reaches the multicol formatting context box, it sees that it has a set number of columns.
    Diagram of a box tree showing a CSS box for a body and a multicol box for a div, now with a fragmentainer CSS box created under the div
    It follows the similar cloning model from before, and creates a fragmentainer with the correct dimensions to adhere to the authors desire for their columns.
    Diagram of a box tree showing a CSS box for a body and a multicol box for a div, now with a CSS box for each column and a line box for each line within each column
    The browser then lays out as many lines as possible by following the same pattern as before. Then the browser creates another fragmentainer and continues the layout to completion.


    OK, so let’s recap where we’re at to this point. We’ve taken out all the CSS content, parsed it, cascaded it onto the DOM tree, and completed layout. But we haven’t applied color, borders, shadows, and similar design treatments to the layout–adding these is known as painting.

    Painting is roughly standardized by CSS, and to put it concisely (you can read the full breakdown in CSS 2.2 Appendix E), you paint in the following order:

    • background;
    • border;
    • and content.

    So if we take our “SHARE IT” button from earlier and follow this process, it will look something like this:

    Graphic showing progressive passes of a box: first the background, then the border, the the content

    Once this is completed, it is converted to a bitmap. That’s right—ultimately every layout element (even text) becomes an image under the hood.

    Concerning the z-index

    Now, most of our websites don’t consist of a single element. Moreover, we often want to have certain elements appear on top of other elements. To accomplish this, we can harness the power of the z-index to superimpose one element over another. This may feel like how we work with layers in our design software, but the only layers that exist are within the browser’s compositor. It might seem as though we’re creating new layers using z-index, but we’re not—so what are we doing?

    What we’re doing is creating a new stacking context. Creating a new stacking context effectively changes the order in which you paint elements. Let’s look at an example:

    <div id="one">
    	Item 1
    <div id="two">
    	Item 2
    body {
    	background: lightgray;
    div {
    	width: 300px;
    	height: 300px;
    	position: absolute;
    	background: white;
    	z-index: 2;
    #two {
    	background: green;
    	z-index: 1;

    Without z-index utilization, the document above would be painted in document order, which would place “Item 2” on top of “Item 1.” But because of the z-index, the painting order is changed. Let’s step through each phase, similar to how we stepped through our earlier layouts.

    Diagram of a box tree with a basic layout representing a root stacking context. One box has a z-index of one, another box has a z-index of 2.
    The browser starts with the root box; we paint in the background.
    The same layout, but the box with the z-index of 1 is now rendering.
    The browser then traverses, out of document order to the lower level stacking context (which in this case is “Item 2”) and begins to paint that element following the same rules from above.
    The same layout, but the box with the z-index of 2 is now rendering on top of the previous box
    Then it traverses to the next highest stacking context (which in this case is “Item 1”) and paints it according to the order defined in CSS 2.2.

    The z-index has no bearing on color, just which element is visible to users, and hence, which text and color is visible.


    At this stage, we have a minimum of a single bitmap that is passed from painting to the compositor. The compositor’s job is to create a layer, or layers, and render the bitmap(s) to the screen for the end user to see.

    A reasonable question to ask at this point is, “Why would any site need more than one bitmap or compositor layer?” Well, with the examples that we’ve looked at thus far, we really wouldn’t. But let’s look at an example that’s a little bit more complex. Let’s say that in a hypothetical world, the Office team wants to bring Clippy back online, and they want to draw attention to Clippy by having him pulsate via a CSS transform.

    The code for animating Clippy could look something like this:

    <div class="clippy"></div>
    .clippy {
    	width: 100px;
    	height: 100px;
    	animation: pulse 1s infinite;
    	background: url(clippy.svg);
    @keyframes pulse {
    	from {
    		transform: scale(1, 1);
    	to {
    		transform: scale(2, 2);

    When the browser reads that the web developer wants to animate Clippy on infinite loop, it has two options:

    • It can go back to the repaint stage for every frame of the animation, and produce a new bitmap to send back to the compositor.
    • Or it can produce two different bitmaps, and allow the compositor to do the animation itself on only the layer that has this animation applied.

    In most circumstances, the browser will choose option two and produce the following (I have purposefully simplified the amount of layers Word Online would produce for this example):

    Diagram showing a root composite layer with Clippy on his own layer

    Then it will re-compose the Clippy bitmap in the correct position and handle the pulsating animation. This is a great win for performance as in many engines the compositor is on its own thread, and this allows the main thread to be unblocked. If the browser were to choose option one above, it would have to block on every frame to accomplish the same result, which would negatively impact performance and responsiveness for the end user.

    A diagram showing a layout with Clippy, with a chart of the process of rendering. The Compose step is looping.

    Creating the illusion of interactivity

    As we’ve just learned, we took all the styles and the DOM, and produced an image that we rendered to the end user. So how does the browser create the illusion of interactivity? Welp, as I’m sure you’ve now learned, so let’s take a look at an example using our handy “SHARE IT” button as an analogy:

    button {
        float: left;
        background: rgb(210, 32, 79);
        padding: 3px 10px;
        border: 2px solid black;
    button:hover {
        background: teal;
        color: black;

    All we’ve added here is a pseudo-class that tells the browser to change the button’s background and text color when the user hovers over the button. This begs the question, how does the browser handle this?

    The browser constantly tracks a variety of inputs, and while those inputs are moving it goes through a process called hit testing. For this example, the process looks like this:

    A diagram showing the process for hit testing. The process is detailed below.
    1. The user moves the mouse over the button.
    2. The browser fires an event that the mouse has been moved and goes into the hit testing algorithm, which essentially asks the question, “What box(es) is the mouse touching?”
    3. The algorithm returns the box that is linked to our “SHARE IT” button.
    4. The browser asks the question, “Is there anything I should do since a mouse is hovering over you?”
    5. It quickly runs style/cascade for this box and its children and determines that, yes, there is a :hover pseudo-class with paint-only style adjustments inside of the declaration block.
    6. It hangs those styles off of the DOM element (as we learned in the cascade phase), which is the button in this case.
    7. It skips past layout and goes directly to painting a new bitmap.
    8. The new bitmap is passed off to the compositor and then to the user.

    To the user, this effectively creates the perception of interactivity, even though the browser is just swapping an orange image to a green one.

    Et voilà!

    Hopefully this has removed some of the mystery from how CSS goes from the braces you’ve written to rendered pixels in your browser.

    In this leg of our journey, we discussed how CSS is parsed, how values are computed, and how the cascade actually works. Then we dove into a discussion of layout, painting, and composition.

    Now stay tuned for the final installment of this series, where one of the designers of the JavaScript language itself will discuss how browsers compile and execute our JavaScript.

  • Tags to DOM

    In our previous segment, “Server to Client,” we saw how a URL is requested from a server and learned all about the many conditions and caches that help optimize delivery of the associated resource. Once the browser engine finally gets the resource, it needs to start turning it into a rendered web page. In this segment, we focus primarily on HTML resources, and how the tags of HTML are transformed into the building blocks for what will eventually be presented on screen.

    To use a construction metaphor, we’ve drafted the blueprints, acquired all the permits, and collected all the raw materials at the construction site; it’s time to start building!


    Once content gets from the server to the client through the networking system, its first stop is the HTML parser, which is composed of a few systems working together: encoding, pre-parsing, tokenization, and tree construction. The parser is the part of the construction project metaphor where we walk through all the raw materials: unpacking boxes; unbinding pallets, pipes, wiring, etc.; and pouring the foundation before handing off everything to the experts working on the framing, plumbing, electrical, etc.


    The payload of an HTTP response body can be anything from HTML text to image data. The first job of the parser is to figure out how to interpret the bits just received from the server. Assuming we’re processing an HTML document, the decoder must figure out how the text document was translated into bits in order to reverse the process.

    Binary-to-text representation
    Characters D O M
    ASCII Values 68 79 77
    Binary Values 01000100 01001111 01001101
    Bits 8 8 8

    (Remember that ultimately even text must be translated to binary in the computer. Encoding—in this case ASCII encoding—defines that a binary value such as “01000100” means the letter “D,” as shown in the figure above.) Many possible encodings exist for text—it’s the browser’s job to figure out how to properly decode the text. The server should provide hints via Content-Type headers, and the leading bits themselves can be analyzed (for a byte order mark, or BOM). If the encoding still cannot be determined, the browser can apply its best guess based on heuristics. Sometimes the only definitive answer comes from the (encoded) content itself in the form of a <meta> html tag. Worst case scenario, the browser makes an educated guess and then later finds a contradicting <meta> tag after parsing has started in earnest. In these rare cases, the parser must restart, throwing away the previously decoded content. Browsers sometimes have to deal with old web content (using legacy encodings), and a lot of these systems are in place to support that.

    When saving your HTML documents for the web today, the choice is clear: use UTF-8 encoding. Why? It nicely supports the full Unicode range of characters, has good compatibility with ASCII for single-byte characters common to languages like CSS, HTML, and JavaScript, and is likely to be the browser’s fallback default. You can tell when encoding goes wrong, because text won’t render properly (you will tend to get garbage characters or boxes where legible text is usually visible).


    Once the encoding is known, the parser starts an initial pre-parsing step to scan the content with the goal of minimizing round-trip latency for additional resources. The pre-parser is not a full parser; for example, it doesn’t understand nesting levels or parent/child relationships in HTML. However, the pre-parser does recognize specific HTML tag names and attributes, as well as URLs. For example, if you have an <img src="https://somewhere.example.com/​images/​dog.png" alt=""> somewhere in your HTML content, the pre-parser will notice the src attribute, and queue a resource request for the dog picture via the networking system. The dog image is requested as quickly as possible, minimizing the time you need to wait for it to arrive from the network. The pre-parser may also notice certain explicit requests in the HTML such as preload and prefetch directives, and queue these up for processing as well.


    Tokenization is the first half of parsing HTML. It involves turning the markup into individual tokens such as “begin tag,” “end tag,” “text run,” “comment,” and so forth, which are fed into the next state of the parser. The tokenizer is a state machine that transitions between the different states of the HTML language, such as “in tag open state” (<|video controls>), “in attribute name state” (<video con|trols>), and “after attribute name state” (<video controls|>), doing so iteratively as each character in the HTML markup text document is read.

    (In each of those example tags, the vertical pipe illustrates the tokenizer’s position.)

    Diagram showing HTML tags being run through a tokenizer to create tokens

    The HTML spec (see “12.2.5 Tokenization”) currently defines eighty separate states for the tokenizer. The tokenizer and parser are very adaptable: both can handle and convert any text content into an HTML document—even if code in the text is not valid HTML. Resiliency like this is one of the features that has made the web so approachable by developers of all skill levels. However, the drawback of the tokenizer and parser’s resilience is that you may not always get the results you expect, which can lead to some subtle programming bugs. (Checking your code in the HTML validator can help you avoid bugs like this.)

    For those who prefer a more black-and-white approach to markup language correctness, browsers have an alternate parsing mechanism built in that treats any failure as a catastrophic failure (meaning any failure will cause the content to not render). This parsing mode uses the rules of XML to process HTML, and can be enabled by sending the document to the browser with the “application/xhtml+xml” MIME type (or any XML-based MIME type that uses elements in the HTML namespace).

    Browsers may combine the pre-parser and tokenization steps together as an optimization.

    Parsing/tree construction

    The browser needs an internal (in-memory) representation of a web page, and, in the DOM standard, web standards define exactly what shape that representation should be. The parser’s responsibility is to take the tokens created by the tokenizer in the previous step, and create and insert the objects into the Document Object Model (DOM) in the appropriate way (specifically using the twenty-three separate states of its state machine; see “ The rules for parsing tokens in HTML content”). The DOM is organized into a tree data structure, so this process is sometimes referred to as tree construction. (As an aside, Internet Explorer did not use a tree structure for much of its history.)

    Diagram showing tokens being turned into the DOM

    HTML parsing is complicated by the variety of error-handling cases that ensure that legacy HTML content on the web continues to have compatible structure in today’s modern browsers. For example, many HTML tags have implied end tags, meaning that if you don’t provide them, the browser auto-closes the matching tag for you. Consider, for instance, this HTML:

    <p>sincerely<p>The authors</p>

    The parser has a rule that will create an implied end tag for the paragraph, like so:

    <p>sincerely</p><p>The authors</p>

    This ensures the two paragraph objects in the resulting tree are siblings, as opposed to one paragraph object by ignoring the second open tag. HTML tables are perhaps the most complicated where the parser’s rules attempt to ensure that tables have the proper structure.

    Despite all the complicated parsing rules, once the DOM tree is created, all of the parsing rules that try to create a “correct” HTML structure are no longer enforced. Using JavaScript, a web page can rearrange the DOM tree in almost any way it likes, even if it doesn’t make sense! (For example, adding a table cell as the child of a <video> tag). The rendering system becomes responsible for figuring out how to deal with any weird inconsistencies like that.

    Another complicating factor in HTML parsing is that JavaScript can add more content to be parsed while the parser is in the middle of doing its job. <script> tags contain text that the parser must collect and then send to a scripting engine for evaluation. While the script engine parses and evaluates the script text, the parser waits. If the script evaluation includes invoking the document.write API, a second instance of the HTML parser must start running (reentrantly). To quickly revisit our construction metaphor, <script> and document.write require stopping all in-progress work to go back to the store to get some additional materials that we hadn’t realized we needed. While we’re away at the store, all progress on the construction is stalled.

    All of these complications make writing a compliant HTML parser a non-trivial undertaking.


    When the parser finishes, it announces its completion via an event called DOMContentLoaded. Events are the broadcast system built into the browser that JavaScript can listen and respond to. In our construction metaphor, events are the reports that various workers bring to the foreman when they encounter a problem or finish a task. Like DOMContentLoaded, there are a variety of events that signal significant state changes in the web page such as load (meaning parsing is done, and all the resources requested by the parser, like images, CSS, video, etc., have been downloaded) and unload (meaning the web page is about to be closed). Many events are specific to user input, such as the user touching the screen (pointerdown, pointerup, and others), using a mouse (mouseover, mousemove, and others), or typing on the keyboard (keydown, keyup, and keypress).

    The browser creates an event object in the DOM, packs it full of useful state information (such as the location of the touch on the screen, the key on the keyboard that was pressed, and so on), and “fires” that event. Any JavaScript code that happens to be listening for that event is then run and provided with the event object.

    The tree structure of the DOM makes it convenient to “filter” how frequently code responds to an event by allowing events to be listened for at any level in the tree (i.e.., at the root of the tree, in the leaves of the tree, or anywhere in between). The browser first determines where to fire the event in the tree (meaning which DOM object, such as a specific <input> control), and then calculates a route for the event starting from the root of the tree, then down each branch until it reaches the target (the <input> for example), and then back along the same path to the root. Each object along the route then has its event listeners triggered, so that listeners at the root of the tree will “see” more events than specific listeners at the leaves of the tree.

    Diagram showing a route being calculated for an event, and then event listeners being called

    Some events can also be canceled, which provides, for example, the ability to stop a form submission if the form isn’t filled out properly. (A submit event is fired from a <form> element, and a JavaScript listener can check the form and optionally cancel the event if fields are empty or invalid.)


    The HTML language provides a rich feature set that extends far beyond the markup that the parser processes. The parser builds the structure of which elements contain other elements and what state those elements have initially (their attributes). The combination of the structure and state is enough to provide both a basic rendering and some interactivity (such as through built-in controls like <textarea>, <video>, <button>, etc.). But without the addition of CSS and JavaScript, the web would be very boring (and static). The DOM provides an additional layer of functionality both to the elements of HTML and to other objects that are not related to HTML at all.

    In the construction metaphor, the parser has assembled the final building—all the walls, doors, floors, and ceilings are installed, and the plumbing, electrical, gas, and such, are ready. You can open the doors and windows, and turn the lights on and off, but the structure is otherwise quite plain. CSS provides the interior details—color on the walls and baseboards, for example. (We’ll get to CSS in the next installment.) JavaScript enables access to the DOM—all the furniture and appliances inside, as well as the services outside the building, such as the mailbox, storage shed and tools, solar panels, water well, etc. We describe the “furniture” and outside “services” next.

    Element interfaces

    As the parser is constructing objects to put into the tree, it looks up the element’s name (and namespace) and finds a matching HTML interface to wrap around the object.

    Interfaces add features to basic HTML elements that are specific to their kind or type of element. Some generic features include:

    • access to HTML collections representing all or a subset of the element’s children;
    • the ability to search the element’s attributes, children, and parent elements;
    • and importantly, ways to create new elements (without using the parser), and attach them to (or detach them from) the tree.

    For specific elements like <table>, the interface contains additional table-specific features for locating all the rows, columns, and cells within the table, as well as shortcuts for removing and adding rows and cells from and to the table. Likewise, <canvas> interfaces have features for drawing lines, shapes, text, and images. JavaScript is required to use these APIs—they are not available using HTML markup alone.

    Any DOM changes made to the tree via the APIs described above (such as the hierarchical position of an element in the tree, the element’s state by toggling an attribute name or value, or any of the API actions from an element’s interface) after parsing ends will trigger a chain-reaction of browser systems whose job is to analyze the change and update what you see on the screen as soon as possible. The tree maintains many optimizations for making these repeated updates fast and efficient, such as:

    • representing common element names and attributes via a number (using hash tables for fast identification);
    • collection caches that remember an element’s frequently-visited children (for fast child-element iteration);
    • and sub-tree change-tracking to minimize what parts of the whole tree get “dirty” (and will need to be re-validated).

    Other APIs

    The HTML elements and their interfaces in the DOM are the browser’s only mechanism for showing content on the screen. CSS can affect layout, but only for content that exists in HTML elements. Ultimately, if you want to see content on screen, it must be done through HTML interfaces that are part of the tree.” (For those wondering about Scalable Vector Graphics (SVG) and MathML languages—those elements must also be added to the tree to be seen—I’ve skipped them for brevity.)

    We learned how the parser is one way of getting HTML from the server into the DOM tree, and how element interfaces in the DOM can be used to add, remove, and modify that tree after the fact. Yet, the browser’s programmable DOM is quite vast and not scoped to just HTML element interfaces.

    The scope of the browser’s DOM is comparable to the set of features that apps can use in any operating system. Things like (but not limited to):

    • access to storage systems (databases, key/value storage, network cache storage);
    • devices (geolocation, proximity and orientation sensors of various types, USB, MIDI, Bluetooth, Gamepads);
    • the network (HTTP exchanges, bidirectional server sockets, real-time media streaming);
    • graphics (2D and 3D graphics primitives, shaders, virtual and augmented reality);
    • and multithreading (shared and dedicated execution environments with rich message passing capabilities).

    The capabilities exposed by the DOM continue to grow as new web standards are developed and implemented by major browser engines. Most of these “extra” APIs of the DOM are out of scope for this article, however.

    Moving on from markup

    In this segment, you’ve learned how parsing and tree construction create the foundation for the DOM: the stateful, in-memory representation of the HTML tags received from the network.

    With the DOM model in place, services such as the event model and element APIs enable web developers to change the DOM structure at any time. Each change begins a sequence of “re-building” work of which updating the DOM is only the first step.

    Going back to the construction analogy, the on-site raw materials have been formed into the structural framing of the building and built to the right dimensions with internal plumbing, electrical, and other services installed, but with no real sense yet of the building’s final look—its exterior and interior design.

    In the next installment, we’ll cover how the browser takes the DOM tree as input to a layout engine that incorporates CSS and transforms the tree into something you can finally see on the screen.

  • From URL to Interactive

    Imagine, if you will, that you’re behind the wheel of a gorgeous 1957 Chevy Bel Air convertible, making your way across the desert on a wide open highway. The sun is setting, so you’ve got the top down, naturally. The breeze caresses your cheek like a warm hand as your nose catches a faint whiff of … What was that?

    The car lurches and chokes before losing all power. You coast, ever more slowly, to a stop. There’s steam rising from the hood. Oh jeez. What the heck just happened?

    You reach down to pop the hood, and open the door. Getting out, you make your way around to the front of the car. As you release the latch and lift the bonnet, you get blasted in the face with even more steam. You hope it’s just water.

    Looking around, it’s clear the engine has overheated, but you have no idea what you’re looking at. Back home you’ve got a guy who’s amazing with these old engines, but you fell in love with the luxurious curves, the fins, the plush interior, the allure of the open road.

    A tumbleweed rolls by. In the distance a buzzard screeches.

    What’s happening under the hood?

    Years ago, my colleague Molly Holzschlag used a variant of this story to explain the importance of understanding our tools. When it comes to complex machines like cars, knowing how they work can really get you out of a jam when things go wrong. Fail to understand how they work and you could end up, well, buzzard food.

    At the time, Molly and I were trying to convince folks that learning HTML, CSS, and JavaScript was more important than learning Dreamweaver. Like many similar tools, Dreamweaver allowed you to focus on the look and feel of a website without needing to burden yourself with knowing how the HTML, CSS, and JavaScript it produced actually worked. This analogy still applies today, though perhaps more so to frameworks than WYSIWYG design tools.

    If you think about it, our whole industry depends on our faith in a handful of “black boxes” few of us fully understand: browsers. We hand over our HTML, CSS, JavaScript, images, etc., and then cross our fingers and hope they render the experience we have in our heads. But how do browsers do what they do? How do they take our users from a URL to a fully-rendered and interactive page?

    To get from URL to interactive, we’ve assembled a handful of incredibly knowledgeable authors to act as our guides. This journey will take place in four distinct legs, delivered over the course of a few weeks. Each will provide you with details that will help you do your job better.

    Leg 1: Server to Client

    Ali Alabbas understands the ins and outs of networking, and he kicks off this journey with a discussion of how our code gets to the browser in the first place. He discusses how server connections are made, caching, and how Service Workers factor into the request and response process. He also discusses the “origin model” and how to improve performance using HTTP2, Client Hints, and more. Understanding this aspect of how browsers work will undoubtedly help you make your pages download more quickly.

    Read the article

    Leg 2: tags to DOM

    In the second installment, Travis Leithead—a former editor of the W3C’s HTML spec—takes us through the process of parsing HTML. He covers how browsers create trees (like the DOM tree) and how those trees become element collections you can access via JavaScript. And speaking of JavaScript, he’ll even get into how the DOM responds to manipulation and to events, including touch and click. Armed with this information, you’ll be able to make smarter decisions about how and when you touch the DOM, how to reduce Time To Interactive (TTI), and how to eliminate unintended reflows.

    Read the article

    Leg 3: braces to pixels

    Greg Whitworth has spent much of his career in the weeds of browsers’ CSS mechanics, and he’s here to tell us how they do what they do. He explains how CSS is parsed, how values are computed, and how the cascade actually works. Then he dives into a discussion of layout, painting, and composition. He wraps things up with details concerning how hit testing and input are managed. Understanding how CSS works under the hood is critical to building resilient, performant, and beautiful websites.

    Read the article

    Leg 4: var to JIT

    One of JavaScript’s language designers, Kevin Smith, joins us for the final installment in this series to discuss how browsers compile and execute our JavaScript. For instance, what do browsers do when tearing down a page when users navigate away? How do they optimize the JavaScript we write to make it run even faster? He also tackles topics like writing code that works in multiple threads using workers. Understanding the inner processes browsers use to optimize and run your JavaScript can help you write code that is more efficient in terms of both performance and memory consumption.

    Read the article

    Let’s get going

    I sincerely hope you’ll join us on this trip across the web and into the often foggy valley where browsers turn code into experience.

Search Engine Watch
Keep updated with major stories about search engine marketing and search engines as published by Search Engine Watch.
Search Engine Watch
ClickZ News
Breaking news, information, and analysis.
CNN.com - RSS Channel - App Tech Section
CNN.com delivers up-to-the-minute news and information on the latest top stories, weather, entertainment, politics and more.
CNN.com - RSS Channel - App Tech Section

Ако решите, че "как се прави сайт" ръководството може да бъде полезно и за други хора, моля гласувайте за сайта:

+добави в любими.ком Елате в .: BGtop.net :. Топ класацията на българските сайтове и гласувайте за този сайт!!!

Ако желаете да оставите коментар към статията, трябва да се регистрирате.