Caching html: Mobile vs. Desktop
The HTTPArchive is an awesome resource for website statistics. Every 2 weeks, the top 1M desktop and 5,000 mobile sites are tested with webpagetest.org and placed into a database. Using Google BigQuery, one can extract data, and gain insight into how websites work in the real world. Earlier this week, Ilya Grigorik posted on the cache control policy of html documents (how many sites cache the html page).
This got me wondering, how do sites treat their mobile caching differently from their desktop caching? I ran a few tests and found that the overall results of mobile to desktop were similar, but that there were some values missing. Certainly, there would be outlying sites that treat mobile significantly differently than desktop. But how many? And how? So, I mashed up Ilya’s query with GuyPo’s (from his m. study) to grab the cache control headers and max-age from the first html on desktop and mobile and then compare them:
SELECT dData.url, dData.age, mData.age, dData.resp_cache_control, mData.resp_cache_control//COUNT(dData.age) as web_count, COUNT(mData.age) as Mobile_countFROM (SELECT pages.pageid as pid,url,urlhash,wptid,fHtml,fReq,fStatus,loc, age, resp_cache_control FROM [httparchive:runs.latest_pages] as pages JOIN (select pageid, MAX(firstHtml) as fHtml,MAX(firstReq) as fReq,MAX(status) fStatus, MAX(resp_location) as loc, INTEGER(REGEXP_EXTRACT(resp_cache_control, r’max-age=(d+)’)) age, resp_cache_control from [httparchive:runs.latest_requests] WHERE firstHtml = true AND status =200 group by resp_cache_control,age, pageid) as reqs ON (reqs.pageid = pages.pageid) )as dData JOIN (SELECT pages.pageid as pid,url,wptid,fHtml,fReq,fStatus,loc , age, resp_cache_control FROM [httparchive:runs.latest_pages_mobile] as pages JOIN (select pageid, MAX(firstHtml) as fHtml,MAX(firstReq) as fReq,MAX(status) fStatus, MAX(resp_location) as loc, INTEGER(REGEXP_EXTRACT(resp_cache_control, r’max-age=(d+)’)) age,resp_cache_control from [httparchive:runs.latest_requests_mobile] WHERE firstHtml = true AND status =200 group by resp_cache_control,age,pageid) as reqs ON (reqs.pageid = pages.pageid ) ) as mData ON mData.url=dData.url where mData.url=dData.url AND dData.age!=mData.age Group By dData.url, dData.age, mData.age, dData.resp_cache_control, mData.resp_cache_control //having web_count >20 //order by dData.age asc
Of 4672 sites that match in the 2 databases, 3645 (78%) have the same cache control response header. While 996 (21%) have the same Max-age (mData.age = dData.age). I then changed the “where” parameter to further breakdown the sites into various categories.
What I am interested in are the sites that are outside the norm. There are certainly legitimate reasons to cache longer (or shorter) on a mobile device compared to the desktop. So, let’s look into the 1027 (22%) sites that are doing caching differently mobile vs. desktop:
Table 1: Breakdown of sites with different cache control headers for Mobile and Desktop.
Let’s look through these one by one (ignoring headers with the same Max-age – because that sounds kind of boring):
322 have different cache headers but no Max-age values. Of these:
Table 2 Breakdown of different cache control headers with no max-age values for mobile or desktop.
The first 2 lines in Table 2 show sites that have cache control headers for only mobile (69) or only desktop (85), but not the other version (that’s 3.3% of all sites). A large number (86) are different by only a few characters, and glancing at the results – they are generally missing commas between parameters. Then there are the other 81 (1.5%) sites that have cache control headers that are longer for either mobile or desktop due to more parameters being added for one or the other.
Table 3: When Cache Control Max ages differ
In table 3, the top line and bottom line show 2 extremes where the cache directives differ by over 15 minutes one way or another. 1.1% of websites studied suffer from this. Another 1.6% of sites have cache headers that are over 2 minutes (but less than 15 minutes) different.
Tables 4 and 5 Sites with Max-age values for only mobile or desktop, broken down by available Max-age.
In Tables 4 and 5, we see a breakdown of mobile Max-ages when there is a mobile Max-age, but no desktop Max-age (And the converse: desktop Max-ages when there is no mobile Max-age). Most are under 5 minutes, but interestingly, there are 51 sites that have Max-ages 5min-1 day different (1.1% of all sites). 25 sites have a Max-age>1 day (while not specifying the other)! That’s 0.54% of all sites studied.
In conclusion, cache control headers and the Max-age for caching can (and probably should) vary for mobile and desktop sites. We see 22% of sites with headers that vary from our sample of 4672 sites. However, there are no real patterns in the data as to identify ideal caching length, and ~10% of sites have cache control headers or Max-age values that are extremely different between their mobile and desktop offerings. This goes to show that once you add cache headers, it is probably good to go back and verify them on a fairly regular basis to ensure that all of the sites you deliver have cache headers that make sense for your site on mobile and desktop.