Rule Based Detection Methods of SSR vs CSR Pages by Plain HTML + RegEx
For anyone building a web crawler at scale, there is a fundamental truth: you are not crawling one internet, but two.
- The Static Internet: This is the web of documents. A server sends you a complete HTML file, rich with text and structure. It's fast, predictable, and perfectly legible to a simple HTTP client like
cURL
or Python'srequests
library. - The Dynamic Internet: This is the web of applications. A server sends you a nearly empty HTML "shell." This shell contains links to large JavaScript files. Only when a browser downloads, parses, and executes this JavaScript does the application come to life—fetching data, rendering content, and building the page you actually see.
At scale, the cost difference between crawling these two internets is monumental. A simple HTTP request is measured in kilobytes of memory and milliseconds of CPU time. Firing up a full headless browser (like Puppeteer or Playwright) to render a dynamic page consumes hundreds of megabytes of RAM and can take seconds of CPU time.
Making the wrong choice is costly. If you use a headless browser on a static page, you've wasted thousands of times the necessary resources. If you use a simple HTTP client on a dynamic page, you get a useless empty shell and miss all your data. The single most important economic and technical decision in a modern crawling pipeline is deciding, per-URL, which internet you are looking at.
This guide dissects a powerful solution: a sophisticated, weighted heuristic engine. It's a system designed to act as an intelligent gatekeeper, analyzing the raw HTML from a cheap initial request to make a highly accurate prediction about whether a costly JS-render is necessary. We will explore every single rule in this engine, explaining its logic, its reliability, and its role in the final verdict.
TL;DR: Executive Summary
- The Problem: Distinguishing between simple "static" pages and complex JavaScript "dynamic" applications is the single most important economic decision in web crawling at scale. Making the wrong choice wastes thousands of times the necessary resources.
- Our Solution: This guide presents a sophisticated, weighted heuristic engine that analyzes raw HTML to predict if a costly browser render is necessary, using 60+ detection rules with positive/negative scoring and intelligent thresholds.
- Your Takeaway: We provide the complete, production-ready Python code for this engine, allowing you to immediately integrate this intelligence into your own crawling pipeline and save massive computational costs.
Our Solution: An Intelligent Heuristic Engine
A simple checklist ("does it contain id='root'
?") is too naive. The modern web is a complex tapestry of frameworks, build tools, and rendering patterns. Our engine embraces this complexity using four core principles:
- Weighted Scoring: Not all clues are equal. An empty
<body>
tag is a "smoking gun" for Client-Side Rendering (CSR) and deserves a high score (+50 points). Anasync
attribute on a script tag is a common performance practice and means very little (+1 point). Every rule is assigned a weight reflecting its predictive power. - Positive and Negative Indicators: The engine doesn't just look for signs of CSR. It also actively looks for definitive signs of Server-Side Rendering (SSR) or Static Site Generation (SSG). These "affirmations" subtract points from the score, providing a crucial counterbalance and preventing false positives.
- Advanced Heuristics: We go beyond simple regex patterns. The engine calculates the text-to-HTML ratio, parses embedded JSON blobs for framework-specific clues (like Next.js
__NEXT_DATA__
), analyzes script weight ratios, and counts module preloads to assess application complexity. - Intelligent Threshold System: The scores are summed up. If the
totalScore
surpasses a definedSCORE_THRESHOLD
(e.g., 30), the page is flagged as requiring a JS crawl. This approach is resilient to a few weak, misleading signals and relies on the collective weight of evidence rather than any single indicator.
The Engine, Deconstructed: An Exhaustive Rule-by-Rule Analysis
Let's dissect the brain of the detector. The rules are organized into tiers, from definitive server-side patterns to the faintest whispers of client-side activity.
I. SSR / SSG / Server-First Affirmations (Negative Score)
These are high-confidence signals that content was rendered on the server. Finding these patterns gives us confidence that a simple HTTP crawl is likely sufficient.
hasVueSSRRoot
(-35): Looks fordata-server-rendered="true"
. This is a definitive attribute added by Vue.js's SSR engine. It's a direct statement that the server did the rendering work.hasFastBootShoebox
(-30): Detects<script type="fastboot/shoebox">
. This is a specific mechanism used by Ember.js's SSR solution, FastBoot, to pass server-fetched data to the client for hydration. Its presence guarantees an SSR context.hasStaticSiteGeneratorArtifacts
(-30): Scans for<meta name="generator" content="...">
with values like Jekyll, Hugo, Eleventy, Astro, Gatsby, or Next.js. This is an explicit declaration of the tool used to build the site, and these are overwhelmingly used for SSG/SSR.hasAngularServerContext
(-25): Finds theng-server-context
attribute. This is added by Angular Universal (its SSR solution) to indicate the content was server-generated.hasSvelteHydrationMarks
(-25): Searches for SvelteKit-specific attributes likedata-svelte-h
ordata-sveltekit-hydrate-start
. These are markers left in the HTML to guide the client-side hydration process, confirming an initial server render.hasReactStreamPlaceholders
(-22): Looks for patterns like<!--$?-->
or<template id="B:0">
. These are artifacts of React's modern streaming SSR, where the server sends a complete shell with "holes" that are filled in as data becomes available. It's a clear sign of a sophisticated server-first architecture.- And others in this category: Rules like
hasAngularTransferState
,hasDeclarativeShadowDOM
, andhasMarkoSSRAttributes
all follow the same principle—they identify specific, unambiguous fingerprints left by various server-rendering technologies. Their presence strongly argues against the need for a JS crawl to get the initial content.
Crawler's Action: When these rules match, the crawler can be highly confident that the HTML contains meaningful content. It should lean towards not escalating to a full browser render unless other powerful CSR indicators are also present.
II. High-Confidence CSR Indicators ("Smoking Guns")
These are the definitive signs of a CSR-first architecture. When these rules hit, a JS crawl is almost certainly mandatory.
hasBotChallengeText
(+55): This is the ultimate signal. It looks for phrases like "checking your browser," "complete the captcha," or "prove you're not a robot." If this text is present, the real content is 100% gated behind a JavaScript-powered security challenge. No JS, no content. Period.hasEmptyBodyCSR
(+50): Detects<body>\s*</body>
. An empty body is the most fundamental sign of a pure SPA. It's an architectural statement that JavaScript is responsible for constructing the entire visible page.hasMinimalAppDiv
(+50): Searches for an empty<div id="app|root|main|container">
. This is the textbook pattern for frameworks like React, Vue, and Angular, which are configured to find this "mount point" and inject the application into it.hasUnprocessedTemplateLiterals
(+45): Finds raw, unprocessed template placeholders like{{ some_variable }}
. If these are visible in the raw HTML, it means the client-side templating engine (like Angular.js, Vue, or Handlebars) has not run yet. The content is incomplete.hasCRANoscript
(+40): Detects the specific<noscript>You need to enable JavaScript to run this app.</noscript>
message provided bycreate-react-app
. This is a direct, developer-written confession that the page is a CSR application.hasClientSideTemplateScripts
(+35): Looks for<script type="text/template">
or similar. This is a classic pattern for shipping HTML templates to the client, which are then rendered and injected into the DOM by JavaScript.
Crawler's Action: A match on any of these rules is a command. The crawler should immediately escalate the URL to the full, JavaScript-enabled rendering queue. Proceeding with the static HTML will result in missing data.
III. Strong CSR & Heavy Hydration Indicators
These patterns signal that even if some HTML is present (SSR/SSG), it is a "lifeless" shell that requires a heavy dose of JavaScript to become interactive and load its full content.
hasBotGates
(+45): Looks for URLs or patterns related to reCAPTCHA, Cloudflare Turnstile, hCaptcha, PerimeterX, etc. Similar to the bot challenge text, this indicates an external, JS-powered security gate that must be passed.hasClientSideRoutingOnClick
(+35): Finds<a onclick="...">
tags used for navigation. This is an anti-pattern for SEO but a strong signal for crawlers. It means navigation is controlled by a custom JS function, not standardhref
links, indicating a client-side router is managing the page.hasFrameworkSpecificStructures
(+30): Detects custom elements like<router-outlet>
(Angular) or<Outlet>
(React Router). These are not standard HTML; they are instructions for a framework's JS router to render a component.embedsReduxOrApolloBlobs
(+25): Findswindow.__REDUX_STATE__
orwindow.__APOLLO_STATE__
. These are large blobs of JSON data embedded in the page. They represent the "initial state" of a complex application, which is fed to the client-side JavaScript to bootstrap the app—a process known as hydration.usesCodeSplittingOrDynamicImports
(+20): Searches for theimport()
syntax. This is the mechanism for code-splitting, where parts of the application's JavaScript are intentionally not loaded until they are needed (e.g., when a user clicks a button to open a dialog). This implies the initial page is incomplete by design.
Crawler's Action: These signals strongly suggest that the initial HTML does not represent the full, interactive state of the page. A JS crawl is required to capture dynamically loaded components, post-hydration content, and the behavior of the application.
IV. Lightweight Interactivity / Progressive Enhancement
This category identifies "islands" architectures or progressive enhancement libraries. The server sends functional HTML, but JavaScript is required to make it interactive.
hasAlpineDirectives
(+18): Looks for Alpine.js attributes likex-data
,@click
, orx-show
.hasWPInteractivityDirectives
(+18): Looks for the new WordPress Interactivity API attributes likedata-wp-interactive
.hasIsLandIslands
(+20): Detects the<is-land>
custom element from the Eleventy Islands library.
Crawler's Action: This is a gray area. If the goal is just to extract the visible text, a simple crawl might suffice. However, if the goal is to interact with the page (e.g., open dropdowns, click tabs to reveal content), a JS crawl is necessary. These signals warrant a JS crawl for any task beyond simple text extraction.
V. Supporting & Behavioral CSR Indicators
These are weaker signals that, on their own, are not definitive but add corroborating evidence to the overall score.
hasModernBuildToolArtifactsScriptJS
(+15): Finds script tags with hashed filenames, likemain.a4b8c1f9.js
. This is a very common output of modern bundlers like Webpack and Vite, which are overwhelmingly used for SPAs.hasLoadingSpinnerMarkup
(+15): Detects class names or IDs likespinner
,loader
, orskeleton
. This implies that the real content is not yet present and is being fetched asynchronously by JavaScript.hasHelmetMarkers
(+10): Findsdata-rh="true"
. This attribute comes from React Helmet, a library used to manage the document<head>
(e.g., the page title) from within JS components. It suggests the page's metadata is dynamic.
VI. Low-Confidence & General Web Patterns (The Long Tail)
This final, massive category includes the dozens of rules from your original comprehensive list. For completeness, they are included in the engine but are assigned very low scores (1-5 points). Why? Because they suffer from one of two problems:
- They are nearly universal: Patterns like
async
/defer
scripts (hasAsyncDeferScripts
), lazy-loading images (hasLazyLoadingAttributes
), or using jQuery (includesJQuery
) are common best practices across static sites, blogs, and complex SPAs. They provide almost no differentiating signal. - They look for JS syntax in HTML: Rules like
usesAjaxOrFetch
orusesReactHooks
are fundamentally flawed for static analysis because this code lives in external.js
files, not the initial HTML document. They are included here as a legacy, but their practical value is near zero.
Crawler's Action: These rules should be treated as tie-breakers at best. A high score built entirely from these weak signals is unreliable. The final decision should always be dominated by the high-confidence rules in Tiers I-III.
Beyond Regex: Computed Heuristics
The most advanced part of the engine goes beyond simple patterns and computes signals based on the overall structure of the document.
- Low HTML Length / Low Text Content: An HTML file under 2KB, or one that contains fewer than 100 characters of actual text after stripping all tags, is highly likely to be an empty application shell.
- Parsing JSON Blobs: The engine specifically looks for
__NEXT_DATA__
(Next.js) and__NUXT__
(Nuxt.js) script tags, parses their JSON content, and checks for properties likegssp: true
orserverRendered: true
. This is a far more reliable way to confirm SSR than just finding the script tag itself. - Script Weight Ratio: The engine calculates the ratio of bytes within
<script>
tags to the total HTML size. A high ratio (>40%) indicates the page is primarily JavaScript code rather than content, strongly suggesting a CSR application. - Module Preload/Prefetch Count: Counting
<link rel="modulepreload">
tags reveals the complexity of the dependency graph. More than 5 module preloads typically indicates a sophisticated build system and complex CSR application. - Large
<noscript>
Fallback: If a developer has provided thousands of bytes of content (>8KB) inside<noscript>
tags, it's actually a negative signal. This suggests they've created a high-quality static fallback, meaning a simple crawl might capture the essential content without needing JavaScript execution.
Greatest Hits: Top 20 Detection Rules
The most impactful rules that drive the majority of detection decisions. These are the patterns with the highest absolute scores—both positive (CSR indicators) and negative (SSR affirmations).
Rule Name | Score | Tier | Pattern |
---|---|---|---|
hasBotChallengeText |
+55 | Smoking Gun | Detects "prove you're not a robot", "complete the captcha" |
hasEmptyBodyCSR |
+50 | Smoking Gun | Empty <body> tag: <body>\s*</body> |
hasMinimalAppDiv |
+50 | Smoking Gun | Empty mount point: <div id="app|root">\s*</div> |
hasClientOnlyComponentTags |
+50 | Smoking Gun | <ClientOnly> , <nuxt-client-only> , Blazor InteractiveWebAssembly |
hasBotGates |
+45 | Strong CSR | reCAPTCHA, Cloudflare, hCaptcha, PerimeterX challenges |
hasUnprocessedTemplateLiterals |
+45 | Smoking Gun | Raw template placeholders: {{ variable }} |
hasQwikAttributes |
+40 | Strong CSR | Qwik resumability: q:container , q:base , qrl= |
hasAngularJSUnprocessedDirectives |
+40 | Smoking Gun | Raw AngularJS: ng-app , ng-if , ng-repeat |
hasCRANoscript |
+40 | Smoking Gun | Create React App: "You need to enable JavaScript" |
hasVueSSRRoot |
-35 | SSR Confirmed | Vue SSR: data-server-rendered="true" |
hasClientSideTemplateScripts |
+35 | Smoking Gun | <script type="text/template"> |
hasClientSideRoutingOnClick |
+35 | Strong CSR | Navigation via <a onclick="..."> |
hasBlazorWebAssembly |
+35 | Strong CSR | Blazor WebAssembly runtime |
hasAjaxCrawlableMeta |
+35 | Smoking Gun | Hash-bang routing: <meta name="fragment" content="!"> |
hasRemixManifest |
+35 | Strong CSR | window.__remixManifest |
hasAstroIslands |
+35 | Strong CSR | <astro-island> components |
hasFrameworkSpecificStructures |
+30 | Strong CSR | <router-outlet> , <Outlet> , <app-root> |
hasNextFlightStream |
+30 | Strong CSR | Next.js streaming: self.__next_f.push( |
hasStaticSiteGeneratorArtifacts |
-30 | SSR Confirmed | Generator meta: Jekyll, Hugo, Eleventy, Astro, Gatsby |
hasFastBootShoebox |
-30 | SSR Confirmed | Ember FastBoot: <script type="fastboot/shoebox"> |
Complete Rule Reference Table
The exhaustive table of all detection rules, organized by confidence tier. This serves as the complete reference for understanding every signal the engine analyzes.
Summary Statistics
- Total Rules: 80+ detection patterns
- Negative Score Rules (SSR): 16 rules (-35 to -5 points)
- High-Confidence CSR Rules: 15 rules (35-55 points)
- Strong CSR Indicators: 25+ rules (20-35 points)
- Supporting Indicators: 40+ rules (1-20 points)
Rule Name | Category | Pattern/Logic | Score |
---|---|---|---|
I. SSR / SSG / Server-First Affirmations | NEGATIVE SCORES | Definitive server-side rendering signals | - |
hasVueSSRRoot |
SSR Confirmed | /\bdata-server-rendered=["']?true["']?/i |
-35 |
hasStaticSiteGeneratorArtifacts |
SSR Confirmed | /<meta\s+name=["']generator["']\s+content=["'](?:Jekyll|Hugo|Eleventy|Astro|Gatsby|Next\.js)["']/i |
-30 |
hasFastBootShoebox |
SSR Confirmed | /<script[^>]+type=["']fastboot\/shoebox["'][^>]*>/i |
-30 |
hasAngularServerContext |
SSR Confirmed | /\bng-server-context\b/i |
-25 |
hasSvelteHydrationMarks |
SSR Confirmed | /\bdata-svelte-h=["'][^"']+["']/i |
-25 |
hasMarkoSSRAttributes |
SSR Confirmed | /\bdata-marko=/i |
-25 |
hasReactStreamPlaceholders |
SSR Confirmed | /(<!--\$\??-->|<!--\/\$-->|<template\s+id=["'][A-Z]:\d+)/i |
-22 |
hasAngularTransferState |
SSR Confirmed | /<script[^>]+id=["']ng-state["'][^>]*type=["']application\/json["'][^>]*>/i |
-20 |
hasDeclarativeShadowDOM |
SSR Confirmed | /<template\s+shadowrootmode=["'](?:open|closed)["']/i |
-20 |
hasSolidHydrationMarks |
SSR Confirmed | `/\bdata-hk=["'][^"']+["'] | \bdata-hydration\b/i` |
embedsHypernova |
SSR Confirmed | /<script[^>]+data-hypernova-key=/i |
-20 |
hasVueMetaSSR |
SSR Confirmed | /\bdata-vue-meta-server-rendered\b/i |
-15 |
hasReactSSRAttributes |
SSR Confirmed | /\bdata-reactroot\b|\bdata-reactid\b|<!--\s*react-(?:text|empty):/i |
-15 |
hasVueSSRClientHydration |
SSR Confirmed | /__VUE_SSR__\s*=\s*true/i |
-15 |
hasSvelteHydrationScript |
SSR Confirmed | /svelte-hydrate/i |
-15 |
hasNextJSHydrationMarker |
SSR Confirmed | /<div\s+id=["']__next["']><\/div>/i |
-10 |
hasSSRFrameworkScript |
SSR Confirmed | /<script[^>]*\bsrc=["'][^"']*\b(?:next|gatsby|nuxt|remix)\b[^"']+\.js["'][^>]*><\/script>/i |
-5 |
II. High-Confidence CSR Indicators | ("Smoking Guns") | Definitive client-side rendering signals | High Positive |
hasBotChallengeText |
Smoking Gun | Bot challenge phrases, captcha requirements | +55 |
hasEmptyBodyCSR |
Smoking Gun | /<body>\s*<\/body>/i |
+50 |
hasMinimalAppDiv |
Smoking Gun | /<(div|main)\s+id\s*=\s*["'](?:app|root|main)["']\s*>\s*<\/\1>/i |
+50 |
hasClientOnlyComponentTags |
Smoking Gun | /(?:<nuxt-client-only>|<ClientOnly>|<ion-.*>|@rendermode="InteractiveWebAssembly")/ |
+50 |
hasUnprocessedTemplateLiterals |
Smoking Gun | /\{\{[^}]+\}\}/ |
+45 |
hasAngularJSUnprocessedDirectives |
Smoking Gun | /\bng-(?:app|if|repeat|bind|controller)\b/i |
+40 |
hasCRANoscript |
Smoking Gun | /<noscript>[^<]*You need to enable JavaScript to run this app/i |
+40 |
hasJSXSyntax |
Smoking Gun | /<script[^>]*>[^<]*\bReact\.createElement\b[^<]*<\/script>/i |
+40 |
hasVueRouterScript |
Smoking Gun | /<script[^>]*\bsrc=["'][^"']*vue-router[^"']*\.js["'][^>]*><\/script>/i |
+40 |
hasAngularNgModule |
Smoking Gun | /@NgModule\(\s*\{[^}]*\}\s*\)/i |
+40 |
hasFrontendFrameworks |
Smoking Gun | /<script[^>]*\bsrc=["'][^"']*\b(?:react|vue|angular|svelte)\b[^"']*\.js["'][^>]*><\/script>/i |
+40 |
hasClientSideTemplateScripts |
Smoking Gun | /<script[^>]+type=["'](?:text\/(?:x-)?template|text\/x-handlebars-template|text\/ng-template)["'] |
+35 |
hasAjaxCrawlableMeta |
Smoking Gun | /<meta[^>]+name=["']fragment["'][^>]+content=["']!["'] |
+35 |
hasDynamicComponentLoading |
Smoking Gun | `/React.lazy( | Vue.component(/i` |
III. Strong CSR & Heavy Hydration | Indicators | Strong client-side rendering signals | High Positive |
hasBotGates |
Strong CSR | Bot detection services (reCAPTCHA, Cloudflare, etc.) | +45 |
hasClientSideRoutingOnClick |
Strong CSR | /<a\b[^>]*\bonclick\s*=["'][^"']+["'][^>]*>/i |
+35 |
hasBlazorWebAssembly |
Strong CSR | /\/_framework\/blazor\.(?:server|webassembly)\.js\b/i |
+35 |
hasRemixManifest |
Strong CSR | /window\.__remixManifest/i |
+35 |
hasNextFlightStream |
Strong CSR | /self\.__next_f\.push\(/i |
+30 |
hasFrameworkSpecificStructures |
Strong CSR | /<(router-outlet|Outlet|app-root|vue-app|custom-element)\b[^>]*>/i |
+35 |
hasBundledScript |
2. Scripts and Modules | /<script[^>]*\bsrc\s*=\s*["'][^"']+\.(?:bundle|chunk|main|app|min|vendor)\.[a-z0-9]{0,}\.js["'][^>]*><\/script>/i |
+20 |
hasES6Modules |
2. Scripts and Modules | /<script\s+type=["']module["'][^>]*>[\s\S]*<\/script>/i |
+3 |
hasInlineEventHandlers |
2. Scripts and Modules | /<(?:a|button|div|span)\b[^>]*\s+onclick\s*=["'][^"']+["'][^>]*>/i |
+10 |
hasAsyncDeferScripts |
2. Scripts and Modules | /<script[^>]*\b(?:async|defer)\b[^>]*><\/script>/i |
+2 |
hasInlineJSModules |
2. Scripts and Modules | /<script\s+type=["']module["'][^>]*>[\s\S]*<\/script>/i |
+2 |
hasClientSideRouting |
3. Routing | /<(router-outlet|Outlet|router-link)\b[^>]*>/i |
+45 |
hasClientSideRouting_Hash |
3. Routing | /<a\b[^>]*\bhref\s*=\s*["'][^"']*#\/[^"']*["'][^>]*>/i |
+45 |
hasClientSideRouting_Onclick |
3. Routing | /<a\b[^>]*\bonclick\s*=["'][^"']+["'][^>]*>/i |
+45 |
hasDynamicRouteParameters |
3. Routing | /\/:\w+/i |
+40 |
hasAjaxFetchCalls |
4. AJAX and Fetch | /\b(fetch|XMLHttpRequest|\.ajax)\s*\(/i |
+5 |
hasExternalApiWebSockets |
4. AJAX and Fetch | /new\s+WebSocket\s*\(/i |
+20 |
hasLoadingSpinner |
5. Loading Indicators | /\bloading-spinner\b/i |
+10 |
hasLoadingSpinner_Skeleton |
5. Loading Indicators | /\bskeleton\b/i |
+10 |
hasLazyLoadingAttributes |
5. Loading Indicators | /\bloading=["']lazy["']/i |
+2 |
hasLazyLoadedImages |
5. Loading Indicators | /\blazyload\b/i |
+5 |
hasMetaJsDependencies |
6. Meta and SEO | /<meta\s+name=["']fragment["']\s+content=["']!["'][^>]*>/i |
+5 |
hasMetaJsDependencies_BuiltWith |
6. Meta and SEO | /<!--\s*Built with\s+(?:Create React App|Vite|Vue\.js)\s*-->/i |
+2 |
hasDynamicMetaOrTitle |
6. Meta and SEO | /document\.title\s*=\s*["'][^"']+["']/i |
+15 |
hasDynamicMetaOrTitle_QuerySelector |
6. Meta and SEO | /document\.querySelector\(['"]meta\[name=["'][^"']+["']\)['"]\)\.setAttribute\(['"]content['"],\s*["'][^"']+["']\)/i |
+15 |
hasDynamicStyleLoading |
7. Event Handlers and Listeners | /document\.createElement\(['"]style['"]\)\.innerHTML\s*=/i |
+5 |
hasIntersectionObserversOrScroll_IntersectionObserver |
7. Event Handlers and Listeners | /IntersectionObserver\s*\(/i |
+5 |
hasIntersectionObserversOrScroll_ScrollListener |
7. Event Handlers and Listeners | /addEventListener\s*\(\s*['"]scroll['"]\)/i |
+5 |
usesReactHooksUsage |
8. State Management and Hooks | /\buseState\b|\buseEffect\b|\buseContext\b/i |
+40 |
usesCustomHooks |
8. State Management and Hooks | /\buse[A-Z][a-zA-Z0-9]*\b/i |
+5 |
hasStateManagementLibraries |
8. State Management and Hooks | /<script[^>]*\bsrc=["'][^"']*\b(?:redux|mobx|vuex|ngrx)\b[^"']+\.js["'][^>]*><\/script>/i |
+35 |
hasTemplatingLibraries |
9. Components and Libraries | /<script\s+type=["'](?:text\/template|text\/x-handlebars-template)["'][^>]*>[\s\S]*<\/script>/i |
+10 |
hasTemplatingLibraries_Placeholders |
9. Components and Libraries | /\{\{\s*[^}]+\s*\}\}/ |
+10 |
hasTemplatingLibraries_Comments |
9. Components and Libraries | /\{\/*\s*[^}]+\s*\*\/\}/ |
+5 |
hasTemplatingLibraries_HYDRATE |
9. Components and Libraries | /<!--\s*\[(HYDRATE|hydration)\]\s*-->/i |
+25 |
hasWebComponents_Tag |
10. Web Components and Shadow DOM | /<[a-z]+-[a-z]+[^>]*><\/[a-z]+-[a-z]+>/i |
+20 |
hasWebComponents_Script |
10. Web Components and Shadow DOM | /<script[^>]*\bsrc=["'][^"']+custom-element\.js["'][^>]*><\/script>/i |
+20 |
hasShadowDOM |
10. Web Components and Shadow DOM | `/attachShadow\s*(\s*({[^}]*} | null)\s*)/i` |
hasSSRFrameworkScripts |
11. SSR Indicators | /<script[^>]*\bsrc=["'][^"']*\b(?:next|gatsby|nuxt|remix)\b[^"']+\.js["'][^>]*><\/script>/i |
+2 |
hasSSRHydration_DataReactroot |
11. SSR Indicators | /data-reactroot/i |
+2 |
hasSSRHydration_ReactHydrate |
11. SSR Indicators | /ReactDOM\.hydrate\s*\(/i |
+2 |
hasHydrationMarkers |
11. SSR Indicators | /<div\s+id=["']__next["']><\/div>/i |
+2 |
hasHydrationScripts |
11. SSR Indicators | /hydrate\s*\(/i |
+2 |
hasProgressiveHydration |
11. SSR Indicators | /<script[^>]*\btype=["']application\/javascript["'][^>]*>[\s\S]*\bhydrate\b[\s\S]*<\/script>/i |
+2 |
hasDynamicImportExpressions |
12. Modern Build Tools | /\bimport\(\s*['"][^'"]+['"]\s*\)/i |
+30 |
hasDynamicImports |
12. Modern Build Tools | /import\(['"][^'"]+['"]\)/i |
+30 |
hasDynamicImportMeta |
12. Modern Build Tools | /import\.meta\./i |
+5 |
usesGraphQLMutations |
12. Modern Build Tools | /mutation\s*{[^}]+}/i |
+10 |
usesGraphQLQueries |
12. Modern Build Tools | /graphql\s*\(/i |
+15 |
hasGraphQLSubscriptions |
12. Modern Build Tools | /subscriptions\s*{[^}]+}/i |
+5 |
hasWebAssemblyModules |
12. Modern Build Tools | /<script[^>]*\bsrc=["'][^"']+\.wasm["'][^>]*><\/script>/i |
+5 |
hasModernBuildToolArtifacts_ScriptBundle |
12. Modern Build Tools | /<script[^>]*\bsrc=["'][^"']+\.[a-z0-9]{6}\.bundle\.js["'][^>]*><\/script>/i |
+10 |
hasModernBuildToolArtifacts_LinkChunkCSS |
12. Modern Build Tools | /<link[^>]*\bhref=["'][^"']+\.chunk\.[a-z0-9]{6}\.css["'][^>]*>/i |
+10 |
hasModernBuildToolArtifacts_ScriptJS |
12. Modern Build Tools | /<script[^>]*\bsrc=["'][^"']+\.[a-z0-9]{6}\.js["'][^>]*><\/script>/i |
+10 |
hasModernJSFeatures |
12. Modern Build Tools | /(\basync\s+function\b|\(\s*\)\s*=>|\bPromise\.all\b)/i |
+2 |
hasWebpackDevServer |
12. Modern Build Tools | /webpack-dev-server\/bundle\.js/i |
+2 |
usesHotModuleReplacement |
12. Modern Build Tools | /module\.hot/i |
+2 |
hasRuntimeBundlers |
12. Modern Build Tools | /<script[^>]*\bsrc=["'][^"']+\b(?:system|require)\.js["'][^>]*><\/script>/i |
+10 |
hasDynamicCSSLoading |
13. Styling | /document\.createElement\(['"]link['"]\)\.rel\s*=\s*['"]stylesheet['"]/i |
+5 |
usesDynamicCSSImports |
13. Styling | /import\s+['"]\.\/[^'"]+\.css['"]/i |
+5 |
hasCSSinJS |
13. Styling | /\bstyled\.|css\.inject\b/i |
+5 |
hasTailwindClasses |
13. Styling | /\b(bg|text|p|m|flex|grid)-[a-z]+\b/i |
+1 |
usesReactFragments |
14. Components and Libraries | /<React\.Fragment>|<>\s*<\/>/i |
+15 |
hasNextImageComponents |
14. Components and Libraries | /<Image\s+[^>]*\/>/i |
+2 |
hasVueRouterScript |
14. Components and Libraries | /<script[^>]*\bsrc=["'][^"']*vue-router[^"']*\.js["'][^>]*><\/script>/i |
+40 |
hasAngularNgModule |
14. Components and Libraries | /@NgModule\(\s*{[^}]*}\s*\)/i |
+40 |
hasSvelteHydration |
14. Components and Libraries | /svelte-hydrate/i |
+2 |
hasVirtualDOMReferences |
14. Components and Libraries | /React\.createElement|Vue\.createElement|h\(/i |
+2 |
hasServiceWorkersOrPWA_Manifest |
14. Components and Libraries | /<link\s+rel=["']manifest["']\s+href=["'][^"']+["']>/i |
+2 |
hasServiceWorkersOrPWA_Register |
14. Components and Libraries | /navigator\.serviceWorker\.register\(["'][^"']+["']\)/i |
+2 |
usesServiceWorkers |
14. Components and Libraries | /navigator\.serviceWorker\.register\(/i |
+2 |
usesGraphQLClient |
14. Components and Libraries | /<script[^>]*\bsrc=["'][^"']+\b(?:apollo|relay)\.js["'][^>]*><\/script>/i |
+15 |
usesCodeSplitting |
14. Components and Libraries | /import\(\s*['"][^'"]+['"]\s*\)/i |
+25 |
hasNoscriptContent |
15. Additional Rules | /<noscript>([\s\S]*?)<\/noscript>/i |
+1 |
hasObfuscatedMinifiedJS |
15. Additional Rules | /<script[^>]*\bsrc=["'][^"']+\.(?:min|bundle|vendor)\.[a-z0-9]*\.js["'][^>]*><\/script>/i |
+5 |
hasDynamicComponentLoading |
15. Additional Rules | `/React.lazy( | Vue.component(/i` |
hasDynamicThemeLoading |
15. Additional Rules | /loadTheme\(\s*['"][^"']+['"]\s*\)/i |
+2 |
hasReactStrictMode |
15. Additional Rules | /<React.StrictMode>/i |
+5 |
hasDynamicImportPolyfill |
15. Additional Rules | /import\.polyfill\(\)/i |
+10 |
hasJSFrameworkGlobalVariables |
15. Additional Rules | /\b(window\.(React|Vue|Angular))\b/i |
+25 |
hasJSXSyntax |
15. Additional Rules | /<script[^>]*>[^<]*\bReact\.createElement\b[^<]*<\/script>/i |
+40 |
hasModernLibsCDN |
15. Additional Rules | /<script[^>]*\bsrc=["']https?:\/\/cdn\.jsdelivr\.net\/npm\/(?:react|vue|angular|svelte)[^"']+\.js["'][^>]*><\/script>/i |
+2 |
usesEmberDataModels |
16. Additional JS Frameworks | /\bEmber\.Model\b/i |
+30 |
createsEmberApplication |
16. Additional JS Frameworks | /Ember\.Application\.create\s*\(/i |
+30 |
usesBackboneModelsCollections |
16. Additional JS Frameworks | /\bBackbone\.(?:Model|Collection)\b/i |
+25 |
usesBackboneRouter |
16. Additional JS Frameworks | /\bBackbone\.Router\.extend\s*\(/i |
+30 |
usesAlpineJSDirectives |
16. Additional JS Frameworks | /\bx-(?:data|bind|on|show|if|for)\b/i |
+15 |
initializesAlpineJS |
16. Additional JS Frameworks | /Alpine\.start\s*\(\s*\)/i |
+15 |
usesDocumentWrite |
17. Server-Driven JS Injection | /document\.write\s*\(\s*["']<script[^>]+><\/script>["']\s*\)/i |
+20 |
injectsScriptsWithInnerHTML |
17. Server-Driven JS Injection | /innerHTML\s*=\s*["']<script[^>]+><\/script>["']/i |
+25 |
usesServerSideTemplateJS |
17. Server-Driven JS Injection | /<%[^%]*%>/i |
+1 |
initializesWebWorkers |
18. Web Workers & Service Workers | /new\s+Worker\s*\(\s*["'][^"']+["']\s*\)/i |
+2 |
usesImportScriptsInWorkers |
18. Web Workers & Service Workers | /importScripts\s*\(\s*["'][^"']+["']\s*\)/i |
+2 |
serviceWorkerHandlesFetch |
18. Web Workers & Service Workers | /self\.addEventListener\s*\(\s*['"]fetch['"]\s*,/i |
+2 |
usesVirtualDomLibrary |
19. Virtual DOM Manipulation | /\bvirtual-dom\b/i |
+25 |
performsVirtualDomOperations |
19. Virtual DOM Manipulation | /\b(?:createElement|diff|patch)\s*\(/i |
+5 |
usesCustomFrameworkGlobals |
20. Custom JS Frameworks | /\bMyAppInit\b|\bCustomFramework\b/i |
+30 |
callsCustomInitFunctions |
20. Custom JS Frameworks | /\binitializeApp\s*\(/i |
+25 |
usesFeatureDetectionLibraries |
21. Progressive Enhancement | /\bModernizr\b/i |
+2 |
conditionallyLoadsScripts |
21. Progressive Enhancement | /if\s*\(\s*['"]feature['"]\s*\)\s*{\s*loadScript\s*\(/i |
+10 |
usesI18next |
22. Client-Side Localization | /\bi18next\b/i |
+10 |
loadsLanguageFiles |
22. Client-Side Localization | /loadLanguage\s*\(\s*["'][a-z]{2}["']\s*\)/i |
+5 |
usesSignalR |
23. Real-Time Data | /\bSignalR\b/i |
+5 |
usesFirebaseRealTimeDB |
23. Real-Time Data | /\bfirebase\.database\b/i |
+5 |
usesES6ModuleImports |
24. Modular JS Patterns | /\bimport\s+[^;]+from\s+['"][^'"]+['"]/i |
+5 |
usesES6ModuleExports |
24. Modular JS Patterns | /\bexport\s+(?:default\s+)?(?:class|function|const|let|var)\b/i |
+5 |
usesCommonJSRequire |
24. Modular JS Patterns | /\brequire\s*\(\s*['"][^'"]+['"]\s*\)/i |
+5 |
usesAMDDefineRequire |
24. Modular JS Patterns | `/\bdefine\s*( | \brequire\s*(/i` |
includesGoogleAnalytics |
25. Analytics & Tracking | /https?:\/\/www\.google-analytics\.com\/analytics\.js/i |
+1 |
includesjQuery |
25. Analytics & Tracking | /<script[^>]*\bsrc=["']https?:\/\/code\.jquery\.com\/jquery[^"']+\.js["'][^>]*><\/script>/i |
+2 |
loadsGoogleAnalyticsAsync |
25. Analytics & Tracking | /<script[^>]*\basync\b[^>]*\bsrc=["']https:\/\/www\.google-analytics\.com\/analytics\.js["'][^>]*><\/script>/i |
+1 |
injectsTrackingPixels |
25. Analytics & Tracking | /createElement\s*\(\s*['"]img['"]\s*\)\.src\s*=/i |
+1 |
usesMJML |
26. Email Template Rendering | /\bmjml\b/i |
+1 |
usesHandlebarsEmailTemplates |
26. Email Template Rendering | /\bHandlebars\.compile\s*\(/i |
+1 |
generatesEmailContentDynamically |
26. Email Template Rendering | /generateEmail\s*\(\s*\)/i |
+1 |
usesAuth0 |
27. User Authentication | /\bAuth0\.WebAuth\b/i |
+5 |
usesFirebaseAuth |
27. User Authentication | /\bfirebase\.auth\b/i |
+5 |
usesPassportJS |
27. User Authentication | /\bPassport\b/i |
+1 |
usesDragula |
28. Drag and Drop | /\bdragula\s*\(/i |
+2 |
usesSortableJS |
28. Drag and Drop | /\bSortable\.create\s*\(/i |
+2 |
usesInteractJS |
28. Drag and Drop | /\binteract\.\w+\s*\(/i |
+2 |
usesDropzoneJS |
29. File Upload | /\bDropzone\.autoDiscover\s*=/i |
+2 |
usesFineUploader |
29. File Upload | /\bFineUploader\s*\(/i |
+2 |
usesUppy |
29. File Upload | /\bUppy\.use\s*\(/i |
+2 |
serializesStateWithJSON |
30. State Serialization | /\bJSON\.stringify\s*\(/i |
+2 |
deserializesStateWithJSON |
30. State Serialization | /\bJSON\.parse\s*\(/i |
+2 |
usesStateSerializationLibraries |
30. State Serialization | /\bserializeState\b|\bdeserializeState\b/i |
+20 |
usesLodash |
31. Data Filtering/Sorting | /\b_\.\w+\s*\(/i |
+2 |
usesUnderscoreJS |
31. Data Filtering/Sorting | /\b_\.\w+\s*\(/i |
+2 |
usesRamda |
31. Data Filtering/Sorting | /\bR\.\w+\s*\(/i |
+2 |
usesToastr |
32. Notification Libraries | /\btoastr\.\w+\s*\(/i |
+1 |
usesNoty |
32. Notification Libraries | /new\s+Noty\s*\(/i |
+1 |
usesSweetAlert |
32. Notification Libraries | /\bswal\s*\(/i |
+1 |
usesVideoJS |
34. Video & Media | /\bvideo\.js\b/i |
+2 |
usesPlyr |
34. Video & Media | /\bPlyr\.setup\b/i |
+2 |
usesHowlerJS |
34. Video & Media | /\bHowl\s*=\s*new\s+Howler\.Howl\b/i |
+2 |
usesLazyLoadLibrary |
37. Performance Optimization | /\bLazyLoad\s*\(/i |
+2 |
usesIntersectionObserverAPI |
37. Performance Optimization | /\bIntersectionObserver\b/i |
+2 |
usesPerformanceAPI |
37. Performance Optimization | /\bperformance\.\w+\s*\(/i |
+1 |
enforcesCSPViaJS |
38. Security Features | /setAttribute\s*\(\s*['"]content-security-policy['"]\s*,/i |
+1 |
usesCryptoJS |
38. Security Features | /\bCryptoJS\.\w+\s*\(/i |
+1 |
usesReduxPersist |
39. Data Persistence | /\bpersistReducer\b/i |
+25 |
usesMobXPersist |
39. Data Persistence | /\bmobxPersist\b/i |
+25 |
usesLocalForage |
39. Data Persistence | /\blocalforage\b/i |
+15 |
usesApolloClient |
40. GraphQL | /\bApolloClient\b/i |
+15 |
usesRelayModern |
40. GraphQL | /\bRelay\.Environment\b/i |
+15 |
usesGraphQLQueriesInJS |
40. GraphQL | /graphql\s* [\s\S]+?/i |
+15 |
usesIonicFramework |
43. Mobile Frameworks | /\bIonic\b/i |
+10 |
usesFramework7 |
43. Mobile Frameworks | /\bFramework7\b/i |
+10 |
usesOnsenUI |
43. Mobile Frameworks | /\bOnsenUI\b/i |
+10 |
usesIonicComponents |
43. Mobile Frameworks | /<ion-[a-z]+>/i |
+5 |
usesFabricJS |
44. Image Manipulation | /\bFabric\b/i |
+1 |
usesPixiJS |
44. Image Manipulation | /\bPIXI\.Application\b/i |
+1 |
usesP5JS |
44. Image Manipulation | /\bp5\.createCanvas\b/i |
+1 |
usesCryptoJSEncryption |
45. Data Encryption | /\bCryptoJS\.\w+\s*\(/i |
+1 |
usesSJCL |
45. Data Encryption | /\bsjcl\.encrypt\b/i |
+1 |
usesBcryptJS |
45. Data Encryption | /\bbcrypt\.hash\b/i |
+1 |
usesLaunchDarkly |
46. Feature Toggles | /\bLDClient\.init\b/i |
+2 |
usesUnleash |
46. Feature Toggles | /\bUnleash\.initialize\b/i |
+2 |
usesFeatureToggleJS |
46. Feature Toggles | /\bFeatureToggle\.isEnabled\b/i |
+2 |
usesjQuerySteps |
49. Multi-Step Forms | /\bsteps\s*\(/i |
+5 |
usesFormikStepper |
49. Multi-Step Forms | /\buseFormikStepper\b/i |
+10 |
handlesDynamicFormSteps |
49. Multi-Step Forms | /goToStep\s*\(\s*\d+\s*\)/i |
+10 |
usesStripeJS |
50. Payment Processing | /\bStripe\.\w+\s*\(/i |
+10 |
usesPayPalSDK |
50. Payment Processing | /https:\/\/www\.paypal\.com\/sdk\/js/i |
+10 |
usesBraintreeSDK |
50. Payment Processing | /\bBraintree\.\w+\s*\(/i |
+10 |
usesReactHydrateRoot |
51. State Hydration | /\bReactDOM\.hydrateRoot\b/i |
+3 |
usesVueHydration |
51. State Hydration | /\bVue\.hydrate\b/i |
+3 |
embedsInitialStateForHydration |
51. State Hydration | /window\.__INITIAL_STATE__\s*=\s*{/i |
+3 |
usesReactErrorBoundaries |
52. Error Boundaries | /class\s+\w+\s+extends\s+React\.Component\s*{\s*componentDidCatch/i |
+2 |
usesFunctionalErrorBoundaries |
52. Error Boundaries | /\buseErrorHandler\b/i |
+2 |
wrapsWithErrorBoundary |
52. Error Boundaries | /<ErrorBoundary>\s*<\/ErrorBoundary>/i |
+2 |
usesD3JS |
56. Data Visualization | /\bD3\.v\d+\.\d+\.\d+\b/i |
+2 |
usesChartJS |
56. Data Visualization | /\bChart\.new\s*\(/i |
+2 |
usesHighcharts |
56. Data Visualization | /\bHighcharts\.chart\s*\(/i |
+2 |
usesKnockoutJS |
57. Data Binding | /\bko\.applyBindings\b/i |
+10 |
usesRivetsJS |
57. Data Binding | /\bRivets\.bind\b/i |
+10 |
usesDataBindAttributes |
57. Data Binding | /\bdata-bind\s*=\s*["'][^"']+["']/i |
+5 |
usesJoiValidation |
59. Form Validation | /\bJoi\.validate\b/i |
+2 |
usesYupValidation |
59. Form Validation | /\bYup\.object\b/i |
+5 |
submitsFormsWithFetch |
59. Form Validation | /form\.addEventListener\s*\(\s*['"]submit['"]\s*,\s*function\s*\([^)]*\)\s*{\s*fetch\s*\(/i |
+15 |
submitsFormsWithXHR |
59. Form Validation | /form\.addEventListener\s*\(\s*['"]submit['"]\s*,\s*function\s*\([^)]*\)\s*{\s*new\s+XMLHttpRequest\s*\(/i |
+10 |
usesMomentJS |
60. Date & Time Libraries | /\bMoment\.js\b/i |
+1 |
usesDayJS |
60. Date & Time Libraries | /\bdayjs\s*\(/i |
+1 |
usesLuxon |
60. Date & Time Libraries | /\bLuxon\.DateTime\b/i |
+1 |
hasReCAPTCHA |
Bot Gates / Challenges | /g-recaptcha|www\.google\.com\/recaptcha|grecaptcha\.execute/i |
+35 |
hasHCaptcha |
Bot Gates / Challenges | /hcaptcha\.com|class=["']h-captcha["']|data-sitekey=/i |
+35 |
hasCFTurnstile |
Bot Gates / Challenges | /challenges\.cloudflare\.com|cf-turnstile/i |
+40 |
hasPerimeterX |
Bot Gates / Challenges | `/(?:^ | /)_?px/ |
hasDataDome |
Bot Gates / Challenges | /datadome/i |
+35 |
hasAkamaiBM |
Bot Gates / Challenges | /bm-verify|akamai/i |
+30 |
hasArkoseLabs |
Bot Gates / Challenges | /funcaptcha\.com|arkoselabs/i |
+35 |
hasWebpackRuntime |
Bundlers & Runtimes | /(window|self)\["webpack(?:Jsonp|Chunk[^"]*)"\]\s*=/i |
+30 |
hasParcelRuntime |
Bundlers & Runtimes | /\bparcelRequire\s*=\s*function/i |
+25 |
hasViteDevClient |
Bundlers & Runtimes | /<script[^>]+type=["']module["'][^>]+src=["'][^"']*\/@vite\/client["']/i |
+30 |
hasModulePreloadLinks |
Bundlers & Runtimes | /<link[^>]+rel=["']modulepreload["']/i |
+15 |
hasEsbuildFootprint |
Bundlers & Runtimes | /\/esbuild\.[^"']+\.js\b|\besbuild:\b/i |
+15 |
hasRollupSystemRegister |
Bundlers & Runtimes | /\bSystem\.register\s*\(/i |
+20 |
hasSnowpackFootprint |
Bundlers & Runtimes | /\/_snowpack\//i |
+15 |
scriptBytesDominant |
Computed Signals | Calculated ratio of script tag bytes to total HTML bytes is > 40%. | +12 |
hugeInlineJS |
Computed Signals | Total bytes of inline (non-src) script tags is > 60KB. | +10 |
largeJsonState |
Computed Signals | Total bytes within <script type="application/json"> tags is > 10KB. |
+14 |
lowVisibleText |
Computed Signals | Character count of visible text (after stripping scripts, styles, tags) is < 500. | +10 |
manyModulePreloads |
Computed Signals | Count of <link rel="modulepreload"> tags is >= 3. |
+8 |
manyPrefetchedScripts |
Computed Signals | Count of <link rel="prefetch" as="script"> tags is >= 3. |
+6 |
earlyJSRedirect |
Computed Signals | HTML contains window.location.replace() or window.location.assign() . |
+8 |
manyThirdPartyPreconnects |
Computed Signals | Count of `<link rel="preconnect | dns-prefetch">` tags is >= 4. |
headHeavyScripting |
Computed Signals | The <head> contains >= 3 script tags and has more scripts than the <body> . |
+5 |
computed:nextGSSP |
Computed Signals (High Value) | The JSON inside __NEXT_DATA__ contains "gssp": true . |
-35 |
computed:nuxtServerRendered |
Computed Signals (High Value) | The JSON inside window.__NUXT__ contains "serverRendered": true . |
-25 |
computed:reactHydrateCall |
Computed Signals (High Value) | The HTML contains hydrateRoot( or ReactDOM.hydrate( . |
+15 |
computed:skeletonUI |
Computed Signals (High Value) | The HTML contains >= 8 instances of class names with skeleton , shimmer , placeholder , etc. |
+10 |
computed:largeNoscriptFallback |
Computed Signals (High Value) | The total byte size of content within <noscript> tags is > 8KB. |
-10 |
hasCRANoscript |
CSR Required / Heavy Hydration | /<noscript>[^<]*You need to enable JavaScript to run this app/i |
+28 |
hasGatsbyAnnouncer |
CSR Required / Heavy Hydration | /id=["']gatsby-announcer["']/i |
+12 |
hasHelmetMarkers |
CSR Required / Heavy Hydration | `/\bdata-rh=["']true["'] | \bdata-react-helmet=["']true["']/i` |
hasIsLandIslands |
CSR Required / Heavy Hydration | /<is-land\b/i |
+20 |
hasKnockoutBindings |
CSR Required / Heavy Hydration | `/\bdata-bind\s*= | ko.applyBindings\s*(/i` |
hasMithrilRouter |
CSR Required / Heavy Hydration | /\bm\.route\s*\(/i |
+18 |
hasNextFlightStream |
CSR Required / Heavy Hydration | /self\.__next_f\.push\(/i |
+30 |
hasUnpolyDirectives |
CSR Required / Heavy Hydration | /\bup-(?:target|follow|layer|data|main|on|modal|autosubmit)\b/i |
+20 |
hasWPInteractivityDirectives |
CSR Required / Heavy Hydration | /\bdata-wp-(?:interactive|bind|on|class|text|style|watch)\b/i |
+18 |
hasWebflowRuntime |
CSR Required / Heavy Hydration | `/assets.website-files.com/[^"']+/webflow(?:.min)?.js | \bdata-wf-(?:site|status|page)\b/i` |
hasEmptyBodyCSR |
Empty or Minimal Body | /<body>\s*<\/body>/i |
+50 |
hasNextDataJson |
Frameworks & Platforms | /<script[^>]+id=["']__NEXT_DATA__["'][^>]*type=["']application\/json["'][^>]*>/i |
+45 |
hasRemixManifest |
Frameworks & Platforms | /window\.__remixManifest/i |
+35 |
hasAstroIslands |
Frameworks & Platforms | /<astro-island\b/i |
+35 |
hasQwikAttributes |
Frameworks & Platforms | /\bq:(?:container|base|slot)\b|qrl=/i |
+40 |
hasSolidHydrationMarks |
SSR Confirmed | `/\bdata-hk=["'][^"']+["'] | \bdata-hydration\b/i` |
hasPreactRuntime |
Frameworks & Platforms | `/\bpreact.render\s*( | /preact(?:.min)?.js\b/i` |
hasLitOrPolymer |
Frameworks & Platforms | `/(lit-(?:html|element)|webcomponents-loader).js\b | \bShadyDOM\b/i` |
hasStencilRuntime |
Frameworks & Platforms | `/@stencil|data-stencil-build | /stencil(?:.esm)?.js\b/i` |
hasHTMX |
Frameworks & Platforms | /\bhx-(?:get|post|boost|target|swap)\b/i |
+20 |
hasTurboOrHotwire |
Frameworks & Platforms | `/<turbo-(?:frame|stream)\b | \bdata-turbo(?:-permanent)?\b/i` |
hasStimulus |
Frameworks & Platforms | /\bdata-controller=["'][^"']+["'][^>]*\bdata-action=["'][^"']+->/i |
+15 |
hasPhoenixLiveView |
Frameworks & Platforms | `/\bphx-(?:click|change|submit|update)\b | /phoenix/live_reload.js\b/i` |
hasLaravelLivewire |
Frameworks & Platforms | /\bwire:(?:click|model|submit|change)\b/i |
+25 |
hasInertia |
Frameworks & Platforms | /\bdata-page=["'][\s\S]*?["']\b|window\.__INERTIA__/i |
+35 |
hasBlazor |
Frameworks & Platforms | /\/_framework\/blazor\.(?:server|webassembly)\.js\b/i |
+35 |
hasElmInit |
Frameworks & Platforms | /\bElm\.[A-Za-z0-9_]+\s*\.init\s*\(/i |
+35 |
hasIslandMarkers |
Islands Architecture | /data-island/i |
+30 |
prefetchManyScripts |
Preload/Prefetch Hints | /<link[^>]+rel=["']prefetch["'][^>]+as=["']script["']/i |
+12 |
preconnectManyThirdParties |
Preload/Prefetch Hints | /<link[^>]+rel=["'](?:preconnect|dns-prefetch)["'][^>]+href=/ig |
+8 |
hasQwikResumability |
Resumability Indicators | /qwik-/i |
+40 |
hasQwikLoaderScript |
Resumability Indicators | /qwikloader\.js/i |
+40 |
usesHistoryAPI |
Routing & Navigation | /\bhistory\.(?:pushState|replaceState)\s*\(/i |
+25 |
hasBaseHrefRooted |
Routing & Navigation | /<base\s+href=["'][^"']*["'][^>]*>/i |
+8 |
hasAngularServerContext |
SSR Confirmed | /\bng-server-context\b/i |
-25 |
hasAngularTransferState |
SSR Confirmed | /<script[^>]+id=["']ng-state["'][^>]*type=["']application\/json["'][^>]*>/i |
-20 |
hasDeclarativeShadowDOM |
SSR Confirmed | /<template\s+shadowrootmode=["'](?:open|closed)["']/i |
-20 |
hasFastBootShoebox |
SSR Confirmed | /<script[^>]+type=["']fastboot\/shoebox["'][^>]*>/i |
-30 |
hasReactSSRRootHeuristic |
SSR Confirmed | /\bdata-reactroot\b/i |
-12 |
hasReactStreamPlaceholders |
SSR Confirmed | /(<!--\$\??-->|<!--\/\$-->|<template\s+id=["'][A-Z]:\d+)/i |
-22 |
hasSvelteHydrationMarks |
SSR Confirmed | /\bdata-svelte-h=["'][^"']+["']/i |
-25 |
hasVueMetaSSR |
SSR Confirmed | /\bdata-vue-meta-server-rendered\b/i |
-15 |
hasVueSSRRoot |
SSR Confirmed | /\bdata-server-rendered=["']?true["']?/i |
-35 |
hasReactSSRAttributes |
SSR Confirmed | /\bdata-reactroot\b|\bdata-reactid\b|<!--\s*react-(?:text|empty):/i |
-15 |
hasMarkoSSRAttributes |
SSR Confirmed | /\bdata-marko=/i |
-25 |
hasNextJSHydrationMarker |
SSR Confirmed | /<div\s+id=["']__next["']><\/div>/i |
-10 |
hasSvelteHydrationScript |
SSR Confirmed | /svelte-hydrate/i |
-15 |
hasSSRFrameworkScript |
SSR Confirmed | /<script[^>]*\bsrc=["'][^"']*\b(?:next|gatsby|nuxt|remix)\b[^"']+\.js["'][^>]*><\/script>/i |
-5 |
hasPrerenderAttribute |
SSR Data Attributes | /prerender/i |
+2 |
hasReactHelmetSSR |
SSR Data Attributes | /data-rh/i |
+2 |
hasAngularUniversalSSR |
SSR Markers (Framework-Specific) | /ng-server-context/i |
+2 |
hasSvelteKitSSR |
SSR Markers (Framework-Specific) | /sveltekit:embedded/i |
+2 |
hasRemixSSRManifest |
SSR Markers (Framework-Specific) | /remix-manifest/i |
+2 |
hasGatsbySSRWrapper |
SSR Markers (Framework-Specific) | /gatsby-focus-wrapper/i |
+2 |
hasNuxtSSRState |
SSR Markers (Framework-Specific) | /__NUXT__/i |
+2 |
hasAstroSSRPrefix |
SSR Markers (Framework-Specific) | /astro-/i |
+2 |
embedsReduxOrApolloBlobs |
State/JSON Blobs | /window\.__(?:REDUX|APOLLO)_STATE__/i |
+25 |
embedsHypernova |
State/JSON Blobs | /<script[^>]+data-hypernova-key=/i |
+25 |
hasAlpineDirectives |
Vue/Alpine/Interactivity | /\bx-data\b|x-bind:|x-on:|@click=|x-model\b/i |
+18 |
hasVueAllowMismatch |
Vue/Alpine/Interactivity | /\bdata-allow-mismatch\b/i |
+8 |
usesHydrationStrategyAttributes |
Whitepaper: Advanced Detection | /loading=['"](hydrate|onClientLoad|onClick|onFocus|onVisible)['"]/ |
+35 |
hasWaiAriaLiveRegions |
Strong CSR & Heavy Hydration | /(aria-live=['"](polite|assertive)['"]|aria-busy=['"]true['"])/ |
+22 |
hasClientOnlyComponentTags |
High-Confidence CSR (Smoking Guns) | /(?:<nuxt-client-only>|<ClientOnly>|<ion-.*>|@rendermode="InteractiveWebAssembly")/ |
+50 |
usesHistoryAPI |
Strong CSR & Heavy Hydration | /history\.(pushState|replaceState)\s*\(/i |
+25 |
usesReduxPersist |
Strong CSR & Heavy Hydration | /\bpersistReducer\s*\(/i |
+25 |
hasInertia |
Modern Frameworks | /@inertiajs|Inertia\.(visit|get|post)/i |
+20 |
hasLaravelLivewire |
Lightweight Interactivity | /wire:|livewire:|@livewire/i |
+18 |
hasPhoenixLiveView |
Lightweight Interactivity | /phx-|data-phx-|Phoenix\.LiveView/i |
+18 |
hasServerDrivenDataPayload |
Whitepaper: Advanced Detection | /(?:<script\s+id=["']__NEXT_DATA__["']|window\.__remixContext|window\.__solid_data)/ |
+10 |
hasServerSideStateAttributes |
Whitepaper: Advanced Detection | /(?:data-blazor-ssr|data-server-state)/ |
+10 |
usesDomMutationObservers |
Whitepaper: Advanced Detection | /new\s+MutationObserver\s*\(/ |
+45 |
usesIntersectionObserver |
Whitepaper: Advanced Detection | /new\s+IntersectionObserver\s*\(/ |
+30 |
hasAsyncLoadingCssPatterns |
Whitepaper: Advanced Detection | /(?:skeleton-loading|shimmer-effect|placeholder-container)/ |
+20 |
While the original engine was prototyped in JavaScript, Python is the language of choice for data pipelines and backend systems. Here is the complete, all-inclusive engine, translated into a clean and readable Python class, ready to be integrated into your crawler.
import re
import json
class RenderingDetector:
"""
A comprehensive Python class for heuristic-based detection of web page
rendering strategies (SSR vs. CSR).
This engine analyzes a raw HTML string against an exhaustive set of rules,
calculating a weighted score to determine the likelihood that a page
relies on Client-Side Rendering (CSR) for its primary content.
- Positive scores indicate CSR.
- Negative scores indicate SSR/SSG.
- A score above the threshold suggests a JS-required crawl is necessary.
"""
SCORE_THRESHOLD = 30
# The exhaustive list of all detection rules, organized by signal confidence.
RULES = [
# I. SSR / SSG / Server-First Affirmations (NEGATIVE SCORE)
{'name': 'hasVueSSRRoot', 'pattern': re.compile(r'\bdata-server-rendered=["\']?true["\']?', re.IGNORECASE)},
{'name': 'hasFastBootShoebox', 'pattern': re.compile(r'<script[^>]+type=["\']fastboot\/shoebox["\'][^>]*>', re.IGNORECASE)},
{'name': 'hasAngularServerContext', 'pattern': re.compile(r'\bng-server-context\b', re.IGNORECASE)},
{'name': 'hasSvelteHydrationMarks', 'pattern': re.compile(r'\bdata-svelte-h=["\'][^"\']+["\']|\bdata-sveltekit-hydrate-start\b', re.IGNORECASE)},
{'name': 'hasReactStreamPlaceholders', 'pattern': re.compile(r'(<!--\$\??-->|<!--\/\$-->|<template\s+id=["\'][A-Z]:\d+)', re.IGNORECASE)},
{'name': 'hasAngularTransferState', 'pattern': re.compile(r'<script[^>]+id=["\']ng-state["\'][^>]*type=["\']application\/json["\'][^>]*>', re.IGNORECASE)},
{'name': 'hasDeclarativeShadowDOM', 'pattern': re.compile(r'<template\s+shadowrootmode=["\'](?:open|closed)["\']', re.IGNORECASE)},
{'name': 'hasVueMetaSSR', 'pattern': re.compile(r'\bdata-vue-meta-server-rendered\b', re.IGNORECASE)},
{'name': 'hasReactSSRAttributes', 'pattern': re.compile(r'\bdata-reactroot\b|\bdata-reactid\b|<!--\s*react-(?:text|empty):', re.IGNORECASE)},
{'name': 'hasSolidHydrationMarks', 'pattern': re.compile(r'\bdata-hk=["\'][^"\']+["\']|\bdata-hydration\b', re.IGNORECASE)},
{'name': 'hasStaticSiteGeneratorArtifacts', 'pattern': re.compile(r'<meta\s+name=["\']generator["\']\s+content=["\'](?:Jekyll|Hugo|Eleventy|Astro|Gatsby|Next\.js)["\']', re.IGNORECASE)},
{'name': 'hasMarkoSSRAttributes', 'pattern': re.compile(r'\bdata-marko=', re.IGNORECASE)},
{'name': 'embedsHypernova', 'pattern': re.compile(r'<script[^>]+data-hypernova-key=', re.IGNORECASE)},
{'name': 'hasVueSSRClientHydration', 'pattern': re.compile(r'__VUE_SSR__\s*=\s*true', re.IGNORECASE)},
{'name': 'hasNextJSHydrationMarker', 'pattern': re.compile(r'<div\s+id=["\']__next["\']></div>', re.IGNORECASE)},
{'name': 'hasSvelteHydrationScript', 'pattern': re.compile(r'svelte-hydrate', re.IGNORECASE)},
{'name': 'hasSSRFrameworkScript', 'pattern': re.compile(r'<script[^>]*\bsrc=["\'][^"\']*\b(?:next|gatsby|nuxt|remix)\b[^"\']+\.js["\'][^>]*><\/script>', re.IGNORECASE)},
# II. High-Confidence CSR Indicators ("Smoking Guns")
{'name': 'hasBotChallengeText', 'pattern': re.compile(r'\b(unusual\s+activity\s+from\s+your\s+IP|detected\s+unusual\s+(traffic|activity)|(please\s+)?(complete|solve|pass)\s+the\s+captcha|(type|enter|input)\s+the\s+(characters|letters|code|text)|verify\s+(you\'re|you\s+are)\s+(a\s+)?human|prove\s+(you\'re|you\s+are)\s+not\s+a\s+robot|(please\s+)?check\s+the\s+box(\s+below)?|checking\s+your\s+browser|security\s+check|robot\s+verification|are\s+you\s+a\s+human\?|click\s+the\s+(images|pictures)\s+of|select\s+all\s+squares\s+with|anti-bot\s+measure|human\s+verification\s+required)\b', re.IGNORECASE)},
{'name': 'hasEmptyBodyCSR', 'pattern': re.compile(r'<body>\s*</body>', re.IGNORECASE)},
{'name': 'hasMinimalAppDiv', 'pattern': re.compile(r'<(div|main)\s+id\s*=\s*["\'](?:app|root|main|container)["\']\s*>\s*(?:<\/[^>]+>)?', re.IGNORECASE)},
{'name': 'hasUnprocessedTemplateLiterals', 'pattern': re.compile(r'\{\{[^}]+\}\}', re.IGNORECASE)},
{'name': 'hasAngularJSUnprocessedDirectives', 'pattern': re.compile(r'\bng-(?:app|if|repeat|bind|controller)\b', re.IGNORECASE)},
{'name': 'hasCRANoscript', 'pattern': re.compile(r'<noscript>[^<]*You need to enable JavaScript to run this app', re.IGNORECASE)},
{'name': 'hasClientSideTemplateScripts', 'pattern': re.compile(r'<script[^>]+type=["\'](?:text\/(?:x-)?template|text\/x-handlebars-template|text\/ng-template)["\']', re.IGNORECASE)},
{'name': 'hasAjaxCrawlableMeta', 'pattern': re.compile(r'<meta[^>]+name=["\']fragment["\'][^>]+content=["\']!["\']', re.IGNORECASE)},
{'name': 'hasNextFlightStream', 'pattern': re.compile(r'self\.__next_f\.push\(', re.IGNORECASE)},
# III. Strong CSR & Heavy Hydration Indicators
{'name': 'usesDomMutationObservers', 'pattern': re.compile(r'new\s+MutationObserver\s*\(', re.IGNORECASE)},
{'name': 'hasMeteorRuntimeConfig', 'pattern': re.compile(r'__meteor_runtime_config__', re.IGNORECASE)},
{'name': 'hasBotGates', 'pattern': re.compile(r'(g-recaptcha|hcaptcha\.com|challenges\.cloudflare\.com|perimeterx|datadome|akamai|arkoselabs)', re.IGNORECASE)},
{'name': 'hasBlazorWebAssembly', 'pattern': re.compile(r'\/_framework\/blazor\.(?:server|webassembly)\.js\b', re.IGNORECASE)},
{'name': 'hasClientSideRoutingOnClick', 'pattern': re.compile(r'<a\b[^>]*\bonclick\s*=["\'][^"\']+["\'][^>]*>', re.IGNORECASE)},
{'name': 'hasFrameworkSpecificStructures', 'pattern': re.compile(r'<(router-outlet|Outlet|app-root|vue-app|custom-element)\b[^>]*>', re.IGNORECASE)},
{'name': 'hasWebflowRuntime', 'pattern': re.compile(r'assets\.website-files\.com\/[^"\']+\/webflow(?:\.min)?\.js|\bdata-wf-(?:site|status|page)\b', re.IGNORECASE)},
{'name': 'hasKnockoutBindings', 'pattern': re.compile(r'\bdata-bind\s*=|ko\.applyBindings\s*\(', re.IGNORECASE)},
{'name': 'hasWaiAriaLiveRegions', 'pattern': re.compile(r'(aria-live=[\'"](polite|assertive)[\'"]|aria-busy=[\'"]true[\'"])', re.IGNORECASE)},
{'name': 'usesIntersectionObserver', 'pattern': re.compile(r'new\s+IntersectionObserver\s*\(', re.IGNORECASE)},
{'name': 'hasMithrilRouter', 'pattern': re.compile(r'\bm\.route\s*\(', re.IGNORECASE)},
{'name': 'hasRemixManifest', 'pattern': re.compile(r'window\.__remixManifest', re.IGNORECASE)},
{'name': 'hasStencilRuntime', 'pattern': re.compile(r'@stencil|data-stencil-build|\/stencil(?:\.esm)?\.js\b', re.IGNORECASE)},
{'name': 'hasAlternativeBundlerRuntimes', 'pattern': re.compile(r'(\bparcelRequire\s*=\s*function|\/esbuild\.[^"\']+\.js\b|\besbuild:\b|\bSystem\.register\s*\(|\/_snowpack\/)', re.IGNORECASE)},
{'name': 'embedsReduxOrApolloBlobs', 'pattern': re.compile(r'window\.__(?:REDUX|APOLLO)_STATE__', re.IGNORECASE)},
{'name': 'hasDynamicComponentLoading', 'pattern': re.compile(r'React\.lazy\(|Vue\.component\(', re.IGNORECASE)},
{'name': 'usesCodeSplittingOrDynamicImports', 'pattern': re.compile(r'import\(\s*[\'"][^\'"]+[\'"]\s*\)', re.IGNORECASE)},
{'name': 'hasClientSideRoutingComponents', 'pattern': re.compile(r'<(router-outlet|Outlet|router-link)\b[^>]*>', re.IGNORECASE)},
{'name': 'hasStateManagementLibraries', 'pattern': re.compile(r'<script[^>]*\bsrc=["\'][^"\']*\b(?:redux|mobx|vuex|ngrx)\b[^"\']+\.js["\'][^>]*><\/script>', re.IGNORECASE)},
{'name': 'hasWebComponentsTag', 'pattern': re.compile(r'<[a-z]+-[a-z]+[^>]*><\/[a-z]+-[a-z]+>', re.IGNORECASE)},
# New high-value rules
{'name': 'usesHistoryAPI', 'pattern': re.compile(r'history\.(pushState|replaceState)\s*\(', re.IGNORECASE)},
{'name': 'usesReduxPersist', 'pattern': re.compile(r'\bpersistReducer\s*\(', re.IGNORECASE)},
{'name': 'hasInertia', 'pattern': re.compile(r'@inertiajs|Inertia\.(visit|get|post)', re.IGNORECASE)},
{'name': 'hasLaravelLivewire', 'pattern': re.compile(r'wire:|livewire:|@livewire', re.IGNORECASE)},
{'name': 'hasPhoenixLiveView', 'pattern': re.compile(r'phx-|data-phx-|Phoenix\.LiveView', re.IGNORECASE)},
# IV. Lightweight Interactivity / Progressive Enhancement
{'name': 'hasUnpolyDirectives', 'pattern': re.compile(r'\bup-(?:target|follow|layer|data|main|on|modal|autosubmit)\b', re.IGNORECASE)},
{'name': 'hasIsLandIslands', 'pattern': re.compile(r'<is-land\b', re.IGNORECASE)},
{'name': 'hasAlpineDirectives', 'pattern': re.compile(r'\bx-data\b|x-bind:|x-on:|@click=|x-model\b', re.IGNORECASE)},
{'name': 'hasWPInteractivityDirectives', 'pattern': re.compile(r'\bdata-wp-(?:interactive|bind|on|class|text|style|watch)\b', re.IGNORECASE)},
# V. Supporting & Behavioral CSR Indicators
{'name': 'hasModernBuildToolArtifactsScriptJS', 'pattern': re.compile(r'<script[^>]*\bsrc=["\'][^"\']+\.[a-z0-9]{6,}\.js["\'][^>]*><\/script>', re.IGNORECASE)},
{'name': 'hasQwikResumability', 'pattern': re.compile(r'(\bq:(?:container|base|slot)\b|qrl=|\/qwikloader\.js)', re.IGNORECASE)},
{'name': 'hasClientOnlyIonicComponents', 'pattern': re.compile(r'<ion-[^>]+>', re.IGNORECASE)},
{'name': 'hasLoadingSpinnerMarkup', 'pattern': re.compile(r'<[^>]+(?:id|class)=["\'][^"\']*\b(spinner|loader|skeleton|shimmer|placeholder)\b[^"\']*["\']|<app-root>\s*Loading\.+\s*<\/app-root>', re.IGNORECASE)},
{'name': 'hasGatsbyAnnouncer', 'pattern': re.compile(r'id=["\']gatsby-announcer["\']', re.IGNORECASE)},
{'name': 'hasHelmetMarkers', 'pattern': re.compile(r'\bdata-rh=["\']true["\']|\bdata-react-helmet=["\']true["\']', re.IGNORECASE)},
{'name': 'hasVueAllowMismatch', 'pattern': re.compile(r'\bdata-allow-mismatch\b', re.IGNORECASE)},
{'name': 'hasBaseHrefRooted', 'pattern': re.compile(r'<base\s+href=["\'][^"\']*["\'][^>]*>', re.IGNORECASE)},
# VI. Low-Confidence & General Web Patterns
{'name': 'hasBundledScript', 'pattern': re.compile(r'<script[^>]*\bsrc\s*=\s*["\'][^"\']+\.(?:bundle|chunk|main|app|min|vendor)\.[a-z0-9]{0,}\.js["\'][^>]*><\/script>', re.IGNORECASE)},
{'name': 'hasES6Modules', 'pattern': re.compile(r'<script\s+type=["\']module["\'][^>]*>[\s\S]*<\/script>', re.IGNORECASE)},
{'name': 'hasInlineEventHandlers', 'pattern': re.compile(r'<(?:a|button|div|span)\b[^>]*\s+onclick\s*=["\'][^"\']+["\'][^>]*>', re.IGNORECASE)},
{'name': 'hasAsyncDeferScripts', 'pattern': re.compile(r'<script[^>]*\b(?:async|defer)\b[^>]*><\/script>', re.IGNORECASE)},
{'name': 'hasClientSideRoutingHash', 'pattern': re.compile(r'<a\b[^>]*\bhref\s*=\s*["\'][^"\']*#\/[^"\']*["\'][^>]*>', re.IGNORECASE)},
{'name': 'usesAjaxOrFetch', 'pattern': re.compile(r'\b(fetch|XMLHttpRequest|\.ajax)\s*\(', re.IGNORECASE)},
{'name': 'usesWebSockets', 'pattern': re.compile(r'new\s+WebSocket\s*\(', re.IGNORECASE)},
{'name': 'hasLazyLoadingAttributes', 'pattern': re.compile(r'\bloading=["\']lazy["\']', re.IGNORECASE)},
{'name': 'hasDynamicMetaOrTitle', 'pattern': re.compile(r'document\.title\s*=|document\.querySelector\([\'"]meta', re.IGNORECASE)},
{'name': 'usesReactHooks', 'pattern': re.compile(r'\buseState\b|\buseEffect\b|\buseContext\b', re.IGNORECASE)},
{'name': 'hasShadowDOM', 'pattern': re.compile(r'attachShadow\s*\(\s*({[^}]*}|null)\s*\)', re.IGNORECASE)},
{'name': 'hasTailwindClasses', 'pattern': re.compile(r'\b(bg|text|p|m|flex|grid)-[a-z0-9]+(?:-[a-z0-9]+)*', re.IGNORECASE)},
{'name': 'usesJQuery', 'pattern': re.compile(r'code\.jquery\.com\/jquery', re.IGNORECASE)},
{'name': 'usesDataVizLibraries', 'pattern': re.compile(r'\bD3\b|\bChart\.js\b|\bHighcharts\b', re.IGNORECASE)},
]
# The definitive mapping of all rule names to their weights.
WEIGHTS = {
# I. SSR / SSG / Server-First Affirmations (NEGATIVE SCORE)
'hasVueSSRRoot': -35, 'hasFastBootShoebox': -30, 'hasAngularServerContext': -25, 'hasSvelteHydrationMarks': -25, 'hasReactStreamPlaceholders': -22, 'hasAngularTransferState': -20, 'hasDeclarativeShadowDOM': -20, 'hasVueMetaSSR': -15, 'hasReactSSRAttributes': -15, 'hasSolidHydrationMarks': -20, 'hasStaticSiteGeneratorArtifacts': -30, 'hasMarkoSSRAttributes': -25, 'embedsHypernova': -20, 'hasVueSSRClientHydration': -15, 'hasNextJSHydrationMarker': -10, 'hasSvelteHydrationScript': -15, 'hasSSRFrameworkScript': -5,
# II. High-Confidence CSR Indicators ("Smoking Guns")
'hasBotChallengeText': 55, 'hasEmptyBodyCSR': 50, 'hasMinimalAppDiv': 50, 'hasUnprocessedTemplateLiterals': 45, 'hasAngularJSUnprocessedDirectives': 40, 'hasCRANoscript': 40, 'hasClientSideTemplateScripts': 35, 'hasAjaxCrawlableMeta': 35, 'hasNextFlightStream': 30,
# III. Strong CSR & Heavy Hydration Indicators
'usesDomMutationObservers': 28, 'hasMeteorRuntimeConfig': 28, 'hasBotGates': 45, 'hasBlazorWebAssembly': 35, 'hasClientSideRoutingOnClick': 35, 'hasFrameworkSpecificStructures': 30, 'hasWebflowRuntime': 25, 'hasKnockoutBindings': 25, 'hasWaiAriaLiveRegions': 22, 'usesIntersectionObserver': 25, 'hasMithrilRouter': 25, 'hasRemixManifest': 25, 'hasStencilRuntime': 25, 'hasAlternativeBundlerRuntimes': 20, 'embedsReduxOrApolloBlobs': 25, 'hasDynamicComponentLoading': 25, 'usesCodeSplittingOrDynamicImports': 20, 'hasClientSideRoutingComponents': 25, 'hasStateManagementLibraries': 20, 'hasWebComponentsTag': 20, 'usesHistoryAPI': 25, 'usesReduxPersist': 25, 'hasInertia': 20, 'hasLaravelLivewire': 18, 'hasPhoenixLiveView': 18,
# IV. Lightweight Interactivity / Progressive Enhancement
'hasUnpolyDirectives': 20, 'hasIsLandIslands': 20, 'hasAlpineDirectives': 18, 'hasWPInteractivityDirectives': 18,
# V. Supporting & Behavioral CSR Indicators
'hasModernBuildToolArtifactsScriptJS': 15, 'hasQwikResumability': 25, 'hasClientOnlyIonicComponents': 15, 'hasLoadingSpinnerMarkup': 15, 'hasGatsbyAnnouncer': 12, 'hasHelmetMarkers': 10, 'hasVueAllowMismatch': 8, 'hasBaseHrefRooted': 8,
# VI. Low-Confidence & General Web Patterns (Scores 1-5)
'hasBundledScript': 3, 'hasES6Modules': 2, 'hasInlineEventHandlers': 3, 'hasAsyncDeferScripts': 1, 'hasClientSideRoutingHash': 5, 'usesAjaxOrFetch': 2, 'usesWebSockets': 2, 'hasLazyLoadingAttributes': 1, 'hasDynamicMetaOrTitle': 4, 'usesReactHooks': 4, 'hasShadowDOM': 5, 'hasTailwindClasses': 1, 'usesJQuery': 1, 'usesDataVizLibraries': 1,
}
def _compute_additional_signals(self, html: str) -> dict:
"""Calculates scores based on complex heuristics that require more than a single regex."""
score = 0
reasons = []
if not html or len(html) == 0:
return {'score': score, 'reasons': reasons}
# A) Low HTML Length Check
if len(html) < 2000:
score += 10
reasons.append("computed:lowHtmlLength")
# B) Low Text Content Check
no_scripts_styles = re.sub(r'<style[^>]*>[\s\S]*?<\/style>', '', html, flags=re.IGNORECASE)
no_scripts_styles = re.sub(r'<script[^>]*>[\s\S]*?<\/script>', '', no_scripts_styles, flags=re.IGNORECASE)
text_content = re.sub(r'<[^>]+>', '', no_scripts_styles).strip()
if 0 < len(text_content) < 100:
score += 15
reasons.append("computed:veryLowTextContent")
# C) Next.js __NEXT_DATA__ -> Check for gssp:true (definitive SSR)
next_data_match = re.search(r'<script[^>]+id=["\']__NEXT_DATA__["\'][^>]*>([\s\S]*?)<\/script>', html, re.IGNORECASE)
if next_data_match:
try:
j = json.loads(next_data_match.group(1))
if j.get('gssp'):
score -= 35
reasons.append("computed:nextGSSP")
except json.JSONDecodeError:
pass # Ignore parse error
# D) Nuxt.js __NUXT__ -> Check for serverRendered:true (definitive SSR)
nuxt_blob_match = re.search(r'window\.__NUXT__\s*=\s*({[\s\S]*?});', html, re.IGNORECASE)
if nuxt_blob_match:
try:
# Need to be careful here, as it's JS object literal, not strict JSON
# A simple check for the property is safer than a full parse
if '"serverRendered":true' in nuxt_blob_match.group(1):
score -= 25
reasons.append("computed:nuxtServerRendered")
except Exception:
pass
# E) React hydration calls present in inline scripts
if re.search(r'\bhydrateRoot\s*\(|ReactDOM\.hydrate\s*\(', html, re.IGNORECASE):
score += 15
reasons.append("computed:reactHydrateCall")
# F) Script Weight Ratio - High ratio of script bytes to total HTML
script_matches = re.findall(r'<script[^>]*>([\s\S]*?)<\/script>', html, re.IGNORECASE)
total_script_bytes = sum(len(match) for match in script_matches)
if len(html) > 0 and total_script_bytes / len(html) > 0.4: # >40% script content
score += 20
reasons.append("computed:highScriptRatio")
# G) Module Preload/Prefetch Count - Many preloads suggest complex dependency graph
modulepreload_count = len(re.findall(r'<link[^>]+rel=["\']modulepreload["\']', html, re.IGNORECASE))
if modulepreload_count > 5:
score += 15
reasons.append("computed:manyModulePreloads")
# H) Large noscript fallback - Negative signal suggesting quality static content
noscript_matches = re.findall(r'<noscript[^>]*>([\s\S]*?)<\/noscript>', html, re.IGNORECASE)
total_noscript_bytes = sum(len(match) for match in noscript_matches)
if total_noscript_bytes > 8192: # >8KB of noscript content
score -= 20
reasons.append("computed:largeNoscriptFallback")
return {'score': score, 'reasons': reasons}
def analyze(self, html: str) -> dict:
"""
Analyzes an HTML string and returns a detailed report.
"""
if not isinstance(html, str):
return {'totalScore': 0, 'indicatesJS': False, 'matchedRules': ['error:invalid_input']}
total_score = 0
matched_rules_details = []
for rule in self.RULES:
if rule['pattern'].search(html):
weight = self.WEIGHTS.get(rule['name'], 0)
total_score += weight
matched_rules_details.append({'rule': rule['name'], 'score': weight})
computed = self._compute_additional_signals(html)
total_score += computed['score']
for reason in computed['reasons']:
# Simplified score for reporting purposes
reason_score = 10 if 'low' in reason or 'react' in reason else -25
matched_rules_details.append({'rule': reason, 'score': reason_score})
# Sort matched rules by the absolute value of their score for importance
matched_rules_details.sort(key=lambda x: abs(x['score']), reverse=True)
matched_rules_formatted = [
f"{item['rule']} ({'+' if item['score'] > 0 else ''}{item['score']})"
for item in matched_rules_details
]
return {
'totalScore': round(total_score),
'indicatesJS': total_score >= self.SCORE_THRESHOLD,
'matchedRules': matched_rules_formatted,
}
# ========================================================================
# HOW TO USE THE DETECTOR
# ========================================================================
if __name__ == '__main__':
print("--- Rendering Detector Demo ---")
# Instantiate the detector
detector = RenderingDetector()
demos = {
"Classic CSR (Create React App)": """
<!DOCTYPE html><html><head><title>My App</title></head><body>
<div id="root"></div>
<noscript>You need to enable JavaScript to run this app.</noscript>
<script src="/bundle.js"></script></body></html>
""",
"Modern SSR (Next.js)": """
<!DOCTYPE html><html><head><title>News</title><meta name="generator" content="Next.js"></head>
<body><div id="__next" data-reactroot><h1>Article</h1><p>Content.</p></div>
<script id="__NEXT_DATA__" type="application/json">{"props":{"pageProps":{}},"page":"/","query":{},"gssp":true}</script></body></html>
""",
"Hybrid with Skeleton Loader": """
<!DOCTYPE html><html><body><header>My Site</header><main>
<div class="skeleton-loader"></div>
</main><script src="/app.chunk.a4b8c1.js"></script></body></html>
""",
"Static Site with Alpine.js": """
<!DOCTYPE html><html><body><h1>Welcome</h1>
<div x-data="{ open: false }"><button @click="open = !open">Toggle</button><div x-show="open">Content</div></div>
<script src="/alpine.js" defer></script></body></html>
""",
"Bot-Gated Page (Cloudflare)": """
<!DOCTYPE html><html><body><h1>Security Check</h1>
<p>Please solve the captcha to prove you're not a robot.</p>
<div id="cf-turnstile"></div>
<script src="https://challenges.cloudflare.com/turnstile/v0/api.js" async defer></script></body></html>
""",
}
for name, html_content in demos.items():
print(f"\n[ANALYZING] {name}")
result = detector.analyze(html_content)
print(f" > Final Score: {result['totalScore']}")
print(f" > Needs JS Render: {result['indicatesJS']}")
print(" > Top Matched Rules:", result['matchedRules'][:5])
How to Use the Engine
Integrating the RenderingDetector
into your crawler is straightforward.
- Save the Code: Save the Python class above as a file, for example,
detector.py
. - Make an Initial Request: In your crawler logic, perform a cheap, standard HTTP GET request to fetch the raw HTML of your target URL. Do not use a headless browser for this first step.
- Analyze the HTML: Pass the raw HTML content to the detector.
- Make a Decision: Based on the
indicatesJS
boolean in the result, decide your next action:- If
False
, proceed to parse the HTML you already have. - If
True
, send the URL to your dedicated queue for rendering with a headless browser like Puppeteer, Playwright, or Selenium.
- If
Here's a conceptual example of how it would fit into a crawler's workflow:
# from detector import RenderingDetector
# import requests
# from my_headless_browser_service import render_with_browser
# detector = RenderingDetector()
# url_queue = ["http://example.com/static-page", "http://example.com/dynamic-app"]
# for url in url_queue:
# try:
# # 1. Make the cheap initial request
# response = requests.get(url, timeout=10)
# raw_html = response.text
# # 2. Analyze the HTML
# analysis_result = detector.analyze(raw_html)
# print(f"Analysis for {url}: Score = {analysis_result['totalScore']}")
# # 3. Make a decision
# if analysis_result['indicatesJS']:
# print(" -> Escalating to full JS render...")
# # final_html = render_with_browser(url)
# else:
# print(" -> Processing as static HTML...")
# # final_html = raw_html
# # ... process the final_html ...
# except requests.RequestException as e:
# print(f"Failed to fetch {url}: {e}")
Conclusion
By moving from a brute-force approach to an intelligent, heuristic-based one, you can dramatically increase the efficiency and throughput of your web crawling infrastructure. This engine acts as a smart, resource-aware gatekeeper, ensuring you use the right tool for the right job. It saves you money on computation, speeds up your data collection, and ultimately allows you to build a smarter, more scalable system for understanding the web.