For anyone building a web crawler at scale, there is a fundamental truth: you are not crawling one internet, but two.

  1. The Static Internet: This is the web of documents. A server sends you a complete HTML file, rich with text and structure. It's fast, predictable, and perfectly legible to a simple HTTP client like cURL or Python's requests library.
  2. The Dynamic Internet: This is the web of applications. A server sends you a nearly empty HTML "shell." This shell contains links to large JavaScript files. Only when a browser downloads, parses, and executes this JavaScript does the application come to life—fetching data, rendering content, and building the page you actually see.

At scale, the cost difference between crawling these two internets is monumental. A simple HTTP request is measured in kilobytes of memory and milliseconds of CPU time. Firing up a full headless browser (like Puppeteer or Playwright) to render a dynamic page consumes hundreds of megabytes of RAM and can take seconds of CPU time.

Making the wrong choice is costly. If you use a headless browser on a static page, you've wasted thousands of times the necessary resources. If you use a simple HTTP client on a dynamic page, you get a useless empty shell and miss all your data. The single most important economic and technical decision in a modern crawling pipeline is deciding, per-URL, which internet you are looking at.

This guide dissects a powerful solution: a sophisticated, weighted heuristic engine. It's a system designed to act as an intelligent gatekeeper, analyzing the raw HTML from a cheap initial request to make a highly accurate prediction about whether a costly JS-render is necessary. We will explore every single rule in this engine, explaining its logic, its reliability, and its role in the final verdict.

TL;DR: Executive Summary

  • The Problem: Distinguishing between simple "static" pages and complex JavaScript "dynamic" applications is the single most important economic decision in web crawling at scale. Making the wrong choice wastes thousands of times the necessary resources.
  • Our Solution: This guide presents a sophisticated, weighted heuristic engine that analyzes raw HTML to predict if a costly browser render is necessary, using 60+ detection rules with positive/negative scoring and intelligent thresholds.
  • Your Takeaway: We provide the complete, production-ready Python code for this engine, allowing you to immediately integrate this intelligence into your own crawling pipeline and save massive computational costs.

Our Solution: An Intelligent Heuristic Engine

A simple checklist ("does it contain id='root'?") is too naive. The modern web is a complex tapestry of frameworks, build tools, and rendering patterns. Our engine embraces this complexity using four core principles:

  1. Weighted Scoring: Not all clues are equal. An empty <body> tag is a "smoking gun" for Client-Side Rendering (CSR) and deserves a high score (+50 points). An async attribute on a script tag is a common performance practice and means very little (+1 point). Every rule is assigned a weight reflecting its predictive power.
  2. Positive and Negative Indicators: The engine doesn't just look for signs of CSR. It also actively looks for definitive signs of Server-Side Rendering (SSR) or Static Site Generation (SSG). These "affirmations" subtract points from the score, providing a crucial counterbalance and preventing false positives.
  3. Advanced Heuristics: We go beyond simple regex patterns. The engine calculates the text-to-HTML ratio, parses embedded JSON blobs for framework-specific clues (like Next.js __NEXT_DATA__), analyzes script weight ratios, and counts module preloads to assess application complexity.
  4. Intelligent Threshold System: The scores are summed up. If the totalScore surpasses a defined SCORE_THRESHOLD (e.g., 30), the page is flagged as requiring a JS crawl. This approach is resilient to a few weak, misleading signals and relies on the collective weight of evidence rather than any single indicator.

The Engine, Deconstructed: An Exhaustive Rule-by-Rule Analysis

Let's dissect the brain of the detector. The rules are organized into tiers, from definitive server-side patterns to the faintest whispers of client-side activity.

I. SSR / SSG / Server-First Affirmations (Negative Score)

These are high-confidence signals that content was rendered on the server. Finding these patterns gives us confidence that a simple HTTP crawl is likely sufficient.

  • hasVueSSRRoot (-35): Looks for data-server-rendered="true". This is a definitive attribute added by Vue.js's SSR engine. It's a direct statement that the server did the rendering work.
  • hasFastBootShoebox (-30): Detects <script type="fastboot/shoebox">. This is a specific mechanism used by Ember.js's SSR solution, FastBoot, to pass server-fetched data to the client for hydration. Its presence guarantees an SSR context.
  • hasStaticSiteGeneratorArtifacts (-30): Scans for <meta name="generator" content="..."> with values like Jekyll, Hugo, Eleventy, Astro, Gatsby, or Next.js. This is an explicit declaration of the tool used to build the site, and these are overwhelmingly used for SSG/SSR.
  • hasAngularServerContext (-25): Finds the ng-server-context attribute. This is added by Angular Universal (its SSR solution) to indicate the content was server-generated.
  • hasSvelteHydrationMarks (-25): Searches for SvelteKit-specific attributes like data-svelte-h or data-sveltekit-hydrate-start. These are markers left in the HTML to guide the client-side hydration process, confirming an initial server render.
  • hasReactStreamPlaceholders (-22): Looks for patterns like <!--$?--> or <template id="B:0">. These are artifacts of React's modern streaming SSR, where the server sends a complete shell with "holes" that are filled in as data becomes available. It's a clear sign of a sophisticated server-first architecture.
  • And others in this category: Rules like hasAngularTransferState, hasDeclarativeShadowDOM, and hasMarkoSSRAttributes all follow the same principle—they identify specific, unambiguous fingerprints left by various server-rendering technologies. Their presence strongly argues against the need for a JS crawl to get the initial content.

Crawler's Action: When these rules match, the crawler can be highly confident that the HTML contains meaningful content. It should lean towards not escalating to a full browser render unless other powerful CSR indicators are also present.


II. High-Confidence CSR Indicators ("Smoking Guns")

These are the definitive signs of a CSR-first architecture. When these rules hit, a JS crawl is almost certainly mandatory.

  • hasBotChallengeText (+55): This is the ultimate signal. It looks for phrases like "checking your browser," "complete the captcha," or "prove you're not a robot." If this text is present, the real content is 100% gated behind a JavaScript-powered security challenge. No JS, no content. Period.
  • hasEmptyBodyCSR (+50): Detects <body>\s*</body>. An empty body is the most fundamental sign of a pure SPA. It's an architectural statement that JavaScript is responsible for constructing the entire visible page.
  • hasMinimalAppDiv (+50): Searches for an empty <div id="app|root|main|container">. This is the textbook pattern for frameworks like React, Vue, and Angular, which are configured to find this "mount point" and inject the application into it.
  • hasUnprocessedTemplateLiterals (+45): Finds raw, unprocessed template placeholders like {{ some_variable }}. If these are visible in the raw HTML, it means the client-side templating engine (like Angular.js, Vue, or Handlebars) has not run yet. The content is incomplete.
  • hasCRANoscript (+40): Detects the specific <noscript>You need to enable JavaScript to run this app.</noscript> message provided by create-react-app. This is a direct, developer-written confession that the page is a CSR application.
  • hasClientSideTemplateScripts (+35): Looks for <script type="text/template"> or similar. This is a classic pattern for shipping HTML templates to the client, which are then rendered and injected into the DOM by JavaScript.

Crawler's Action: A match on any of these rules is a command. The crawler should immediately escalate the URL to the full, JavaScript-enabled rendering queue. Proceeding with the static HTML will result in missing data.


III. Strong CSR & Heavy Hydration Indicators

These patterns signal that even if some HTML is present (SSR/SSG), it is a "lifeless" shell that requires a heavy dose of JavaScript to become interactive and load its full content.

  • hasBotGates (+45): Looks for URLs or patterns related to reCAPTCHA, Cloudflare Turnstile, hCaptcha, PerimeterX, etc. Similar to the bot challenge text, this indicates an external, JS-powered security gate that must be passed.
  • hasClientSideRoutingOnClick (+35): Finds <a onclick="..."> tags used for navigation. This is an anti-pattern for SEO but a strong signal for crawlers. It means navigation is controlled by a custom JS function, not standard href links, indicating a client-side router is managing the page.
  • hasFrameworkSpecificStructures (+30): Detects custom elements like <router-outlet> (Angular) or <Outlet> (React Router). These are not standard HTML; they are instructions for a framework's JS router to render a component.
  • embedsReduxOrApolloBlobs (+25): Finds window.__REDUX_STATE__ or window.__APOLLO_STATE__. These are large blobs of JSON data embedded in the page. They represent the "initial state" of a complex application, which is fed to the client-side JavaScript to bootstrap the app—a process known as hydration.
  • usesCodeSplittingOrDynamicImports (+20): Searches for the import() syntax. This is the mechanism for code-splitting, where parts of the application's JavaScript are intentionally not loaded until they are needed (e.g., when a user clicks a button to open a dialog). This implies the initial page is incomplete by design.

Crawler's Action: These signals strongly suggest that the initial HTML does not represent the full, interactive state of the page. A JS crawl is required to capture dynamically loaded components, post-hydration content, and the behavior of the application.


IV. Lightweight Interactivity / Progressive Enhancement

This category identifies "islands" architectures or progressive enhancement libraries. The server sends functional HTML, but JavaScript is required to make it interactive.

  • hasAlpineDirectives (+18): Looks for Alpine.js attributes like x-data, @click, or x-show.
  • hasWPInteractivityDirectives (+18): Looks for the new WordPress Interactivity API attributes like data-wp-interactive.
  • hasIsLandIslands (+20): Detects the <is-land> custom element from the Eleventy Islands library.

Crawler's Action: This is a gray area. If the goal is just to extract the visible text, a simple crawl might suffice. However, if the goal is to interact with the page (e.g., open dropdowns, click tabs to reveal content), a JS crawl is necessary. These signals warrant a JS crawl for any task beyond simple text extraction.


V. Supporting & Behavioral CSR Indicators

These are weaker signals that, on their own, are not definitive but add corroborating evidence to the overall score.

  • hasModernBuildToolArtifactsScriptJS (+15): Finds script tags with hashed filenames, like main.a4b8c1f9.js. This is a very common output of modern bundlers like Webpack and Vite, which are overwhelmingly used for SPAs.
  • hasLoadingSpinnerMarkup (+15): Detects class names or IDs like spinner, loader, or skeleton. This implies that the real content is not yet present and is being fetched asynchronously by JavaScript.
  • hasHelmetMarkers (+10): Finds data-rh="true". This attribute comes from React Helmet, a library used to manage the document <head> (e.g., the page title) from within JS components. It suggests the page's metadata is dynamic.

VI. Low-Confidence & General Web Patterns (The Long Tail)

This final, massive category includes the dozens of rules from your original comprehensive list. For completeness, they are included in the engine but are assigned very low scores (1-5 points). Why? Because they suffer from one of two problems:

  1. They are nearly universal: Patterns like async/defer scripts (hasAsyncDeferScripts), lazy-loading images (hasLazyLoadingAttributes), or using jQuery (includesJQuery) are common best practices across static sites, blogs, and complex SPAs. They provide almost no differentiating signal.
  2. They look for JS syntax in HTML: Rules like usesAjaxOrFetch or usesReactHooks are fundamentally flawed for static analysis because this code lives in external .js files, not the initial HTML document. They are included here as a legacy, but their practical value is near zero.

Crawler's Action: These rules should be treated as tie-breakers at best. A high score built entirely from these weak signals is unreliable. The final decision should always be dominated by the high-confidence rules in Tiers I-III.


Beyond Regex: Computed Heuristics

The most advanced part of the engine goes beyond simple patterns and computes signals based on the overall structure of the document.

  • Low HTML Length / Low Text Content: An HTML file under 2KB, or one that contains fewer than 100 characters of actual text after stripping all tags, is highly likely to be an empty application shell.
  • Parsing JSON Blobs: The engine specifically looks for __NEXT_DATA__ (Next.js) and __NUXT__ (Nuxt.js) script tags, parses their JSON content, and checks for properties like gssp: true or serverRendered: true. This is a far more reliable way to confirm SSR than just finding the script tag itself.
  • Script Weight Ratio: The engine calculates the ratio of bytes within <script> tags to the total HTML size. A high ratio (>40%) indicates the page is primarily JavaScript code rather than content, strongly suggesting a CSR application.
  • Module Preload/Prefetch Count: Counting <link rel="modulepreload"> tags reveals the complexity of the dependency graph. More than 5 module preloads typically indicates a sophisticated build system and complex CSR application.
  • Large <noscript> Fallback: If a developer has provided thousands of bytes of content (>8KB) inside <noscript> tags, it's actually a negative signal. This suggests they've created a high-quality static fallback, meaning a simple crawl might capture the essential content without needing JavaScript execution.

Greatest Hits: Top 20 Detection Rules

The most impactful rules that drive the majority of detection decisions. These are the patterns with the highest absolute scores—both positive (CSR indicators) and negative (SSR affirmations).

Rule Name Score Tier Pattern
hasBotChallengeText +55 Smoking Gun Detects "prove you're not a robot", "complete the captcha"
hasEmptyBodyCSR +50 Smoking Gun Empty <body> tag: <body>\s*</body>
hasMinimalAppDiv +50 Smoking Gun Empty mount point: <div id="app|root">\s*</div>
hasClientOnlyComponentTags +50 Smoking Gun <ClientOnly>, <nuxt-client-only>, Blazor InteractiveWebAssembly
hasBotGates +45 Strong CSR reCAPTCHA, Cloudflare, hCaptcha, PerimeterX challenges
hasUnprocessedTemplateLiterals +45 Smoking Gun Raw template placeholders: {{ variable }}
hasQwikAttributes +40 Strong CSR Qwik resumability: q:container, q:base, qrl=
hasAngularJSUnprocessedDirectives +40 Smoking Gun Raw AngularJS: ng-app, ng-if, ng-repeat
hasCRANoscript +40 Smoking Gun Create React App: "You need to enable JavaScript"
hasVueSSRRoot -35 SSR Confirmed Vue SSR: data-server-rendered="true"
hasClientSideTemplateScripts +35 Smoking Gun <script type="text/template">
hasClientSideRoutingOnClick +35 Strong CSR Navigation via <a onclick="...">
hasBlazorWebAssembly +35 Strong CSR Blazor WebAssembly runtime
hasAjaxCrawlableMeta +35 Smoking Gun Hash-bang routing: <meta name="fragment" content="!">
hasRemixManifest +35 Strong CSR window.__remixManifest
hasAstroIslands +35 Strong CSR <astro-island> components
hasFrameworkSpecificStructures +30 Strong CSR <router-outlet>, <Outlet>, <app-root>
hasNextFlightStream +30 Strong CSR Next.js streaming: self.__next_f.push(
hasStaticSiteGeneratorArtifacts -30 SSR Confirmed Generator meta: Jekyll, Hugo, Eleventy, Astro, Gatsby
hasFastBootShoebox -30 SSR Confirmed Ember FastBoot: <script type="fastboot/shoebox">

Complete Rule Reference Table

The exhaustive table of all detection rules, organized by confidence tier. This serves as the complete reference for understanding every signal the engine analyzes.

Summary Statistics

  • Total Rules: 80+ detection patterns
  • Negative Score Rules (SSR): 16 rules (-35 to -5 points)
  • High-Confidence CSR Rules: 15 rules (35-55 points)
  • Strong CSR Indicators: 25+ rules (20-35 points)
  • Supporting Indicators: 40+ rules (1-20 points)
Rule Name Category Pattern/Logic Score
I. SSR / SSG / Server-First Affirmations NEGATIVE SCORES Definitive server-side rendering signals -
hasVueSSRRoot SSR Confirmed /\bdata-server-rendered=["']?true["']?/i -35
hasStaticSiteGeneratorArtifacts SSR Confirmed /<meta\s+name=["']generator["']\s+content=["'](?:Jekyll|Hugo|Eleventy|Astro|Gatsby|Next\.js)["']/i -30
hasFastBootShoebox SSR Confirmed /<script[^>]+type=["']fastboot\/shoebox["'][^>]*>/i -30
hasAngularServerContext SSR Confirmed /\bng-server-context\b/i -25
hasSvelteHydrationMarks SSR Confirmed /\bdata-svelte-h=["'][^"']+["']/i -25
hasMarkoSSRAttributes SSR Confirmed /\bdata-marko=/i -25
hasReactStreamPlaceholders SSR Confirmed /(<!--\$\??-->|<!--\/\$-->|<template\s+id=["'][A-Z]:\d+)/i -22
hasAngularTransferState SSR Confirmed /<script[^>]+id=["']ng-state["'][^>]*type=["']application\/json["'][^>]*>/i -20
hasDeclarativeShadowDOM SSR Confirmed /<template\s+shadowrootmode=["'](?:open|closed)["']/i -20
hasSolidHydrationMarks SSR Confirmed `/\bdata-hk=["'][^"']+["'] \bdata-hydration\b/i`
embedsHypernova SSR Confirmed /<script[^>]+data-hypernova-key=/i -20
hasVueMetaSSR SSR Confirmed /\bdata-vue-meta-server-rendered\b/i -15
hasReactSSRAttributes SSR Confirmed /\bdata-reactroot\b|\bdata-reactid\b|<!--\s*react-(?:text|empty):/i -15
hasVueSSRClientHydration SSR Confirmed /__VUE_SSR__\s*=\s*true/i -15
hasSvelteHydrationScript SSR Confirmed /svelte-hydrate/i -15
hasNextJSHydrationMarker SSR Confirmed /<div\s+id=["']__next["']><\/div>/i -10
hasSSRFrameworkScript SSR Confirmed /<script[^>]*\bsrc=["'][^"']*\b(?:next|gatsby|nuxt|remix)\b[^"']+\.js["'][^>]*><\/script>/i -5
II. High-Confidence CSR Indicators ("Smoking Guns") Definitive client-side rendering signals High Positive
hasBotChallengeText Smoking Gun Bot challenge phrases, captcha requirements +55
hasEmptyBodyCSR Smoking Gun /<body>\s*<\/body>/i +50
hasMinimalAppDiv Smoking Gun /<(div|main)\s+id\s*=\s*["'](?:app|root|main)["']\s*>\s*<\/\1>/i +50
hasClientOnlyComponentTags Smoking Gun /(?:<nuxt-client-only>|<ClientOnly>|<ion-.*>|@rendermode="InteractiveWebAssembly")/ +50
hasUnprocessedTemplateLiterals Smoking Gun /\{\{[^}]+\}\}/ +45
hasAngularJSUnprocessedDirectives Smoking Gun /\bng-(?:app|if|repeat|bind|controller)\b/i +40
hasCRANoscript Smoking Gun /<noscript>[^<]*You need to enable JavaScript to run this app/i +40
hasJSXSyntax Smoking Gun /<script[^>]*>[^<]*\bReact\.createElement\b[^<]*<\/script>/i +40
hasVueRouterScript Smoking Gun /<script[^>]*\bsrc=["'][^"']*vue-router[^"']*\.js["'][^>]*><\/script>/i +40
hasAngularNgModule Smoking Gun /@NgModule\(\s*\{[^}]*\}\s*\)/i +40
hasFrontendFrameworks Smoking Gun /<script[^>]*\bsrc=["'][^"']*\b(?:react|vue|angular|svelte)\b[^"']*\.js["'][^>]*><\/script>/i +40
hasClientSideTemplateScripts Smoking Gun /<script[^>]+type=["'](?:text\/(?:x-)?template|text\/x-handlebars-template|text\/ng-template)["'] +35
hasAjaxCrawlableMeta Smoking Gun /<meta[^>]+name=["']fragment["'][^>]+content=["']!["'] +35
hasDynamicComponentLoading Smoking Gun `/React.lazy( Vue.component(/i`
III. Strong CSR & Heavy Hydration Indicators Strong client-side rendering signals High Positive
hasBotGates Strong CSR Bot detection services (reCAPTCHA, Cloudflare, etc.) +45
hasClientSideRoutingOnClick Strong CSR /<a\b[^>]*\bonclick\s*=["'][^"']+["'][^>]*>/i +35
hasBlazorWebAssembly Strong CSR /\/_framework\/blazor\.(?:server|webassembly)\.js\b/i +35
hasRemixManifest Strong CSR /window\.__remixManifest/i +35
hasNextFlightStream Strong CSR /self\.__next_f\.push\(/i +30
hasFrameworkSpecificStructures Strong CSR /<(router-outlet|Outlet|app-root|vue-app|custom-element)\b[^>]*>/i +35
hasBundledScript 2. Scripts and Modules /<script[^>]*\bsrc\s*=\s*["'][^"']+\.(?:bundle|chunk|main|app|min|vendor)\.[a-z0-9]{0,}\.js["'][^>]*><\/script>/i +20
hasES6Modules 2. Scripts and Modules /<script\s+type=["']module["'][^>]*>[\s\S]*<\/script>/i +3
hasInlineEventHandlers 2. Scripts and Modules /<(?:a|button|div|span)\b[^>]*\s+onclick\s*=["'][^"']+["'][^>]*>/i +10
hasAsyncDeferScripts 2. Scripts and Modules /<script[^>]*\b(?:async|defer)\b[^>]*><\/script>/i +2
hasInlineJSModules 2. Scripts and Modules /<script\s+type=["']module["'][^>]*>[\s\S]*<\/script>/i +2
hasClientSideRouting 3. Routing /<(router-outlet|Outlet|router-link)\b[^>]*>/i +45
hasClientSideRouting_Hash 3. Routing /<a\b[^>]*\bhref\s*=\s*["'][^"']*#\/[^"']*["'][^>]*>/i +45
hasClientSideRouting_Onclick 3. Routing /<a\b[^>]*\bonclick\s*=["'][^"']+["'][^>]*>/i +45
hasDynamicRouteParameters 3. Routing /\/:\w+/i +40
hasAjaxFetchCalls 4. AJAX and Fetch /\b(fetch|XMLHttpRequest|\.ajax)\s*\(/i +5
hasExternalApiWebSockets 4. AJAX and Fetch /new\s+WebSocket\s*\(/i +20
hasLoadingSpinner 5. Loading Indicators /\bloading-spinner\b/i +10
hasLoadingSpinner_Skeleton 5. Loading Indicators /\bskeleton\b/i +10
hasLazyLoadingAttributes 5. Loading Indicators /\bloading=["']lazy["']/i +2
hasLazyLoadedImages 5. Loading Indicators /\blazyload\b/i +5
hasMetaJsDependencies 6. Meta and SEO /<meta\s+name=["']fragment["']\s+content=["']!["'][^>]*>/i +5
hasMetaJsDependencies_BuiltWith 6. Meta and SEO /<!--\s*Built with\s+(?:Create React App|Vite|Vue\.js)\s*-->/i +2
hasDynamicMetaOrTitle 6. Meta and SEO /document\.title\s*=\s*["'][^"']+["']/i +15
hasDynamicMetaOrTitle_QuerySelector 6. Meta and SEO /document\.querySelector\(['"]meta\[name=["'][^"']+["']\)['"]\)\.setAttribute\(['"]content['"],\s*["'][^"']+["']\)/i +15
hasDynamicStyleLoading 7. Event Handlers and Listeners /document\.createElement\(['"]style['"]\)\.innerHTML\s*=/i +5
hasIntersectionObserversOrScroll_IntersectionObserver 7. Event Handlers and Listeners /IntersectionObserver\s*\(/i +5
hasIntersectionObserversOrScroll_ScrollListener 7. Event Handlers and Listeners /addEventListener\s*\(\s*['"]scroll['"]\)/i +5
usesReactHooksUsage 8. State Management and Hooks /\buseState\b|\buseEffect\b|\buseContext\b/i +40
usesCustomHooks 8. State Management and Hooks /\buse[A-Z][a-zA-Z0-9]*\b/i +5
hasStateManagementLibraries 8. State Management and Hooks /<script[^>]*\bsrc=["'][^"']*\b(?:redux|mobx|vuex|ngrx)\b[^"']+\.js["'][^>]*><\/script>/i +35
hasTemplatingLibraries 9. Components and Libraries /<script\s+type=["'](?:text\/template|text\/x-handlebars-template)["'][^>]*>[\s\S]*<\/script>/i +10
hasTemplatingLibraries_Placeholders 9. Components and Libraries /\{\{\s*[^}]+\s*\}\}/ +10
hasTemplatingLibraries_Comments 9. Components and Libraries /\{\/*\s*[^}]+\s*\*\/\}/ +5
hasTemplatingLibraries_HYDRATE 9. Components and Libraries /<!--\s*\[(HYDRATE|hydration)\]\s*-->/i +25
hasWebComponents_Tag 10. Web Components and Shadow DOM /<[a-z]+-[a-z]+[^>]*><\/[a-z]+-[a-z]+>/i +20
hasWebComponents_Script 10. Web Components and Shadow DOM /<script[^>]*\bsrc=["'][^"']+custom-element\.js["'][^>]*><\/script>/i +20
hasShadowDOM 10. Web Components and Shadow DOM `/attachShadow\s*(\s*({[^}]*} null)\s*)/i`
hasSSRFrameworkScripts 11. SSR Indicators /<script[^>]*\bsrc=["'][^"']*\b(?:next|gatsby|nuxt|remix)\b[^"']+\.js["'][^>]*><\/script>/i +2
hasSSRHydration_DataReactroot 11. SSR Indicators /data-reactroot/i +2
hasSSRHydration_ReactHydrate 11. SSR Indicators /ReactDOM\.hydrate\s*\(/i +2
hasHydrationMarkers 11. SSR Indicators /<div\s+id=["']__next["']><\/div>/i +2
hasHydrationScripts 11. SSR Indicators /hydrate\s*\(/i +2
hasProgressiveHydration 11. SSR Indicators /<script[^>]*\btype=["']application\/javascript["'][^>]*>[\s\S]*\bhydrate\b[\s\S]*<\/script>/i +2
hasDynamicImportExpressions 12. Modern Build Tools /\bimport\(\s*['"][^'"]+['"]\s*\)/i +30
hasDynamicImports 12. Modern Build Tools /import\(['"][^'"]+['"]\)/i +30
hasDynamicImportMeta 12. Modern Build Tools /import\.meta\./i +5
usesGraphQLMutations 12. Modern Build Tools /mutation\s*{[^}]+}/i +10
usesGraphQLQueries 12. Modern Build Tools /graphql\s*\(/i +15
hasGraphQLSubscriptions 12. Modern Build Tools /subscriptions\s*{[^}]+}/i +5
hasWebAssemblyModules 12. Modern Build Tools /<script[^>]*\bsrc=["'][^"']+\.wasm["'][^>]*><\/script>/i +5
hasModernBuildToolArtifacts_ScriptBundle 12. Modern Build Tools /<script[^>]*\bsrc=["'][^"']+\.[a-z0-9]{6}\.bundle\.js["'][^>]*><\/script>/i +10
hasModernBuildToolArtifacts_LinkChunkCSS 12. Modern Build Tools /<link[^>]*\bhref=["'][^"']+\.chunk\.[a-z0-9]{6}\.css["'][^>]*>/i +10
hasModernBuildToolArtifacts_ScriptJS 12. Modern Build Tools /<script[^>]*\bsrc=["'][^"']+\.[a-z0-9]{6}\.js["'][^>]*><\/script>/i +10
hasModernJSFeatures 12. Modern Build Tools /(\basync\s+function\b|\(\s*\)\s*=>|\bPromise\.all\b)/i +2
hasWebpackDevServer 12. Modern Build Tools /webpack-dev-server\/bundle\.js/i +2
usesHotModuleReplacement 12. Modern Build Tools /module\.hot/i +2
hasRuntimeBundlers 12. Modern Build Tools /<script[^>]*\bsrc=["'][^"']+\b(?:system|require)\.js["'][^>]*><\/script>/i +10
hasDynamicCSSLoading 13. Styling /document\.createElement\(['"]link['"]\)\.rel\s*=\s*['"]stylesheet['"]/i +5
usesDynamicCSSImports 13. Styling /import\s+['"]\.\/[^'"]+\.css['"]/i +5
hasCSSinJS 13. Styling /\bstyled\.|css\.inject\b/i +5
hasTailwindClasses 13. Styling /\b(bg|text|p|m|flex|grid)-[a-z]+\b/i +1
usesReactFragments 14. Components and Libraries /<React\.Fragment>|<>\s*<\/>/i +15
hasNextImageComponents 14. Components and Libraries /<Image\s+[^>]*\/>/i +2
hasVueRouterScript 14. Components and Libraries /<script[^>]*\bsrc=["'][^"']*vue-router[^"']*\.js["'][^>]*><\/script>/i +40
hasAngularNgModule 14. Components and Libraries /@NgModule\(\s*{[^}]*}\s*\)/i +40
hasSvelteHydration 14. Components and Libraries /svelte-hydrate/i +2
hasVirtualDOMReferences 14. Components and Libraries /React\.createElement|Vue\.createElement|h\(/i +2
hasServiceWorkersOrPWA_Manifest 14. Components and Libraries /<link\s+rel=["']manifest["']\s+href=["'][^"']+["']>/i +2
hasServiceWorkersOrPWA_Register 14. Components and Libraries /navigator\.serviceWorker\.register\(["'][^"']+["']\)/i +2
usesServiceWorkers 14. Components and Libraries /navigator\.serviceWorker\.register\(/i +2
usesGraphQLClient 14. Components and Libraries /<script[^>]*\bsrc=["'][^"']+\b(?:apollo|relay)\.js["'][^>]*><\/script>/i +15
usesCodeSplitting 14. Components and Libraries /import\(\s*['"][^'"]+['"]\s*\)/i +25
hasNoscriptContent 15. Additional Rules /<noscript>([\s\S]*?)<\/noscript>/i +1
hasObfuscatedMinifiedJS 15. Additional Rules /<script[^>]*\bsrc=["'][^"']+\.(?:min|bundle|vendor)\.[a-z0-9]*\.js["'][^>]*><\/script>/i +5
hasDynamicComponentLoading 15. Additional Rules `/React.lazy( Vue.component(/i`
hasDynamicThemeLoading 15. Additional Rules /loadTheme\(\s*['"][^"']+['"]\s*\)/i +2
hasReactStrictMode 15. Additional Rules /<React.StrictMode>/i +5
hasDynamicImportPolyfill 15. Additional Rules /import\.polyfill\(\)/i +10
hasJSFrameworkGlobalVariables 15. Additional Rules /\b(window\.(React|Vue|Angular))\b/i +25
hasJSXSyntax 15. Additional Rules /<script[^>]*>[^<]*\bReact\.createElement\b[^<]*<\/script>/i +40
hasModernLibsCDN 15. Additional Rules /<script[^>]*\bsrc=["']https?:\/\/cdn\.jsdelivr\.net\/npm\/(?:react|vue|angular|svelte)[^"']+\.js["'][^>]*><\/script>/i +2
usesEmberDataModels 16. Additional JS Frameworks /\bEmber\.Model\b/i +30
createsEmberApplication 16. Additional JS Frameworks /Ember\.Application\.create\s*\(/i +30
usesBackboneModelsCollections 16. Additional JS Frameworks /\bBackbone\.(?:Model|Collection)\b/i +25
usesBackboneRouter 16. Additional JS Frameworks /\bBackbone\.Router\.extend\s*\(/i +30
usesAlpineJSDirectives 16. Additional JS Frameworks /\bx-(?:data|bind|on|show|if|for)\b/i +15
initializesAlpineJS 16. Additional JS Frameworks /Alpine\.start\s*\(\s*\)/i +15
usesDocumentWrite 17. Server-Driven JS Injection /document\.write\s*\(\s*["']<script[^>]+><\/script>["']\s*\)/i +20
injectsScriptsWithInnerHTML 17. Server-Driven JS Injection /innerHTML\s*=\s*["']<script[^>]+><\/script>["']/i +25
usesServerSideTemplateJS 17. Server-Driven JS Injection /<%[^%]*%>/i +1
initializesWebWorkers 18. Web Workers & Service Workers /new\s+Worker\s*\(\s*["'][^"']+["']\s*\)/i +2
usesImportScriptsInWorkers 18. Web Workers & Service Workers /importScripts\s*\(\s*["'][^"']+["']\s*\)/i +2
serviceWorkerHandlesFetch 18. Web Workers & Service Workers /self\.addEventListener\s*\(\s*['"]fetch['"]\s*,/i +2
usesVirtualDomLibrary 19. Virtual DOM Manipulation /\bvirtual-dom\b/i +25
performsVirtualDomOperations 19. Virtual DOM Manipulation /\b(?:createElement|diff|patch)\s*\(/i +5
usesCustomFrameworkGlobals 20. Custom JS Frameworks /\bMyAppInit\b|\bCustomFramework\b/i +30
callsCustomInitFunctions 20. Custom JS Frameworks /\binitializeApp\s*\(/i +25
usesFeatureDetectionLibraries 21. Progressive Enhancement /\bModernizr\b/i +2
conditionallyLoadsScripts 21. Progressive Enhancement /if\s*\(\s*['"]feature['"]\s*\)\s*{\s*loadScript\s*\(/i +10
usesI18next 22. Client-Side Localization /\bi18next\b/i +10
loadsLanguageFiles 22. Client-Side Localization /loadLanguage\s*\(\s*["'][a-z]{2}["']\s*\)/i +5
usesSignalR 23. Real-Time Data /\bSignalR\b/i +5
usesFirebaseRealTimeDB 23. Real-Time Data /\bfirebase\.database\b/i +5
usesES6ModuleImports 24. Modular JS Patterns /\bimport\s+[^;]+from\s+['"][^'"]+['"]/i +5
usesES6ModuleExports 24. Modular JS Patterns /\bexport\s+(?:default\s+)?(?:class|function|const|let|var)\b/i +5
usesCommonJSRequire 24. Modular JS Patterns /\brequire\s*\(\s*['"][^'"]+['"]\s*\)/i +5
usesAMDDefineRequire 24. Modular JS Patterns `/\bdefine\s*( \brequire\s*(/i`
includesGoogleAnalytics 25. Analytics & Tracking /https?:\/\/www\.google-analytics\.com\/analytics\.js/i +1
includesjQuery 25. Analytics & Tracking /<script[^>]*\bsrc=["']https?:\/\/code\.jquery\.com\/jquery[^"']+\.js["'][^>]*><\/script>/i +2
loadsGoogleAnalyticsAsync 25. Analytics & Tracking /<script[^>]*\basync\b[^>]*\bsrc=["']https:\/\/www\.google-analytics\.com\/analytics\.js["'][^>]*><\/script>/i +1
injectsTrackingPixels 25. Analytics & Tracking /createElement\s*\(\s*['"]img['"]\s*\)\.src\s*=/i +1
usesMJML 26. Email Template Rendering /\bmjml\b/i +1
usesHandlebarsEmailTemplates 26. Email Template Rendering /\bHandlebars\.compile\s*\(/i +1
generatesEmailContentDynamically 26. Email Template Rendering /generateEmail\s*\(\s*\)/i +1
usesAuth0 27. User Authentication /\bAuth0\.WebAuth\b/i +5
usesFirebaseAuth 27. User Authentication /\bfirebase\.auth\b/i +5
usesPassportJS 27. User Authentication /\bPassport\b/i +1
usesDragula 28. Drag and Drop /\bdragula\s*\(/i +2
usesSortableJS 28. Drag and Drop /\bSortable\.create\s*\(/i +2
usesInteractJS 28. Drag and Drop /\binteract\.\w+\s*\(/i +2
usesDropzoneJS 29. File Upload /\bDropzone\.autoDiscover\s*=/i +2
usesFineUploader 29. File Upload /\bFineUploader\s*\(/i +2
usesUppy 29. File Upload /\bUppy\.use\s*\(/i +2
serializesStateWithJSON 30. State Serialization /\bJSON\.stringify\s*\(/i +2
deserializesStateWithJSON 30. State Serialization /\bJSON\.parse\s*\(/i +2
usesStateSerializationLibraries 30. State Serialization /\bserializeState\b|\bdeserializeState\b/i +20
usesLodash 31. Data Filtering/Sorting /\b_\.\w+\s*\(/i +2
usesUnderscoreJS 31. Data Filtering/Sorting /\b_\.\w+\s*\(/i +2
usesRamda 31. Data Filtering/Sorting /\bR\.\w+\s*\(/i +2
usesToastr 32. Notification Libraries /\btoastr\.\w+\s*\(/i +1
usesNoty 32. Notification Libraries /new\s+Noty\s*\(/i +1
usesSweetAlert 32. Notification Libraries /\bswal\s*\(/i +1
usesVideoJS 34. Video & Media /\bvideo\.js\b/i +2
usesPlyr 34. Video & Media /\bPlyr\.setup\b/i +2
usesHowlerJS 34. Video & Media /\bHowl\s*=\s*new\s+Howler\.Howl\b/i +2
usesLazyLoadLibrary 37. Performance Optimization /\bLazyLoad\s*\(/i +2
usesIntersectionObserverAPI 37. Performance Optimization /\bIntersectionObserver\b/i +2
usesPerformanceAPI 37. Performance Optimization /\bperformance\.\w+\s*\(/i +1
enforcesCSPViaJS 38. Security Features /setAttribute\s*\(\s*['"]content-security-policy['"]\s*,/i +1
usesCryptoJS 38. Security Features /\bCryptoJS\.\w+\s*\(/i +1
usesReduxPersist 39. Data Persistence /\bpersistReducer\b/i +25
usesMobXPersist 39. Data Persistence /\bmobxPersist\b/i +25
usesLocalForage 39. Data Persistence /\blocalforage\b/i +15
usesApolloClient 40. GraphQL /\bApolloClient\b/i +15
usesRelayModern 40. GraphQL /\bRelay\.Environment\b/i +15
usesGraphQLQueriesInJS 40. GraphQL /graphql\s*[\s\S]+?/i +15
usesIonicFramework 43. Mobile Frameworks /\bIonic\b/i +10
usesFramework7 43. Mobile Frameworks /\bFramework7\b/i +10
usesOnsenUI 43. Mobile Frameworks /\bOnsenUI\b/i +10
usesIonicComponents 43. Mobile Frameworks /<ion-[a-z]+>/i +5
usesFabricJS 44. Image Manipulation /\bFabric\b/i +1
usesPixiJS 44. Image Manipulation /\bPIXI\.Application\b/i +1
usesP5JS 44. Image Manipulation /\bp5\.createCanvas\b/i +1
usesCryptoJSEncryption 45. Data Encryption /\bCryptoJS\.\w+\s*\(/i +1
usesSJCL 45. Data Encryption /\bsjcl\.encrypt\b/i +1
usesBcryptJS 45. Data Encryption /\bbcrypt\.hash\b/i +1
usesLaunchDarkly 46. Feature Toggles /\bLDClient\.init\b/i +2
usesUnleash 46. Feature Toggles /\bUnleash\.initialize\b/i +2
usesFeatureToggleJS 46. Feature Toggles /\bFeatureToggle\.isEnabled\b/i +2
usesjQuerySteps 49. Multi-Step Forms /\bsteps\s*\(/i +5
usesFormikStepper 49. Multi-Step Forms /\buseFormikStepper\b/i +10
handlesDynamicFormSteps 49. Multi-Step Forms /goToStep\s*\(\s*\d+\s*\)/i +10
usesStripeJS 50. Payment Processing /\bStripe\.\w+\s*\(/i +10
usesPayPalSDK 50. Payment Processing /https:\/\/www\.paypal\.com\/sdk\/js/i +10
usesBraintreeSDK 50. Payment Processing /\bBraintree\.\w+\s*\(/i +10
usesReactHydrateRoot 51. State Hydration /\bReactDOM\.hydrateRoot\b/i +3
usesVueHydration 51. State Hydration /\bVue\.hydrate\b/i +3
embedsInitialStateForHydration 51. State Hydration /window\.__INITIAL_STATE__\s*=\s*{/i +3
usesReactErrorBoundaries 52. Error Boundaries /class\s+\w+\s+extends\s+React\.Component\s*{\s*componentDidCatch/i +2
usesFunctionalErrorBoundaries 52. Error Boundaries /\buseErrorHandler\b/i +2
wrapsWithErrorBoundary 52. Error Boundaries /<ErrorBoundary>\s*<\/ErrorBoundary>/i +2
usesD3JS 56. Data Visualization /\bD3\.v\d+\.\d+\.\d+\b/i +2
usesChartJS 56. Data Visualization /\bChart\.new\s*\(/i +2
usesHighcharts 56. Data Visualization /\bHighcharts\.chart\s*\(/i +2
usesKnockoutJS 57. Data Binding /\bko\.applyBindings\b/i +10
usesRivetsJS 57. Data Binding /\bRivets\.bind\b/i +10
usesDataBindAttributes 57. Data Binding /\bdata-bind\s*=\s*["'][^"']+["']/i +5
usesJoiValidation 59. Form Validation /\bJoi\.validate\b/i +2
usesYupValidation 59. Form Validation /\bYup\.object\b/i +5
submitsFormsWithFetch 59. Form Validation /form\.addEventListener\s*\(\s*['"]submit['"]\s*,\s*function\s*\([^)]*\)\s*{\s*fetch\s*\(/i +15
submitsFormsWithXHR 59. Form Validation /form\.addEventListener\s*\(\s*['"]submit['"]\s*,\s*function\s*\([^)]*\)\s*{\s*new\s+XMLHttpRequest\s*\(/i +10
usesMomentJS 60. Date & Time Libraries /\bMoment\.js\b/i +1
usesDayJS 60. Date & Time Libraries /\bdayjs\s*\(/i +1
usesLuxon 60. Date & Time Libraries /\bLuxon\.DateTime\b/i +1
hasReCAPTCHA Bot Gates / Challenges /g-recaptcha|www\.google\.com\/recaptcha|grecaptcha\.execute/i +35
hasHCaptcha Bot Gates / Challenges /hcaptcha\.com|class=["']h-captcha["']|data-sitekey=/i +35
hasCFTurnstile Bot Gates / Challenges /challenges\.cloudflare\.com|cf-turnstile/i +40
hasPerimeterX Bot Gates / Challenges `/(?:^ /)_?px/
hasDataDome Bot Gates / Challenges /datadome/i +35
hasAkamaiBM Bot Gates / Challenges /bm-verify|akamai/i +30
hasArkoseLabs Bot Gates / Challenges /funcaptcha\.com|arkoselabs/i +35
hasWebpackRuntime Bundlers & Runtimes /(window|self)\["webpack(?:Jsonp|Chunk[^"]*)"\]\s*=/i +30
hasParcelRuntime Bundlers & Runtimes /\bparcelRequire\s*=\s*function/i +25
hasViteDevClient Bundlers & Runtimes /<script[^>]+type=["']module["'][^>]+src=["'][^"']*\/@vite\/client["']/i +30
hasModulePreloadLinks Bundlers & Runtimes /<link[^>]+rel=["']modulepreload["']/i +15
hasEsbuildFootprint Bundlers & Runtimes /\/esbuild\.[^"']+\.js\b|\besbuild:\b/i +15
hasRollupSystemRegister Bundlers & Runtimes /\bSystem\.register\s*\(/i +20
hasSnowpackFootprint Bundlers & Runtimes /\/_snowpack\//i +15
scriptBytesDominant Computed Signals Calculated ratio of script tag bytes to total HTML bytes is > 40%. +12
hugeInlineJS Computed Signals Total bytes of inline (non-src) script tags is > 60KB. +10
largeJsonState Computed Signals Total bytes within <script type="application/json"> tags is > 10KB. +14
lowVisibleText Computed Signals Character count of visible text (after stripping scripts, styles, tags) is < 500. +10
manyModulePreloads Computed Signals Count of <link rel="modulepreload"> tags is >= 3. +8
manyPrefetchedScripts Computed Signals Count of <link rel="prefetch" as="script"> tags is >= 3. +6
earlyJSRedirect Computed Signals HTML contains window.location.replace() or window.location.assign(). +8
manyThirdPartyPreconnects Computed Signals Count of `<link rel="preconnect dns-prefetch">` tags is >= 4.
headHeavyScripting Computed Signals The <head> contains >= 3 script tags and has more scripts than the <body>. +5
computed:nextGSSP Computed Signals (High Value) The JSON inside __NEXT_DATA__ contains "gssp": true. -35
computed:nuxtServerRendered Computed Signals (High Value) The JSON inside window.__NUXT__ contains "serverRendered": true. -25
computed:reactHydrateCall Computed Signals (High Value) The HTML contains hydrateRoot( or ReactDOM.hydrate(. +15
computed:skeletonUI Computed Signals (High Value) The HTML contains >= 8 instances of class names with skeleton, shimmer, placeholder, etc. +10
computed:largeNoscriptFallback Computed Signals (High Value) The total byte size of content within <noscript> tags is > 8KB. -10
hasCRANoscript CSR Required / Heavy Hydration /<noscript>[^<]*You need to enable JavaScript to run this app/i +28
hasGatsbyAnnouncer CSR Required / Heavy Hydration /id=["']gatsby-announcer["']/i +12
hasHelmetMarkers CSR Required / Heavy Hydration `/\bdata-rh=["']true["'] \bdata-react-helmet=["']true["']/i`
hasIsLandIslands CSR Required / Heavy Hydration /<is-land\b/i +20
hasKnockoutBindings CSR Required / Heavy Hydration `/\bdata-bind\s*= ko.applyBindings\s*(/i`
hasMithrilRouter CSR Required / Heavy Hydration /\bm\.route\s*\(/i +18
hasNextFlightStream CSR Required / Heavy Hydration /self\.__next_f\.push\(/i +30
hasUnpolyDirectives CSR Required / Heavy Hydration /\bup-(?:target|follow|layer|data|main|on|modal|autosubmit)\b/i +20
hasWPInteractivityDirectives CSR Required / Heavy Hydration /\bdata-wp-(?:interactive|bind|on|class|text|style|watch)\b/i +18
hasWebflowRuntime CSR Required / Heavy Hydration `/assets.website-files.com/[^"']+/webflow(?:.min)?.js \bdata-wf-(?:site|status|page)\b/i`
hasEmptyBodyCSR Empty or Minimal Body /<body>\s*<\/body>/i +50
hasNextDataJson Frameworks & Platforms /<script[^>]+id=["']__NEXT_DATA__["'][^>]*type=["']application\/json["'][^>]*>/i +45
hasRemixManifest Frameworks & Platforms /window\.__remixManifest/i +35
hasAstroIslands Frameworks & Platforms /<astro-island\b/i +35
hasQwikAttributes Frameworks & Platforms /\bq:(?:container|base|slot)\b|qrl=/i +40
hasSolidHydrationMarks SSR Confirmed `/\bdata-hk=["'][^"']+["'] \bdata-hydration\b/i`
hasPreactRuntime Frameworks & Platforms `/\bpreact.render\s*( /preact(?:.min)?.js\b/i`
hasLitOrPolymer Frameworks & Platforms `/(lit-(?:html|element)|webcomponents-loader).js\b \bShadyDOM\b/i`
hasStencilRuntime Frameworks & Platforms `/@stencil|data-stencil-build /stencil(?:.esm)?.js\b/i`
hasHTMX Frameworks & Platforms /\bhx-(?:get|post|boost|target|swap)\b/i +20
hasTurboOrHotwire Frameworks & Platforms `/<turbo-(?:frame|stream)\b \bdata-turbo(?:-permanent)?\b/i`
hasStimulus Frameworks & Platforms /\bdata-controller=["'][^"']+["'][^>]*\bdata-action=["'][^"']+->/i +15
hasPhoenixLiveView Frameworks & Platforms `/\bphx-(?:click|change|submit|update)\b /phoenix/live_reload.js\b/i`
hasLaravelLivewire Frameworks & Platforms /\bwire:(?:click|model|submit|change)\b/i +25
hasInertia Frameworks & Platforms /\bdata-page=["'][\s\S]*?["']\b|window\.__INERTIA__/i +35
hasBlazor Frameworks & Platforms /\/_framework\/blazor\.(?:server|webassembly)\.js\b/i +35
hasElmInit Frameworks & Platforms /\bElm\.[A-Za-z0-9_]+\s*\.init\s*\(/i +35
hasIslandMarkers Islands Architecture /data-island/i +30
prefetchManyScripts Preload/Prefetch Hints /<link[^>]+rel=["']prefetch["'][^>]+as=["']script["']/i +12
preconnectManyThirdParties Preload/Prefetch Hints /<link[^>]+rel=["'](?:preconnect|dns-prefetch)["'][^>]+href=/ig +8
hasQwikResumability Resumability Indicators /qwik-/i +40
hasQwikLoaderScript Resumability Indicators /qwikloader\.js/i +40
usesHistoryAPI Routing & Navigation /\bhistory\.(?:pushState|replaceState)\s*\(/i +25
hasBaseHrefRooted Routing & Navigation /<base\s+href=["'][^"']*["'][^>]*>/i +8
hasAngularServerContext SSR Confirmed /\bng-server-context\b/i -25
hasAngularTransferState SSR Confirmed /<script[^>]+id=["']ng-state["'][^>]*type=["']application\/json["'][^>]*>/i -20
hasDeclarativeShadowDOM SSR Confirmed /<template\s+shadowrootmode=["'](?:open|closed)["']/i -20
hasFastBootShoebox SSR Confirmed /<script[^>]+type=["']fastboot\/shoebox["'][^>]*>/i -30
hasReactSSRRootHeuristic SSR Confirmed /\bdata-reactroot\b/i -12
hasReactStreamPlaceholders SSR Confirmed /(<!--\$\??-->|<!--\/\$-->|<template\s+id=["'][A-Z]:\d+)/i -22
hasSvelteHydrationMarks SSR Confirmed /\bdata-svelte-h=["'][^"']+["']/i -25
hasVueMetaSSR SSR Confirmed /\bdata-vue-meta-server-rendered\b/i -15
hasVueSSRRoot SSR Confirmed /\bdata-server-rendered=["']?true["']?/i -35
hasReactSSRAttributes SSR Confirmed /\bdata-reactroot\b|\bdata-reactid\b|<!--\s*react-(?:text|empty):/i -15
hasMarkoSSRAttributes SSR Confirmed /\bdata-marko=/i -25
hasNextJSHydrationMarker SSR Confirmed /<div\s+id=["']__next["']><\/div>/i -10
hasSvelteHydrationScript SSR Confirmed /svelte-hydrate/i -15
hasSSRFrameworkScript SSR Confirmed /<script[^>]*\bsrc=["'][^"']*\b(?:next|gatsby|nuxt|remix)\b[^"']+\.js["'][^>]*><\/script>/i -5
hasPrerenderAttribute SSR Data Attributes /prerender/i +2
hasReactHelmetSSR SSR Data Attributes /data-rh/i +2
hasAngularUniversalSSR SSR Markers (Framework-Specific) /ng-server-context/i +2
hasSvelteKitSSR SSR Markers (Framework-Specific) /sveltekit:embedded/i +2
hasRemixSSRManifest SSR Markers (Framework-Specific) /remix-manifest/i +2
hasGatsbySSRWrapper SSR Markers (Framework-Specific) /gatsby-focus-wrapper/i +2
hasNuxtSSRState SSR Markers (Framework-Specific) /__NUXT__/i +2
hasAstroSSRPrefix SSR Markers (Framework-Specific) /astro-/i +2
embedsReduxOrApolloBlobs State/JSON Blobs /window\.__(?:REDUX|APOLLO)_STATE__/i +25
embedsHypernova State/JSON Blobs /<script[^>]+data-hypernova-key=/i +25
hasAlpineDirectives Vue/Alpine/Interactivity /\bx-data\b|x-bind:|x-on:|@click=|x-model\b/i +18
hasVueAllowMismatch Vue/Alpine/Interactivity /\bdata-allow-mismatch\b/i +8
usesHydrationStrategyAttributes Whitepaper: Advanced Detection /loading=['"](hydrate|onClientLoad|onClick|onFocus|onVisible)['"]/ +35
hasWaiAriaLiveRegions Strong CSR & Heavy Hydration /(aria-live=['"](polite|assertive)['"]|aria-busy=['"]true['"])/ +22
hasClientOnlyComponentTags High-Confidence CSR (Smoking Guns) /(?:<nuxt-client-only>|<ClientOnly>|<ion-.*>|@rendermode="InteractiveWebAssembly")/ +50
usesHistoryAPI Strong CSR & Heavy Hydration /history\.(pushState|replaceState)\s*\(/i +25
usesReduxPersist Strong CSR & Heavy Hydration /\bpersistReducer\s*\(/i +25
hasInertia Modern Frameworks /@inertiajs|Inertia\.(visit|get|post)/i +20
hasLaravelLivewire Lightweight Interactivity /wire:|livewire:|@livewire/i +18
hasPhoenixLiveView Lightweight Interactivity /phx-|data-phx-|Phoenix\.LiveView/i +18
hasServerDrivenDataPayload Whitepaper: Advanced Detection /(?:<script\s+id=["']__NEXT_DATA__["']|window\.__remixContext|window\.__solid_data)/ +10
hasServerSideStateAttributes Whitepaper: Advanced Detection /(?:data-blazor-ssr|data-server-state)/ +10
usesDomMutationObservers Whitepaper: Advanced Detection /new\s+MutationObserver\s*\(/ +45
usesIntersectionObserver Whitepaper: Advanced Detection /new\s+IntersectionObserver\s*\(/ +30
hasAsyncLoadingCssPatterns Whitepaper: Advanced Detection /(?:skeleton-loading|shimmer-effect|placeholder-container)/ +20

While the original engine was prototyped in JavaScript, Python is the language of choice for data pipelines and backend systems. Here is the complete, all-inclusive engine, translated into a clean and readable Python class, ready to be integrated into your crawler.

import re
import json

class RenderingDetector:
    """
    A comprehensive Python class for heuristic-based detection of web page
    rendering strategies (SSR vs. CSR).

    This engine analyzes a raw HTML string against an exhaustive set of rules,
    calculating a weighted score to determine the likelihood that a page

    relies on Client-Side Rendering (CSR) for its primary content.

    - Positive scores indicate CSR.
    - Negative scores indicate SSR/SSG.
    - A score above the threshold suggests a JS-required crawl is necessary.
    """
    SCORE_THRESHOLD = 30

    # The exhaustive list of all detection rules, organized by signal confidence.
    RULES = [
        # I. SSR / SSG / Server-First Affirmations (NEGATIVE SCORE)
        {'name': 'hasVueSSRRoot', 'pattern': re.compile(r'\bdata-server-rendered=["\']?true["\']?', re.IGNORECASE)},
        {'name': 'hasFastBootShoebox', 'pattern': re.compile(r'<script[^>]+type=["\']fastboot\/shoebox["\'][^>]*>', re.IGNORECASE)},
        {'name': 'hasAngularServerContext', 'pattern': re.compile(r'\bng-server-context\b', re.IGNORECASE)},
        {'name': 'hasSvelteHydrationMarks', 'pattern': re.compile(r'\bdata-svelte-h=["\'][^"\']+["\']|\bdata-sveltekit-hydrate-start\b', re.IGNORECASE)},
        {'name': 'hasReactStreamPlaceholders', 'pattern': re.compile(r'(<!--\$\??-->|<!--\/\$-->|<template\s+id=["\'][A-Z]:\d+)', re.IGNORECASE)},
        {'name': 'hasAngularTransferState', 'pattern': re.compile(r'<script[^>]+id=["\']ng-state["\'][^>]*type=["\']application\/json["\'][^>]*>', re.IGNORECASE)},
        {'name': 'hasDeclarativeShadowDOM', 'pattern': re.compile(r'<template\s+shadowrootmode=["\'](?:open|closed)["\']', re.IGNORECASE)},
        {'name': 'hasVueMetaSSR', 'pattern': re.compile(r'\bdata-vue-meta-server-rendered\b', re.IGNORECASE)},
        {'name': 'hasReactSSRAttributes', 'pattern': re.compile(r'\bdata-reactroot\b|\bdata-reactid\b|<!--\s*react-(?:text|empty):', re.IGNORECASE)},
        {'name': 'hasSolidHydrationMarks', 'pattern': re.compile(r'\bdata-hk=["\'][^"\']+["\']|\bdata-hydration\b', re.IGNORECASE)},
        {'name': 'hasStaticSiteGeneratorArtifacts', 'pattern': re.compile(r'<meta\s+name=["\']generator["\']\s+content=["\'](?:Jekyll|Hugo|Eleventy|Astro|Gatsby|Next\.js)["\']', re.IGNORECASE)},
        {'name': 'hasMarkoSSRAttributes', 'pattern': re.compile(r'\bdata-marko=', re.IGNORECASE)},
        {'name': 'embedsHypernova', 'pattern': re.compile(r'<script[^>]+data-hypernova-key=', re.IGNORECASE)},
        {'name': 'hasVueSSRClientHydration', 'pattern': re.compile(r'__VUE_SSR__\s*=\s*true', re.IGNORECASE)},
        {'name': 'hasNextJSHydrationMarker', 'pattern': re.compile(r'<div\s+id=["\']__next["\']></div>', re.IGNORECASE)},
        {'name': 'hasSvelteHydrationScript', 'pattern': re.compile(r'svelte-hydrate', re.IGNORECASE)},
        {'name': 'hasSSRFrameworkScript', 'pattern': re.compile(r'<script[^>]*\bsrc=["\'][^"\']*\b(?:next|gatsby|nuxt|remix)\b[^"\']+\.js["\'][^>]*><\/script>', re.IGNORECASE)},
        
        # II. High-Confidence CSR Indicators ("Smoking Guns")
        {'name': 'hasBotChallengeText', 'pattern': re.compile(r'\b(unusual\s+activity\s+from\s+your\s+IP|detected\s+unusual\s+(traffic|activity)|(please\s+)?(complete|solve|pass)\s+the\s+captcha|(type|enter|input)\s+the\s+(characters|letters|code|text)|verify\s+(you\'re|you\s+are)\s+(a\s+)?human|prove\s+(you\'re|you\s+are)\s+not\s+a\s+robot|(please\s+)?check\s+the\s+box(\s+below)?|checking\s+your\s+browser|security\s+check|robot\s+verification|are\s+you\s+a\s+human\?|click\s+the\s+(images|pictures)\s+of|select\s+all\s+squares\s+with|anti-bot\s+measure|human\s+verification\s+required)\b', re.IGNORECASE)},
        {'name': 'hasEmptyBodyCSR', 'pattern': re.compile(r'<body>\s*</body>', re.IGNORECASE)},
        {'name': 'hasMinimalAppDiv', 'pattern': re.compile(r'<(div|main)\s+id\s*=\s*["\'](?:app|root|main|container)["\']\s*>\s*(?:<\/[^>]+>)?', re.IGNORECASE)},
        {'name': 'hasUnprocessedTemplateLiterals', 'pattern': re.compile(r'\{\{[^}]+\}\}', re.IGNORECASE)},
        {'name': 'hasAngularJSUnprocessedDirectives', 'pattern': re.compile(r'\bng-(?:app|if|repeat|bind|controller)\b', re.IGNORECASE)},
        {'name': 'hasCRANoscript', 'pattern': re.compile(r'<noscript>[^<]*You need to enable JavaScript to run this app', re.IGNORECASE)},
        {'name': 'hasClientSideTemplateScripts', 'pattern': re.compile(r'<script[^>]+type=["\'](?:text\/(?:x-)?template|text\/x-handlebars-template|text\/ng-template)["\']', re.IGNORECASE)},
        {'name': 'hasAjaxCrawlableMeta', 'pattern': re.compile(r'<meta[^>]+name=["\']fragment["\'][^>]+content=["\']!["\']', re.IGNORECASE)},
        {'name': 'hasNextFlightStream', 'pattern': re.compile(r'self\.__next_f\.push\(', re.IGNORECASE)},
        
        # III. Strong CSR & Heavy Hydration Indicators
        {'name': 'usesDomMutationObservers', 'pattern': re.compile(r'new\s+MutationObserver\s*\(', re.IGNORECASE)},
        {'name': 'hasMeteorRuntimeConfig', 'pattern': re.compile(r'__meteor_runtime_config__', re.IGNORECASE)},
        {'name': 'hasBotGates', 'pattern': re.compile(r'(g-recaptcha|hcaptcha\.com|challenges\.cloudflare\.com|perimeterx|datadome|akamai|arkoselabs)', re.IGNORECASE)},
        {'name': 'hasBlazorWebAssembly', 'pattern': re.compile(r'\/_framework\/blazor\.(?:server|webassembly)\.js\b', re.IGNORECASE)},
        {'name': 'hasClientSideRoutingOnClick', 'pattern': re.compile(r'<a\b[^>]*\bonclick\s*=["\'][^"\']+["\'][^>]*>', re.IGNORECASE)},
        {'name': 'hasFrameworkSpecificStructures', 'pattern': re.compile(r'<(router-outlet|Outlet|app-root|vue-app|custom-element)\b[^>]*>', re.IGNORECASE)},
        {'name': 'hasWebflowRuntime', 'pattern': re.compile(r'assets\.website-files\.com\/[^"\']+\/webflow(?:\.min)?\.js|\bdata-wf-(?:site|status|page)\b', re.IGNORECASE)},
        {'name': 'hasKnockoutBindings', 'pattern': re.compile(r'\bdata-bind\s*=|ko\.applyBindings\s*\(', re.IGNORECASE)},
        {'name': 'hasWaiAriaLiveRegions', 'pattern': re.compile(r'(aria-live=[\'"](polite|assertive)[\'"]|aria-busy=[\'"]true[\'"])', re.IGNORECASE)},
        {'name': 'usesIntersectionObserver', 'pattern': re.compile(r'new\s+IntersectionObserver\s*\(', re.IGNORECASE)},
        {'name': 'hasMithrilRouter', 'pattern': re.compile(r'\bm\.route\s*\(', re.IGNORECASE)},
        {'name': 'hasRemixManifest', 'pattern': re.compile(r'window\.__remixManifest', re.IGNORECASE)},
        {'name': 'hasStencilRuntime', 'pattern': re.compile(r'@stencil|data-stencil-build|\/stencil(?:\.esm)?\.js\b', re.IGNORECASE)},
        {'name': 'hasAlternativeBundlerRuntimes', 'pattern': re.compile(r'(\bparcelRequire\s*=\s*function|\/esbuild\.[^"\']+\.js\b|\besbuild:\b|\bSystem\.register\s*\(|\/_snowpack\/)', re.IGNORECASE)},
        {'name': 'embedsReduxOrApolloBlobs', 'pattern': re.compile(r'window\.__(?:REDUX|APOLLO)_STATE__', re.IGNORECASE)},
        {'name': 'hasDynamicComponentLoading', 'pattern': re.compile(r'React\.lazy\(|Vue\.component\(', re.IGNORECASE)},
        {'name': 'usesCodeSplittingOrDynamicImports', 'pattern': re.compile(r'import\(\s*[\'"][^\'"]+[\'"]\s*\)', re.IGNORECASE)},
        {'name': 'hasClientSideRoutingComponents', 'pattern': re.compile(r'<(router-outlet|Outlet|router-link)\b[^>]*>', re.IGNORECASE)},
        {'name': 'hasStateManagementLibraries', 'pattern': re.compile(r'<script[^>]*\bsrc=["\'][^"\']*\b(?:redux|mobx|vuex|ngrx)\b[^"\']+\.js["\'][^>]*><\/script>', re.IGNORECASE)},
        {'name': 'hasWebComponentsTag', 'pattern': re.compile(r'<[a-z]+-[a-z]+[^>]*><\/[a-z]+-[a-z]+>', re.IGNORECASE)},
        
        # New high-value rules
        {'name': 'usesHistoryAPI', 'pattern': re.compile(r'history\.(pushState|replaceState)\s*\(', re.IGNORECASE)},
        {'name': 'usesReduxPersist', 'pattern': re.compile(r'\bpersistReducer\s*\(', re.IGNORECASE)},
        {'name': 'hasInertia', 'pattern': re.compile(r'@inertiajs|Inertia\.(visit|get|post)', re.IGNORECASE)},
        {'name': 'hasLaravelLivewire', 'pattern': re.compile(r'wire:|livewire:|@livewire', re.IGNORECASE)},
        {'name': 'hasPhoenixLiveView', 'pattern': re.compile(r'phx-|data-phx-|Phoenix\.LiveView', re.IGNORECASE)},
        
        # IV. Lightweight Interactivity / Progressive Enhancement
        {'name': 'hasUnpolyDirectives', 'pattern': re.compile(r'\bup-(?:target|follow|layer|data|main|on|modal|autosubmit)\b', re.IGNORECASE)},
        {'name': 'hasIsLandIslands', 'pattern': re.compile(r'<is-land\b', re.IGNORECASE)},
        {'name': 'hasAlpineDirectives', 'pattern': re.compile(r'\bx-data\b|x-bind:|x-on:|@click=|x-model\b', re.IGNORECASE)},
        {'name': 'hasWPInteractivityDirectives', 'pattern': re.compile(r'\bdata-wp-(?:interactive|bind|on|class|text|style|watch)\b', re.IGNORECASE)},
        
        # V. Supporting & Behavioral CSR Indicators
        {'name': 'hasModernBuildToolArtifactsScriptJS', 'pattern': re.compile(r'<script[^>]*\bsrc=["\'][^"\']+\.[a-z0-9]{6,}\.js["\'][^>]*><\/script>', re.IGNORECASE)},
        {'name': 'hasQwikResumability', 'pattern': re.compile(r'(\bq:(?:container|base|slot)\b|qrl=|\/qwikloader\.js)', re.IGNORECASE)},
        {'name': 'hasClientOnlyIonicComponents', 'pattern': re.compile(r'<ion-[^>]+>', re.IGNORECASE)},
        {'name': 'hasLoadingSpinnerMarkup', 'pattern': re.compile(r'<[^>]+(?:id|class)=["\'][^"\']*\b(spinner|loader|skeleton|shimmer|placeholder)\b[^"\']*["\']|<app-root>\s*Loading\.+\s*<\/app-root>', re.IGNORECASE)},
        {'name': 'hasGatsbyAnnouncer', 'pattern': re.compile(r'id=["\']gatsby-announcer["\']', re.IGNORECASE)},
        {'name': 'hasHelmetMarkers', 'pattern': re.compile(r'\bdata-rh=["\']true["\']|\bdata-react-helmet=["\']true["\']', re.IGNORECASE)},
        {'name': 'hasVueAllowMismatch', 'pattern': re.compile(r'\bdata-allow-mismatch\b', re.IGNORECASE)},
        {'name': 'hasBaseHrefRooted', 'pattern': re.compile(r'<base\s+href=["\'][^"\']*["\'][^>]*>', re.IGNORECASE)},

        # VI. Low-Confidence & General Web Patterns
        {'name': 'hasBundledScript', 'pattern': re.compile(r'<script[^>]*\bsrc\s*=\s*["\'][^"\']+\.(?:bundle|chunk|main|app|min|vendor)\.[a-z0-9]{0,}\.js["\'][^>]*><\/script>', re.IGNORECASE)},
        {'name': 'hasES6Modules', 'pattern': re.compile(r'<script\s+type=["\']module["\'][^>]*>[\s\S]*<\/script>', re.IGNORECASE)},
        {'name': 'hasInlineEventHandlers', 'pattern': re.compile(r'<(?:a|button|div|span)\b[^>]*\s+onclick\s*=["\'][^"\']+["\'][^>]*>', re.IGNORECASE)},
        {'name': 'hasAsyncDeferScripts', 'pattern': re.compile(r'<script[^>]*\b(?:async|defer)\b[^>]*><\/script>', re.IGNORECASE)},
        {'name': 'hasClientSideRoutingHash', 'pattern': re.compile(r'<a\b[^>]*\bhref\s*=\s*["\'][^"\']*#\/[^"\']*["\'][^>]*>', re.IGNORECASE)},
        {'name': 'usesAjaxOrFetch', 'pattern': re.compile(r'\b(fetch|XMLHttpRequest|\.ajax)\s*\(', re.IGNORECASE)},
        {'name': 'usesWebSockets', 'pattern': re.compile(r'new\s+WebSocket\s*\(', re.IGNORECASE)},
        {'name': 'hasLazyLoadingAttributes', 'pattern': re.compile(r'\bloading=["\']lazy["\']', re.IGNORECASE)},
        {'name': 'hasDynamicMetaOrTitle', 'pattern': re.compile(r'document\.title\s*=|document\.querySelector\([\'"]meta', re.IGNORECASE)},
        {'name': 'usesReactHooks', 'pattern': re.compile(r'\buseState\b|\buseEffect\b|\buseContext\b', re.IGNORECASE)},
        {'name': 'hasShadowDOM', 'pattern': re.compile(r'attachShadow\s*\(\s*({[^}]*}|null)\s*\)', re.IGNORECASE)},
        {'name': 'hasTailwindClasses', 'pattern': re.compile(r'\b(bg|text|p|m|flex|grid)-[a-z0-9]+(?:-[a-z0-9]+)*', re.IGNORECASE)},
        {'name': 'usesJQuery', 'pattern': re.compile(r'code\.jquery\.com\/jquery', re.IGNORECASE)},
        {'name': 'usesDataVizLibraries', 'pattern': re.compile(r'\bD3\b|\bChart\.js\b|\bHighcharts\b', re.IGNORECASE)},
    ]
    
    # The definitive mapping of all rule names to their weights.
    WEIGHTS = {
        # I. SSR / SSG / Server-First Affirmations (NEGATIVE SCORE)
        'hasVueSSRRoot': -35, 'hasFastBootShoebox': -30, 'hasAngularServerContext': -25, 'hasSvelteHydrationMarks': -25, 'hasReactStreamPlaceholders': -22, 'hasAngularTransferState': -20, 'hasDeclarativeShadowDOM': -20, 'hasVueMetaSSR': -15, 'hasReactSSRAttributes': -15, 'hasSolidHydrationMarks': -20, 'hasStaticSiteGeneratorArtifacts': -30, 'hasMarkoSSRAttributes': -25, 'embedsHypernova': -20, 'hasVueSSRClientHydration': -15, 'hasNextJSHydrationMarker': -10, 'hasSvelteHydrationScript': -15, 'hasSSRFrameworkScript': -5,

        # II. High-Confidence CSR Indicators ("Smoking Guns")
        'hasBotChallengeText': 55, 'hasEmptyBodyCSR': 50, 'hasMinimalAppDiv': 50, 'hasUnprocessedTemplateLiterals': 45, 'hasAngularJSUnprocessedDirectives': 40, 'hasCRANoscript': 40, 'hasClientSideTemplateScripts': 35, 'hasAjaxCrawlableMeta': 35, 'hasNextFlightStream': 30,

        # III. Strong CSR & Heavy Hydration Indicators
        'usesDomMutationObservers': 28, 'hasMeteorRuntimeConfig': 28, 'hasBotGates': 45, 'hasBlazorWebAssembly': 35, 'hasClientSideRoutingOnClick': 35, 'hasFrameworkSpecificStructures': 30, 'hasWebflowRuntime': 25, 'hasKnockoutBindings': 25, 'hasWaiAriaLiveRegions': 22, 'usesIntersectionObserver': 25, 'hasMithrilRouter': 25, 'hasRemixManifest': 25, 'hasStencilRuntime': 25, 'hasAlternativeBundlerRuntimes': 20, 'embedsReduxOrApolloBlobs': 25, 'hasDynamicComponentLoading': 25, 'usesCodeSplittingOrDynamicImports': 20, 'hasClientSideRoutingComponents': 25, 'hasStateManagementLibraries': 20, 'hasWebComponentsTag': 20, 'usesHistoryAPI': 25, 'usesReduxPersist': 25, 'hasInertia': 20, 'hasLaravelLivewire': 18, 'hasPhoenixLiveView': 18,
        
        # IV. Lightweight Interactivity / Progressive Enhancement
        'hasUnpolyDirectives': 20, 'hasIsLandIslands': 20, 'hasAlpineDirectives': 18, 'hasWPInteractivityDirectives': 18,
        
        # V. Supporting & Behavioral CSR Indicators
        'hasModernBuildToolArtifactsScriptJS': 15, 'hasQwikResumability': 25, 'hasClientOnlyIonicComponents': 15, 'hasLoadingSpinnerMarkup': 15, 'hasGatsbyAnnouncer': 12, 'hasHelmetMarkers': 10, 'hasVueAllowMismatch': 8, 'hasBaseHrefRooted': 8,

        # VI. Low-Confidence & General Web Patterns (Scores 1-5)
        'hasBundledScript': 3, 'hasES6Modules': 2, 'hasInlineEventHandlers': 3, 'hasAsyncDeferScripts': 1, 'hasClientSideRoutingHash': 5, 'usesAjaxOrFetch': 2, 'usesWebSockets': 2, 'hasLazyLoadingAttributes': 1, 'hasDynamicMetaOrTitle': 4, 'usesReactHooks': 4, 'hasShadowDOM': 5, 'hasTailwindClasses': 1, 'usesJQuery': 1, 'usesDataVizLibraries': 1,
    }

    def _compute_additional_signals(self, html: str) -> dict:
        """Calculates scores based on complex heuristics that require more than a single regex."""
        score = 0
        reasons = []
        if not html or len(html) == 0:
            return {'score': score, 'reasons': reasons}

        # A) Low HTML Length Check
        if len(html) < 2000:
            score += 10
            reasons.append("computed:lowHtmlLength")

        # B) Low Text Content Check
        no_scripts_styles = re.sub(r'<style[^>]*>[\s\S]*?<\/style>', '', html, flags=re.IGNORECASE)
        no_scripts_styles = re.sub(r'<script[^>]*>[\s\S]*?<\/script>', '', no_scripts_styles, flags=re.IGNORECASE)
        text_content = re.sub(r'<[^>]+>', '', no_scripts_styles).strip()
        if 0 < len(text_content) < 100:
            score += 15
            reasons.append("computed:veryLowTextContent")

        # C) Next.js __NEXT_DATA__ -> Check for gssp:true (definitive SSR)
        next_data_match = re.search(r'<script[^>]+id=["\']__NEXT_DATA__["\'][^>]*>([\s\S]*?)<\/script>', html, re.IGNORECASE)
        if next_data_match:
            try:
                j = json.loads(next_data_match.group(1))
                if j.get('gssp'):
                    score -= 35
                    reasons.append("computed:nextGSSP")
            except json.JSONDecodeError:
                pass # Ignore parse error

        # D) Nuxt.js __NUXT__ -> Check for serverRendered:true (definitive SSR)
        nuxt_blob_match = re.search(r'window\.__NUXT__\s*=\s*({[\s\S]*?});', html, re.IGNORECASE)
        if nuxt_blob_match:
            try:
                # Need to be careful here, as it's JS object literal, not strict JSON
                # A simple check for the property is safer than a full parse
                if '"serverRendered":true' in nuxt_blob_match.group(1):
                     score -= 25
                     reasons.append("computed:nuxtServerRendered")
            except Exception:
                pass

        # E) React hydration calls present in inline scripts
        if re.search(r'\bhydrateRoot\s*\(|ReactDOM\.hydrate\s*\(', html, re.IGNORECASE):
            score += 15
            reasons.append("computed:reactHydrateCall")
        
        # F) Script Weight Ratio - High ratio of script bytes to total HTML
        script_matches = re.findall(r'<script[^>]*>([\s\S]*?)<\/script>', html, re.IGNORECASE)
        total_script_bytes = sum(len(match) for match in script_matches)
        if len(html) > 0 and total_script_bytes / len(html) > 0.4:  # >40% script content
            score += 20
            reasons.append("computed:highScriptRatio")
        
        # G) Module Preload/Prefetch Count - Many preloads suggest complex dependency graph
        modulepreload_count = len(re.findall(r'<link[^>]+rel=["\']modulepreload["\']', html, re.IGNORECASE))
        if modulepreload_count > 5:
            score += 15
            reasons.append("computed:manyModulePreloads")
        
        # H) Large noscript fallback - Negative signal suggesting quality static content
        noscript_matches = re.findall(r'<noscript[^>]*>([\s\S]*?)<\/noscript>', html, re.IGNORECASE)
        total_noscript_bytes = sum(len(match) for match in noscript_matches)
        if total_noscript_bytes > 8192:  # >8KB of noscript content
            score -= 20
            reasons.append("computed:largeNoscriptFallback")
        
        return {'score': score, 'reasons': reasons}

    def analyze(self, html: str) -> dict:
        """
        Analyzes an HTML string and returns a detailed report.
        """
        if not isinstance(html, str):
            return {'totalScore': 0, 'indicatesJS': False, 'matchedRules': ['error:invalid_input']}

        total_score = 0
        matched_rules_details = []
        
        for rule in self.RULES:
            if rule['pattern'].search(html):
                weight = self.WEIGHTS.get(rule['name'], 0)
                total_score += weight
                matched_rules_details.append({'rule': rule['name'], 'score': weight})

        computed = self._compute_additional_signals(html)
        total_score += computed['score']
        for reason in computed['reasons']:
            # Simplified score for reporting purposes
            reason_score = 10 if 'low' in reason or 'react' in reason else -25
            matched_rules_details.append({'rule': reason, 'score': reason_score})
            
        # Sort matched rules by the absolute value of their score for importance
        matched_rules_details.sort(key=lambda x: abs(x['score']), reverse=True)
        
        matched_rules_formatted = [
            f"{item['rule']} ({'+' if item['score'] > 0 else ''}{item['score']})"
            for item in matched_rules_details
        ]

        return {
            'totalScore': round(total_score),
            'indicatesJS': total_score >= self.SCORE_THRESHOLD,
            'matchedRules': matched_rules_formatted,
        }

# ========================================================================
# HOW TO USE THE DETECTOR
# ========================================================================

if __name__ == '__main__':
    print("--- Rendering Detector Demo ---")
    
    # Instantiate the detector
    detector = RenderingDetector()

    demos = {
        "Classic CSR (Create React App)": """
            <!DOCTYPE html><html><head><title>My App</title></head><body>
            <div id="root"></div>
            <noscript>You need to enable JavaScript to run this app.</noscript>
            <script src="/bundle.js"></script></body></html>
        """,
        "Modern SSR (Next.js)": """
            <!DOCTYPE html><html><head><title>News</title><meta name="generator" content="Next.js"></head>
            <body><div id="__next" data-reactroot><h1>Article</h1><p>Content.</p></div>
            <script id="__NEXT_DATA__" type="application/json">{"props":{"pageProps":{}},"page":"/","query":{},"gssp":true}</script></body></html>
        """,
        "Hybrid with Skeleton Loader": """
            <!DOCTYPE html><html><body><header>My Site</header><main>
            <div class="skeleton-loader"></div>
            </main><script src="/app.chunk.a4b8c1.js"></script></body></html>
        """,
        "Static Site with Alpine.js": """
             <!DOCTYPE html><html><body><h1>Welcome</h1>
             <div x-data="{ open: false }"><button @click="open = !open">Toggle</button><div x-show="open">Content</div></div>
             <script src="/alpine.js" defer></script></body></html>
        """,
        "Bot-Gated Page (Cloudflare)": """
             <!DOCTYPE html><html><body><h1>Security Check</h1>
             <p>Please solve the captcha to prove you're not a robot.</p>
             <div id="cf-turnstile"></div>
             <script src="https://challenges.cloudflare.com/turnstile/v0/api.js" async defer></script></body></html>
        """,
    }

    for name, html_content in demos.items():
        print(f"\n[ANALYZING] {name}")
        result = detector.analyze(html_content)
        print(f"  > Final Score: {result['totalScore']}")
        print(f"  > Needs JS Render: {result['indicatesJS']}")
        print("  > Top Matched Rules:", result['matchedRules'][:5])

How to Use the Engine

Integrating the RenderingDetector into your crawler is straightforward.

  1. Save the Code: Save the Python class above as a file, for example, detector.py.
  2. Make an Initial Request: In your crawler logic, perform a cheap, standard HTTP GET request to fetch the raw HTML of your target URL. Do not use a headless browser for this first step.
  3. Analyze the HTML: Pass the raw HTML content to the detector.
  4. Make a Decision: Based on the indicatesJS boolean in the result, decide your next action:
    • If False, proceed to parse the HTML you already have.
    • If True, send the URL to your dedicated queue for rendering with a headless browser like Puppeteer, Playwright, or Selenium.

Here's a conceptual example of how it would fit into a crawler's workflow:

# from detector import RenderingDetector
# import requests
# from my_headless_browser_service import render_with_browser

# detector = RenderingDetector()
# url_queue = ["http://example.com/static-page", "http://example.com/dynamic-app"]

# for url in url_queue:
#     try:
#         # 1. Make the cheap initial request
#         response = requests.get(url, timeout=10)
#         raw_html = response.text

#         # 2. Analyze the HTML
#         analysis_result = detector.analyze(raw_html)
#         print(f"Analysis for {url}: Score = {analysis_result['totalScore']}")

#         # 3. Make a decision
#         if analysis_result['indicatesJS']:
#             print("  -> Escalating to full JS render...")
#             # final_html = render_with_browser(url)
#         else:
#             print("  -> Processing as static HTML...")
#             # final_html = raw_html
        
#         # ... process the final_html ...

#     except requests.RequestException as e:
#         print(f"Failed to fetch {url}: {e}")

Conclusion

By moving from a brute-force approach to an intelligent, heuristic-based one, you can dramatically increase the efficiency and throughput of your web crawling infrastructure. This engine acts as a smart, resource-aware gatekeeper, ensuring you use the right tool for the right job. It saves you money on computation, speeds up your data collection, and ultimately allows you to build a smarter, more scalable system for understanding the web.