Softplorer Logo

Proxy Guide

How Bot Detection Actually Works

Bot detection is not a single system. It is a layered stack where each layer can independently block or challenge a request — and changing the proxy IP only affects the one layer that cares about the IP.

In practice

  • IP reputation is one detection layer — not the whole stack ✗
  • TLS fingerprinting identifies non-browser clients regardless of IP ✗
  • Behavioral analysis detects machine-speed patterns regardless of IP type ✗
  • JavaScript challenges verify browser execution environment, not IP ✗
  • Changing proxy type only addresses IP-layer detection ✔

If switching proxy type doesn't reduce the block rate — the layer that caught the traffic isn't the IP layer.

Overview

Most operators assume detection means IP detection. It's the simplest model and the one proxy provider marketing supports. It is also why most proxy setups fail at predictable, repeatable points — because the problem was never the IP.

Detection systems are stacks. Each layer operates on independent signals. Passing the IP layer doesn't bypass the TLS layer. Passing the TLS layer doesn't bypass behavioral analysis. A block from any layer looks identical to the operator: a 403, a CAPTCHA, a redirect. The source layer is not disclosed.

How to think about it

The IP layer is the outermost check. It queries IP intelligence databases for ASN classification, geolocation, abuse history, and blocklist membership. A residential IP from a clean pool clears this layer. A datacenter IP in a commercial ASN may not. This is the layer proxies directly address — and the only layer they directly address.

The TLS layer reads the client's TLS handshake parameters: cipher suite selection, extension order, supported groups, and the resulting JA3 or JA4 fingerprint. Every TLS client library produces a distinct fingerprint. Python's `requests` library, Go's `net/http`, Node's `axios` — each generates a fingerprint that identifies it as a non-browser client. A residential IP making a request with a Python `requests` TLS fingerprint is a residential IP from a bot. The IP classification is irrelevant to this layer.

The behavioral layer operates on what the client does across requests: timing distribution, navigation sequences, mouse or scroll patterns on JavaScript-capable targets, and request structure consistency. Machine-generated traffic is statistically distinguishable from human traffic at volume — uniform timing intervals, no viewport interaction, no resource loading beyond the primary target. This layer operates on patterns the proxy cannot modify, because the patterns are generated by the client application, not by the network path.

How it works

IP scoring aggregates signals from multiple databases: ASN classification, geolocation, historical abuse reports, shared pool contamination from other customers, and proprietary scoring maintained by the target platform or its CDN provider. The score is computed per-request, updated continuously, and compared against a threshold. Requests scoring above the threshold are challenged or blocked. Requests below pass to the next layer. The threshold is not disclosed and varies by target, endpoint, and current traffic conditions.

TLS fingerprinting happens before any HTTP request data is processed. The server reads the ClientHello message from the TLS handshake — the list of cipher suites and extensions the client supports — and computes a hash. That hash is compared against a database of known browser fingerprints and known bot fingerprints. A mismatch between the IP's apparent type and the TLS fingerprint is itself a signal: a residential IP with a bot TLS fingerprint is suspicious in a way that neither signal alone would indicate.

JavaScript challenges — Cloudflare's managed challenge, hCaptcha, reCAPTCHA — are served when upstream layers raise the suspicion score above a challenge threshold. The challenge requires the client to execute JavaScript in a real browser environment. Headless browsers solve JS challenges unless the challenge includes fingerprint detection targeting headless execution environments specifically — which the more sophisticated implementations do. Solving CAPTCHA programmatically is a separate problem domain from proxy configuration.

Where it breaks

CAPTCHA on the first request — before any behavioral signal can accumulate — points to IP layer or TLS layer. If the block clears after switching to residential proxies, the IP layer was the trigger. If the block persists with residential IPs, the TLS fingerprint is the more likely source. Testing with a real browser through the same proxy confirms which layer: if a browser passes but the scraper doesn't, the problem is the TLS stack or missing JavaScript execution, not the IP.

Block rate constant after switching proxy type — same rate with datacenter, residential, ISP, or mobile — indicates the detection layer is not IP-based. Behavioral patterns or request structure are the candidates. Adding jitter to request timing, randomizing navigation sequences, and loading page resources alongside target requests are the variables to test. None of these involve changing the proxy.

JavaScript challenge not resolved by the scraper after passing IP and TLS checks means the challenge specifically detects the execution environment — headless browser fingerprinting, missing browser API implementations, canvas or WebGL anomalies. The proxy is not involved in this failure at all.

In context

A proxy addresses the IP layer only. It changes the origin IP, ASN, and associated reputation signals. It does not modify the TLS fingerprint, the browser fingerprint, or the behavioral patterns of the client. For targets where IP-layer detection is the binding constraint, a proxy is sufficient. For targets where TLS or behavioral detection is the binding constraint, adding or upgrading a proxy changes nothing.

Browser automation tools — Puppeteer, Playwright, Selenium — address the JavaScript execution layer by running requests inside a real browser engine. They solve JS challenges that pure HTTP clients cannot. They do not address TLS fingerprinting unless specifically patched to modify the browser's TLS behavior, and they do not address IP layer detection unless combined with a proxy. Headless mode introduces its own fingerprint signals that sophisticated challenges detect.

Antidetect browsers address the device fingerprint layer specifically: they spoof browser fingerprints, canvas signatures, WebGL parameters, and font rendering to make each session appear as a distinct, real device. They are the right tool when the binding constraint is device-level fingerprinting — multi-accounting at scale, platforms that track device signatures across sessions. They do not address behavioral detection, and they require proxy integration to address the IP layer.

Choose your path

The diagnostic sequence is layer-by-layer, not provider-by-provider. Identify which layer is blocking before selecting a tool. Switching providers within a layer that isn't the constraint is wasted spend.

  • Block clears with residential IP → IP layer was the constraint; proxy type was the fix
  • Block persists with residential IP → IP layer is not the constraint; check TLS next
  • Real browser through same proxy passes; scraper doesn't → TLS fingerprint or JS execution
  • Block rate uniform across residential/mobile/ISP → behavioral or header signals; fix the client
  • CAPTCHA on first request with clean residential IP → TLS fingerprint or challenge threshold set very low
TLS fingerprinting — what the JA3 hash identifies and how to address itWhy CAPTCHAs persist after switching proxies — which layer is actually firingCloudflare's detection stack — what each layer evaluates in sequence