IP Proxy

The IP Proxy Dataset provides structured intelligence on IP addresses associated with residential proxy infrastructure and anonymization services. It is designed to support fraud detection, traffic classification, and risk assessment workflows by allowing organizations to identify IPs that belong to residential proxy networks often used to mask automated or abusive activity.

Unlike real-time API calls, the dataset delivers enriched proxy classification signals in bulk via CSV format, enabling offline processing, internal scoring models, and large-scale traffic analysis. This dataset would be provided on a license basis and updated periodically (typically monthly), similar to other Trustfull datasets.

Dataset Structure

The dataset may contain the following fields:

ip: The IP address in IPv4 or IPv6 format
is_valid_format: Boolean indicating whether the IP is structurally valid
is_proxy: Boolean indicating proxy detection
is_vpn: Boolean indicating VPN detection
is_tor: Boolean indicating Tor node detection
proxy_type: Classification of proxy infrastructure (e.g., residential, datacenter)
updated_at: Timestamp of the last update (YYYY-MM-DD)

Note: The exact schema may vary depending on licensing scope and customization.

Example Data

ip,is_valid_format,is_proxy,is_vpn,is_tor,is_relay,proxy_type,updated_at
192.168.9.10,true,true,true,false,false,residential,2025-03-01
45.12.87.190,true,true,false,false,false,datacenter,2025-03-01
185.220.101.45,true,true,false,true,false,residential,2025-03-01
8.8.8.8,true,false,false,false,false,,2025-03-01

Suggested SQL Schema

Table: `ip_data`

CREATE TABLE ip_data (
    ip INET PRIMARY KEY,
    is_valid_format BOOLEAN NOT NULL,
    is_proxy BOOLEAN,
    is_vpn BOOLEAN,
    is_tor BOOLEAN,
    is_relay BOOLEAN,
    proxy_type VARCHAR(50),
    updated_at DATE NOT NULL
);

Importing into PostgreSQL

COPY ip_data(ip, is_valid_format, is_proxy, is_vpn, is_tor, is_relay, proxy_type, updated_at)
FROM '/path/to/ip_dataset.csv'
DELIMITER ','
CSV HEADER;

Use Cases

The IP Dataset can support:

IP proxy detection at scale
Traffic segmentation (residential vs datacenter infrastructure)
Fraud rule pre-screening before real-time API calls
Historical backtesting of proxy-related risk
Internal model training and ML feature enrichment

Because residential proxy traffic originates from legitimate ISP ranges, it is harder to detect through simple IP reputation checks. Having offline classification enables more precise infrastructure-level filtering and behavioral correlation.