xk6-crawler
    Preparing search index...

    Interface Options

    Options that modify the Crawler's behavior.

    const c = new Crawler({ max_depth: 2, parse_http_error_response: true });
    
    interface Options {
        allow_url_revisit: boolean;
        allowed_domains: string[];
        cache_dir: string;
        check_head: boolean;
        detect_charset: boolean;
        disallowed_domains: string[];
        disallowed_url_filters: string[];
        id: number;
        ignore_robots_txt: boolean;
        max_body_size: number;
        max_depth: number;
        parse_http_error_response: boolean;
        url_filters: string[];
        user_agent: string;
    }
    Index

    Properties

    allow_url_revisit: boolean

    allow_url_revisit allows multiple downloads of the same URL.

    allowed_domains: string[]

    allowed_domains is a domain whitelist. Leave it blank to allow any domains to be visited.

    cache_dir: string

    cache_dir specifies a location where GET requests are cached as files. When it's not defined, caching is disabled.

    check_head: boolean

    check_head performs a HEAD request before every GET to pre-validate the response.

    detect_charset: boolean

    detect_charset can enable character encoding detection for non-utf8 response bodies without explicit charset declaration.

    disallowed_domains: string[]

    disallowed_domains is a domain blacklist.

    disallowed_url_filters: string[]

    disallowed_url_filters is a list of regular expressions which restricts visiting URLs. If any of the rules matches to a URL the request will be stopped. disallowed_url_filters will be evaluated before URLFilters. Leave it blank to allow any URLs to be visited.

    id: number

    id is the unique identifier of a crawler.

    ignore_robots_txt: boolean

    ignore_robots_txt allows the Crawler to ignore any restrictions set by the target host's robots.txt file. See http://www.robotstxt.org/ for more information.

    max_body_size: number

    max_body_size is the limit of the retrieved response body in bytes. 0 means unlimited. The default value for max_body_size is 10MB (10 * 1024 * 1024 bytes).

    max_depth: number

    max_depth limits the recursion depth of visited URLs. Set it to 0 for infinite recursion (default).

    parse_http_error_response: boolean

    parse_http_error_response allows parsing HTTP responses with non 2xx status codes. By default, only successful HTTP responses will be parsed. Set parse_http_error_response to true to enable it.

    url_filters: string[]

    url_filters is a list of regular expressions which restricts visiting URLs. If any of the rules matches to a URL the request won't be stopped. disallowed_url_filters will be evaluated before url_filters. Leave it blank to allow any URLs to be visited.

    user_agent: string

    user_agent is the User-Agent string used by HTTP requests.