Class Crawler

This is the crawler module's main class.

import { Crawler } from "k6/x/crawler";

export default function () {
const c = new Crawler({ max_depth: 2 });

c.onHTML("a[href]", (e) => {
if (e.attr("href").startsWith("/")) {
e.request.visit(e.attr("href"));
}
});

c.onResponse((r) => {
console.log(r.status_code, r.request.url);
});

c.visit("https://grafana.com");
}

Constructors

  • Creates a new instance of Crawler. Options that modify the behavior of the Crawler can be passed in the optional opts parameter.

    Parameters

    Returns Crawler

    const c = new Crawler({ max_depth: 2, parse_http_error_response: true });
    

Methods

  • Register a function to be called on every HTML element matched by the selector parameter.

    Processing different HTML elements can be conveniently done using multiple onHTML callbacks.

    Parameters

    • selector: string

      element selector

    • cb: HTMLCallback

      callback function

    Returns void

    c.onHTML("title", (e) => {
    titles[e.request.url] = e.text;
    });

    c.onHTML("a[href]", (e) => {
    e.request.ctx.put("page_href", e.request.url);
    e.request.ctx.put("link_text", e.text);
    e.request.visit(e.attr("href"));
    });
  • Deregister the function associated with the given selector.

    Parameters

    • selector: string

      element selector

    Returns void

  • Register a function to be called on every request.

    With the onRequest callback function, you can customize the HTTP request before it is executed.

    Parameters

    Returns void

    c.onRequest((r) => {
    // ...
    });
  • Register a function to be called on every response.

    The onResponse callback function is called after the response has been received.

    Parameters

    Returns void

    c.onResponse((r) => {
    // ...
    });
  • Register a function to be called on every response when headers and status are already received, but body is not yet read.

    Like in onRequest, you can call request.abort to abort the transfer. This might be useful if, for example, you're following all hyperlinks, but want to avoid downloading files.

    Parameters

    Returns void

  • Register a function to be called at the end of scraping. Function will be executed after onHTML, as a final part of the scraping.

    Parameters

    Returns void

  • Starts Crawlers's collecting job by creating a request to the URL specified in parameter. It also calls the previously provided callbacks.

    Parameters

    • url: string

      start URL

    Returns void

    c.visit("https://grafana.com");