xk6-crawler
    Preparing search index...

    Class Crawler

    This is the crawler module's main class.

    import { Crawler } from "k6/x/crawler";

    export default function () {
    const c = new Crawler({ max_depth: 2 });

    c.onHTML("a[href]", (e) => {
    if (e.attr("href").startsWith("/")) {
    e.request.visit(e.attr("href"));
    }
    });

    c.onResponse((r) => {
    console.log(r.status_code, r.request.url);
    });

    c.visit("https://grafana.com");
    }
    Index

    Constructors

    • Creates a new instance of Crawler. Options that modify the behavior of the Crawler can be passed in the optional opts parameter.

      Parameters

      Returns Crawler

      const c = new Crawler({ max_depth: 2, parse_http_error_response: true });
      

    Methods

    • Register a function to be called on every HTML element matched by the selector parameter.

      Processing different HTML elements can be conveniently done using multiple onHTML callbacks.

      Parameters

      • selector: string

        element selector

      • cb: HTMLCallback

        callback function

      Returns void

      c.onHTML("title", (e) => {
      titles[e.request.url] = e.text;
      });

      c.onHTML("a[href]", (e) => {
      e.request.ctx.put("page_href", e.request.url);
      e.request.ctx.put("link_text", e.text);
      e.request.visit(e.attr("href"));
      });
    • Deregister the function associated with the given selector.

      Parameters

      • selector: string

        element selector

      Returns void

    • Register a function to be called on every request.

      With the onRequest callback function, you can customize the HTTP request before it is executed.

      Parameters

      Returns void

      c.onRequest((r) => {
      // ...
      });
    • Register a function to be called on every response.

      The onResponse callback function is called after the response has been received.

      Parameters

      Returns void

      c.onResponse((r) => {
      // ...
      });
    • Register a function to be called on every response when headers and status are already received, but body is not yet read.

      Like in onRequest, you can call request.abort to abort the transfer. This might be useful if, for example, you're following all hyperlinks, but want to avoid downloading files.

      Parameters

      Returns void

    • Register a function to be called at the end of scraping. Function will be executed after onHTML, as a final part of the scraping.

      Parameters

      Returns void

    • Starts Crawlers's collecting job by creating a request to the URL specified in parameter. It also calls the previously provided callbacks.

      Parameters

      • url: string

        start URL

      Returns void

      c.visit("https://grafana.com");