Skip to content

Delegate requests to httr2 #20

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

dfalbel
Copy link
Collaborator

@dfalbel dfalbel commented Apr 9, 2025

This is a draft PR for discussion. The idea here is to centralize all requests made by ragnar to use httr2 and add a callback allowing users to modify requests. This allows users to use a central cache that can be shared for both finding links and reading documents. It also allows users to modify requests such that they can be retried, have a timeout, etc.

Here's an example:

modify_request <- function(req) {
  req |>
    httr2::req_cache(path = "cache")
}


links <- ragnar::ragnar_find_links(
  "https://quarto.org",
  depth = 3,
  children_only = FALSE,
  url_filter = function(x) {
    # We only want quarto links from official repositories
    # - quarto.org
    stringi::stri_subset(x, regex = "^https://quarto.org")
  },
  modify_request = modify_request
)

Then links can be read with:

doc <- link |>
      ragnar::ragnar_read(
        frame_by_tags = "h1",
        modify_request = modify_request
      )

Another idea, would be for find_links to return httr2_response objects (eg being something very similar to what ragnar_perform_spider() would do r-lib/httr2#456. And ragnar_read being able to read httr2_response objects additionally.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant