Skip to content

help request: upstream support retry on http custom code #12127

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
tanzhe-chinamobile-it opened this issue Apr 8, 2025 · 5 comments
Open
Assignees
Labels
feature-request question label for questions asked by users

Comments

@tanzhe-chinamobile-it
Copy link

tanzhe-chinamobile-it commented Apr 8, 2025

Description

for now,upstream admin api support retries and retry_timeout, can it support when http code such as 500/501/502/50x, do the retry thing?

i write a plugin code, but not include the real retry thing:

    local core = require("apisix.core")
    local plugin_name = "retry-on-5xx"

    local schema = {
      type = "object",
      properties = {
        status_codes = {
            type = "array",
            default = {500, 501, 502, 503},
            items = {type = "integer"}
        },

        allowed_methods = { 
          type = "array",
          default = {"GET", "HEAD"},
          items = {type = "string"}
        }
      }

    }

    

    local _M = {
        version = 0.1,
        priority = 501,        -- TODO: add a type field, may be a good idea
        name = plugin_name,
        schema = schema,

    }

    local function is_retryable(status, codes, method, allowed_methods)
      local status_num = tonumber(status) or 0
      codes = codes or {}
      allowed_methods = allowed_methods or {}
      local code_map = {}

      for _, c in ipairs(codes) do
          code_map[c] = true
      end

      local method_map = {}
      for _, m in ipairs(allowed_methods) do
          method_map[m:upper()] = true
      end

      core.log.error("Retry check params: ",
          "status=", status,
          "codes=", core.json.encode(codes),
          "method=", method,
          "allowed=", core.json.encode(allowed_methods)

      )

      return code_map[status_num] and method_map[method:upper()]
    end      

    function _M.header_filter(conf, ctx)

      local status = ctx.var.upstream_status or ngx.status
      local method = ngx.req.get_method():upper() 
      core.log.error("Retry check - Status: ", status, " Method: ", method, " status_codes:", core.json.encode(conf.status_codes), " allowed_methods:",  core.json.encode(conf.allowed_methods))
      if not is_retryable(status, conf.status_codes, method, conf.allowed_methods) then
        core.log.error("not retryable")
        return
      end

      core.log.error("Retry check triggered")
      -- do the retry...

    end

    return _M

Environment

  • APISIX version (run apisix version): v3.2.2
@github-project-automation github-project-automation bot moved this to 📋 Backlog in Apache APISIX backlog Apr 8, 2025
@dosubot dosubot bot added feature-request question label for questions asked by users labels Apr 8, 2025
@tanzhe-chinamobile-it
Copy link
Author

i mean when get 500/501/502/503 and the method is "GET" or custom http method config, the apisix will hold the request and retry the upstream for tiny network unreachable or sth else.

@Revolyssup
Copy link
Contributor

Revolyssup commented Apr 9, 2025

Yeah this can be done. We can add an optional field allowed_failure on upstream. Like when set to 5xx, proxy_retry_deadline will be set to now and retry wont happen. Though for complex use cases, it would be recommended to do it in a custom plugin. See if something like the one below does your job

local core = require("apisix.core")

local plugin_name = "retry-control"

local schema = {
    type = "object",
    properties = {
        non_retry_statuses = {
            type = "array",
            items = { type = "integer" },
            default = { 404, 412 } -- Example: Don't retry on these statuses
        }
    }
}

local _M = {
    version = 1.0,
    priority = 999, -- Set appropriate priority
    name = plugin_name,
    schema = schema,
}

function _M.check_schema(conf)
    return core.schema.check(schema, conf)
end

function _M.before_proxy(conf, ctx)
    -- Only execute during retries
    if not ctx.picked_server then
        local state, code = get_last_failure()

        -- Check if status code is in non-retry list
        if code and core.table.has_value(conf.non_retry_statuses, code) then
            core.log.warn("Aborting retry due to status code: ", code)
            ctx.proxy_retry_deadline = ngx_now() -- Force retry timeout
        end
    end

    return true
end

return _M

@Revolyssup Revolyssup self-assigned this Apr 9, 2025
@tanzhe-chinamobile-it
Copy link
Author

@Revolyssup thank you so much for attention,but i have some questions:

  1. what is before_proxy phase?
  2. why must not ctx.picked_server?
  3. ctx.proxy_retry_deadline = ngx_now() why this code can abort retry instead of retry immediately(ntuitive feeling of seeing the code)
  4. maybe we can add method for most upstream service interfaces do not perform all right in idempotency

Massive thanks for volunteering your skills! 🚀

@Revolyssup
Copy link
Contributor

@tanzhe-chinamobile-it

  1. before_proxy is a plugin phase that is executed just before sending request to upstream.
  2. ctx.picked_server is set only on first request in the access phase. On retries, the request goes back to balancer phase and this is not set which tells us that this is a retry.
  3. ctx.proxy_retry_deadline = ngx_now() will make sure that there are no more tries after it. But this time the request will be sent. Because if this try fails and request comes back in balancer phase, it will check for deadline and the deadline will be exceeded.
  4. Can you rephrase this point? I didn't get it.

@tanzhe-chinamobile-it
Copy link
Author

@Revolyssup
thank you for explain code to me! I've learn a lot!
the 4th point,for example, upstream have 2 interface, 1 is GET the other is POST. GET interface is idempotent naturally, POST interface must do sth else for the idempotent (But most likely not).
so maybe we can set GET/HEAD for the retry default value,and can add method for we are aware that the upstream service interface all has idempotency

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature-request question label for questions asked by users
Projects
Status: 📋 Backlog
Development

No branches or pull requests

2 participants