#' Plot f-differential privacy trade-off functions
#'
#' Produce a comparative plot of one or more (analytic or empirical) f-differential privacy trade-off functions.
#'
#' This is the main plotting function in the package, which produces plots of f-differential privacy (f-DP) trade-off functions in the style shown in the original f-DP paper (Dong et al., 2022).
#' If you would like a reminder of the formal definition of f-DP, please see further down this documentation page in the "Formal definition" Section.
#'
#' The `...` arguments define the trade-off functions to be plotted and can be:
#'
#' * Built-in analytic trade-off function generators such as [gdp()], [epsdelta()], [lap()].
#' * User-defined functions defining trade-off functions.
#' * Data frames containing an `alpha` and `beta` column.
#' * Numeric vectors interpreted as a sequence of `beta` values over a canonical grid of Type-I error rates `alpha = seq(0, 1, by = 0.01)`.
#'
#' We cover each of these cases in more detail in the subsequent sub-sections.
#' After that is a discussion of the two main approaches to modifying the legend labels.
#'
#' ## Built-in analytic trade-off function generators
#'
#' Most built-in trade-off function generators will take one or more arguments specifying the level of differential privacy, for example, `gdp(0.5)` corresponding to \eqn{\mu=0.5}-Gaussian differential privacy.
#'
#' These function calls can be passed directly, eg `fdp(gdp(0.5))`, and will automatically provide suitable legend names in the plot, including the detail of any argument specification.
#' So the example `fdp(gdp(0.5))` results in a legend label "0.5-GDP".
#'
#' ## User-defined trade-off functions
#'
#' Custom trade-off functions should accept a vector of Type-I error values, \eqn{\alpha}, and return the corresponding vector of Type-II error values, \eqn{\beta}.
#' In the simplest case, the user defined function will accept a single argument, so in the (unrealistic) perfect privacy setting:
#'
#' ```r
#' my_fdp <- function(a) {
#'   1 - a
#' }
#' ```
#'
#' This can then be plotted by calling `fdp(my_fdp)`.
#'
#' However, often there will be a need to pass additional arguments.
#' This is supported using the direct calling mechanism, so assume an axis offset is required for the above unrealistic example:
#'
#' ```r
#' my_fdp <- function(a, off) {
#'   pmax(0, 1 - a - off)
#' }
#' ```
#'
#' This is now called by using the dummy variable `alpha` (which need not be defined in your calling environment), `fdp(my_fdp(alpha, 0.1))`, which will produce the trade-off function curve with offset 0.1.
#'
#' ## Data frames
#'
#' One need not define a trade-off function explicitly, it can be implicitly defined by giving a set of coordinates \eqn{\{(\alpha_i, \beta_i)\}_{i=1}^n} in a two-column data frame with columns named `alpha` and `beta`.
#' These coordinates will be linearly interpolated to produce the trade-off function curve.
#' For example
#'
#' ```r
#' my_fdp <- data.frame(alpha = c(0, 0.25, 1), beta = c(1, 0.25, 0))
#' ```
#'
#' Can be used to produce the f-DP curve corresponding to \eqn{\varepsilon\approx1.09861}-differential privacy by then calling `fdp(my_fdp)`.
#' Of course, that particular example is more easily produced using the built-in analytic trade-off function generator [epsdelta()] by calling `fdp(epsdelta(1.09861))`.
#'
#' ## Numeric vectors
#'
#' Finally, it is possible to simply provide a vector of \eqn{\beta} values at the grid of \eqn{\alpha} values that `fdp()` uses internally for plotting --- that is, at the values `seq(0.0, 1.0, by = 0.01)`.
#' For example,
#'
#' ```r
#' a <- seq(0.0, 1.0, by = 0.01)
#' my_fdp <- 1 - a
#' ```
#'
#' would then produce the (unrealistic) perfect f-DP privacy curve by calling `fdp(my_fdp)`.
#'
#' ## Legend labels
#'
#' As discussed above, built-in analytic trade-off function generators will provide automatic legend labels that make sense for their particular trade-off function.
#' In all other cases, the default will be for the legend label to equal the function, data frame, or numeric vector variable name used when calling `fdp()`.
#' Thus, in all the examples above where `my_fdp` was used as the name of the function/data frame/vector the default legend label will be simply "my_fdp".
#'
#' This default can be overridden in two ways:
#'
#' 1.  by using an argument name.
#'     For example, to set the legend label to "hello" in the user-defined function with offset, one would call `fdp(hello = my_fdp(alpha, 0.1))`.
#'     This also works with spaces or special characters by using backtick quoted argument names, for example `` fdp(`So cool!` = my_fdp(alpha, 0.1)) ``.
#' 1.  by modifying the object passed with [fdp_name()] in advance.
#'     See the help file for that function for further details.
#'
#' ## Drawing method and validation
#'
#' By default, built-in and user-defined function arguments will be plotted as a trade-off function curve.
#' This means that they will first be checked to ensure the rendered line is indeed a valid trade-off function: that is, convex, non-increasing and less than \eqn{1-\alpha} (however, technically continuity cannot be checked with a finite number of calls to a black-box function).
#' If a problem is detected an error will be thrown.
#' **Note** that due to the finite precision nature of computers, it might be that these validity checks throw a false alarm, in which case you may use the `.tol` argument to increase the tolerance within which these validity checks must pass.
#'
#' In contrast, data frame/vector arguments are plotted differently depending on their size.
#' If there are at least 100 rows/elements then these will be treated in the same way as built-in and user-defined function arguments, with trade-off function validity checks.
#' However, if there are fewer rows/elements, then these will be treated as merely a collection of points, the only check being that they all lie below the \eqn{\beta = 1-\alpha} line.
#' Those points will then be plotted, together with the lower convex hull which corresponds to the lower bounding trade-off function for that collection of points.
#'
#' This default behaviour of validating and drawing a line versus computing lower convex hull and plotting points can be controlled with the [fdp_point()] and [fdp_line()] functions.
#' See those help files for further details.
#'
#' A final performance note: all function type arguments are evaluated on a uniform grid `alpha = seq(0, 1, 0.01)`.
#' To use a custom resolution, supply an explicit data frame instead of a function.
#'
#' # Formal definition (Dong et al., 2022)
#'
#' For any two probability distributions \eqn{P} and \eqn{Q} on the same space, the trade-off function
#' \deqn{T(P,Q) \colon [0,1] \to [0,1]}
#' characterises the optimal relationship between Type I and Type II errors in a hypothesis test distinguishing between them. It is defined as:
#' \deqn{T(P, Q)(\alpha) = \inf \left\{ \beta_\phi \colon \alpha_\phi \leq \alpha \right\}}
#' where the infimum is taken over all measurable rejection rules \eqn{\phi}.
#' The terms \eqn{\alpha_\phi = \mathbb{E}_P[\phi]} and \eqn{\beta_\phi = 1 - \mathbb{E}_Q[\phi]} represent the Type I and Type II errors, respectively.
#'
#' A function \eqn{f \colon [0,1] \to [0,1]} is a trade-off function if and only if it is convex, continuous, non-increasing, and satisfies \eqn{f(x) \le 1-x} for all \eqn{x \in [0,1]}.
#'
#' In the context of differential privacy, we are interested in the distributions of the output of a randomised algorithm when run on two neighbouring datasets (datasets that differ in a single record), \eqn{S} and \eqn{S'}. Let \eqn{M} be a randomised algorithm which has output probability distribution denoted \eqn{M(S)} when applied to dataset \eqn{S}. Then, each pair of neighbouring datasets generate a specific trade-off function \eqn{T(M(S), M(S'))} which characterises how hard it is to distinguish between whether dataset \eqn{S} or \eqn{S'} has been used to produce the released output. Considering all possible neighbouring datasets leads to a family of trade-off functions, the lower bound of which determines the privacy of the randomised algorithm.
#'
#' More formally, let \eqn{f} be a trade-off function.
#' A randomised algorithm \eqn{M} is said to be \eqn{f}-differentially private (f-DP) if for any pair of neighbouring datasets \eqn{S} and \eqn{S'}, the following condition holds:
#' \deqn{T(M(S), M(S')) \ge f}
#' This definition means that the task of distinguishing whether the mechanism was run on dataset \eqn{S} or its neighbour \eqn{S'} is at least as difficult as distinguishing between two canonical distributions whose trade-off function is \eqn{f}.
#'
#' Therefore, this function is concerned with plotting \eqn{T(P,Q) \colon [0,1] \to [0,1]} or \eqn{f \colon [0,1] \to [0,1]}.
#' That is, plotting a function which returns the smallest type-II error for a specified type-I error rate.
#'
#' @references
#' Andrew, A. M. (1979). “Another efficient algorithm for convex hulls in two dimensions”. _Information Processing Letters_, **9**(5), 216–219. \doi{10.1016/0020-0190(79)90072-3}.
#'
#' Dong, J., Roth, A. and Su, W.J. (2022). “Gaussian Differential Privacy”. _Journal of the Royal Statistical Society Series B_, **84**(1), 3–37. \doi{10.1111/rssb.12454}.
#'
#' @param ...
#'        One or more f-DP trade-off specifications. Each argument can be a:
#'        \itemize{
#'          \item function (user-defined or built-in, e.g. [gdp()], [epsdelta()], [lap()], etc) that when called with a numeric vector `alpha` returns a data frame with columns `alpha` and `beta`;
#'          \item data frame with columns `alpha` and `beta`;
#'          \item numeric vector of length equal to the internal alpha grid (interpreted as `beta`).
#'        }
#'        Arguments may be named to control legend labels.
#'        See Details for full explanation of different ways to pass these arguments.
#' @param .legend
#'        Character string giving the legend title.
#'        Use `NULL` (default) for no title.
#' @param .tol
#'        Numeric tolerance used when:
#'        \itemize{
#'          \item Validating \eqn{\beta}, `beta <= 1 - alpha + .tol`.
#'          \item Checking convexity for objects forced to draw as lines.
#'        }
#'
#' @return
#' A `ggplot2` object of class `c("fdp_plot", "gg", "ggplot")` displaying the supplied trade-off functions (and points, if applicable).
#' It can be further modified with additional `ggplot2` layers or combined with other `fdp_plot` objects using `+`.
#'
#' @export
#'
#' @examples
#' # Plotting mu=1 Gaussian differential privacy
#' fdp(gdp(1))
#'
#' # Plotting the f_(epsilon,delta) curve corresponding to (1, 0.1)-differential privacy
#' fdp(epsdelta(1, 0.1))
#'
#' # These can be plotted together for comparison
#' fdp(gdp(1), epsdelta(1, 0.1))
#'
#' # The same curves custom labels and a custom legend header
#' fdp("Gaussian DP" = gdp(1),
#'     "Classical DP" = epsdelta(1, 0.1),
#'     .legend = "Methods")
#'
#' # Alternatively, combine separate fdp() calls using +
#' fdp(gdp(1)) + fdp(epsdelta(1, 0.1))
fdp <- function(..., .legend = NULL, .tol = sqrt(.Machine$double.eps)) {
  # Grid of alpha we evaluate on for function arguments
  alpha <- seq(0.0, 1.0, by = 0.01)

  # Preprocess args so convert everything into values
  dotargs <- as.list(substitute(list(...)))[-1L]
  x <- preprocess_args(dotargs, alpha, .tol)
  if (length(x) == 0L)
    return(invisible(NULL))

  p <- ggplot2::ggplot() +
    ggplot2::lims(x = c(-0.01, 1.01), y = c(-0.01, 1.01)) +
    ggplot2::coord_fixed(ratio = 1.0) +
    ggplot2::geom_function(fun = \(xx) 1.0 - xx, linetype = 2L, colour = "grey", xlim = c(0.0, 1.0)) +
    ggplot2::theme_minimal() +
    ggplot2::labs(x = "Type-I error", y = "Type-II error")

  lns <- pts <- list()
  nms <- NULL
  for (i in seq_along(x)) {
    nms <- c(nms, fdp_name(x[[i]]))
    if (attr(x[[i]], "fdp_draw") == "point") {
      # Points
      if (!(attr(x[[i]], "fdp_hide_point") %||% FALSE)) {
        pts <- c(pts,
                 list(cbind(item = fdp_name(x[[i]]),
                            x[[i]])))
      }
      # Lower convex hull
      lns <- c(lns,
               list(cbind(item = fdp_name(x[[i]]),
                          lower_hull(x[[i]]))))
    } else if (attr(x[[i]], "fdp_draw") == "line") {
      if (!isTRUE(all.equal(stats::approx(lower_hull(x[[i]]), xout = x[[i]][, 1L])$y, x[[i]][, 2L], tolerance = .tol))) {
        cli::cli_abort(c(x = "Argument {i} (named {attr(x[[i]], 'fdp_name')}) is to be drawn as a line, but is not convex (ie not a trade-off function). Either there is an error or this should be passed with {.fn fdp_point}."))
      }
      lns <- c(lns,
               list(cbind(item = fdp_name(x[[i]]),
                          x[[i]])))
    } else {
      cli::cli_abort(c(x = "Argument {i} (named {attr(x[[i]], 'fdp_name')}) has unknown {.code fdp_draw} attribute set."))
    }
  }
  for (i in seq_along(lns)) {
    p <- p +
      ggplot2::geom_line(ggplot2::aes(x = .data$alpha, y = .data$beta, col = .data$item), lns[[i]])
  }
  for (i in seq_along(pts)) {
    p <- p +
      ggplot2::geom_point(ggplot2::aes(x = .data$alpha, y = .data$beta, col = .data$item), pts[[i]], size = 0.5, shape = 4L, stroke = 1.5)
  }
  p <- p + ggplot2::scale_colour_discrete(name = .legend, breaks = nms)

  # Store processed data for future combination with other fdp plots
  # Put fdp_plot class FIRST to ensure S3 method dispatch works before S7
  class(p) <- c("fdp_plot", setdiff(class(p), "fdp_plot"))
  attr(p, "fdp_data") <- x
  attr(p, "fdp_legend") <- .legend
  attr(p, "fdp_tol") <- .tol

  p
}

#' Combine fdp plots
#'
#' @description
#' Allows combining multiple `fdp()` plot objects using the `+` operator.
#'
#' @param e1 An `fdp_plot` object (the result of calling `fdp()`)
#' @param e2 Either another `fdp_plot` object or a `ggplot2` layer
#'
#' @return
#' If `e2` is an `fdp_plot`, returns a new combined `fdp_plot` object.
#' If `e2` is a `ggplot2` layer, returns a modified `ggplot2` object.
#'
#' @export
#'
#' @examples
#' # Combine two separate fdp() calls
#' fdp(gdp(0.5)) + fdp(lap(1))
#'
#' # Can still add regular ggplot2 layers
#' fdp(gdp(1)) + ggplot2::ggtitle("My Privacy Plot")
#'
#' # First legend naming takes precedence
#' fdp(gdp(0.5), .legend = "First") + fdp(lap(1), .legend = "Second")
#' # Later .legend arguments apply if none specified in prior calls
#' fdp(gdp(0.5)) + fdp(lap(1), .legend = "Second")
`+.fdp_plot` <- function(e1, e2) {
  # If e2 is also an fdp_plot, combine them
  if (inherits(e2, "fdp_plot")) {
    # Extract data from both plots
    data1 <- attr(e1, "fdp_data")
    data2 <- attr(e2, "fdp_data")
    legend1 <- attr(e1, "fdp_legend")
    legend2 <- attr(e2, "fdp_legend")
    tol <- attr(e1, "fdp_tol") # Use tolerance from first plot

    # Use legend from second plot if first is NULL, otherwise keep first
    final_legend <- if (is.null(legend1)) legend2 else legend1

    # Combine the data lists
    combined_data <- c(data1, data2)

    # Deduplicate names in the combined data
    combined_data <- deduplicate_names(combined_data)

    # Rebuild the plot with combined data
    p <- ggplot2::ggplot() +
      ggplot2::lims(x = c(-0.01, 1.01), y = c(-0.01, 1.01)) +
      ggplot2::coord_fixed(ratio = 1.0) +
      ggplot2::geom_function(fun = \(xx) 1.0 - xx, linetype = 2L, colour = "grey", xlim = c(0.0, 1.0)) +
      ggplot2::theme_minimal() +
      ggplot2::labs(x = "Type-I error", y = "Type-II error")

    lns <- pts <- list()
    nms <- NULL
    for (i in seq_along(combined_data)) {
      nms <- c(nms, fdp_name(combined_data[[i]]))
      if (attr(combined_data[[i]], "fdp_draw") == "point") {
        # Points
        if (!(attr(combined_data[[i]], "fdp_hide_point") %||% FALSE)) {
          pts <- c(pts,
                   list(cbind(item = fdp_name(combined_data[[i]]),
                              combined_data[[i]])))
        }
        # Lower convex hull
        lns <- c(lns,
                 list(cbind(item = fdp_name(combined_data[[i]]),
                            lower_hull(combined_data[[i]]))))
      } else if (attr(combined_data[[i]], "fdp_draw") == "line") {
        lns <- c(lns,
                 list(cbind(item = fdp_name(combined_data[[i]]),
                            combined_data[[i]])))
      }
    }
    for (i in seq_along(lns)) {
      p <- p +
        ggplot2::geom_line(ggplot2::aes(x = .data$alpha, y = .data$beta, col = .data$item), lns[[i]])
    }
    for (i in seq_along(pts)) {
      p <- p +
        ggplot2::geom_point(ggplot2::aes(x = .data$alpha, y = .data$beta, col = .data$item), pts[[i]], size = 0.5, shape = 4L, stroke = 1.5)
    }
    p <- p + ggplot2::scale_colour_discrete(name = final_legend, breaks = nms)

    # Store combined data for further combinations
    # Put fdp_plot class FIRST to ensure S3 method dispatch works before S7
    class(p) <- c("fdp_plot", setdiff(class(p), "fdp_plot"))
    attr(p, "fdp_data") <- combined_data
    attr(p, "fdp_legend") <- final_legend
    attr(p, "fdp_tol") <- tol

    return(p)
  }

  # Otherwise, fall back to default ggplot2 behavior
  # (adding a regular ggplot2 layer to an fdp_plot)
  # Remove fdp_plot class temporarily to allow ggplot2's + method to work
  orig_class <- class(e1)
  class(e1) <- setdiff(class(e1), "fdp_plot")
  result <- e1 + e2
  # Restore fdp_plot class and attributes
  class(result) <- orig_class
  attr(result, "fdp_data") <- attr(e1, "fdp_data")
  attr(result, "fdp_legend") <- attr(e1, "fdp_legend")
  attr(result, "fdp_tol") <- attr(e1, "fdp_tol")
  result
}
