---
title: "Extending factoextra to support new analysis backends"
author: "Alboukadel Kassambara"
output:
  rmarkdown::html_vignette:
    toc: true
    toc_depth: 3
vignette: >
  %\VignetteIndexEntry{Extending factoextra to support new analysis backends}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r setup, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  # All chunks are illustrative: they document the extension contract rather
  # than execute backend code, so the vignette builds without any optional
  # backend package (ExPosition, ade4, vegan, ...) installed.
  eval = FALSE
)
```

## Why this guide exists

`factoextra` visualizes the output of many different multivariate-analysis
packages (FactoMineR, stats, ade4, ca, MASS, ExPosition) through **one unified,
ggplot2-based interface**. The same `fviz_pca_*()`, `fviz_ca_*()`,
`fviz_mca_*()` functions work whether your PCA came from `stats::prcomp()`,
`FactoMineR::PCA()`, `ade4::dudi.pca()` or `ExPosition::epPCA()`.

This is possible because of a strict two-layer design. Adding a **new backend**
(say, an analysis object from another package) almost never requires touching the
plotting code: you only teach a handful of *extractor* functions how to read the
new object. This vignette documents that contract precisely so you can add a
backend yourself and submit it as a pull request.

## Quick start: bring your own coordinates with `as_factoextra_pca()`

Before writing any backend code, check whether the **constructor** is all you
need. If you already have coordinates from *any* method, wrap them with
`as_factoextra_pca()` and the whole `fviz_pca_*()` family works immediately — no
source edits, no S3 methods.

The constructor takes individual coordinates (`ind.coord`) and, optionally,
variable coordinates/loadings (`var.coord`) and eigenvalues (`eig`). Squared
cosines and contributions are derived from the coordinates when you don't supply
them.

Here we *illustrate* it with a base-R PCA (`stats::prcomp()`) — passing its
scores, loadings and eigenvalues through the constructor reproduces a standard
factoextra biplot:

```{r quick-start-prcomp}
library(factoextra)

# A base-R PCA (any analysis that yields coordinates would do)
pca <- prcomp(iris[, -5], scale. = TRUE)

# Wrap the coordinates into a factoextra-ready object
res <- as_factoextra_pca(
  ind.coord = pca$x,         # individual scores  (n x k)
  var.coord = pca$rotation,  # variable loadings  (p x k)
  eig       = pca$sdev^2     # eigenvalues        (optional)
)

# Now the full fviz_pca_* family works on `res`
fviz_pca_biplot(res, label = "var", habillage = iris$Species,
                addEllipses = TRUE)
fviz_eig(res)
fviz_contrib(res, choice = "var", axes = 1)
```

The same pattern turns the output of methods factoextra doesn't natively know
about into elegant biplots — e.g. classical MDS or a UMAP/t-SNE embedding
(coordinates only; supply `var.coord` only if your method has variable loadings):

```{r quick-start-mds}
mds <- stats::cmdscale(dist(scale(mtcars)), k = 3)   # classical MDS
fviz_pca_ind(as_factoextra_pca(ind.coord = mds), repel = TRUE)
```

When you don't pass `eig`, it defaults to the variance of each coordinate column
(the natural eigenvalue), so the scree plot and axis percentages still work. When
you don't pass `cos2`/`contrib`, they are computed from the coordinates
(`cos2` is then the quality of representation *within the supplied dimensions*).

Use the constructor when you have results in hand. Read on if instead you want
factoextra to recognise a **class** of analysis objects directly (so users of
your package can call `fviz_pca_biplot(my_object)` without the constructor).

## The two-layer architecture

For every analysis family, factoextra has two layers:

1. **`get_<method>()` — extract.** Reads coordinates, squared cosines (`cos2`)
   and contributions from a backend object and returns them as a *standardized*
   list. This is where backend-specific knowledge lives.
2. **`fviz_<method>()` — visualize.** Builds the ggplot from that standardized
   list. It is backend-agnostic.

The glue is `facto_summarize()`, which calls `.get_facto_class()` to identify the
analysis type, dispatches to the right `get_*()` extractor, and hands a tidy data
frame to `fviz()`.

> **Key insight.** Once `.get_facto_class()`, `get_eig()` and the relevant
> `get_*()` extractor recognise your class, **every** `fviz_*()`,
> `fviz_contrib()`, `fviz_cos2()` and screeplot function works automatically —
> because none of them hard-code backend class names; they only read `$coord`,
> `$cos2` and `$contrib`.

So extending factoextra means editing **three or four functions**, not fifty.

## The standardized return contract

Every extractor returns a list whose elements are matrices with:

- columns named `Dim.1`, `Dim.2`, ... (one per retained axis), and
- row names equal to the individual / variable / category names.

| Extractor | Required list elements |
|-----------|------------------------|
| `get_pca_ind()` | `$coord`, `$cos2`, `$contrib` |
| `get_pca_var()` | `$coord`, `$cor`, `$cos2`, `$contrib` |
| `get_ca_row()` / `get_ca_col()` | `$coord`, `$cos2`, `$contrib`, `$inertia` |
| `get_mca_ind()` / `get_mca_var()` | `$coord`, `$cos2`, `$contrib` |

The list is given S3 class `c("factoextra", ...)`. `get_eig()` returns a data
frame (see below). Contributions are expressed as **percentages** (note the
`* 100` in the examples).

## Step 1 — Teach `.get_facto_class()` your class

`.get_facto_class()` (in `R/utilities.R`) maps a backend object to one canonical
analysis token: `"PCA"`, `"CA"`, `"MCA"`, `"FAMD"`, `"MFA"` or `"HMFA"`. Add one
`else if` branch before the final `stop()`:

```{r}
# inside .get_facto_class()
else if (inherits(X, "myPCA"))            # your new class
  facto_class <- "PCA"
```

For a *wrapper* object (a list that carries the real result in a slot), test the
inner class — this is exactly how ExPosition is handled:

```{r}
else if (inherits(X, "expoOutput")) {
  if      (inherits(X$ExPosition.Data, "epCA"))  facto_class <- "CA"
  else if (inherits(X$ExPosition.Data, "epPCA")) facto_class <- "PCA"
  else if (inherits(X$ExPosition.Data, "epMCA")) facto_class <- "MCA"
}
```

## Step 2 — Teach `get_eig()` your eigenvalues

`get_eig()` (in `R/eigenvalue.R`) returns a data frame with exactly these three
columns and `Dim.N` row names:

```
              eigenvalue variance.percent cumulative.variance.percent
Dim.1            2.918             72.96                       72.96
Dim.2            0.914             22.85                       95.81
...
```

You only need to supply the raw eigenvalues; the helper normalizes them to
percentages and cumulative percentages. Add a branch:

```{r}
# inside get_eig(), before the final stop()
else if (inherits(X, "myPCA"))
  eig <- X$eigenvalues          # a numeric vector of eigenvalues
```

If your object already stores a FactoMineR-style `$eig` data frame, it is used
as-is. The ExPosition branch is a one-liner:

```{r}
else if (inherits(X, "expoOutput"))
  eig <- X$ExPosition.Data$eigs
```

## Step 3 — Teach the `get_*()` extractor(s)

Add one `else if` branch per extractor for your method. Below are the real
templates you can copy.

### PCA

`get_pca_ind()` and `get_pca_var()` live in `R/get_pca.R`. The two existing
templates are `prcomp` (stats) and `dudi` (ade4):

```{r}
# get_pca_ind(): stats::prcomp template
else if (inherits(res.pca, "prcomp")) {
  ind.coord <- res.pca$x
  data      <- .prcomp_reconst(res.pca)
  ind <- .get_pca_ind_results(ind.coord, data, res.pca$sdev^2,
                              res.pca$center, res.pca$scale)
}

# get_pca_ind(): ade4 dudi template
else if (inherits(res.pca, "pca") && inherits(res.pca, "dudi")) {
  ind.coord <- res.pca$li
  data <- sweep(res.pca$tab, 2, res.pca$norm, "*")
  data <- sweep(data,        2, res.pca$cent, "+")
  ind <- .get_pca_ind_results(ind.coord, data, res.pca$eig,
                              res.pca$cent, res.pca$norm)
}
```

```{r}
# get_pca_var(): stats::prcomp template
else if (inherits(res.pca, "prcomp")) {
  var.cor <- sweep(res.pca$rotation, 2, res.pca$sdev, "*")
  var <- .get_pca_var_results(var.cor)
}

# get_pca_var(): ade4 dudi template
else if (inherits(res.pca, "pca") && inherits(res.pca, "dudi")) {
  var <- .get_pca_var_results(res.pca$co)
}
```

The helpers `.get_pca_ind_results()` and `.get_pca_var_results()` compute `cos2`
and `contrib` for you, so if you can produce coordinates you are nearly done.

### CA (and the mass helper you must not forget)

For CA backends edit `get_ca_row()` and `get_ca_col()` in `R/get_ca.R`, returning
`$coord`, `$cos2`, `$contrib` and `$inertia` with `Dim.N` columns:

```{r}
# get_ca_row(): build the standardized list directly
else if (inherits(res.ca, "myCA")) {
  coord   <- res.ca$row.coord
  cos2    <- res.ca$row.cos2
  contrib <- res.ca$row.contrib * 100
  inertia <- res.ca$row.inertia
  colnames(coord) <- colnames(cos2) <- colnames(contrib) <-
    paste0("Dim.", seq_len(ncol(coord)))
  row <- list(coord = coord, cos2 = cos2, contrib = contrib, inertia = inertia)
}
```

**Don't forget `.get_ca_mass()`** (in `R/utilities.R`). CA biplot scaling
(symmetric / row-/col-principal maps) needs row and column masses. Add a branch
returning a named numeric vector of masses:

```{r}
# inside .get_ca_mass()
else if (inherits(res.ca, "myCA")) {
  if      (element == "row") mass <- res.ca$row.mass
  else if (element == "col") mass <- res.ca$col.mass
}
```

### MCA

Edit `get_mca_ind()` and `get_mca_var()` in `R/get_mca.R`, returning `$coord`,
`$cos2`, `$contrib` (same `Dim.N` convention as above).

## A complete worked example: ExPosition

ExPosition is the most recent multi-family backend (PCA + CA + MCA) and is the
best template to study. Every place it is wired in:

| Extension point | Location (file / line) |
|-----------------|-----------|
| Class dispatch (wrapper) | `R/utilities.R:103-107` |
| Eigenvalues | `R/eigenvalue.R:123-124` |
| `get_pca_ind()` | `R/get_pca.R:92-95` |
| `get_pca_var()` | `R/get_pca.R:133-139` |
| `get_ca_col()` | `R/get_ca.R:127-136` |
| `get_ca_row()` | `R/get_ca.R:213-221` |
| `.get_ca_mass()` | `R/utilities.R:473-476` |
| `get_mca_var()` | `R/get_mca.R:120-129` |
| `get_mca_ind()` | `R/get_mca.R:168-175` |

For instance the PCA-individual branch simply maps ExPosition's slots onto the
standardized names:

```{r}
# R/get_pca.R — ExPosition epPCA, individuals
else if (inherits(res.pca, "expoOutput") &&
         inherits(res.pca$ExPosition.Data, "epPCA")) {
  res <- res.pca$ExPosition.Data
  ind <- list(coord = res$fi, cos2 = res$ri, contrib = res$ci * 100)
}
```

Once these branches exist, the full ExPosition workflow plots like any other:

```{r}
library(ExPosition)
library(factoextra)

res.pca <- epPCA(iris[, -5], graph = FALSE)
fviz_pca_ind(res.pca, habillage = iris$Species, addEllipses = TRUE)
fviz_eig(res.pca)
fviz_contrib(res.pca, choice = "var")
```

## A forward-looking template: vegan (RDA / CCA)

Ecology users often ask for `vegan::rda()` / `vegan::cca()` support. These are
not bundled (they are method-specific), but they are a good illustration of how
you would add a backend yourself. A constrained ordination is CA-like, so you
would map vegan's scores onto the CA extractors. **This is a sketch, not a
supported path** — treat it as a starting point for a contribution:

```{r}
# .get_facto_class(): treat an unconstrained CA component as CA
else if (inherits(X, c("cca", "rda")))
  facto_class <- "CA"

# get_eig(): vegan stores eigenvalues per axis
else if (inherits(X, c("cca", "rda")))
  eig <- vegan::eigenvals(X)

# get_ca_row()/get_ca_col(): use vegan::scores() for sites / species
else if (inherits(res.ca, c("cca", "rda"))) {
  sc      <- vegan::scores(res.ca, display = "sites", scaling = "sites")
  coord   <- as.matrix(sc)
  colnames(coord) <- paste0("Dim.", seq_len(ncol(coord)))
  # cos2 / contrib / inertia derived from the ordination as appropriate
  row <- list(coord = coord, cos2 = cos2, contrib = contrib, inertia = inertia)
}
```

The exact mapping (scaling choice, how to derive `cos2`/`contrib`, constrained
vs. unconstrained axes) is the real work and is method-specific — which is why it
belongs in a focused contribution rather than the core package.

## Testing your backend

Mirror the existing smoke tests in
`tests/testthat/test-ca-backends-smoke.R`. They assert the standardized contract
holds, which is exactly what a new backend must satisfy:

```{r}
test_that("CA extractors work with <your package> outputs", {
  skip_if_not_installed("yourpkg")
  data("housetasks", package = "factoextra")
  res <- yourpkg::your_ca(housetasks)

  cols <- get_ca_col(res)
  rows <- get_ca_row(res)

  expect_s3_class(cols, "factoextra")
  expect_true(all(c("coord", "cos2", "contrib") %in% names(cols)))
  expect_equal(nrow(cols$coord), ncol(housetasks))
  expect_equal(nrow(rows$coord), nrow(housetasks))
})
```

Use `skip_if_not_installed()` so the test is skipped when the optional backend is
absent (backends belong in `Suggests`, never `Imports`).

## Submitting a backend: checklist

1. Add the needed `else if` branches: `.get_facto_class()`, `get_eig()`, the
   relevant `get_*()` extractor(s), and `.get_ca_mass()` for CA.
2. Keep changes **gated** — each new branch only fires for the new class, so all
   existing backends are byte-identical (factoextra's non-regression policy).
3. Add a smoke test mirroring `test-ca-backends-smoke.R`, guarded with
   `skip_if_not_installed()`.
4. Put the backend package in `Suggests` (not `Imports`).
5. Update `NEWS.md` and open a pull request at
   <https://github.com/kassambara/factoextra/issues>.

That's it — no `fviz_*()` changes required.
