Workflow Example

fingerpro@eead.csic.es

GitHub repository

CRAN page

This vignette presentsa complete workflow in fingerPro, including data verification, exploratory analysis, tracer selection, unmixing, visualization, and validation the results.

The example dataset example_geochemical_3s_raw.csv, included in the package, is used to illustrate step by step the workflow.

1. Load and verify the data

install.packages("fingerPro")
library(fingerPro)

Load the example dataset included in the package:

data <- read_database(
  system.file("extdata", "example_geochemical_3s_raw.csv", package = "fingerPro")
)
Preview: example_geochemical_3s_raw.csv
ID samples Ba Nb Zr Sr Rb Pb Zn Fe Mn Cr Ti Ca Al P Si Mg V
1 Source1 272.77 10.47 186.48 360.84 62.25 12.08 47.43 20105.14 259.01 90.34 2876.70 185988.2 35149.08 1104.07 161458.6 3944.15 56.67
2 Source1 342.37 12.08 226.51 392.19 78.22 14.92 62.26 22804.77 250.86 78.39 3389.78 158492.0 41484.38 1064.15 169675.8 3992.01 59.63
3 Source1 351.12 10.43 178.56 522.67 77.19 14.87 71.18 21169.07 305.97 61.64 3340.13 176925.6 39449.94 1314.66 168952.0 3840.61 42.11
4 Source1 302.87 11.51 157.54 490.00 79.21 13.50 67.41 23004.56 396.77 80.32 3183.65 171179.3 41774.51 1116.09 165760.3 3507.03 61.27
5 Source1 306.89 10.94 224.24 439.45 53.82 16.29 44.33 18263.02 324.41 66.40 2915.43 198378.5 32408.88 1111.35 157717.9 3545.03 41.31
6 Source1 389.35 10.69 170.48 449.07 84.29 17.56 66.89 24718.21 395.48 69.44 3241.24 168063.6 44404.34 1286.99 173154.1 3834.79 69.14

2. Exploratory analysis

Before selecting tracers and running the unmixing model, explore your data.

Boxplots

box_plot(data)

If the number of tracers is large, the output may span multiple pages. For additional options, such as navigating between pages (page =), customizing colors (colors =), or adjusting the layout of the plots (n_row =, n_col =), consult the function documentation:

help("unmix")
box_plot(data, page = 2)

box_plot(data, page = 3)

Correlation analysis

correlation_plot(data)

Linear Discriminant Analysis (LDA)

LDA_plot(data)

Principal Component Analysis (PCA)

PCA_plot(data)

Individual tracer analysis and ternary diagrams

The individual tracer analysis can be explored visually with ternary diagrams.

This step is especially useful for cases with three sources.

ternary_diagram(data)

If the number of tracers is large, the output may span multiple pages. To see additional pages include this argument in the funtion.

ternary_diagram(data, page = 2)

ternary_diagram(data, page = 3)

Range test

The range test identifies tracers whose mixture values fall outside the range defined by the sources.

range_test(data)

3. Tracer selection

Tracer selection is a key step in fingerPro. The process combines pre-screening, tracer ranking, and the exploration of consistent tracer combinations using the CTS method

CTS_explore

The CTS workflow starts by exploring all possible minimal tracer combinations using the funtion CTS_explore:

tracers_seeds <- CTS_explore(data, iter = 1000)
Preview: Minimal tracer combinations
seed_id tracers w1 w2 w3 percent_physical sd_w1 sd_w2 sd_w3 max_sd_wi
1 Cr P 0.4352013 0.2851417 0.2796570 99.4 0.0820186 0.0877820 0.0395752 0.0877820
2 Cr Mg 0.3957113 0.2746657 0.3296230 96.5 0.1166657 0.0952257 0.1137979 0.1166657
3 Cr V 0.4085380 0.2780684 0.3133936 93.0 0.1670452 0.0885184 0.1433845 0.1670452
4 Zr P 0.6511841 0.0465297 0.3022862 63.0 0.1707133 0.1707169 0.0413458 0.1707169
5 Rb Cr 0.6628971 0.3455455 -0.0084426 48.6 0.1781801 0.0930122 0.1407672 0.1781801
6 Zr Cr 0.2836381 0.2449346 0.4714274 85.2 0.2042065 0.0938298 0.1817926 0.2042065

The user must select one of these combinations (select a seed) to extent into a final tracer subset using the function CTS_select: Select a seed based on the following criteria:

Combinations with low dispersion indicate a higher discriminant capacity of the selected tracers.

In practice, the user inspects the output table and selects one row (seed) that provides a good balance between feasibility and low dispersion. This selected seed is then used as input in the CTS_select function.

In this example, the first ranked combination (row 1) is selected as the seed.

CTS_select

selected_data <- CTS_select(data, tracers_seeds, seed_id = 1, error_threshold = 0.05)
Preview: dataset after CTS_select
ID samples Cr P Mg V
1 Source1 90.34 1104.07 3944.15 56.67
2 Source1 78.39 1064.15 3992.01 59.63
3 Source1 61.64 1314.66 3840.61 42.11
4 Source1 80.32 1116.09 3507.03 61.27
5 Source1 66.40 1111.35 3545.03 41.31
6 Source1 69.44 1286.99 3834.79 69.14

At this stage, selected_data contains the tracer subset that will be used in the unmixing model.

4. Unmixing and Visualize the results

The selected tracer subset can now be used to estimate source apportionments.

A quick run can be obtained with the default settings:

output_unmix <- unmix(selected_data)
Preview: unmixing results
ID Source1 Source2 Source3 GOF
Mixture (60) 0.3923894 0.2849740 0.3226366 0.9806114
Mixture (60) 0.4212065 0.2826654 0.2961281 0.9785353
Mixture (60) 0.4212065 0.2826654 0.2961281 0.9785353
Mixture (60) 0.4848043 0.2830437 0.2321519 0.9726158
Mixture (60) 0.3570352 0.3287364 0.3142284 0.9809761
Mixture (60) 0.5087059 0.2138281 0.2774660 0.9748148

Advanced analyses can be performed by adjusting arguments such as iter, variability, lvp, constrained, and resolution. These options allow the user to tailor the model settings to the characteristics of the dataset.

For a full description of the available arguments, consult the function documentation:

help("unmix")

The source apportionment results can be displayed using density plots or violin plots.

Density plots

plot_results(output_unmix, violin = FALSE, )

Violin plots

plot_results(output_unmix, violin = TRUE,)

These plots help visualize the distribution of source contributions and the variability in the model results.

5. Validate the results

Finally, the apportionment solution can be checked for mathematical consistency.

The validate_results function allows the user to assess the mathematical consistency of a given set of source apportionments. The apportionments can come from the fingerPro model or from any other model, and are evaluated against the tracer dataset used for unmixing.

The user must provide:

The function computes the normalized error between the observed tracer values in the mixture and the values predicted from the proposed apportionments.

Low normalized error values indicate that the solution is consistent with the selected tracers, whereas high values may suggest inconsistencies or that the proposed apportionment is not supported by the data.

apportionments <- c(0.435, 0.285, 0.280)
normalized_error <- validate_results(selected_data, apportionments)
Preview: normalized error values from validate_results
tracer normalized_error
Cr 0.0000321
P 0.0002210
Mg 0.0198135
V 0.0108719

Low normalized error values indicate that the proposed solution is consistent with the selected tracers.

Final remarks

This workflow should be repeated independently for each mixture, since optimum tracer selection depends on the combined information from the sources and the specific mixture under study.

.R Script for beginner users

For beginner users who are not familiar with R Markdown, you can copy the code below into an .R script and run it in R or RStudio step by step.

###################################    
###### 0. Install and set wd
###################################
install.packages("fingerPro") # one time
setwd("C:/your/file/directory") # your own working directory (wd)


###################################    
###### 1. Load and verify the data
###################################      
library(fingerPro)
data <- read_database(system.file("extdata", "example_geochemical_3s_raw.csv", package = "fingerPro")) # Input example dataset


###################################    
###### 2. Exploratory analysis
###################################    


###### Box plots

box_plot(data)

box_plot(data, page = 1) # Visualise a specific page (e.g. page 1)
box_plot(data, page = 2) # Visualise a specific page (e.g. page 2)
box_plot(data, page = 3) # Visualise a specific page (e.g. page 3)
box_plot(data, n_row = 3, n_col = 6,) # Visualise all tracers

# Save results as a PNG image
png("output_boxplot_all.png", width = 30, height = 15, units = "cm", res = 300) # to save .png results
box_plot(data, n_row = 3, n_col = 6,) # Visualise all tracers
dev.off()

# Check 'help' for more information 
help("box_plot")


###### Correlation analysis

correlation_plot(data)

correlation_plot(data, columns = c(1:8)) # correlation plot of  n tracers (e.g. 1 to 8)

# Save results as a PNG image
png("output_correlationplot_tracers1-8.png", width = 25, height = 15, units = "cm", res = 300) # to save .png results
correlation_plot(data, columns = c(1:8)) # correlation plot of  n tracers (e.g. 1 to 8)
dev.off()

# Check 'help' for more information 
help("correlation_plot")


###### Linear Discriminant Analysis (LDA)

LDA_plot(data)

# Save results as a PNG image
png("output_LDA.png", width = 15, height = 12, units = "cm", res = 300) # to save .png results
LDA_plot(data)
dev.off()


###### Principal Component Analysis (PCA)

PCA_plot(data)

# Save results as a PNG image
png("output_PCA.png", width = 15, height = 12, units = "cm", res = 300) # to save .png results
PCA_plot(data)
dev.off()


###### Individual tracer analysis and ternary diagrams

output_ternary <- ternary_diagram(data)

ternary_diagram(data, page = 1) # Visualise a specific page (e.g. page 1)
ternary_diagram(data, page = 2) # Visualise a specific page (e.g. page 2)
ternary_diagram(data, page = 3) # Visualise a specific page (e.g. page 3)
ternary_diagram(data, rows = 4, cols = 5)  # Visualise all tracers

# e.g. Save ternary_diagram results as a PNG image
png("output_ternary_all.png", width = 18, height = 12, units = "cm", res = 300) # to save .png results
output_ternary_all <- ternary_diagram(data, rows = 4, cols = 5)  # Visualise all tracers
dev.off()

# Check 'help' for more information 
help("ternary_diagram")


###### Range test
data_rangetest <- range_test(data)
write.csv(data_rangetest, "output_rangetest.csv")


###################################    
###### 3. Tracer selection
###################################    


###### CTS_explore

tracers_seeds <- CTS_explore(data, iter = 1000)
write.csv(tracers_seeds, "output_CTS_explore_tracers_seeds.csv")

# Check 'help' for more information 
help("CTS_explore")


###### CTS_select

selected_data <- CTS_select(data, tracers_seeds, seed_id = 1, error_threshold = 0.05) # (e.g. Seed 1 selected with an error of 5% (0.05))
write.csv(selected_data, "output_CTS_select_selected_data.csv")

# Check 'help' for more information 
help("CTS_select")

###################################    
###### 4. Unmix
###################################    

output_unmix <- unmix(selected_data)
write.csv(output_unmix, "output_unmix.csv")

# Check 'help' for more information 
help("unmix")

plot_results(output_unmix, violin = FALSE) # Density plot
plot_results(output_unmix, violin = TRUE) # Violing plot

# save density plot
png("output_unmix_densityplot.png", width = 18, height = 12, units = "cm", res = 300) # to save .png results
plot_results(output_unmix, violin = FALSE) # Density plot
dev.off()

# save violin plot
png("output_unmix_violinplot.png", width = 18, height = 12, units = "cm", res = 300) # to save .png results
plot_results(output_unmix, violin = TRUE) # Violing plot
dev.off()


###################################    
###### 5. Validate results
###################################    

apportionments <- c(0.435, 0.285, 0.280)
normalized_error <- validate_results(selected_data, apportionments = c(0.435, 0.285, 0.280), error_threshold = 0.05)
write.csv(normalized_error, "output_validate_results_normalized_error.csv")

# Check 'help' for more information 
help("validate_results")