Getting Started

fingerpro@eead.csic.es

GitHub repository

CRAN page

fingerPro is a flexible framework for sediment source fingerprinting that integrates data exploration, tracer selection, and unmixing to estimate, visualize, and validate source apportionments.

This vignette is intended for users who want to start working with their own databases. It explains how to organize the analysis, how to prepare a valid input file, and how to validate the structure of the dataset before running the workflow.

A key practical idea

In fingerPro, each mixture must be analysed independently. Optimum tracer selection depends on the combined information from both the sources and the mixture. Therefore, tracer selection must be performed separately for each mixture.

For this reason, it is strongly recommended to organize the analysis using one folder per mixture. Each folder should contain the input database, together with all figures and output files generated during the analysis.

Using different sets of optimum tracers for different mixtures is not a limitation of the method. Instead, it reflects the adaptation of the model to the specific characteristics of each dataset. Therefore, comparisons between results obtained for different mixtures remain valid even when different tracer sets have been selected.

Installation

Install from CRAN:

install.packages("fingerPro")

Or from a local file:

install.packages("FingerPro_2.1.tar.gz", repos = NULL, type = "source")

Load package

library(fingerPro)

Organizing your project folder

When working with your own .csv file, set the working directory to the folder containing the input database:

setwd("C:/your/project/folder")

Reading and validating your data

To read and validate your own input database, place the .csv file in your project folder and use read_dataset():

data <- read_database("my_input_database.csv")

Preparing your own database

Before starting, it is important to prepare your input database following the structure of the example datasets provided in the package.

A valid database should include:

In all cases, the mixture must be placed at the end of the dataset. If multiple mixture samples are available, they must share the same name in the samples column but have different ID values.

To retain conservative tracers for subsequent analyses, it is recommended to perform a basic data cleaning beforehand:

Supported input formats

Raw dataset | Scalar tracers

This format contains individual measurements for scalar tracers.

Required structure:

Raw dataset | Isotopic tracers

This format contains individual measurements for isotopic tracers.

Required structure:

Averaged dataset | Scalar tracers

This format contains statistical summaries of scalar tracers.

Required structure:

Averaged dataset | Isotopic tracers

This format contains statistical summaries of isotopic tracers.

Required structure:

  • ID: ID
  • samples: samles
  • mean_ratio1, mean_ratio2, ...: mean_
  • mean_cont_ratio1, mean_cont_ratio2, ...: mean_cont_
  • sd_ratio1, sd_ratio2, ...: sd_
  • sd_cont_ratio1, sd_cont_ratio2, ...: sd_cont_
  • n: number of measurements in the last column

Example datasets

The package includes four example datasets:

Preview Example datasets

Preview: example_geochemical_3s_raw.csv
ID samples Ba Nb Zr Sr Rb Pb Zn Fe Mn Cr Ti Ca Al P Si Mg V
1 Source1 272.77 10.47 186.48 360.84 62.25 12.08 47.43 20105.14 259.01 90.34 2876.70 185988.2 35149.08 1104.07 161458.6 3944.15 56.67
2 Source1 342.37 12.08 226.51 392.19 78.22 14.92 62.26 22804.77 250.86 78.39 3389.78 158492.0 41484.38 1064.15 169675.8 3992.01 59.63
3 Source1 351.12 10.43 178.56 522.67 77.19 14.87 71.18 21169.07 305.97 61.64 3340.13 176925.6 39449.94 1314.66 168952.0 3840.61 42.11
4 Source1 302.87 11.51 157.54 490.00 79.21 13.50 67.41 23004.56 396.77 80.32 3183.65 171179.3 41774.51 1116.09 165760.3 3507.03 61.27
5 Source1 306.89 10.94 224.24 439.45 53.82 16.29 44.33 18263.02 324.41 66.40 2915.43 198378.5 32408.88 1111.35 157717.9 3545.03 41.31
6 Source1 389.35 10.69 170.48 449.07 84.29 17.56 66.89 24718.21 395.48 69.44 3241.24 168063.6 44404.34 1286.99 173154.1 3834.79 69.14
Preview: example_geochemical_3s_mean.csv
ID samples mean_Ba mean_Nb mean_Zr mean_Sr mean_Rb mean_Pb mean_Zn mean_Fe mean_Mn mean_Cr mean_Ti mean_Ca mean_Al mean_P mean_Si mean_Mg mean_V sd_Ba sd_Nb sd_Zr sd_Sr sd_Rb sd_Pb sd_Zn sd_Fe sd_Mn sd_Cr sd_Ti sd_Ca sd_Al sd_P sd_Si sd_Mg sd_V n
1 Source1 296.46 10.92 197.81 422.34 74.14 15.20 57.01 21948.90 305.21 72.34 3241.96 164329.0 39170.14 1112.08 172621.3 3558.43 61.36 42.03 1.03 42.42 73.31 9.77 2.39 8.20 2626.95 61.79 11.71 263.32 20916.40 4085.08 100.21 12783.58 732.85 12.84 35
2 Source2 332.46 10.69 237.24 496.58 69.47 14.47 51.93 20835.67 296.34 52.47 3299.07 158932.7 38445.32 1079.00 181702.4 3969.15 56.39 31.07 1.00 52.29 119.39 8.75 2.02 4.12 1757.66 81.21 11.60 219.91 22196.89 3073.72 105.47 14286.61 633.26 12.45 12
3 Source3 366.61 10.42 151.55 591.25 85.51 13.43 58.54 23019.85 257.60 68.18 2976.51 171863.8 43268.12 763.28 165152.9 5146.52 77.20 49.06 0.61 46.48 174.03 18.79 2.87 11.67 3195.24 50.52 10.83 172.94 15376.03 6091.50 71.13 7128.13 1442.22 18.57 12
4 Mixture 273.83 9.40 185.66 1239.79 72.43 12.08 54.04 19534.44 256.61 65.51 2753.99 186789.7 33339.04 1005.10 142091.7 4194.71 64.94 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 1
Preview: example_isotopic_3s_raw.csv
ID samples C24 C26 C28 C30 C32 cont_C24 cont_C26 cont_C28 cont_C30 cont_C32
1 Source1 0.9790 1.3842 0.7150 1.5571 1.7612 39.28 16.24 34.04 48.8 17.27
1 Source1 1.1900 1.3853 0.6010 1.5555 1.6894 39.28 16.24 34.04 48.8 17.27
1 Source1 1.0374 1.4054 0.4485 1.5706 1.7412 39.28 16.24 34.04 48.8 17.27
1 Source1 1.0264 1.3651 0.5883 1.5622 1.7710 39.28 16.24 34.04 48.8 17.27
1 Source1 1.1166 1.4106 0.4989 1.5491 1.7353 39.28 16.24 34.04 48.8 17.27
1 Source1 1.0598 1.4475 0.5110 1.5516 1.7198 39.28 16.24 34.04 48.8 17.27
Preview: example_isotopic_3s_mean.csv
ID samples mean_C24 mean_C26 mean_C28 mean_C30 mean_C32 mean_cont_C24 mean_cont_C26 mean_cont_C28 mean_cont_C30 mean_cont_C32 sd_C24 sd_C26 sd_C28 sd_C30 sd_C32 sd_cont_C24 sd_cont_C26 sd_cont_C28 sd_cont_C30 sd_cont_C32 n
1 Source1 1.0618 1.3980 0.5871 1.5621 1.7487 39.28 16.24 34.04 48.80 17.27 0.0956 0.0240 0.0880 0.0119 0.0291 0 0 0 0 0 10
2 Source2 0.7751 1.1479 0.7092 1.1841 1.2714 30.99 34.47 24.65 17.99 11.86 0.0374 0.0362 0.0363 0.0807 0.0633 0 0 0 0 0 10
3 Source3 1.4113 1.9233 0.9569 1.1602 0.5516 12.60 42.34 37.37 29.81 20.35 0.0253 0.0969 0.0188 0.0210 0.0918 0 0 0 0 0 10
4 Mixture 1.1205 1.5043 0.6918 1.4238 1.3531 31.78 24.59 33.93 40.97 0.00 0.0000 0.0000 0.0000 0.0000 0.0000 0 0 0 0 0 1

Next steps

Once your dataset has been validated, you are ready to continue with exploratory analysis, tracer selection, and source apportionment.

For a complete worked example, see the vignette: Workflow step-by-step