% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/provParse.R
\name{get.environment}
\alias{get.environment}
\alias{get.libs}
\alias{get.tool.info}
\alias{get.scripts}
\alias{get.saved.scripts}
\alias{get.proc.nodes}
\alias{get.data.nodes}
\alias{get.error.nodes}
\alias{get.func.nodes}
\alias{get.proc.proc}
\alias{get.data.proc}
\alias{get.proc.data}
\alias{get.func.proc}
\alias{get.func.lib}
\alias{get.input.files}
\alias{get.urls}
\alias{get.output.files}
\alias{get.variables.set}
\alias{get.variables.used}
\alias{get.variable.named}
\title{Provenance access functions}
\usage{
get.environment(prov)

get.libs(prov)

get.tool.info(prov)

get.scripts(prov)

get.saved.scripts(prov)

get.proc.nodes(prov)

get.data.nodes(prov)

get.error.nodes(prov)

get.func.nodes(prov)

get.proc.proc(prov)

get.data.proc(prov)

get.proc.data(prov)

get.func.proc(prov)

get.func.lib(prov)

get.input.files(prov, only.files = FALSE)

get.urls(prov)

get.output.files(prov)

get.variables.set(prov)

get.variables.used(prov)

get.variable.named(prov, var.name)
}
\arguments{
\item{prov}{a ProvInfo object created by calling \code{\link{prov.parse}}.}

\item{only.files}{If true, the output of get.input.files contains just files.  If false,
it contains both files and URLs.}

\item{var.name}{a string containing the name of a variable used in the script the
provenance is for}
}
\value{
All access functions return NULL if there is no parsed provenance.  If parsed provenance
  exists, but there is no provenance for the type of information requested, such as no input 
  files, an empty data frame is returned.

get.environment returns a data frame containing information about how the provenance was collected.
   The data frame has 2 columns:  label and value.  The labels are: 
   \itemize{
   \item {name} {- whose value will always be "environment"}
   \item {architecture}
   \item {operatingSystem}
   \item {language}
   \item {langVersion}
   \item {script} {- the absolute path to the script executed}
   \item {scriptTimeStamp} {- when the script was last modified}
   \item {workingDirectory}
   \item {provDirectory} {- where the provenance is stored}
   \item {provTimeStamp} {- when the provenance was collected}
   \item {hashAlgorithm}
   }

get.libs returns a data frame describing the libraries used by the 
  script.  It contains 3 columns:  id, name, and version.

get.tool.info returns a data frame describing the tool that 
  collected the provenance.  It contains 3 columns:  tool.name, tool.version
  and json.version.

get.scripts returns a data frame identifying all the scripts executed.  The main script
   will be first, followed by all sourced scripts.  The data frame contains 
   2 columns:  name and timestamp (when the script was last modified).

get.scripts returns a data frame identifying the location of saved copies
   of all the scripts executed.  The main script
   will be first, followed by all sourced scripts.  The data frame contains 
   2 columns:  name and timestamp (when the script was last modified).

get.proc.nodes returns a data frame identifying all the procedural nodes executed.  
   These are represented in PROV-JSON as activities and include nodes
   corresponding to lines of code, start or finish nodes that surround
   blocks of code, and nodes to represent the binding of function arguments
   to parameters.  The data frame contains 
   8 columns:  
   \itemize{
     \item{id} {- a unique id}
     \item{name} {- a description of what the node represents.  Often this is a line of code from
       the script, perhaps shortened}
     \item{type} {- one of Operation, Binding, Start, Finish, or Incomplete}
     \item{elapsedTime} {- when this executed relative to the start of the script}
     \item {scriptNum} {- a number identifing the script it comes from, with script 1 being the main
       script}
     \item {startLine} {- the line in the script this corresponds to, which may be NA, and the following
       other position infofmation}
     \item {startCol}
     \item {endLine}
     \item {endCol}
   }

get.data.nodes returns a data frame with an entry for each data node
  in the provenance.  The data frame contains the following columns:
  \itemize{
     \item {id} {- a unique id}
			\item {name} {- the descriptive name for the node, which is generally a variable name, file name, or URL}
			\item {value} {- either a text value (possible shortened) or the name of a file where the value is stored}
			\item {valType} {- a description of the value's type, including its container (such as list, vector, etc.), 
        dimensions and member types (such as character, numeric, etc.)}
			\item {type} {- the type of the node, one of Data, Snapshot, File, URL, Exception, or Device}
			\item {scope} {- a hex number identifying the scope.  This is only used for node's with type Data or Snapshot}
			\item {fromEnv} {- a logical value.  If true, it means the variable had a value before the script began execution}
			\item {hash} {- the hash value for File nodes}
			\item {timestamp} {- the time at which the node was created}
			\item {location} {- for file nodes, the absolute path to the file}
  }

get.error.nodes returns a data frame with an entry for each error node
  in the provenance.  The data frame contains the following columns:
  \itemize{
     \item {id} {- a unique id}
			\item {value} {- either a text value (possible shortened) or the name of a file where the value is stored}
			\item {timestamp} {- the time at which the node was created}
  }

get.func.nodes returns a data frame containing information about the functions
  used from other libraries within the script.  The data frame has 2 columns:  id 
  (a unique id) and name (the name of the function called).

get.proc.proc returns a data frame containing information about the edges
  that go between two procedural nodes.  These edges indicate a control-flow relationship
  between the two activities.  The data frame has 3 columns:  id 
  (a unique id), informant (the tail of the edge), and informed (the head of the edge).

get.data.proc returns a data frame containing information about the edges
  that go from data nodes to procedural nodes.  These edges indicate an input relationship
  where the data is used by the activity.  The data frame has 3 columns:  id 
  (a unique id), entity (the input data), and activity (the procedural node that uses the
  data).

get.proc.data returns a data frame containing information about the edges
  that go from procedural nodes to data nodes.  These edges indicate an output relationship
  where the data is produed by the activity.  The data frame has 3 columns:  id 
  (a unique id), entity (the output data), and activity (the procedural node that produces the
  data).

get.proc.func returns a data frame containing information about where externally-defined
  functions are used in the script.  The data frame has 3 columns:  func_id (the id of the
  function node), activity (the procedural node 
  that calls the function) and function (the function's name).

get.func.lib returns a data frame containing information about what
  libraries externally-defined
  functions come from.  The data frame has 3 columns:  func_id (the id of the
  function node), library (a library node)
  and function (the name of a function).

get.input.files returns a data frame containing a subset of the data nodes that correspond to files that are 
  read by the script.  If only.files is False, the data frame contains information about both input files and URLs.

get.urls returns a data frame containing a subset of the data nodes that correspond to urls used 
  in the script.

get.output.files returns a data frame containing a subset of the data nodes that correspond to files that are 
  written by the script.

get.variables.set returns a data frame containing a subset of the data nodes that correspond to variables
  assigned to in the script.

get.variables.used returns a data frame containing a subset of the data nodes that correspond to variables
  whose values are used in the script.

get.variable.named returns a data frame containing a subset of the data nodes that correspond to variables
  with the specified name.
}
\description{
These functions extract information from a ProvInfo object created by the prov.parse function 
and return this information as a data frame.
}
\examples{
prov <- prov.parse(system.file ("testdata", "prov.json", package="provParseR", mustWork=TRUE))
get.proc.nodes(prov)
get.input.files(prov)
get.urls(prov)
get.output.files(prov)
get.variables.set(prov)
get.variables.used(prov)
get.variable.named(prov, "z")
get.data.nodes(prov)
get.error.nodes(prov)
get.func.nodes(prov)
get.proc.proc(prov)
get.data.proc(prov)
get.proc.data(prov)
get.func.proc(prov)
get.func.lib(prov)
get.libs(prov)
get.scripts(prov)
get.environment(prov)

}
\seealso{
\code{\link{prov.parse}}
}
