% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/pack.R
\name{pack}
\alias{pack}
\title{Pack a data.frame of tokens}
\usage{
pack(tbl, pull = "token", n = 1L, sep = "-", .collapse = " ")
}
\arguments{
\item{tbl}{A data.frame of tokens.}

\item{pull}{<\code{\link[rlang:args_data_masking]{data-masked}}>
Column to be packed into text or ngrams body. Default value is \code{token}.}

\item{n}{Integer internally passed to ngrams tokenizer function
created of \code{audubon::ngram_tokenizer()}}

\item{sep}{Character scalar internally used as the concatenator of ngrams.}

\item{.collapse}{This argument is passed to \code{stringi::stri_c()}.}
}
\value{
A tibble.
}
\description{
Packs a data.frame of tokens into a new data.frame of corpus,
which is compatible with the Text Interchange Formats.
}
\section{Text Interchange Formats (TIF)}{


The Text Interchange Formats (TIF) is a set of standards
that allows R text analysis packages to target defined inputs and outputs
for corpora, tokens, and document-term matrices.
}

\section{Valid data.frame of tokens}{


The data.frame of tokens here is a data.frame object
compatible with the TIF.

A TIF valid data.frame of tokens are expected to have one unique key column (named \code{doc_id})
of each text and several feature columns of each tokens.
The feature columns must contain at least \code{token} itself.
}

\examples{
\dontrun{
df <- tokenize(
  data.frame(
    doc_id = seq_along(ginga[5:8]),
    text = ginga[5:8]
  )
)
pack(df)
}
}
\seealso{
\url{https://github.com/ropenscilabs/tif}
}
