% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/gl.blast.r
\name{gl.blast}
\alias{gl.blast}
\title{Aligns nucleotides sequences against those present in a target database
using blastn}
\usage{
gl.blast(
  x,
  ref_genome,
  task = "megablast",
  Percentage_identity = 70,
  Percentage_overlap = 0.8,
  bitscore = 50,
  number_of_threads = 2,
  verbose = NULL
)
}
\arguments{
\item{x}{Either a genlight object containing a column named
'TrimmedSequence' containing the sequence of the SNPs (the sequence tag)
trimmed of adapters as provided by DArT; or a path to a fasta file with the
query sequences [required].}

\item{ref_genome}{Path to a reference genome in fasta of fna format
[required].}

\item{task}{Four different tasks are supported: 1) “megablast”, for very
similar sequences (e.g, sequencing errors), 2) “dc-megablast”, typically
used for inter-species comparisons, 3) “blastn”, the traditional program
used for inter-species comparisons, 4) “blastn-short”, optimized for
sequences less than 30 nucleotides [default 'megablast'].}

\item{Percentage_identity}{Not a very sensitive or reliable measure of
sequence similarity, however it is a reasonable proxy for evolutionary
distance. The evolutionary distance associated with a 10 percent change in
Percentage_identity is much greater at longer distances. Thus, a change from
80 – 70 percent identity might reflect divergence 200 million years earlier
in time, but the change from 30 percent to 20 percent might correspond to a
billion year divergence time change [default 70].}

\item{Percentage_overlap}{Calculated as alignment length divided by the
query length or subject length (whichever is shortest of the two lengths,
i.e.  length / min(qlen,slen) ) [default 0.8].}

\item{bitscore}{A rule-of-thumb for inferring homology, a bit score of 50
is almost always significant [default 50].}

\item{number_of_threads}{Number of threads (CPUs) to use in blastn search
[default 2].}

\item{verbose}{verbose= 0, silent or fatal errors; 1, begin and end; 2,
progress log ; 3, progress and results summary; 5, full report
[default 2 or as specified using gl.set.verbosity]}
}
\value{
If the input is a genlight object: returns a genlight object with one
 hit per sequence merged to the slot $other$loc.metrics. If the input is a
 fasta file: returns a dataframe with one hit per sequence.
}
\description{
Basic Local Alignment Search Tool (BLAST; Altschul et al., 1990 &
 1997) is a sequence comparison algorithm optimized for speed used to search
 sequence databases for optimal local alignments to a query. This function
 creates fasta files, creates databases to run BLAST, runs blastn and filters
 these results to obtain the best hit per sequence.

 This function can be used to run BLAST alignment of short-read (DArTseq
 data) and long-read sequences (Illumina, PacBio... etc). You can use
 reference genomes from NCBI, genomes from your private collection, contigs,
 scaffolds or any other genetic sequence that you would like to use as
 reference.
}
\details{
\strong{Installing BLAST}

 You can download the BLAST installs from:
 \url{https://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/}

 It is important to install BLAST in a path that does not contain spaces for
 this function to work.

 \strong{Running BLAST}

 Four different tasks are supported: \itemize{ \item “megablast”, for very
 similar sequences (e.g, sequencing errors) \item “dc-megablast”, typically
 used for inter-species comparisons \item “blastn”, the traditional program
 used for inter-species comparisons \item “blastn-short”, optimized for
 sequences less than 30 nucleotides }

 If  you  are  running  a  BLAST alignment of  similar  sequences,  for
 example  Turtle  Genome  Vs Turtle Sequences, the recommended parameters
 are: task = “megablast”, Percentage_identity = 70, Percentage_overlap =  0.8
 and bitscore = 50.

 If you are running a BLAST alignment of highly dissimilar sequences because
 you are probably looking for sex linked  hits in  a distantly  related
 species,  and  you  are aligning for example sequences of Chicken Genome Vs
 Bassiana, the recommended parameters are: task = “dc-megablast”,
 Percentage_identity = 50, Percentage_overlap =  0.01 and bitscore = 30.

 Be aware that running BLAST might take a long time (i.e. days) depending of
 the size of your query, the size of your database and the number of threads
 selected for your computer.

 \strong{BLAST output}

 The BLAST output is formatted as a table using output format 6, with columns
 defined in the following order: \itemize{ \item qseqid - Query Seq-id \item
 sacc - Subject accession \item stitle - Subject Title \item qseq - Aligned
 part of query sequence \item sseq - Aligned part of subject sequence \item
 nident - Number of identical matches \item mismatch - Number of mismatches
 \item pident - Percentage of identical matches \item length - Alignment
 length \item evalue - Expect value \item bitscore - Bit score \item qstart -
 Start of alignment in query \item qend - End of alignment in query \item
 sstart - Start of alignment in subject \item send - End of alignment in
 subject \item gapopen - Number of gap openings \item gaps - Total number of
 gaps \item qlen - Query sequence length \item slen - Subject sequence length
 \item PercentageOverlap - length / min(qlen,slen) }

 Databases containing unfiltered aligned sequences, filtered aligned
 sequences and one hit per sequence are saved to the working directory
 (plot.dir tempdir if not set).


 \strong{BLAST filtering}

 BLAST output is filtered by ordering the hits of each sequence first by the
 highest percentage identity, then the highest percentage overlap and then
 the highest bitscore. Only one hit per sequence is kept based on these
 selection criteria.
}
\examples{
\dontrun{
res <- gl.blast(x = testset.gl, ref_genome = "sequence.fasta")
# display of reports saved in the temporal directory
# open the reports saved in the temporal directory
}

}
\references{
\itemize{
\item Altschul, S. F., Gish, W., Miller, W., Myers, E. W., & Lipman, D.
 J. (1990). Basic local alignment search tool. Journal of molecular biology,
 215(3), 403-410.
\item Altschul, S. F., Madden, T. L., Schäffer, A. A., Zhang, J., Zhang,
 Z., Miller, W., & Lipman, D. J. (1997). Gapped BLAST and PSI-BLAST: a new
 generation of protein database search programs. Nucleic acids research,
 25(17), 3389-3402.
\item Pearson, W. R. (2013). An introduction to sequence similarity
 (“homology”) searching. Current protocols in bioinformatics, 42(1), 3-1.
 }
}
\seealso{
\code{\link{gl.print.history}}
}
\author{
Berenice Talamantes Becerra & Luis Mijangos (Post to
 \url{https://groups.google.com/d/forum/dartr})
}
\concept{reference genomes}
