The AI Dictionary

AI terms and jargon simply explained.

I often find explanations online to be more complicated than they need to be. Here, I hope to fix that. New terms will continue to be added over time.

Click terms to view expanded definitions.

Do let me know of any corrections and improvements, and of any terms you would like added!

Accuracy

A type of metric. It is a value that tells us how often a model produces correct predictions. The higher the accuracy, the better.

Activation Function

A function that follows the linear function in a neuron, to introduce nonlinearity.

Architecture

A model that is used as a template or a starting point for another model.

Backpropagation

The name given to the algorithm that computes the gradients for the weights in a model. Note that it does not update the weights, and hence is not an optimizer

Backward Pass

The pass that begins with the outputs, and ends with the gradients. Also see forward pass.

Bagging

An ensembling technique. When bagging, each model is trained on random subset of the rows, and a random subset of the columns, with replacement.

Batch

A small collection of data from the dataset.

Cognitive Map

Simply put, a specialized embedding where each row is an object and each column is a location. Thus, each cell tells us whether a given object is at a given location.

Cross Entropy Loss

A technique for calculating the loss for categorical models with multiple categories.

Dataloader

An object that takes data from the dataset, and assembles them into batches. Note that this object does not decide what indices to load from, and hence is not a sampler.

Decision Tree

A type of model that acts like an if-else statement.

Decoder (Transformers)

A component of a transformer that is used for generating text. An example is the autocomplete feature on a smartphone’s keyboard.

Document

The name given to a piece or collection of text. It can range from anything from a single word to a sentence to a paragraph to a page of text to a full book, and so on. Also referred to as sequence.

Dot Product

The operation given to the process of taking the product of each corresponding element in a vector, and summing all products. Also known as linear combination.

A table, or matrix, where each row represents an item and each column describes the items in some way. The real magic of embeddings happen when you combine two embeddings together in some way to obtain further information.

Encoder (Transformers)

A component of a transformer that is used for “understanding” text. Encoders are typically used for classifying sentences by sentiment and figuring out what parts of a sentence refers, for example, to a person or location.

Ensemble

A collection of models whos’ predictions are averaged to obtain the final prediction.

Error Rate

A type of metric. It is a value that tells us how often a model produces incorrect predictions. The lower the error rate, the better.

Forward Pass

The pass that begins with the inputs, and ends with the outputs and loss. Also see backward pass.

Gradient

A numerical value that informs us how to adjust a parameter of the model.

Gradient Accumulation

A technique for running or fitting large models on a not-so-powerful GPU.

Gradient Boosting Machine (GBM)

An ensembling technique where instead of averaging the predictions of all models, each successive model predicts the error of the previous model. The errors are then summed to obtain the final prediction.

Inference

Using a trained model for predictions.

K-Fold Cross Validation

An ensembling technique where models are trained on a different set percent of the dataset. For example each model is trained on a different 80% of the dataset.

Latent (Diffusion)

A compressed image.

Learning Rate

A numerical value which controls how much the gradients update the parameters of a model.

Linear Combination

The operation given to the process of taking the product of each corresponding element in a vector, and summing all products. Also known as dot product.

Loss

A measure of performance of a model. It is used by the model to improve itself. Typically, the lower the loss, the better.

Matrix

A table of values. See also vector

Mean Absolute Error (MAE)

A type of metric. It is a value that tells us, on average, how close a set of predicted values is from the actual values. The smaller the MAE, the better.

Mean Columnwise Root Mean Squared Error (MCRMSE)

A metric used for those tasks where multiple targets need to be predicted. This metric simply takes the average of the RMSE of each target.

Mean Squared Error (MSE)

A type of metric. It is a value that tells us, on average, how close a set of predicted values is from the actual values. The smaller the MSE, the better.

Metric

A measure of performance of a model. It is used by humans to judge the performance of the model.

Model

A mathematical equation that mimicks a real life phenomenon. This equation can be used to predict desired quantities.

Named Entity Recognition (NER)

A NLP classification task where a sentence is broken into its components, and the model attempts to assign each component to a specific entity (e.g., person, place, organization).

Neuron

A basic processor of information. It consists of the linear combination and an activation function.

Numericalization

A process where numbers are assigned to each token. Occurs after tokenization.

One Hot Encoding

A data processing technique where each class in a categorical feature is given its own column that contains true and false values.

OneR Classifier

The simplest type of decision tree. The tree only contains a single split.

Optimizer

The name given to the algorithm that updates the weights in a model. Note that it does not compute the gradients, and hence is not part of backpropagation.

Policy (Reinforcement Learning)

The strategy the agent uses to choose actions based on its current state. The chosen actions aim to maximize the reward.

Random Forest

The name given to a bagged ensemble of decision trees.

Rectified Linear Unit (ReLU)

An activation function that clips any value less than zero, to zero.

Root Mean Squared Error (RMSE)

A type of metric. It is a value that tells us, on average, how close a set of predicted values is from the actual values. The smaller the RMSE, the better.

Root Mean Squared Logarithmic Error (RMSLE)

A type of metric. It is a value that tells us, on average, how close a set of predicted values is from the actual values. The smaller the RMSLE, the better.

Sample

A row in a dataset.

Sampler

An algorithm that decides what indices of a dataset to load. Note that this algorithm does not load data, and hence is not a dataloader.

Sequence

The name given to a piece or collection of text. It can range from anything from a single word to a sentence to a paragraph to a page of text to a full book, and so on. Also referred to as document.

Softmax

A function that calculates the probabilities of a set of predictions.

Tabular Data

Data in the form of a table.

Tabular Model

A model trained on tabular data. It is used to predict a specified column in the data.

Tokenization

Splitting a document into its component words.

Transformer

The name given to a Natural Language Processing (NLP) architecture that, in a nutshell, either fills-in-the-blanks or autocompletes text. Transformers consist of either an encoder, decoder, or both.

Vector

A table of values that has either a single row or a single column. See also matrix.

Weight Decay

A technique for making sure weights do not grow too large, and in turn overfit the data.

Zero-shot

A prefix given to a pretrained model that can be used without finetuning.