Mathematical Activation Functions

Mathematical Activation Functions currently known, and new ones that are becoming commonly used. This page may change on a reasonably regular basis, due to changes in the industry.

Edit | Back to Software


Mathematical Activation Functions

Activation functions play a crucial role in neural networks. They are responsible for introducing non-linearity to the model, enabling it to learn and perform complex tasks. Without activation functions, neural networks would essentially be linear models, which limits their capability in solving non-linear problems.

 

Types of Activation Functions

 

1. ArcTan

 

Mathematical Expression

$$f(x) = \arctan(x)$$

 

Characteristics

  • Output range**: (-π/2, π/2)
  • Advantages**: Smooth and non-saturating
  • Disadvantages**: Less commonly used

 

Code Example in C#

public double ArcTan(double x)
{
    return Math.Atan(x);
}

 

 

2. Bent Identity

 

Mathematical Expression

$$f(x) = \frac{\sqrt{x^2 + 1} - 1}{2} + x$$

 

Characteristics

  • Output range**: (-∞, ∞)
  • Advantages**: Smooth and non-linear
  • Disadvantages**: Less commonly used

 

Code Example in C#

public double BentIdentity(double x)
{
    return (Math.Sqrt(x * x + 1) - 1) / 2 + x;
}

 

 

3. Binary Step

 

Mathematical Expression

$$f(x) = \begin{cases} 
      0 & \text{if } x < 0 \\
      1 & \text{if } x \ge 0 
   \end{cases}
$$

 

Characteristics

  • Output range**: {0, 1}
  • Advantages**: Simple and computationally efficient
  • Disadvantages**: Non-differentiable, not suitable for most neural networks

 

Code Example in C#

public double BinaryStep(double x)
{
    return x < 0 ? 0 : 1;
}

 

 

4. ELU (Exponential Linear Unit)

 

Mathematical Expression

$$f(x) = \begin{cases} 
      x & \text{if } x \ge 0 \\
      \alpha (e^x - 1) & \text{if } x < 0 
   \end{cases}
$$

 

Characteristics

  • Output range**: (-∞, ∞)
  • Advantages**: Reduces bias shift, faster learning
  • Disadvantages**: Computationally more expensive

 

Code Example in C#

public double ELU(double x, double alpha = 1.0)
{
    return x >= 0 ? x : alpha * (Math.Exp(x) - 1);
}

 

 

5. Gaussian

 

Mathematical Expression

$$f(x) = e^{-x^2}$$

 

Characteristics

  • Output range**: (0, 1)
  • Advantages**: Smooth and differentiable
  • Disadvantages**: Rarely used in practice

 

Code Example in C#

public double Gaussian(double x)
{
    return Math.Exp(-x * x);
}

 

 

6. Identity

 

Mathematical Expression

$$f(x) = x$$

 

Characteristics

  • Output range**: (-∞, ∞)
  • Advantages**: Linear, no change to input
  • Disadvantages**: Limited use, usually in the final layer for regression problems

 

Code Example in C#

public double Identity(double x)
{
    return x;
}

 

 

7. Leaky ReLU

 

Mathematical Expression

$$f(x) = \begin{cases} 
      x & \text{if } x \ge 0 \\
      \alpha x & \text{if } x < 0 
   \end{cases}
$$

 

Characteristics

  • Output range**: (-∞, ∞)
  • Advantages**: Solves the dying ReLU problem
  • Disadvantages**: Introduces a small bias

 

Code Example in C#

public double LeakyReLU(double x, double alpha = 0.01)
{
    return x >= 0 ? x : alpha * x;
}

 

 

8. Maxout

 

Mathematical Expression

$$f(x) = \max(w_1^T x + b_1, w_2^T x + b_2)$$

 

Characteristics

  • Output range**: (-∞, ∞)
  • Advantages**: Addresses the dying ReLU problem, flexible
  • Disadvantages**: Requires more parameters

 

Code Example in C#

public double Maxout(double[] inputs, double[] weights1, double[] weights2, double bias1, double bias2)
{
    double sum1 = inputs.Zip(weights1, (input, weight) => input * weight).Sum() + bias1;
    double sum2 = inputs.Zip(weights2, (input, weight) => input * weight).Sum() + bias2;
    return Math.Max(sum1, sum2);
}

 

 

9. Mish

 

Mathematical Expression

$$f(x) = x \cdot \tanh(\text{softplus}(x)) = x \cdot \tanh(\ln(1 + e^x))$$

 

Characteristics

  • Output range**: (-∞, ∞)
  • Advantages**: Non-monotonic, smooth, improved performance over ReLU
  • Disadvantages**: Computationally expensive

 

Code Example in C#

public double Mish(double x)
{
    return x * Math.Tanh(Math.Log(1 + Math.Exp(x)));
}

 

 

10. ReLU (Rectified Linear Unit)

 

Mathematical Expression

$$f(x) = \max(0, x)$$

 

Characteristics

  • Output range**: [0, ∞)
  • Advantages**: Non-linear, computationally efficient, sparsity
  • Disadvantages**: Can cause dead neurons

 

Code Example in C#

public double ReLU(double x)
{
    return Math.Max(0, x);
}

 

 

11. SELU (Scaled Exponential Linear Unit)

 

Mathematical Expression

$$f(x) = \lambda \begin{cases} 
      x & \text{if } x \ge 0 \\
      \alpha (e^x - 1) & \text{if } x < 0 
   \end{cases}
$$

 

Characteristics

  • Output range**: (-∞, ∞)
  • Advantages**: Self-normalizing properties, stable gradients
  • Disadvantages**: Computationally more complex

 

Code Example in C#

public double SELU(double x, double alpha = 1.67326, double lambda = 1.0507)
{
    return lambda * (x >= 0 ? x : alpha * (Math.Exp(x) - 1));
}

 

 

12. Sigmoid

 

Mathematical Expression

$$\sigma(x) = \frac{1}{1 + e^{-x}}$$

 

Characteristics

  • Output range**: (0, 1)
  • Advantages**: Smooth gradient, outputs probabilities
  • Disadvantages**: Saturates and kills gradients, computationally expensive

 

Code Example in C#

public double Sigmoid(double x)
{
    return 1 / (1 + Math.Exp(-x));
}

 

 

13. Sinusoid

 

Mathematical Expression

$$f(x) = \sin(x)$$

 

Characteristics

  • Output range**: (-1, 1)
  • Advantages**: Non-linear, smooth
  • Disadvantages**: Periodic nature can be limiting

 

Code Example in C#

public double Sinusoid(double x)
{
    return Math.Sin(x);
}

 

 

14. Softmax

 

Mathematical Expression

$$\text{softmax}(x_i) = \frac{e^{x_i}}{\sum_{j} e^{x_j}}$$

 

Characteristics

  • Output range**: (0, 1) (sum to 1)
  • Advantages**: Converts logits to probabilities
  • Disadvantages**: Computationally expensive, sensitive to outliers

 

Code Example in C#

public double[] Softmax(double[] x)
{
    double max = x.Max();
    double sumExp = x.Select(val => Math.Exp(val - max)).Sum();
    return x.Select(val => Math.Exp(val - max) / sumExp).ToArray();
}

 

 

15. Softplus

 

Mathematical Expression

$$f(x) = \ln(1 + e^x)$$

 

Characteristics

  • Output range**: (0, ∞)
  • Advantages**: Smooth and non-linear
  • Disadvantages**: Computationally more expensive

 

Code Example in C#

public double Softplus(double x)
{
    return Math.Log(1 + Math.Exp(x));
}

 

 

16. Softsign

 

Mathematical Expression

$$f(x) = \frac{x}{1 + |x|}$$

 

Characteristics

  • Output range**: (-1, 1)
  • Advantages**: Smooth, non-linear
  • Disadvantages**: Can saturate, slower convergence

 

Code Example in C#

public double Softsign(double x)
{
    return x / (1 + Math.Abs(x));
}

 

 

17. SReLU (S-shaped Rectified Linear Activation Unit)

 

Mathematical Expression

$$f(x) = \begin{cases} 
      t_1 + a_1(x - t_1) & \text{if } x \le t_1 \\
      x & \text{if } t_1 < x < t_2 \\
      t_2 + a_2(x - t_2) & \text{if } x \ge t_2 
   \end{cases}
$$

 

Characteristics

  • Output range**: (-∞, ∞)
  • Advantages**: Smooth, handles both positive and negative slopes
  • Disadvantages**: Requires parameter tuning

 

Code Example in C#

public double SReLU(double x, double t1, double t2, double a1, double a2)
{
    if (x <= t1)
        return t1 + a1 * (x - t1);
    else if (x < t2)
        return x;
    else
        return t2 + a2 * (x - t2);
}

 

 

18. Sinusoid

 

Mathematical Expression

$$f(x) = \sin(x)$$

 

Characteristics

  • Output range**: (-1, 1)
  • Advantages**: Non-linear, smooth
  • Disadvantages**: Periodic nature can be limiting

 

Code Example in C#

public double Sinusoid(double x)
{
    return Math.Sin(x);
}

 

 

19. Softmax

 

Mathematical Expression

$$\text{softmax}(x_i) = \frac{e^{x_i}}{\sum_{j} e^{x_j}}$$

 

Characteristics

  • Output range**: (0, 1) (sum to 1)
  • Advantages**: Converts logits to probabilities
  • Disadvantages**: Computationally expensive, sensitive to outliers

 

Code Example in C#

public double[] Softmax(double[] x)
{
    double max = x.Max();
    double sumExp = x.Select(val => Math.Exp(val - max)).Sum();
    return x.Select(val => Math.Exp(val - max) / sumExp).ToArray();
}

 

 

20. Softplus

 

Mathematical Expression

$$f(x) = \ln(1 + e^x)$$

 

Characteristics

  • Output range**: (0, ∞)
  • Advantages**: Smooth and non-linear
  • Disadvantages**: Computationally more expensive

 

Code Example in C#

public double Softplus(double x)
{
    return Math.Log(1 + Math.Exp(x));
}

 

 

21. Softsign

 

Mathematical Expression

$$f(x) = \frac{x}{1 + |x|}$$

 

Characteristics

  • Output range**: (-1, 1)
  • Advantages**: Smooth, non-linear
  • Disadvantages**: Can saturate, slower convergence

 

Code Example in C#

public double Softsign(double x)
{
    return x / (1 + Math.Abs(x));
}

 

 

22. Swish

 

Mathematical Expression

$$f(x) = x \cdot \sigma(x) = x \cdot \frac{1}{1 + e^{-x}}$$

 

Characteristics

  • Output range**: (-∞, ∞)
  • Advantages**: Smooth and non-monotonic, improves model accuracy
  • Disadvantages**: Computationally more expensive

 

Code Example in C#

public double Swish(double x)
{
    return x / (1 + Math.Exp(-x));
}

 

 

23. Tanh (Hyperbolic Tangent)

 

Mathematical Expression

$$\tanh(x) = \frac{e^x - e^{-x}}{e^x + e^{-x}}$$

 

Characteristics

  • Output range**: (-1, 1)
  • Advantages**: Zero-centered, stronger gradients than Sigmoid
  • Disadvantages**: Saturates and kills gradients, computationally expensive

 

Code Example in C#

public double Tanh(double x)
{
    return Math.Tanh(x);
}

 

 

24. TanhShrink

 

Mathematical Expression

$$f(x) = x - \tanh(x)$$

 

Characteristics

  • Output range**: (-∞, ∞)
  • Advantages**: Non-linear, differentiable
  • Disadvantages**: Less commonly used

 

Code Example in C#

public double TanhShrink(double x)
{
    return x - Math.Tanh(x);
}

 

 

25. Hard Sigmoid

 

Mathematical Expression

$$f(x) = \max(0, \min(1, 0.2x + 0.5))$$

 

Characteristics

  • Output range**: (0, 1)
  • Advantages**: Computationally efficient
  • Disadvantages**: Approximation of sigmoid, less smooth

 

Code Example in C#

 

public double HardSigmoid(double x)
{
    return Math.Max(0, Math.Min(1, 0.2 * x + 0.5));
}

 

 

26. Hard Tanh

 

Mathematical Expression

$$f(x) = \begin{cases} 
      -1 & \text{if } x < -1 \\
      x & \text{if } -1 \le x \le 1 \\
      1 & \text{if } x > 1 
   \end{cases}
$$

 

Characteristics

  • Output range**: (-1, 1)
  • Advantages**: Simple and efficient
  • Disadvantages**: Non-smooth, can saturate

 

Code Example in C#

public double HardTanh(double x)
{
    return Math.Max(-1, Math.Min(1, x));
}

 

 

27. LogSigmoid

 

Mathematical Expression

$$f(x) = \log(\frac{1}{1 + e^{-x}})$$

 

Characteristics

  • Output range**: (-∞, 0)
  • Advantages**: Smooth, non-linear
  • Disadvantages**: Computationally expensive, similar to sigmoid

 

Code Example in C#

public double LogSigmoid(double x)
{
    return Math.Log(1 / (1 + Math.Exp(-x)));
}

 

 

28. SQNL (Square Non-linearity)

 

Mathematical Expression

$$f(x) = \begin{cases} 
      1 & \text{if } x > 2 \\
      x - \frac{x^2}{4} & \text{if } 0 \le x \le 2 \\
      x + \frac{x^2}{4} & \text{if } -2 \le x < 0 \\
      -1 & \text{if } x < -2 
   \end{cases}
$$

 

Characteristics

  • Output range**: (-1, 1)
  • Advantages**: Smooth, bounded output
  • Disadvantages**: Less commonly used

 

Code Example in C#

public double SQNL(double x)
{
    if (x > 2)
        return 1;
    else if (x >= 0)
        return x - x * x / 4;
    else if (x >= -2)
        return x + x * x / 4;
    else
        return -1;
}

 

 

29. ISRLU (Inverse Square Root Linear Unit)

 

Mathematical Expression

$$f(x) = \begin{cases} 
      x & \text{if } x \ge 0 \\
      x / \sqrt{1 + \alpha x^2} & \text{if } x < 0 
   \end{cases}
$$

 

Characteristics

  • Output range**: (-∞, ∞)
  • Advantages**: Smooth, non-linear
  • Disadvantages**: Introduces hyperparameter α

 

Code Example in C#

public double ISRLU(double x, double alpha = 1.0)
{
    return x >= 0 ? x : x / Math.Sqrt(1 + alpha * x * x);
}

 

 

30. SiLU (Sigmoid Linear Unit)

 

Mathematical Expression

$$f(x) = x \cdot \sigma(x) = x \cdot \frac{1}{1 + e^{-x}}$$

 

Characteristics


Output range**: (-∞, ∞)
Advantages**: Smooth, non-monotonic
Disadvantages**: Computationally expensive

 

Code Example in C#

public double SiLU(double x)
{
    return x / (1 + Math.Exp(-x));
}

 

 

31. CELU (Continuously Differentiable Exponential Linear Units)

 

Mathematical Expression

$$f(x) = \begin{cases} 
      x & \text{if } x \ge 0 \\
      \alpha (e^{\frac{x}{\alpha}} - 1) & \text{if } x < 0 
   \end{cases}
$$

 

Characteristics

  • Output range**: (-∞, ∞)
  • Advantages**: Smooth, improved training speed
  • Disadvantages**: Computationally more complex

 

Code Example in C#

public double CELU(double x, double alpha = 1.0)
{
    return x >= 0 ? x : alpha * (Math.Exp(x / alpha) - 1);
}

 

 

32. TanhClip

 

Mathematical Expression

$$f(x) = \begin{cases} 
      -1 & \text{if } x < -1 \\
      \tanh(x) & \text{if } -1 \le x \le 1 \\
      1 & \text{if } x > 1 
   \end{cases}
$$

 

Characteristics

  • Output range**: (-1, 1)
  • Advantages**: Smooth, non-linear
  • Disadvantages**: Saturation at extremes

 

Code Example in C#

public double TanhClip(double x)
{
    if (x < -1)
        return -1;
    else if (x > 1)
        return 1;
    else
        return Math.Tanh(x);
}

 

 

33. Parametric ReLU (PReLU)

 

Mathematical Expression

$$f(x) = \begin{cases} 
      x & \text{if } x \ge 0 \\
      \alpha x & \text{if } x < 0 
   \end{cases}
$$

 

Characteristics

  • Output range**: (-∞, ∞)
  • Advantages**: Learnable parameter α, flexibility
  • Disadvantages**: Risk of overfitting

 

Code Example in C#

public double PReLU(double x, double alpha)
{
    return x >= 0 ? x : alpha * x;
}

 

 

34. Gaussian Error Linear Unit (GELU)

 

Mathematical Expression

$$f(x) = x \cdot P(X \leq x) = x \cdot \frac{1}{2}[1 + \text{erf}(\frac{x}{\sqrt{2}})]$$

 

Characteristics

  • Output range**: (-∞, ∞)
  • Advantages**: Smooth, non-linear, better learning characteristics
  • Disadvantages**: Computationally complex

 

Code Example in C#

public double GELU(double x)
{
    return 0.5 * x * (1 + Math.Tanh(Math.Sqrt(2 / Math.PI) * (x + 0.044715 * Math.Pow(x, 3))));
}

 

 

35. Rational Activation Function (RAF)

 

Mathematical Expression

$$f(x) = \frac{x}{\sqrt{1 + \alpha x^2}}$$

 

Characteristics

  • Output range**: (-∞, ∞)
  • Advantages**: Smooth, non-linear
  • Disadvantages**: Less commonly used

 

Code Example in C#

public double RAF(double x, double alpha = 1.0)
{
    return x / Math.Sqrt(1 + alpha * x * x);
}

 

 

36. Rectified Power Unit (RePU)

 

Mathematical Expression

$$f(x) = \begin{cases} 
      x^n & \text{if } x \ge 0 \\
      0 & \text{if } x < 0 
   \end{cases}
$$

 

Characteristics

  • Output range**: [0, ∞) for \(x \ge 0\), 0 for \(x < 0\)
  • Advantages**: Provides non-linearity with a tunable parameter \(n\), useful in various neural network architectures
  • Disadvantages**: Choice of \(n\) impacts gradient flow and may require tuning

 

Code Example in C#

public double RePU(double x) 
{
    return x >= 0 ? Math.Pow(x, _power) : 0;
}

 

 

37. Log-Cosh

 

Mathematical Expression

$$f(x) = \ln(\cosh(x))$$

 

Characteristics

  • Output range**: (0, ∞)
  • Advantages**: Smooth, less likely to saturate
  • Disadvantages**: Computationally more expensive

 

Code Example in C#

public double LogCosh(double x)
{
    return Math.Log(Math.Cosh(x));
}

 

 

38. Mish

 

Mathematical Expression

$$f(x) = x \cdot \tanh(\text{softplus}(x)) = x \cdot \tanh(\ln(1 + e^x))$$

 

Characteristics

  • Output range**: (-∞, ∞)
  • Advantages**: Non-monotonic, smooth, improved performance over ReLU
  • Disadvantages**: Computationally expensive

 

Code Example in C#

public double Mish(double x)
{
    return x * Math.Tanh(Math.Log(1 + Math.Exp(x)));
}

 

 

39. Parametric Softplus

 

Mathematical Expression

$$f(x) = \alpha \ln(1 + e^{\beta x})$$

 

Characteristics

  • Output range**: (0, ∞)
  • Advantages**: Smooth, non-linear, flexible parameters
  • Disadvantages**: Requires parameter tuning

 

Code Example in C#

public double ParametricSoftplus(double x, double alpha = 1.0, double beta = 1.0)
{
    return alpha * Math.Log(1 + Math.Exp(beta * x));
}

 

 

40. ReLU6

 

Mathematical Expression

$$f(x) = \min(\max(0, x), 6)$$

 

Characteristics

  • Output range**: [0, 6]
  • Advantages**: Prevents large activations, smooth and efficient
  • Disadvantages**: Saturates for high values

 

Code Example in C#

public double ReLU6(double x)
{
    return Math.Min(Math.Max(0, x), 6);
}

 

 

41. SELU (Scaled Exponential Linear Unit)

 

Mathematical Expression

$$f(x) = \lambda \begin{cases} 
      x & \text{if } x \ge 0 \\
      \alpha (e^x - 1) & \text{if } x < 0 
   \end{cases}
$$

 

Characteristics

  • Output range**: (-∞, ∞)
  • Advantages**: Self-normalizing properties, stable gradients
  • Disadvantages**: Computationally more complex

 

Code Example in C#

public double SELU(double x, double alpha = 1.67326, double lambda = 1.0507)
{
    return lambda * (x >= 0 ? x : alpha * (Math.Exp(x) - 1));
}

 

 

42. SQ-RBF (Square Root Radial Basis Function)

 

Mathematical Expression

$$f(x) = \sqrt{1 + x^2} - 1$$

 

Characteristics

  • Output range**: (0, ∞)
  • Advantages**: Smooth, non-linear
  • Disadvantages**: Less commonly used

 

Code Example in C#

public double SQ_RBF(double x)
{
    return Math.Sqrt(1 + x * x) - 1;
}

 

 

43. Symmetric Sigmoid

 

Mathematical Expression

$$f(x) = \frac{2}{1 + e^{-x}} - 1$$

 

Characteristics

  • Output range**: (-1, 1)
  • Advantages**: Smooth, zero-centered
  • Disadvantages**: Saturates and kills gradients

 

Code Example in C#

public double SymmetricSigmoid(double x)
{
    return 2 / (1 + Math.Exp(-x)) - 1;
}

 

 

44. TanhExp

 

Mathematical Expression

$$f(x) = \tanh(e^x)$$

 

Characteristics

  • Output range**: (-1, 1)
  • Advantages**: Non-linear, smooth
  • Disadvantages**: Computationally expensive

 

Code Example in C#

public double TanhExp(double x)
{
    return Math.Tanh(Math.Exp(x));
}

 

 

45. Thresholded ReLU

 

Mathematical Expression

$$f(x) = \begin{cases} 
      x & \text{if } x > \theta \\
      0 & \text{if } x \le \theta 
   \end{cases}
$$

 

Characteristics

  • Output range**: (-∞, ∞)
  • Advantages**: Non-linear, computationally efficient
  • Disadvantages**: Introduces threshold parameter

 

Code Example in C#

public double ThresholdedReLU(double x, double theta = 1.0)
{
    return x > theta ? x : 0;
}

 

 

46. Triangular

 

Mathematical Expression

$$f(x) = \max(0, 1 - |x|)$$

 

Characteristics

  • Output range**: [0, 1]
  • Advantages**: Simple and efficient
  • Disadvantages**: Non-smooth

 

# Code Example in C#

public double Triangular(double x)
{
    return Math.Max(0, 1 - Math.Abs(x));
}

 

 

47. Bipolar Sigmoid

 

Mathematical Expression

$$f(x) = \frac{1 - e^{-x}}{1 + e^{-x}}$$

 

Characteristics

  • Output range**: (-1, 1)
  • Advantages**: Produces output that ranges between -1 and 1, which is useful for bipolar values in neural networks
  • Disadvantages**: Can cause vanishing gradient problems, computationally expensive

 

Code Example in C#

public double BipolarSigmoid(double x)
{
    return (1 - Math.Exp(-x)) / (1 + Math.Exp(-x));
}

 

 

Conclusion

Activation functions are the backbone of neural networks, providing the necessary non-linearity that allows the network to model complex data patterns. Each activation function has its strengths and weaknesses, and the choice of which to use can significantly impact the performance and convergence of the neural network. By understanding the mathematical underpinnings and characteristics of these functions, developers and researchers can make more informed decisions in their model architectures.