Ozzie AI - Mathematical Activation Functions

Mathematical Activation Functions

Activation functions play a crucial role in neural networks. They are responsible for introducing non-linearity to the model, enabling it to learn and perform complex tasks. Without activation functions, neural networks would essentially be linear models, which limits their capability in solving non-linear problems.

Types of Activation Functions

1. ArcTan

Mathematical Expression

$$f(x) = \arctan(x)$$

Characteristics

Output range**: (-π/2, π/2)
Advantages**: Smooth and non-saturating
Disadvantages**: Less commonly used

Code Example in C#

public double ArcTan(double x)
{
    return Math.Atan(x);
}

2. Bent Identity

Mathematical Expression

$$f(x) = \frac{\sqrt{x^2 + 1} - 1}{2} + x$$

Characteristics

Output range**: (-∞, ∞)
Advantages**: Smooth and non-linear
Disadvantages**: Less commonly used

Code Example in C#

public double BentIdentity(double x)
{
    return (Math.Sqrt(x * x + 1) - 1) / 2 + x;
}

3. Binary Step

Mathematical Expression

$$f(x) = \begin{cases}
0 & \text{if } x < 0 \\
1 & \text{if } x \ge 0
\end{cases}
$$

Characteristics

Output range**: {0, 1}
Advantages**: Simple and computationally efficient
Disadvantages**: Non-differentiable, not suitable for most neural networks

Code Example in C#

public double BinaryStep(double x)
{
    return x < 0 ? 0 : 1;
}

4. ELU (Exponential Linear Unit)

Mathematical Expression

$$f(x) = \begin{cases}
x & \text{if } x \ge 0 \\
\alpha (e^x - 1) & \text{if } x < 0
\end{cases}
$$

Characteristics

Output range**: (-∞, ∞)
Advantages**: Reduces bias shift, faster learning
Disadvantages**: Computationally more expensive

Code Example in C#

public double ELU(double x, double alpha = 1.0)
{
    return x >= 0 ? x : alpha * (Math.Exp(x) - 1);
}

5. Gaussian

Mathematical Expression

$$f(x) = e^{-x^2}$$

Characteristics

Output range**: (0, 1)
Advantages**: Smooth and differentiable
Disadvantages**: Rarely used in practice

Code Example in C#

public double Gaussian(double x)
{
    return Math.Exp(-x * x);
}

6. Identity

Mathematical Expression

$$f(x) = x$$

Characteristics

Output range**: (-∞, ∞)
Advantages**: Linear, no change to input
Disadvantages**: Limited use, usually in the final layer for regression problems

Code Example in C#

public double Identity(double x)
{
    return x;
}

7. Leaky ReLU

Mathematical Expression

$$f(x) = \begin{cases}
x & \text{if } x \ge 0 \\
\alpha x & \text{if } x < 0
\end{cases}
$$

Characteristics

Output range**: (-∞, ∞)
Advantages**: Solves the dying ReLU problem
Disadvantages**: Introduces a small bias

Code Example in C#

public double LeakyReLU(double x, double alpha = 0.01)
{
    return x >= 0 ? x : alpha * x;
}

8. Maxout

Mathematical Expression

$$f(x) = \max(w_1^T x + b_1, w_2^T x + b_2)$$

Characteristics

Output range**: (-∞, ∞)
Advantages**: Addresses the dying ReLU problem, flexible
Disadvantages**: Requires more parameters

Code Example in C#

public double Maxout(double[] inputs, double[] weights1, double[] weights2, double bias1, double bias2)
{
    double sum1 = inputs.Zip(weights1, (input, weight) => input * weight).Sum() + bias1;
    double sum2 = inputs.Zip(weights2, (input, weight) => input * weight).Sum() + bias2;
    return Math.Max(sum1, sum2);
}

9. Mish

Mathematical Expression

$$f(x) = x \cdot \tanh(\text{softplus}(x)) = x \cdot \tanh(\ln(1 + e^x))$$

Characteristics

Output range**: (-∞, ∞)
Advantages**: Non-monotonic, smooth, improved performance over ReLU
Disadvantages**: Computationally expensive

Code Example in C#

public double Mish(double x)
{
    return x * Math.Tanh(Math.Log(1 + Math.Exp(x)));
}

10. ReLU (Rectified Linear Unit)

Mathematical Expression

$$f(x) = \max(0, x)$$

Characteristics

Output range**: [0, ∞)
Advantages**: Non-linear, computationally efficient, sparsity
Disadvantages**: Can cause dead neurons

Code Example in C#

public double ReLU(double x)
{
    return Math.Max(0, x);
}

11. SELU (Scaled Exponential Linear Unit)

Mathematical Expression

$$f(x) = \lambda \begin{cases}
x & \text{if } x \ge 0 \\
\alpha (e^x - 1) & \text{if } x < 0
\end{cases}
$$

Characteristics

Output range**: (-∞, ∞)
Advantages**: Self-normalizing properties, stable gradients
Disadvantages**: Computationally more complex

Code Example in C#

public double SELU(double x, double alpha = 1.67326, double lambda = 1.0507)
{
    return lambda * (x >= 0 ? x : alpha * (Math.Exp(x) - 1));
}

12. Sigmoid

Mathematical Expression

$$\sigma(x) = \frac{1}{1 + e^{-x}}$$

Characteristics

Output range**: (0, 1)
Advantages**: Smooth gradient, outputs probabilities
Disadvantages**: Saturates and kills gradients, computationally expensive

Code Example in C#

public double Sigmoid(double x)
{
    return 1 / (1 + Math.Exp(-x));
}

13. Sinusoid

Mathematical Expression

$$f(x) = \sin(x)$$

Characteristics

Output range**: (-1, 1)
Advantages**: Non-linear, smooth
Disadvantages**: Periodic nature can be limiting

Code Example in C#

public double Sinusoid(double x)
{
    return Math.Sin(x);
}

14. Softmax

Mathematical Expression

$$\text{softmax}(x_i) = \frac{e^{x_i}}{\sum_{j} e^{x_j}}$$

Characteristics

Output range**: (0, 1) (sum to 1)
Advantages**: Converts logits to probabilities
Disadvantages**: Computationally expensive, sensitive to outliers

Code Example in C#

public double[] Softmax(double[] x)
{
    double max = x.Max();
    double sumExp = x.Select(val => Math.Exp(val - max)).Sum();
    return x.Select(val => Math.Exp(val - max) / sumExp).ToArray();
}

15. Softplus

Mathematical Expression

$$f(x) = \ln(1 + e^x)$$

Characteristics

Output range**: (0, ∞)
Advantages**: Smooth and non-linear
Disadvantages**: Computationally more expensive

Code Example in C#

public double Softplus(double x)
{
    return Math.Log(1 + Math.Exp(x));
}

16. Softsign

Mathematical Expression

$$f(x) = \frac{x}{1 + |x|}$$

Characteristics

Output range**: (-1, 1)
Advantages**: Smooth, non-linear
Disadvantages**: Can saturate, slower convergence

Code Example in C#

public double Softsign(double x)
{
    return x / (1 + Math.Abs(x));
}

17. SReLU (S-shaped Rectified Linear Activation Unit)

Mathematical Expression

$$f(x) = \begin{cases}
t_1 + a_1(x - t_1) & \text{if } x \le t_1 \\
x & \text{if } t_1 < x < t_2 \\
t_2 + a_2(x - t_2) & \text{if } x \ge t_2
\end{cases}
$$

Characteristics

Output range**: (-∞, ∞)
Advantages**: Smooth, handles both positive and negative slopes
Disadvantages**: Requires parameter tuning

Code Example in C#

public double SReLU(double x, double t1, double t2, double a1, double a2)
{
    if (x <= t1)
        return t1 + a1 * (x - t1);
    else if (x < t2)
        return x;
    else
        return t2 + a2 * (x - t2);
}

18. Sinusoid

Mathematical Expression

$$f(x) = \sin(x)$$

Characteristics

Output range**: (-1, 1)
Advantages**: Non-linear, smooth
Disadvantages**: Periodic nature can be limiting

Code Example in C#

public double Sinusoid(double x)
{
    return Math.Sin(x);
}

19. Softmax

Mathematical Expression

$$\text{softmax}(x_i) = \frac{e^{x_i}}{\sum_{j} e^{x_j}}$$

Characteristics

Output range**: (0, 1) (sum to 1)
Advantages**: Converts logits to probabilities
Disadvantages**: Computationally expensive, sensitive to outliers

Code Example in C#

public double[] Softmax(double[] x)
{
    double max = x.Max();
    double sumExp = x.Select(val => Math.Exp(val - max)).Sum();
    return x.Select(val => Math.Exp(val - max) / sumExp).ToArray();
}

20. Softplus

Mathematical Expression

$$f(x) = \ln(1 + e^x)$$

Characteristics

Output range**: (0, ∞)
Advantages**: Smooth and non-linear
Disadvantages**: Computationally more expensive

Code Example in C#

public double Softplus(double x)
{
    return Math.Log(1 + Math.Exp(x));
}

21. Softsign

Mathematical Expression

$$f(x) = \frac{x}{1 + |x|}$$

Characteristics

Output range**: (-1, 1)
Advantages**: Smooth, non-linear
Disadvantages**: Can saturate, slower convergence

Code Example in C#

public double Softsign(double x)
{
    return x / (1 + Math.Abs(x));
}

22. Swish

Mathematical Expression

$$f(x) = x \cdot \sigma(x) = x \cdot \frac{1}{1 + e^{-x}}$$

Characteristics

Output range**: (-∞, ∞)
Advantages**: Smooth and non-monotonic, improves model accuracy
Disadvantages**: Computationally more expensive

Code Example in C#

public double Swish(double x)
{
    return x / (1 + Math.Exp(-x));
}

23. Tanh (Hyperbolic Tangent)

Mathematical Expression

$$\tanh(x) = \frac{e^x - e^{-x}}{e^x + e^{-x}}$$

Characteristics

Output range**: (-1, 1)
Advantages**: Zero-centered, stronger gradients than Sigmoid
Disadvantages**: Saturates and kills gradients, computationally expensive

Code Example in C#

public double Tanh(double x)
{
    return Math.Tanh(x);
}

24. TanhShrink

Mathematical Expression

$$f(x) = x - \tanh(x)$$

Characteristics

Output range**: (-∞, ∞)
Advantages**: Non-linear, differentiable
Disadvantages**: Less commonly used

Code Example in C#

public double TanhShrink(double x)
{
    return x - Math.Tanh(x);
}

25. Hard Sigmoid

Mathematical Expression

$$f(x) = \max(0, \min(1, 0.2x + 0.5))$$

Characteristics

Output range**: (0, 1)
Advantages**: Computationally efficient
Disadvantages**: Approximation of sigmoid, less smooth

Code Example in C#

public double HardSigmoid(double x)
{
    return Math.Max(0, Math.Min(1, 0.2 * x + 0.5));
}

26. Hard Tanh

Mathematical Expression

$$f(x) = \begin{cases}
-1 & \text{if } x < -1 \\
x & \text{if } -1 \le x \le 1 \\
1 & \text{if } x > 1
\end{cases}
$$

Characteristics

Output range**: (-1, 1)
Advantages**: Simple and efficient
Disadvantages**: Non-smooth, can saturate

Code Example in C#

public double HardTanh(double x)
{
    return Math.Max(-1, Math.Min(1, x));
}

27. LogSigmoid

Mathematical Expression

$$f(x) = \log(\frac{1}{1 + e^{-x}})$$

Characteristics

Output range**: (-∞, 0)
Advantages**: Smooth, non-linear
Disadvantages**: Computationally expensive, similar to sigmoid

Code Example in C#

public double LogSigmoid(double x)
{
    return Math.Log(1 / (1 + Math.Exp(-x)));
}

28. SQNL (Square Non-linearity)

Mathematical Expression

$$f(x) = \begin{cases}
1 & \text{if } x > 2 \\
x - \frac{x^2}{4} & \text{if } 0 \le x \le 2 \\
x + \frac{x^2}{4} & \text{if } -2 \le x < 0 \\
-1 & \text{if } x < -2
\end{cases}
$$

Characteristics

Output range**: (-1, 1)
Advantages**: Smooth, bounded output
Disadvantages**: Less commonly used

Code Example in C#

public double SQNL(double x)
{
    if (x > 2)
        return 1;
    else if (x >= 0)
        return x - x * x / 4;
    else if (x >= -2)
        return x + x * x / 4;
    else
        return -1;
}

29. ISRLU (Inverse Square Root Linear Unit)

Mathematical Expression

$$f(x) = \begin{cases}
x & \text{if } x \ge 0 \\
x / \sqrt{1 + \alpha x^2} & \text{if } x < 0
\end{cases}
$$

Characteristics

Output range**: (-∞, ∞)
Advantages**: Smooth, non-linear
Disadvantages**: Introduces hyperparameter α

Code Example in C#

public double ISRLU(double x, double alpha = 1.0)
{
    return x >= 0 ? x : x / Math.Sqrt(1 + alpha * x * x);
}

30. SiLU (Sigmoid Linear Unit)

Mathematical Expression

$$f(x) = x \cdot \sigma(x) = x \cdot \frac{1}{1 + e^{-x}}$$

Characteristics

Output range**: (-∞, ∞)
Advantages**: Smooth, non-monotonic
Disadvantages**: Computationally expensive

Code Example in C#

public double SiLU(double x)
{
    return x / (1 + Math.Exp(-x));
}

31. CELU (Continuously Differentiable Exponential Linear Units)

Mathematical Expression

$$f(x) = \begin{cases}
x & \text{if } x \ge 0 \\
\alpha (e^{\frac{x}{\alpha}} - 1) & \text{if } x < 0
\end{cases}
$$

Characteristics

Output range**: (-∞, ∞)
Advantages**: Smooth, improved training speed
Disadvantages**: Computationally more complex

Code Example in C#

public double CELU(double x, double alpha = 1.0)
{
    return x >= 0 ? x : alpha * (Math.Exp(x / alpha) - 1);
}

32. TanhClip

Mathematical Expression

$$f(x) = \begin{cases}
-1 & \text{if } x < -1 \\
\tanh(x) & \text{if } -1 \le x \le 1 \\
1 & \text{if } x > 1
\end{cases}
$$

Characteristics

Output range**: (-1, 1)
Advantages**: Smooth, non-linear
Disadvantages**: Saturation at extremes

Code Example in C#

public double TanhClip(double x)
{
    if (x < -1)
        return -1;
    else if (x > 1)
        return 1;
    else
        return Math.Tanh(x);
}

33. Parametric ReLU (PReLU)

Mathematical Expression

$$f(x) = \begin{cases}
x & \text{if } x \ge 0 \\
\alpha x & \text{if } x < 0
\end{cases}
$$

Characteristics

Output range**: (-∞, ∞)
Advantages**: Learnable parameter α, flexibility
Disadvantages**: Risk of overfitting

Code Example in C#

public double PReLU(double x, double alpha)
{
    return x >= 0 ? x : alpha * x;
}

34. Gaussian Error Linear Unit (GELU)

Mathematical Expression

$$f(x) = x \cdot P(X \leq x) = x \cdot \frac{1}{2}[1 + \text{erf}(\frac{x}{\sqrt{2}})]$$

Characteristics

Output range**: (-∞, ∞)
Advantages**: Smooth, non-linear, better learning characteristics
Disadvantages**: Computationally complex

Code Example in C#

public double GELU(double x)
{
    return 0.5 * x * (1 + Math.Tanh(Math.Sqrt(2 / Math.PI) * (x + 0.044715 * Math.Pow(x, 3))));
}

35. Rational Activation Function (RAF)

Mathematical Expression

$$f(x) = \frac{x}{\sqrt{1 + \alpha x^2}}$$

Characteristics

Output range**: (-∞, ∞)
Advantages**: Smooth, non-linear
Disadvantages**: Less commonly used

Code Example in C#

public double RAF(double x, double alpha = 1.0)
{
    return x / Math.Sqrt(1 + alpha * x * x);
}

36. Rectified Power Unit (RePU)

Mathematical Expression

$$f(x) = \begin{cases}
x^n & \text{if } x \ge 0 \\
0 & \text{if } x < 0
\end{cases}
$$

Characteristics

Output range**: [0, ∞) for $x \ge 0$, 0 for $x < 0$
Advantages**: Provides non-linearity with a tunable parameter $n$, useful in various neural network architectures
Disadvantages**: Choice of $n$ impacts gradient flow and may require tuning

Code Example in C#

public double RePU(double x) 
{
    return x >= 0 ? Math.Pow(x, _power) : 0;
}

37. Log-Cosh

Mathematical Expression

$$f(x) = \ln(\cosh(x))$$

Characteristics

Output range**: (0, ∞)
Advantages**: Smooth, less likely to saturate
Disadvantages**: Computationally more expensive

Code Example in C#

public double LogCosh(double x)
{
    return Math.Log(Math.Cosh(x));
}

38. Mish

Mathematical Expression

$$f(x) = x \cdot \tanh(\text{softplus}(x)) = x \cdot \tanh(\ln(1 + e^x))$$

Characteristics

Output range**: (-∞, ∞)
Advantages**: Non-monotonic, smooth, improved performance over ReLU
Disadvantages**: Computationally expensive

Code Example in C#

public double Mish(double x)
{
    return x * Math.Tanh(Math.Log(1 + Math.Exp(x)));
}

39. Parametric Softplus

Mathematical Expression

$$f(x) = \alpha \ln(1 + e^{\beta x})$$

Characteristics

Output range**: (0, ∞)
Advantages**: Smooth, non-linear, flexible parameters
Disadvantages**: Requires parameter tuning

Code Example in C#

public double ParametricSoftplus(double x, double alpha = 1.0, double beta = 1.0)
{
    return alpha * Math.Log(1 + Math.Exp(beta * x));
}

40. ReLU6

Mathematical Expression

$$f(x) = \min(\max(0, x), 6)$$

Characteristics

Output range**: [0, 6]
Advantages**: Prevents large activations, smooth and efficient
Disadvantages**: Saturates for high values

Code Example in C#

public double ReLU6(double x)
{
    return Math.Min(Math.Max(0, x), 6);
}

41. SELU (Scaled Exponential Linear Unit)

Mathematical Expression

$$f(x) = \lambda \begin{cases}
x & \text{if } x \ge 0 \\
\alpha (e^x - 1) & \text{if } x < 0
\end{cases}
$$

Characteristics

Output range**: (-∞, ∞)
Advantages**: Self-normalizing properties, stable gradients
Disadvantages**: Computationally more complex

Code Example in C#

public double SELU(double x, double alpha = 1.67326, double lambda = 1.0507)
{
    return lambda * (x >= 0 ? x : alpha * (Math.Exp(x) - 1));
}

42. SQ-RBF (Square Root Radial Basis Function)

Mathematical Expression

$$f(x) = \sqrt{1 + x^2} - 1$$

Characteristics

Output range**: (0, ∞)
Advantages**: Smooth, non-linear
Disadvantages**: Less commonly used

Code Example in C#

public double SQ_RBF(double x)
{
    return Math.Sqrt(1 + x * x) - 1;
}

43. Symmetric Sigmoid

Mathematical Expression

$$f(x) = \frac{2}{1 + e^{-x}} - 1$$

Characteristics

Output range**: (-1, 1)
Advantages**: Smooth, zero-centered
Disadvantages**: Saturates and kills gradients

Code Example in C#

public double SymmetricSigmoid(double x)
{
    return 2 / (1 + Math.Exp(-x)) - 1;
}

44. TanhExp

Mathematical Expression

$$f(x) = \tanh(e^x)$$

Characteristics

Output range**: (-1, 1)
Advantages**: Non-linear, smooth
Disadvantages**: Computationally expensive

Code Example in C#

public double TanhExp(double x)
{
    return Math.Tanh(Math.Exp(x));
}

45. Thresholded ReLU

Mathematical Expression

$$f(x) = \begin{cases}
x & \text{if } x > \theta \\
0 & \text{if } x \le \theta
\end{cases}
$$

Characteristics

Output range**: (-∞, ∞)
Advantages**: Non-linear, computationally efficient
Disadvantages**: Introduces threshold parameter

Code Example in C#

public double ThresholdedReLU(double x, double theta = 1.0)
{
    return x > theta ? x : 0;
}

46. Triangular

Mathematical Expression

$$f(x) = \max(0, 1 - |x|)$$

Characteristics

Output range**: [0, 1]
Advantages**: Simple and efficient
Disadvantages**: Non-smooth

# Code Example in C#

public double Triangular(double x)
{
    return Math.Max(0, 1 - Math.Abs(x));
}

47. Bipolar Sigmoid

Mathematical Expression

$$f(x) = \frac{1 - e^{-x}}{1 + e^{-x}}$$

Characteristics

Output range**: (-1, 1)
Advantages**: Produces output that ranges between -1 and 1, which is useful for bipolar values in neural networks
Disadvantages**: Can cause vanishing gradient problems, computationally expensive

Code Example in C#

public double BipolarSigmoid(double x)
{
    return (1 - Math.Exp(-x)) / (1 + Math.Exp(-x));
}

Conclusion

Activation functions are the backbone of neural networks, providing the necessary non-linearity that allows the network to model complex data patterns. Each activation function has its strengths and weaknesses, and the choice of which to use can significantly impact the performance and convergence of the neural network. By understanding the mathematical underpinnings and characteristics of these functions, developers and researchers can make more informed decisions in their model architectures.