gracefullight.dev

RL 001

2026년 7월 28일 · 약 21분

Eunkwang Shin

Owner

Reinforcement Learning

Learn how to make a good sequence of decisions by interacting with the environment.

Overview of Reinforcement Learning

Characteristics of RL

Trial and Error: Based learning approach.
Optimization: Find good sequences of actions or decisions.
Delayed Consequences/Rewards: Takes time to relize the actions or decisions are good or bad.
Exploration: Learn by making decisions or forming actions or through experiences. Trying new actions to discover their effects.
Exploitation: Choosing actions with the highest expected reward, based on current knowledge.
Generalization: Use previous experiences or knowledge to new or unseen situations effectively.

Framwork of RL

OpenAI Gym
Torch RL
AWS DeepRacer

RL Math

Probability

Sample Space ( $\Omega$ ): The set of all possible outcomes of a random experiment.

Bonferroni's Inequality

$P(A \cap B) \geq P(A) + P(B) - 1$ $P(\cap^{n}_{i=1} A_i) \geq \sum_{i=1}^{n} P(A_i) - (n-1)$

Gives a lower bound on the intersection probability which is useful when this probability is difficult to compute directly.
It is useful when the probabilities of individual events are sufficiently large.

Boole's Inequality

$P(\cup^{n}_{i=1} A_i) \leq \sum_{i=1}^{n} P(A_i)$

for any sets $A_1, A_2, ..., A_n$ .

It is useful when finding an upper bound for the probabilities of the union of events.

Bayes' Rule

$P(A|B) = \frac{P(A \cap B)}{P(B)}$ $P(A\cap B) = P(B|A) P(A) = P(A|B) P(B)$ $P(A|B)P(B) = P(B|A)P(A)$ $P(A|B) = \frac{P(B|A) P(A)}{P(B)}$

It allows us to compute the conditional probability $P(A|B)$ from the inverse conditional probability $P(B|A)$ .
Let $A_1, A_2, ..., A_n$ be a partition of the sample space $S$ . Then let $B$ be any subset of $S$ we have:

$P(A_i|B) = \frac{P(B|A_i) P(A_i)}{\sum_{j=1}^{\infty} P(B|A_j) P(A_j)}$

Independent Events

$P(A \cap B) = P(A) P(B)$

A family $A_i :i \in I$ of events is independent if for every finite subset $J \subseteq I$ we have: $P\left(\bigcap_{i \in J} A_i\right) = \prod_{i \in J} P(A_i)$
The pair-wise independence of events does not imply their mutual independence.

Conditional Independence

$P(A \cap B|C) = P(A|C) P(B|C)$

where $P(C) > 0$ .
or equivalently, $P(A|B \cap C) = P(A|C)$ and $P(B|A \cap C) = P(B|C)$ .

Induced Probability Function

$P_X(x_i) = P\big(\{w_j \in \Omega : X(w_j) = x_i\}\big)$

$\Omega = \{w_1, w_2, ..., w_m\}$ is the sample space.
$X$ is a random variable with range $X = \{x_1, x_2, ..., x_n\}$ .
The result set $\{w_j \in \Omega : X(w_j) = x_i\}$ is the set of all outcomes in the sample space that map to the value $x_i$ under the random variable $X$ .

Cumulative Distribution Function (CDF)

$F_X(x) = P_X(X \leq x) \quad \forall x \in \mathbb{R}$

$F_X(x)$ $F_{X} (x)$ is a cdf $\iff$ $⟺$ the following conditions hold:
- Monotonicity: $F_X(x)$ is non-decreasing.
- Limiting values: $\lim_{x \to -\infty} F_X(x) = 0$ and $\lim_{x \to \infty} F_X(x) = 1$ .
- Right-Continuity: $\lim_{y \downarrow x} F_X(y) = F_X(x) \quad \forall x \in \mathbb{R}$ .

F_X(t)

1.0 |                         ●────────────
    |                         ↑  ↑  ↑
    |                      3.001 3.01 3.1
0.7 |              ●─────────○
    |              ↑  ↑  ↑
    |           2.001 2.01 2.1
0.3 |     ●────────○
    |     ↑  ↑  ↑
    |  1.001 1.01 1.1
0.0 |─────○
    +-------------------------------------- t
          1         2         3
          x         x         x

Continuous & Discrete Random Variables

if $X$ is continuous, then $F_X(x)$ is continuous and differentiable almost everywhere. The probability density function (pdf) is defined as: $f_X(x) = \frac{d}{dx} F_X(x)$
if $X$ is discrete, then $F_X(x)$ is a step function and the probability mass function (pmf) is defined as: $p_X(x_i) = P(X = x_i)$

1 ────────────────────────━━━━━━━━
                        ╱
                      ╱
                    ╱
                  ╱
0 ━━━━━━━━━━━━━━━───────────────── x

1 ─────────────────────────●━━━━━━
                           │
0.7 ─────────────●━━━━━━━━━○
                 │
0.3 ─────●━━━━━━━○
         │
0 ━━━━━━━○──────────────────────── x
         1         2         3

1 ──────────────────────────━━━━━━
                          ╱
                        ╱
0.7 ───────────●━━━━━━━
               │
0.5 ───────────○
             ╱
           ╱
0 ━━━━━━━━──────────────────────── x
               a

Probability Mass Function (PMF)

f_X(x)= \begin{cases} (1-p)^{x-1}p, & x=1,2,3,\ldots \\ 0, & \text{otherwise} \end{cases}

A discrete random variable $X$ is given by $f_X(x) = P(X=x)$ for $x \in \mathbb{R}$ .
It represents the probability that the first success occurs exactly on the $x$ -th trial in a sequence of independent trials.

Probability Density Function (PDF)

F_X(x) = \int_{-\infty}^{x} f_X(t) dt \quad \forall x \in \mathbb{R}

A continuous random variable $X$ is given by $f_X(x)$ for $x \in \mathbb{R}$ .
The probability is calculated as the area under the probability density function over a specific interval.

Expectation

$E[X] = \sum x P(X=x) \quad \text{if } X \text{ is discrete}$

$E[X] = \int_{-\infty}^{\infty} x f_X(x) dx \quad \text{if } X \text{ is continuous}$

Linearity: $E[aX + bY + c] = aE[X] + bE[Y] + c$ for any constants $a$ , $b$ , and $c$ .
Non-negativity: If $X \geq 0$ then $E[X] \geq 0$ because it is a weighted average of non-negative values.
Monotonicity: If $X \geq Y$ then $E[X] \geq E[Y]$ because it is a weighted average of values that are greater than or equal to the corresponding values of $Y$ .
Boundedness: If $a \leq X \leq b$ then $a \leq E[X] \leq b$ because it is a weighted average of values that are bounded by $a$ and $b$ .

a ●──────────────●──────────────● b
       가능한 값들    평균
                   E[X]

Moments

$\mu'_n = E[X^n] \quad \forall n \in \mathbb{N}$

The $n^{th}$ central moment of $X$ is: $\mu_n = E[(X - E[X])^n] = E(X - \mu)^n$
1th central moment is the mean, $\mu_1 = E[X]$ .
2th central moment is the variance, $\mu_2 = E[(X - E[X])^2] = \text{Var}(X)$ .
- It emphasizes the variability of the distribution.
- $\text{Var}(aX + b) = a^2 \text{Var}(X)$ for any constants $a$ and $b$ .
3th central moment is the skewness, $\mu_3 = E[(X - E[X])^3]$ .
- It emphasizes the skewness of the distribution.
4th central moment is the kurtosis, $\mu_4 = E[(X - E[X])^4]$ .
- It emphasizes the tail behavior and extreme values of the distribution.

Covariance

$\text{cov(X, Y)} = E[(X - E[X])(Y - E[Y])]$

It measures how muc htwo random variables change together.
Negative Covariance
Near Zero Covariance
Positive Covariance

Y: 시험 점수

높음  |                ●
     |          ●  ●
     |       ●
     |    ●
낮음  | ●
     +-------------------- X: 공부 시간
       적음             많음

       Cov(X,Y) > 0

Correlation

$p(X, Y) = \frac{\text{cov(X, Y)}}{\sqrt{\text{Var}(X)\text{Var}(Y)}}$

Individual variances must be non-zero.
$p(X, Y)$ lies in the range $[-1, 1]$ .

Joint Distributions

$f_{X,Y}:\mathbb{R}^2\to[0,1], \\ f_{X,Y}(x,y) = P(X=x, Y=y) \quad \text{if } X, Y \text{ are discrete}$

Joint Probability Mass Function (PMF) for discrete random variables $X$ and $Y$ .

Y
1 ┤  ● 1/4       ● 1/4
  │ (0,1)       (1,1)
0 ┤  ● 1/4       ● 1/4
  │ (0,0)       (1,0)
  └──────────────────── X
      0           1

Marginal Distributions

$f_X(x) = \sum_{y} f_{X,Y}(x,y) \\ f_Y(y) = \sum_{x} f_{X,Y}(x,y)$

Fixing one variable and summing over the other variable gives the marginal distribution of the fixed variable.

             Y=0       Y=1       행의 합
X=0          0.10      0.20        0.30
X=1          0.30      0.40        0.70
             ─────────────────────────
열의 합       0.40      0.60        1.00

$f_X(0) = 0.10 + 0.20 = 0.30$ : sum of the first row.
$f_X(1) = 0.30 + 0.40 = 0.70$ : sum of the second row.
$f_Y(0) = 0.10 + 0.30 = 0.40$ : sum of the first column.
$f_Y(1) = 0.20 + 0.40 = 0.60$ : sum of the second column.
Marginalization: It sums or integrates over all possible values of an unwanted variable to obtain the distribution of the variable of interest.

Joint and Marginal Distributions

Conditional Distributions

$f_{X|Y}(x|y) = P(X = x | Y = y) = \frac{f_{X,Y}(x,y)}{f_Y(y)} \quad \text{if } f_Y(y) > 0$

It represents the probability distribution of $X$ given that $Y$ has a specific value.
If $f_Y(y) = 0$ , then $f_{X|Y}(x|y)$ is undefined.

Conditional Distribution

Bernoulli Distribution

$X = \begin{cases} 1, & \text{with probability } p \\ 0, & \text{with probability } 1-p \end{cases}$

$E[X] = p$
$\text{Var}(X) = p(1-p)$
The outcome of a Bernoulli trial is either a success (1) or a failure (0).

Binomial Distribution

probability of success in a single trial: $p$
probability of failure in a single trial: $q = 1 - p$

$P(X = x|n, p) = \binom{n}{x} p^x q^{n-x} \\ \text{where} \binom{n}{x} = \frac{n!}{x!(n-x)!}, \quad 0 \leq x \leq n$

$E[X] = np$
$\text{Var}(X) = npq = np(1-p)$
The binomial distribution describes the number of successes in a fixed number of independent Bernoulli trials.
$X \sim \text{Binomial}(10, 0.7)$ $X \sim Binomial (10, 0.7)$
- $P(X = 7) = \binom{10}{7} (0.7)^7 (0.3)^3 \approx 0.2668$

Geometric Distribution

probability of success in a single trial: $p$
probability of failure in a single trial: $q = 1 - p$

$P(X = x|p) = (q^{x-1}) p \\ \text{where } x = 1, 2, 3, ...$

$E[X] = \frac{1}{p}$
$\text{Var}(X) = \frac{q}{p^2}$
The geometric distribution describes the number of trials before the first success in a sequence of independent Bernoulli trials.
$X \sim \text{Geometric}(0.7)$ $X \sim Geometric (0.7)$
- $P(X = 3) = (0.3)^2 (0.7) \approx 0.063$

Uniform Distribution

$f_X(x|a,b) = \begin{cases} \frac{1}{b-a}, & a \leq x \leq b \\ 0, & \text{otherwise} \end{cases}$

$E[X] = \frac{a+b}{2}$
$\text{Var}(X) = \frac{(b-a)^2}{12}$
The uniform distribution describes a continuous random variable that has an equal probability of taking any value within a specified range.
The probability depends on the length of the interval, not its location.

Normal Distribution

$f_X(x) = \frac{1}{\sqrt{2\pi\sigma^2}} e^{-\frac{(x-\mu)^2}{2\sigma^2}}, \quad -\infty < x < \infty$

$X \sim N(\mu, \sigma^2)$

Central Limit Theorem: the distribution of the sum (or average) of a large number of independent, identically distributed variables will be approximately normal, regardless of the underlying distribution.

$\bar{X} = \frac{1}{n} \sum_{i=1}^{n} X_i \approx N\left(\mu, \frac{\sigma^2}{n}\right)$

Multivariate Normal Distribution

$N(x|\mu, \Sigma) = \frac{1}{\sqrt{(2\pi)^D |\Sigma|}} e^{-\frac{1}{2} (x-\mu)^T \Sigma^{-1} (x-\mu)}$

$\mu$ is the $D$ -dimensional mean vector.
$\Sigma$ is the $D \times D$ covariance matrix.
$|\Sigma|$ is the determinant of the covariance matrix.

Beta Distribution

$f(x|\alpha, \beta) = \frac{\Gamma(\alpha + \beta)}{\Gamma(\alpha)\Gamma(\beta)} x^{\alpha - 1} (1 - x)^{\beta - 1}, \quad 0 < x < 1$

$\Gamma(n) = (n-1)!$ is the gamma function.
$E[X] = \frac{\alpha}{\alpha + \beta}$
$\text{Var}(X) = \frac{\alpha \beta}{(\alpha + \beta)^2 (\alpha + \beta + 1)}$

α < β                  α = β                  α > β

높이                   높이                   높이
│╲                     │    ╭──╮             │        ╱
│ ╲___                 │  ╭─╯  ╰─╮           │    ___╱
└──────── x            └────────── x         └──────── x
0        1             0          1           0        1

0 쪽 강조              가운데 강조             1 쪽 강조

Beta(8,2)                 Beta(80,20)

       ╭────╮                    /\
     ╭─╯    ╰─╮                 /  \
─────╯        ╰───        ─────/────\─────
         0.8                      0.8

불확실성 큼                 불확실성 작음

$p | \text{data} \sim \text{Beta}(\alpha + s, \beta + f)$ $p ∣ data \sim Beta (α + s, β + f)$
- where new observations are $s$ successes and $f$ failures.
The Beta distribution is a distribution of success probabilities for a Bernoulli or Binomial distribution.
Application
- Measuring uncertainty in the probability of success for a Bernoulli or Binomial distribution.
- Conversion rate or click-through rate (CTR) in online advertising.
- Defect rate in manufacturing.

Linear Algebra

Axioms of Vector Spaces

Commutative Law: $u + v = v + u, \quad \forall u, v \in V$
Associative Law: $(u + v) + w = u + (v + w), \quad \forall u, v, w \in V$
Additive Identity: $\exists 0 \in V$ such that $v + 0 = v, \quad \forall v \in V$
Additive Inverse: $\forall v \in V, \exists -v \in V$ such that $v + (-v) = 0$
Distributive Law:
- $a(u + v) = au + av, \quad \forall a \in F, u, v \in V$
- $(a + b)v = av + bv, \quad \forall a, b \in F, v \in V$
Associative Law: $(ab)v = a(bv), \quad \forall a, b \in F, v \in V$
Unitary Law: $1v = v, \quad \forall v \in V$

Subspace

$x + \alpha y \in W, \quad \forall x, y \in W, \forall \alpha \in \mathbb{R}$

Norm

$f: \mathbb{R}^n \to \mathbb{R}$

It outputs a real number that represents the length or size of a vector in a vector space.

$x=(x_1,x_2,\ldots,x_n) \mapsto f(x)=\lVert x\rVert$

Non-negativity: $\forall x \in \mathbb{R}^n, \quad \lVert x\rVert \geq 0$
Definiteness: $f(x) = 0 \iff x = 0$
Homogeneity: $\forall x \in \mathbb{R}^n, \quad \forall t \in \mathbb{R}, f(tx) = |t| f(x)$
Triangle inequality: $\forall x,y \in \mathbb{R}^n, \quad f(x+y) \leq f(x) + f(y)$ $\forall x, y \in R^{n}, f (x + y) \leq f (x) + f (y)$
- $\lVert x + y \rVert \leq \lVert x \rVert + \lVert y \rVert$
A norm is non-negative, only the zero vector has a norm of zero, scaling a vector by two doubles its norm, and the direct distance cannot be greater than the detoured distance.

$\lVert x \rVert_p = (\sum_{i=1}^{n} |x_i|^p)^{\frac{1}{p}}, \quad p \geq 1$

L1 norm: $\lVert x \rVert_1 = \sum_{i=1}^{n} |x_i|$ $∥ x ∥_{1} = \sum_{i = 1}^{n} ∣ x_{i} ∣$
- Manhattan distance.
L2 norm: $\lVert x \rVert_2 = \sqrt{\sum_{i=1}^{n} |x_i|^2}$ $∥ x ∥_{2} = \sum_{i = 1}^{n} ∣ x_{i} ∣^{2}$
- Euclidean distance.
L-infinity norm: $\lVert x \rVert_\infty = \max_i|x_i|$ $∥ x ∥_{\infty} = max_{i} ∣ x_{i} ∣$
- $\lVert (3, 4) \rVert_\infty = \max\{3, 4\} = 4$

$\lVert A \rVert_F = \sqrt{\sum_{i=1}^{m} \sum_{j=1}^{n} |a_{ij}|^2} = \sqrt{\text{tr}(A^T A)}$

Frobenius norm: $A = \begin{bmatrix} 1 & 2 \\ 3 & 4 \end{bmatrix}$ $A = [1324]$
- $\lVert A \rVert_F = \sqrt{1^2 + 2^2 + 3^2 + 4^2} = \sqrt{30}$
- It is the square root of the sum of the absolute squares of its elements.

Span

$span\{x_1, x_2, ..., x_n\} = \{ \alpha_1 x_1 + \alpha_2 x_2 + ... + \alpha_n x_n : \alpha_n \in \mathbb{R}\}$

A set of vertors $X = \{x_1, x_2, ..., x_n\}$ in a vector space $V$ is said to span $V$ if every vector in $V$ can be expressed as a linear combination of the vectors in $X$ .
Span is the entire region that can be reached by scaling and adding the given vectors.

Range

$R(A) = \{\alpha_1 a_1 + \alpha_2 a_2 + ... + \alpha_n a_n : \alpha_n \in \mathbb{R}\}$

The range of a matrix $A$ is the set of all possible linear combinations of its column vectors.
Same as the columnspace of the matrix $A$ .

Nullspace

$N(A) = \{x \in \mathbb{R}^n : Ax = 0\}$

A set of all vectors that equal 0 when multiplied by the matrix $A$ .
The dimensionality of the nullspace is called the nullity of the matrix $A$ .

Linear Independence

A set of vectors $\{v_1, v_2, ..., v_n\}$ is said to be linearly independent if the only solution to the equation $\alpha_1 v_1 + \alpha_2 v_2 + ... + \alpha_n v_n = 0$ is $\alpha_1 = \alpha_2 = ... = \alpha_n = 0$ .
Column rank: The maximum number of linearly independent column vectors in a matrix.
Row rank: The maximum number of linearly independent row vectors in a matrix.
If one row vector is a combination of other row vectors, then it is linearly dependent.
The number of rows that are removed due to redundancy is the Rank of the matrix.
Due to the redundancy, the number of directions which is vanishing is the Nullity of the matrix.

Rank

$Rank(A) = \text{dim}(R(A)), \quad A \in \mathbb{R}^{m \times n}$

$rank(A) \leq \min(m, n), \quad A \in \mathbb{R}^{m \times n}$ $r ank (A) \leq min (m, n), A \in R^{m \times n}$
- if $rank(A) = min(m, n)$ , then $A$ is said to be full rank.
- Rank cannot exceed the number of rows or columns in the matrix.
$rank(A) = rank(A^T), \quad A \in \mathbb{R}^{m \times n}$ $r ank (A) = r ank (A^{T}), A \in R^{m \times n}$
- The rank of a matrix is equal to the rank of its transpose.
- The amount of independent information in the rows is equal to the amount of independent information in the columns.
$rank(AB) \leq \min(rank(A), rank(B)), \quad A \in \mathbb{R}^{m \times n}, B \in \mathbb{R}^{n \times p}$ $r ank (A B) \leq min (r ank (A), r ank (B)), A \in R^{m \times n}, B \in R^{n \times p}$
- The information content of the product of two matrices cannot exceed the information content of either matrix.
$rank(A + B) \leq rank(A) + rank(B), \quad A, B \in \mathbb{R}^{m \times n}$ $r ank (A + B) \leq r ank (A) + r ank (B), A, B \in R^{m \times n}$
- The information content of the sum of two matrices cannot exceed the sum of the information content of each matrix.

Orthogonal Matrices

$U \in \mathbb{R}^{n \times n} \text{ is orthogonal } \iff v_i^T v_j = \begin{cases} 1 & \text{if } i = j \\ 0 & \text{if } i \neq j \end{cases}$

$UU^T = U^TU = I_n$
$\lVert Ux \rVert_2 = \lVert x \rVert_2, \quad \forall x \in \mathbb{R}^n$
Orthonormal: A set of vectors is orthonormal if they are all unit vectors and orthogonal to each other.
Rotation: $R(\theta) = \begin{bmatrix} \cos\theta & -\sin\theta \\ \sin\theta & \cos\theta \end{bmatrix}$
Reflection: $R = \begin{bmatrix} 1 & 0 \\ 0 & -1 \end{bmatrix}$

Quadratic Form

$Q(x) = x^T A x, \quad A \in \mathbb{R}^{n \times n}$

The quadratic form $x^TAx$ $x^{T} A x$ gives a scalar value that measures the cost, energy, or weighted magnitude of the vextor $x$ , according to the quadratic surface defined by the matrix $A$ $A$ .
- Positive definite: $Q(x) > 0, \quad \forall x \neq 0$
- Positive semi-definite: $Q(x) \geq 0, \quad \forall x \neq 0$
- Negative definite: $Q(x) < 0, \quad \forall x \neq 0$
- Negative semi-definite: $Q(x) \leq 0, \quad \forall x \neq 0$
- Indefinite: $Q(x)$ can be positive or negative for different $x \neq 0$

Matrix definiteness

행렬의 정부호성(양의 정부호, 양의 준정부호, 음의 정부호, 음의 준정부호, 부정정부호)

$A^TA = A^TA \implies Q(x) = x^T A^T A x = (Ax)^T (Ax) = \lVert Ax \rVert^2 \geq 0$

Always positive semi-definite because it is the square of the norm of the vector $Ax$ .

Eigenvalues & Eigenvectors

$\lambda \in \mathbb{R} \text{ is an eigenvalue of } A \in \mathbb{R}^{n \times n} \iff A\vec{x} = \lambda \vec{x} , \quad \vec{x} \in \mathbb{R}^n, \vec{x} \neq 0$

An eigenvector is a nonzero vector that remains on the same line when transformed by a matrix.
Its eigenvalue tells us how much the vector is scaled and whether its direction is reversed.

$\text{det}(A - \lambda I) = 0$

$A = \begin{bmatrix} 4 & 1 \\ 2 & 3 \end{bmatrix}$
$A - \lambda I = \begin{bmatrix} 4 - \lambda & 1 \\ 2 & 3 - \lambda \end{bmatrix}$
$\text{det}(A - \lambda I) = (4 - \lambda)(3 - \lambda) - 2 = \lambda^2 - 7\lambda + 10 = (\lambda - 5)(\lambda - 2) = 0$
$\lambda_1 = 5, \quad \lambda_2 = 2$

$\text{tr}(A) = \sum_{i=1}^{n} \lambda_i$

The trace is the sum of the scaling factors along all eigenvector directions.

$\text{det}(A) = \prod_{i=1}^{n} \lambda_i$

The determinant is the product of the eigenvalues, representing the overall signed scaling factor for area or volume.

$rank(A) = \text{number of non-zero eigenvalues of } A$

For a diagonalizable matrix, the rank equals the number of nonzero eigenvalues, counted with multiplicity.

$\lambda_i (A^{-1}) = \frac{1}{\lambda_i(A)}$

The eigenvalues of the inverse of a matrix are the reciprocals of the eigenvalues of the original matrix.

$A \vec{x} = \lambda \vec{x} \\ \implies A^{-1} A \vec{x} = A^{-1} \lambda \vec{x} \\ \implies \vec{x} = \lambda A^{-1} \vec{x} \\ \implies A^{-1} \vec{x} = \frac{1}{\lambda} \vec{x}$

Diagonalization

$S = \begin{bmatrix} \vdots & \vdots & \cdots & \vdots \\ \vec{v_1} & \vec{v_2} & \cdots & \vec{v_n} \\ \vdots & \vdots & \cdots & \vdots \end{bmatrix}$ $AS = \begin{bmatrix} \vdots & \vdots & \cdots & \vdots \\ \lambda_1\vec{v_1} & \lambda_2\vec{v_2} & \cdots & \lambda_n\vec{v_n} \\ \vdots & \vdots & \cdots & \vdots \end{bmatrix} \\ = \begin{bmatrix} \vdots & \vdots & \cdots & \vdots \\ \vec{v_1} & \vec{v_2} & \cdots & \vec{v_n} \\ \vdots & \vdots & \cdots & \vdots \end{bmatrix} \begin{bmatrix} \lambda_1 & 0 & \cdots & 0 \\ 0 & \lambda_2 & \cdots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \cdots & \lambda_n \end{bmatrix} \\ = S\Lambda$

$S^{-1}AS = \Lambda$ $S^{- 1} A S = Λ$
- $AS = S\Lambda$
- $S^{-1}AS = S^{-1}S\Lambda = \mathbb{I}\Lambda = \Lambda$
- A matrix $A$ that appears complicated in the original coordinate system becomes the diagonal matrix $\Lambda$ when expressed in the eigenvector coordinate system.
$A = S\Lambda S^{-1}$ $A = S Λ S^{- 1}$
- In the eigenvector basis, the matrix $A$ becomes the diagonal matrix $\Lambda$ .

Symmetric Matrices

A real symmetric matrix satisfies $A=A^T$ .
All eigenvalues of a real symmetric matrix are real.
A real symmetric matrix has a complete set of orthonormal eigenvectors.
Therefore, the eigenvector matrix $S$ is orthogonal.

S^TS=\mathbb{I} \quad\Longrightarrow\quad S^{-1}=S^T

The diagonalization of a symmetric matrix is

A=S\Lambda S^T

$S$ contains the orthonormal eigenvectors of $A$ .
$\Lambda$ contains the corresponding eigenvalues.
$S^T$ converts $x$ into the eigenvector basis.
$\Lambda$ scales each eigenvector direction by its corresponding eigenvalue.
$S$ converts the result back to the original basis.

Definiteness of Symmetric Matrices

Using $A=S\Lambda S^T$ and $y=S^Tx$ ,

x^TAx = x^TS\Lambda S^Tx = y^T\Lambda y = \sum_{i=1}^{n}\lambda_i y_i^2

$y_i$ is the component of $x$ along the $i$ -th eigenvector.
Since $y_i^2\geq0$ , the sign of $x^TAx$ depends entirely on the signs of the eigenvalues.
If all eigenvalues are positive, $A$ is positive definite.

\lambda_i>0\text{ for all }i \quad\Longrightarrow\quad x^TAx>0\text{ for all }x\neq0

If all eigenvalues are non-negative, $A$ is positive semidefinite (PSD).

\lambda_i\geq0\text{ for all }i \quad\Longrightarrow\quad x^TAx\geq0

If all eigenvalues are negative, $A$ is negative definite.

\lambda_i<0\text{ for all }i \quad\Longrightarrow\quad x^TAx<0\text{ for all }x\neq0

If all eigenvalues are non-positive, $A$ is negative semidefinite (NSD).

\lambda_i\leq0\text{ for all }i \quad\Longrightarrow\quad x^TAx\leq0

If $A$ has both positive and negative eigenvalues, $A$ is indefinite.
The eigenvectors determine the principal directions of the quadratic surface.
The eigenvalues determine the curvature and sign along those directions.
The definiteness of a symmetric matrix is determined entirely by the signs of its eigenvalues.

Eigenvalues of a Positive Semidefinite Matrix

If $A$ is positive semidefinite, then

x^TAx\geq0

For an eigenvector $\vec{x}\neq\vec{0}$ with eigenvalue $\lambda$ ,

A\vec{x}=\lambda\vec{x}

\vec{x}^{\,T}A\vec{x} = \vec{x}^{\,T}(\lambda\vec{x}) = \lambda\vec{x}^{\,T}\vec{x} = \lambda\lVert\vec{x}\rVert^2 \geq0

Since $\vec{x}\neq\vec{0}$ ,

\lVert\vec{x}\rVert^2>0

Therefore,

\lambda\geq0

Hence, all eigenvalues of a positive semidefinite matrix are non-negative.
A zero eigenvalue means that the corresponding eigenvector direction is collapsed to the zero vector.

Singular Value Decomposition

Diagonalization is generally defined for square matrices, while singular value decomposition can be applied to any rectangular or square matrix.

A=U\Sigma V^T

For $A\in\mathbb{R}^{m\times n}$ ,

U\in\mathbb{R}^{m\times m}, \qquad \Sigma\in\mathbb{R}^{m\times n}, \qquad V\in\mathbb{R}^{n\times n}

The columns of $V$ are the right singular vectors and represent orthonormal directions in the input space.
The columns of $U$ are the left singular vectors and represent orthonormal directions in the output space.
The diagonal entries of $\Sigma$ are the singular values.

\sigma_1\geq\sigma_2\geq\cdots\geq0

For each singular-vector pair,

A\vec{v_i}=\sigma_i\vec{u_i}

$\vec{v_i}$ is an input direction.
$\sigma_i$ is the scaling factor.
$\vec{u_i}$ is the corresponding output direction.
The transformation $A$ can be interpreted as three steps:
- $V^T$ expresses the input in the right singular-vector basis.
- $\Sigma$ scales each direction by its singular value.
- $U$ maps the scaled result into the output space.
$U$ and $V$ are orthogonal matrices.

U^TU=\mathbb I, \qquad V^TV=\mathbb I

The singular vectors and singular values are related to the eigenvectors and eigenvalues of $A^TA$ and $AA^T$ .

A^TA\vec{v_i}=\sigma_i^2\vec{v_i}

AA^T\vec{u_i}=\sigma_i^2\vec{u_i}

The rank of $A$ equals the number of nonzero singular values.
SVD is commonly used for dimensionality reduction, data compression, noise reduction, pseudoinverses, and latent-factor analysis.

SVD

SVD Matrix Flow

대수의 법칙과 중심극한정리

2026년 5월 25일 · 약 5분

Eunkwang Shin

Owner

대수의 법칙과 중심극한정리의 개요

대수의 법칙과 중심극한정리의 개념

대수의 법칙 (LLN): 표본의 크기 $n$ 이 커짐에 따라 표본평균이 모평균(기댓값)에 확률수렴 또는 거의 확실하게 수렴함을 밝히는 기초 정리.
중심극한정리 (CLT): 모집단의 원래 분포 형태와 무관하게 표본의 크기 $n$ 이 충분히 크면 표본평균들의 집합 분포가 정규분포에 근사(분포수렴)함을 규명하는 정리.

대수의 법칙과 중심극한정리의 시사점

표본추출 통계의 타당성 확보: 모집단 전수조사가 불가능한 상황에서 부분 표본(Sampling) 분석만으로 전체 집단의 참값(모수)을 정밀하게 추론할 수 있는 과학적 기초를 제공함.
AI 및 기계학습 모델의 설계 근거: 미니배치 경사하강법(SGD)의 그래디언트 추정 안정성 확보, 생성형 모델(Generative Models)의 노이즈 가우시안 수렴성 설계, 몬테카를로 시뮬레이션의 차원 최적화 등에 핵심 구동 메커니즘으로 작용함.

대수의 법칙과 중심극한정리의 구조 및 메커니즘

대수의 법칙과 중심극한정리의 개념 구조도

대수의 법칙과 중심극한정리의 핵심 수식

1. 약한 대수의 법칙 (WLLN: Weak Law of Large Numbers)

표본평균이 모평균에 확률 수렴함을 나타내며, 임의의 양수 $\epsilon$ 에 대해 표본 크기 $n$ 이 무한대로 갈 때 오차가 발생할 확률이 0이 됨을 증명함. $\lim_{n \to \infty} P(|\bar{X}_n - \mu| < \epsilon) = 1$

2. 강한 대수의 법칙 (SLLN: Strong Law of Large Numbers)

표본평균이 모평균에 거의 확실하게(Almost Surely) 수렴함을 의미하며, 확률적으로 1의 확신을 가지고 완벽하게 일치하게 됨을 뜻함. $P\left( \lim_{n \to \infty} \bar{X}_n = \mu \right) = 1$

3. 린데베르그-레비 중심극한정리 (Lindeberg-Lévy CLT)

독립항등분포(i.i.d.) 가정을 따르는 확률변수의 표준화된 표본평균이 표본 크기 $n$ 이 무한해짐에 따라 표준정규분포로 분포 수렴함을 규명함. $\frac{\sqrt{n}(\bar{X}_n - \mu)}{\sigma} \xrightarrow{d} N(0, 1)$

4. 표본평균의 정규 근사성

모집단의 평균이 $\mu$ , 분산이 $\sigma^2$ 일 때, 표본 크기 $n$ 이 충분히 크면( $n \ge 30$ ) 표본평균의 분포는 정규분포에 근사함. $\bar{X}_n \sim N\left(\mu, \frac{\sigma^2}{n}\right)$

대수의 법칙과 중심극한정리의 핵심요소

구분	핵심요소	설명	비고
대수의 법칙 (LLN)	확률 수렴 (Convergence in Probability)	표본 수가 많아질수록 표본평균이 모평균에 가까워질 확률이 1에 가깝게 누적됨	약한 대수의 법칙 (WLLN)
	거의 확실한 수렴 (Almost Sure Convergence)	표본평균 수열이 모평균 지점으로 수렴하는 사건의 확률 자체가 1이 됨	강한 대수의 법칙 (SLLN)
중심극한정리 (CLT)	분포 수렴 (Convergence in Distribution)	표본 평균을 표준화한 통계량의 누적분포함수가 표준정규분포에 근사적으로 일치함	린데베르그-레비 정리
	표본평균의 정규 근사	모집단의 고유 분포와 무관하게 표본평균 분포가 정규성을 획득하는 성질	Z-검정 및 T-검정 근거

대수의 법칙과 중심극한정리의 비교 및 적용 방안

대수의 법칙(LLN)과 중심극한정리(CLT)의 상세 비교

구분	대수의 법칙 (LLN)	중심극한정리 (CLT)
관점 및 지향점	표본평균이 어디로(수렴값) 수렴하는가?	표본평균이 어떤 형태(분포)를 이루는가?
수학적 수렴 종류	확률 수렴 ( $\xrightarrow{P}$ ) 및 거의 확실한 수렴 ( $\xrightarrow{a.s.}$ )	분포 수렴 ( $\xrightarrow{d}$ )
전제 및 조건	모평균 $\mu$ 가 유한하게 존재함 (기댓값 존재)	모평균 $\mu$ 및 모분산 $\sigma^2$ 이 유한하게 존재함
수학적 결과물	하나의 상수값(모평균 $\mu$ )으로 수렴	특정 확률분포(정규분포 $N(\mu, \sigma^2/n)$ )로 근사
주요 활용 분야	몬테카를로 적분, 매개변수 추정의 일치성(Consistency) 증명	표본 검정(Z-test, T-test), 신뢰구간 추정, 오차 한계 산정

기술적 시사점: 대수의 법칙은 빅데이터 분석에서 표본 크기가 커질수록 표본 통계량이 모수에 완벽하게 정렬되는 **'대표성'**을 보장하며, 중심극한정리는 모집단의 미지 상태에서도 정규분포를 가정하여 실무적 **'가설검정 및 추론'**을 가능케 하는 상호 유기적 관계임.

실무 적용 및 비즈니스 활성화 방안

구분	내용	비고
공공 분야	대규모 국가 센서스 및 보건 정책 표본 조사 설계 시, 최적의 통계적 정밀도 충족을 위한 표본 크기( $n$ ) 결정에 수식 활용	국가통계 신뢰성 확보
금융 분야	몬테카를로 시뮬레이션 기반 위험가치(VaR) 산정 및 주가 지수 움직임 모의 실험 시 자산 분포 안정성 확보	리스크 제어 및 관리
민간/AI 분야	딥러닝 모델 미니배치(Mini-batch) 구성 시 배치 크기가 충분하면 매 스텝의 그래디언트 오차가 정규분포를 따르므로 가파른 수렴 유도	MLOps 파이프라인 최적화

대수의 법칙과 중심극한정리 도입 시 실무적 고려사항

단계별 장애 요인 및 극복 방안

구분	문제점	해결방안
i.i.d. 가정의 위배	시계열 데이터(주가, 서버 트래픽 등)나 공간 데이터는 데이터 간 종속성(Dependency)과 이질성(Heterogeneity)이 존재함	체계적 샘플링(Systematic Sampling) 기법을 고도화하고, 시계열 데이터는 차분(Differencing) 및 변환을 통해 정상성(Stationarity)을 확보한 후 적용
헤비 테일(Heavy-tailed) 분포	금융 극단 재해, 사이버 위기 트래픽 등 극단적 아웃라이어가 잦은 분포(Cauchy, Pareto 등)는 분산이 무한해 수렴이 불가능함	이상치 영향도가 제거된 절단 표본평균(Trimmed Mean), 분위수 회귀 분석(Quantile Regression) 등의 견고한(Robust) 비모수 검정 기법 병행
소표본( $n < 30$ ) 환경	스타트업 초기 서비스나 희귀 질병 분석과 같이 샘플 수집의 물리적 한계로 표본 정밀도가 성립하지 않는 경우	정규성 검정(Shapiro-Wilk)을 필수 선행하고, 미충족 시 부트스트랩(Bootstrap) 재표본 기법 및 T-분포 기반 통계적 추론 검정(T-test) 도입

차세대 기술 융합 및 미래 활성화 방안

최근 급격히 발전 중인 거대 언어 모델(LLM)의 강화학습 정렬 기술인 RLHF(Reinforcement Learning from Human Feedback)에서 다수의 인간 피드백 에이전트가 제시하는 보상 함수(Reward Function)의 기댓값 추정 신뢰도 보장을 위해 대수의 법칙이 기반으로 자동 작동함.
디퓨전 이미지 생성 모델(Diffusion Model)의 순방향 확산 과정에서 연속적으로 임의의 미세 노이즈를 누적 주입할 때, 각 단계의 독립 노이즈 분포와 무관하게 최종 잠재 벡터의 분포가 완전한 정규분포(가우시안 노이즈)로 정렬되는 물리적 기초가 바로 중심극한정리에 기인하는바, 차세대 생성 AI 아키텍처 설계를 위한 수학적 필수 뼈대로 활발히 응용되고 있음.

다중 에이전트 시스템

2026년 5월 25일 · 약 4분

Eunkwang Shin

Owner

다중 에이전트 시스템(MAS) 개념

다중 에이전트 시스템(MAS, Multi-Agent System)이란 하나의 환경에서 자율성·반응성·능동성·사회성을 갖춘 다수의 에이전트가 통신·협상·조정을 통해, 단일 에이전트로 해결하기 어려운 복잡한 대규모 문제를 분산 협업으로 해결하는 지능형 시스템 아키텍처이다.
단일 거대 LLM의 할루시네이션, 제한된 컨텍스트 윈도우, 다단계(Multi-step) 태스크의 성능 저하를 극복하기 위해 역할을 특화(Divide & Conquer)하고, A2A(Agent-to-Agent) 협업과 MCP(Model Context Protocol) 도구 연동으로 신뢰성과 비용 효율을 동시에 확보하려는 필요성에서 대두되었다.

다중 에이전트 시스템 구성도, 핵심요소, 적용방안

단일 에이전트와 다중 에이전트의 구조 대비도

다중 에이전트 시스템의 핵심 구성요소

구분	핵심요소	설명
에이전트(Agent)	자율성, 반응성, 능동성, 사회성	역할(Persona)별로 특화된 자율 의사결정 단위
오케스트레이션	Supervisor, Graph 기반 라우팅	LangGraph·CrewAI로 태스크 위임·흐름 제어
A2A 프로토콜	Agent Card, JSON-RPC 기반 협업	에이전트 간 능력 발견·태스크 위임 고수준 규격
MCP	Tool·Resource·Context 연동	개별 에이전트의 도구·데이터 접근 저수준 규격
공유 상태	Blackboard, State Persistence	메모리 동기화 및 협력적 의사결정 기반

다중 에이전트 시스템 적용방안

구분	내용	비고
비즈니스 관점	SDLC·공급망(SCM) 등 End-to-End 업무 자동화	Human-in-the-loop 최소화
기술 관점	역할별 sLLM 분산 배치로 토큰 비용 최적화	LangGraph/CrewAI/AutoGen
보안 관점	내부 메모리 비공유 협업으로 IP 보호	A2A Agent Card 기반 인증

단일 에이전트와 다중 에이전트 시스템의 비교

구분	단일 에이전트(Single-Agent)	다중 에이전트(Multi-Agent, MAS)
아키텍처	중앙 집중형 독립 구조	분산형 자율 협업·오케스트레이션 구조
상호작용	모델 내부 도구 호출(Tool Call)	에이전트 간 통신(A2A Protocol)
내결함성	단일 실패점(SPOF), 오류 시 전체 중단	개별 에이전트 대체·자율 복구 가능
확장성	컨텍스트·도구 증가 시 성능 저하	신규 전문 에이전트 동적 추가로 확장

기술적 시사점: 단일 에이전트는 단순 Q&A에 최적이나, 복잡한 자율 워크플로우(Agentic Workflow)의 신뢰성과 확장성을 확보하려면 분산 협업형 MAS로의 전환이 효과적이다.

다중 에이전트 시스템 적용전략

상호 검증 기반 신뢰도 향상: 서로 다른 관점의 에이전트가 결과를 토론(Debate)·자아 성찰(Self-Reflection)하여 할루시네이션을 억제하고 의사결정 정확도를 높인다.
A2A와 MCP의 상호 보완 통합: 고수준의 에이전트 간 협업은 A2A로, 개별 에이전트의 도구·데이터 접근은 MCP로 처리하는 이중 규격 하이브리드 아키텍처로 수렴한다. 두 표준은 각각 별도 거버넌스로 오픈 표준화가 진행되어, A2A는 2025년 6월 Google이 Linux Foundation에 기증해 Agent2Agent 프로토콜 프로젝트로 운영되고, MCP는 2025년 12월 Anthropic이 Linux Foundation 산하 directed fund인 Agentic AI Foundation(AAIF)에 기증하였다.
프레임워크 선택 전략: 복잡·대규모 상태 오케스트레이션은 LangGraph, 역할 기반 협업은 CrewAI, 대화형 합의는 AutoGen을 활용하되, 조직 요구에 따라 혼합 적용한다.

다중 에이전트 시스템 도입 시 고려사항

구분	문제점	해결방안
기술	에이전트 간 의견 충돌로 무한 루프·데드락 발생	최대 대화 횟수(Max Rounds) 제한, 오케스트레이터 상태 조율·조기 종료
제도	자율 의사결정의 책임 소재·통제 불명확	온톨로지 기반 역할 사전 정의, AgentOps 거버넌스·감사 로깅
비즈니스	메시지 교환 증가로 레이턴시·토큰 비용 누적	비동기 메시징, 공유 블랙보드 활용, 경량 sLLM 융합 설계

향후 MAS는 A2A·MCP 표준화와 AgentOps 거버넌스 위에서, 멀티 프레임워크가 도구를 공유하는 차세대 지능형 자동화 인프라로 발전할 전망이다.

참조

맥케이브 순환복잡도(McCabe's Cyclomatic Complexity)

2026년 5월 25일 · 약 3분

Eunkwang Shin

Owner

맥케이브 순환복잡도 개념

제어 흐름 그래프(Control Flow Graph)를 기반으로 프로그램 내 선형 독립 실행 경로(Linearly Independent Path)의 수를 정량 측정하는 정적 소프트웨어 복잡도 지표.
기본 경로 테스트(Basis Path Testing)의 최소 테스트 케이스 수를 결정하는 상한 기준으로, 화이트박스 테스트 설계와 유지보수성 평가에 활용.

맥케이브 순환복잡도 개념도

위 그래프는 간선 수 $E=5$ , 노드 수 $N=5$ , 판단 노드 수 $D=1$ 이며 단일 모듈( $P=1$ )이므로 $V(G)=E-N+2P=5-5+2=2$ 로 산출.

맥케이브 순환복잡도 구성도, 핵심요소, 적용방안

맥케이브 순환복잡도 구성도

소스코드의 제어 흐름을 노드(명령 구문)와 간선(분기 경로)으로 추상화하여 분기 구조를 시각화.

맥케이브 순환복잡도 핵심요소

구분	핵심요소	설명
간선-노드 방식	$V(G)=E-N+2P$	간선 수 $E$ , 노드 수 $N$ , 연결 요소 수 $P$ (단일 모듈은 $P=1$ ) 기반 일반식
판단 노드 방식	$V(G)=D+1$	판단(Predicate) 노드 수 $D$ (if, while, for, case 등) 기반의 간소식
폐영역 방식	$V(G)=R$	평면 그래프가 분할하는 폐영역과 외부 영역의 합인 총 영역 수 $R$ 과 매핑
복잡도 등급	$V(G)\le 10$	통상 10 이하 안정, 11~~20 보통, 21~~50 고위험, 50 초과 리팩토링 대상

맥케이브 순환복잡도 적용방안

구분	내용	비고
공공	공공 SW 사업 정적 감리 시 소스코드 품질 요구사항 임계치로 반영	감리 점검 기준
금융	계정계 핵심 거래 모듈 단위 복잡도를 $V(G)\le 5$ 로 엄격 통제	결함 최소화
민간	CI/CD 정적 분석 도구(SonarQube 등) 품질 게이트 규칙으로 자동 통제	파이프라인 연동

맥케이브 순환복잡도와 인지 복잡도 비교

구분	맥케이브 순환복잡도	인지 복잡도(Cognitive Complexity)
측정 목적	기계적 테스트 경로 충분성·구조 복잡성 측정	코드를 읽고 이해하는 정신적 부하 측정
측정 요소	제어 흐름 분기 수의 단순 합산	흐름 단절·중첩 깊이·논리 가중치 산출
중첩 가중치	미반영 (중첩 if와 순차 if를 동일 측정)	반영 (중첩 깊이에 비례 가산)
주요 활용	기본 경로 테스트 케이스 설계	가독성 향상·리팩토링 타깃 식별

시사점: 순환복잡도는 테스트 경로 설계, 인지 복잡도는 유지보수 생산성에 특화되어 상호 보완적으로 운용해야 함.

맥케이브 순환복잡도 도입을 위한 고려사항

구분	문제점	해결방안
형식적 측정	수치 충족만을 위한 비정상적 코드 분할 왜곡 발생	인지 복잡도 혼용 정책으로 가독성 함께 통제
비용 부담	과도한 경로로 테스트 케이스 설계·실행 비용 급증	핵심 로직 선택 적용·정적 분석 자동화로 공수 절감
설계 경직성	대용량 파싱 등 본질적 분기 로직에서 오탐 발생	도메인별 복잡도 임계치(Threshold) 유연 정의

복잡도 지표는 정적 분석·형상 통제의 핵심 도구로 SW 전 생명주기에 걸쳐 체계적으로 관리되어야 함.

참조

T. J. McCabe, "A Complexity Measure," IEEE Transactions on Software Engineering, vol. SE-2, no. 4, pp. 308-320, 1976. https://doi.org/10.1109/TSE.1976.233837
NIST, "Structured Testing: A Testing Methodology Using the Cyclomatic Complexity Metric," NIST Special Publication 500-235. https://www.nist.gov/publications/structured-testing-testing-methodology-using-cyclomatic-complexity-metric

LLM AX 학습 파이프라인 (RAG, Fine-Tuning, RLHF, RAFT)

2026년 5월 25일 · 약 3분

Eunkwang Shin

Owner

LLM AX 학습 파이프라인의 개요

LLM AX 학습 파이프라인의 개념

기업의 AI 전환(AX)을 위해 범용 대형언어모델(LLM)에 기업 내부 지식자산을 결합하여 도메인 특화 성능 및 신뢰성을 고도화하는 학습 및 추론 최적화 아키텍처.
내재화된 파라미터 업데이트 방식(Fine-Tuning)과 외부 지식 연계 추론 방식(RAG)의 하이브리드 결합 모델 지향.

LLM AX 학습 파이프라인의 필요성

도메인 정합성: 내부 기밀 규정, 제품 명세 등 비공개 특화 지식에 대한 정확한 추론 필요.
최신성 및 환각 제어: 실시간 데이터 미반영으로 인한 시간적 지체 방지 및 정보 왜곡(Hallucination) 최소화.

파인튜닝(Fine-Tuning)과 RAG의 개념 및 역할

하이브리드형 LLM AX 최적화 개념도

파인튜닝과 RAG의 개념 및 역할 비교

구분	파인튜닝 (Fine-Tuning)	RAG (Retrieval-Augmented Generation)
개념	사전 학습된 LLM의 가중치(Weight)를 특화 데이터로 업데이트	질의에 부합하는 외부 지식을 검색(Search)하여 프롬프트에 결합
핵심 역할	모델의 말투, 출력 형식(JSON 등), 도메인 지식의 내재화	실시간 외부 데이터 접근성 확보 및 정보 근거 제시
데이터 반영	가중치에 물리적 반영 (정적 학습)	프롬프트 컨텍스트에 동적 주입 (실시간 반영)
장단점	지식 고밀도화 / 높은 학습 리소스 및 파라미터 왜곡 위험	외부 데이터 실시간성 / 컨텍스트 윈도우 크기 제약

RLHF와 RAFT 기반 학습 파이프라인 비교

RLHF 및 RAFT 기반 학습 메커니즘

RLHF 파이프라인: SFT 모델 기반으로 인간 선호도 데이터셋 학습을 통한 Reward Model( $R_{RM}$ $R_{RM}$ ) 생성 후, PPO 알고리즘을 사용해 가중치를 정렬.
- 가중치 붕괴 방지를 위해 KL-Divergence 패널티를 보상에 적용: $R_{\text{augmented}}(x, y) = R_{\text{RM}}(x, y) - \beta \cdot D_{KL}(\pi_{\theta}(y|x) || \pi_{\text{ref}}(y|x))$
RAFT 파이프라인: 오픈북 RAG 환경을 모방하여, 질문과 함께 Oracle 문서( $D_*$ $D_{*}$ , 정답 포함) 및 Distractor 문서( $D_k$ $D_{k}$ , 무관한 노이즈)를 혼합 주입 후 Chain-of-Thought(CoT)로 추론하도록 SFT 진행.
- 노이즈 문서 속에서 정답 문서의 근거를 발췌하는 RAG 독해력 자체를 내재화.

RLHF와 RAFT 파이프라인의 특성 비교

구분	RLHF 기반 파이프라인	RAFT 기반 파이프라인
최적화 목적	인간의 윤리적 가치, 지시 이행 선호도(Alignment)에 모델 정렬	도메인 특화 RAG 적용 시 노이즈 필터링 및 독해 성능 극대화
데이터 구성	인간이 평가한 비교 쌍 데이터 (Pairwise Preference Dataset)	질문 + 정답 문서(Oracle) + 무관 문서(Distractor) + CoT 답변
핵심 알고리즘	강화학습 (PPO Policy Gradient, Actor-Critic)	지도학습 (CoT 기반 Supervised Fine-Tuning)
구축 한계	고비용의 인간 피드백 비용 및 RL 학습의 불안정성(Reward Hacking)	Distractor 비중에 따른 학습 데이터셋 구축 난이도 존재

LLM AX 시스템 도입 시 실무적 고려사항

단계별 장애 요인 및 극복 방안

구분	문제점	해결방안
기술적 관점	무분별한 파인튜닝 시 Catastrophic Forgetting(기존 지식 망각) 발생	PEFT(LoRA/QLoRA) 도입 및 원본 모델과의 가중치 보존 비율 조정
보안적 관점	RAG 구동 시 외부 문서 유출 및 민감 개인정보 유출 위험	데이터 익명화 필터링 적용 및 엔터프라이즈 권한 제어(IAM) 연계
비즈니스 관점	데이터셋 가공(Annotation) 및 GPU 서버 유지 비용 가중	소형 오픈소스 모델(sLLM) 다중화 및 RAG-First 하이브리드 아키텍처 채택

차세대 기술 융합 및 미래 활성화 방안

에이전틱 AI(Agentic AI) 연계: 향후 단일 LLM을 넘어 도구 사용(Tool Use) 능력을 결합한 멀티 에이전트 시스템으로 확장하여 기업 업무 프로세스 자동화를 AX의 종착지로 설정.
온디바이스 AI 결합: Edge 단말에서의 초경량 sLLM 파인튜닝과 중앙 서버의 하이브리드 RAG 망 구성을 병행하여 인프라 비용 저감 및 응답 속도 최적화 실현

데이터 가치평가 및 데이터 자산화 (Data Valuation & Assetization)

2026년 5월 25일 · 약 3분

Eunkwang Shin

Owner

데이터 가치평가 및 데이터 자산화의 개요

데이터 가치평가 및 데이터 자산화의 개념

데이터 가치평가: 데이터산업법에 의거, 대상 데이터의 활용을 통해 창출할 수 있는 경제적 가치를 가치평가 방법론을 적용하여 정량적 화폐가치로 산정하는 체계.
데이터 자산화: 기업 내 고립된 데이터를 단순 정보(Information) 수준을 넘어 비즈니스 가치를 반복 창출할 수 있는 경영 전략 자산(Asset)으로 정의, 관리, 운용하는 일련의 과정.

데이터의 자산적 가치 창출 배경 (필요성)

자금 조달 다변화: 디지털 자산화 도래에 따라 데이터를 담보로 한 금융 보증, 대출 및 투자 유치 활성화.
데이터 비즈니스 모델 구축: 데이터 거래소 활성화에 따른 라이선싱 가격 산정 표준 기준 요구.

데이터 가치평가의 개념과 가치평가 방법론

데이터 경제적 가치평가 체계도

데이터 가치평가의 3대 방법론 비교

구분	수익접근법 (Income Approach)	원가 및 시장접근법 (Cost/Market)
개념	데이터 활용으로 인한 미래 기대 수익을 현재 가치로 할인	생성에 소요된 비용 또는 시장 거래 사례 기반 산정
평가 대상	사업적 완성도가 높고 현금 흐름 예측이 가능한 데이터	거래 사례가 존재하거나 대체 구축 비용 산정이 용이한 데이터
핵심 원리	DCF법 및 데이터 기여율( $DR$ ) 반영: $V = \sum_{t=1}^{n} \frac{CF_t \times DR_t}{(1 + r)^t}$	- 대체원가 계산 (역사적 원가 적용) - 유사 거래 사례 비교 (배수법 적용)
주요 한계	미래 현금 흐름 및 데이터 기여율의 임의 추정 위험성	데이터 독창성으로 인한 거래 사례 부재, 미래 가치 미반영

데이터 자산화의 개념과 핵심요소 및 라이프사이클

데이터 자산화의 개념 및 거버넌스 체계

데이터를 자산화하기 위해서는 데이터의 품질, 표준, 메타데이터를 통합 관리하는 데이터 거버넌스(원칙, 조직, 프로세스) 체계가 전제되어야 함.

데이터 자산화의 핵심 구성요소 및 라이프사이클

구분	핵심요소 및 라이프사이클	세부 설명 및 산출물
가치 식별	데이터 가치평가 체계	비즈니스 관련성 분석을 통해 자산화 대상 코어 데이터 선별
구조화	데이터 제품화 (Data Product)	실무자가 즉시 활용 가능하도록 API, 대시보드 형태로 패키징
관리 통제	데이터 카탈로그 및 계보	메타데이터 기반 리니지(Lineage) 관리로 투명성과 품질 보장
주기 관리	데이터 라이프사이클	생성 ➡️ 저장 ➡️ 분석 ➡️ 활용 ➡️ 아카이빙/폐기의 단계별 통제

데이터 가치평가 및 데이터 자산화의 활용사례 및 고려사항

데이터 가치평가 및 자산화의 실무적 활용사례

구분	내용 (활용 분야)	비고 (실제 사례)
금융 및 보증	데이터 담보 보증서 발급 및 보증 대출	신용보증기금, 기술보증기금 주도의 가치평가 연계 금융 지원
자산 및 매각	기업 M&A 및 투자 유치 시 자산 가치 평가	기업 보유 독점 데이터의 가치를 기업가치(Valuation)에 합산
거래 및 중개	데이터 거래소 기반 데이터 판매 및 라이선싱	금융·교통·통신 분야 이종 데이터 결합 및 API 판매 거래

성공적인 데이터 자산화를 위한 고려사항

컴플라이언스 준수: 개인정보보호법에 의거, 가명 정보 처리 및 개인 식별 방지 필터링을 통해 법적 안정성을 확보해야 함.
데이터 리터러시 내재화: 조직 전반이 데이터를 이해하고 분석·활용할 수 있는 CDO 중심의 역량 내재화 프로세스가 결합되어야 실현

CNN 012

2026년 5월 19일 · 약 6분

Eunkwang Shin

Owner

Three main layers of a CNN
- CONV: Convolution Layer
- POOL: Pooling Layer
- FC: Fully Connected Layer
- CONV extracts features, POOL downsamples feature maps, and FC makes the final prediction.
Why CNNs use over ANNs for image processing
- Computationally efficient
- Using Filters to capture spatial features
- Sharing weights across the image
Overfitting
- The model essentially memorizes the training data, leading to poor performance on unseen data
- To prevent overfitting, we can use techniques like:
  - Dropout: Randomly dropping out neurons during training to prevent co-adaptation
  - Batch Normalization: Normalizing the inputs of each layer to stabilize learning
  - L1/L2 Regularization: Adding a penalty to the loss function to discourage large weights
    - L1 regularization adds a penalty based on the absolute value of the weights (can be zero, can make model sparse and useful for feature selection.)
    - L2 regularization adds a penalty based on the squared value of the weights (not can be zero, reduce model complexity and overfitting.)
  - Data Augmentation: Creating new training samples by applying transformations to existing data
ReLU
- If the input is below zero, ReLU does output 0.
- If the input is above zero, it outputs the input value itself.
- max(0, x)
- ReLU can output any number from 0 to infinity, which allows it to capture a wide range of features in the data.
- It fixes gradient vanishing problem by allowing gradients to flow through the network without being squashed to zero, which can happen with activation functions like sigmoid or tanh.
Sigmoid: Binary Classification
Softmax: Multi-class Classification
Backpropagation: sends the error backward through the network and calculates gradients, so the model knows how to update its weights and biases.
Gradient Descent
- It uses Backpropagation to calculate the exact slope (the gradient) of the error (loss).
- Then it takes a step in the opposite direction of the gradient to minimize the error.
- It repeats this interative process until it reaches a local minimum.
Vanishing Gradient Problem
- The gradient becomes too small, so earlier layers learn very slowly or almost stop learning.
- ReLU helps mitigate this problem by allowing gradients to flow through the network without being squashed to zero.
Learning Rate $\alpha$ $α$
- It's a hyperparameter that controls how big of a step the model takes down the slope.
- If $\alpha$ is too small, the model will take tiny steps and may take a long time to converge.
- If $\alpha$ is too large, the model may overshoot the minimum and diverge.
Precision: $\frac{TP}{TP + FP}$ $\frac{TP}{TP + FP}$
- Of all the patients the model predicted as having the disease, how many actually have the disease?
Recall: $\frac{TP}{TP + FN}$ $\frac{TP}{TP + FN}$
- Of all the patients who actually have the disease, how many did the model successfully catch?
- Recall is more important in medical diagnosis because we want to minimize false negatives (missing a disease).
Sliding Window
- Computationally expensive because it requires multiple passes over the image with different window sizes and strides.
- Multiple scales are needed to detect objects of varying sizes, which further increases the computational cost.
Stride: controls the step size of the sliding filter. Larger stride means smaller output.
Edge
- The points or pixels in an image where brightness or intensities change sharply.
- Sobel filter
- Prewitt filter
- Canny edge detector
Padding: adds zeros around the image so the CONV does not shrink the feature map too much.
Keep the output dimension the same as the input dimension, we can use padding.
- $P = \frac{F - 1}{2}$
Image Classification: Assigning a label to an entire image (e.g., cat, dog, car).
Object Detection: Identifying and localizing multiple objects within an image (e.g., bounding boxes)
Instance Segmentation: Identifying and segmenting each object instance in an image (e.g., pixel-level masks)
Momentum: uses an exponentially weighted average of past gradients to smooth updates and accelerate convergence.
RMSProp: uses an exponentially weighted average of squared gradients to adapt the learning rate for each parameter.
Adam: combines Momentum and RMSProp by using both the first moment, average gradient, and the second moment, average squared gradient.
Hyperparameters: learning rate, batch size, number of epochs, optimizer type, dropout rate, etc.
Supervised Learning: The model learns from labeled data, classification, regression.
Unsupervised Learning: The model learns from unlabeled data, clustering.
Loss/Cost function: an estimate of how far the model's predictions are from the actual target/answer.
AI is a broad concept of machines performing human-like tasks.
ML is a subset of AI that learns from data
DL is a subset of ML that uses deep neural networks with many layers.
ML's major problem
- insufficient data
- non-representative training data
- poor-quality data
- irrelevant features
- overfitting
- underfitting
When we use ML?
- a large amount of data for finding patterns and making predictions
- too many rules or too much complexity for humans to handle
Faster R-CNN: Propose regions first, then classify them
- RPN: Region Proposal Network, which generates candidate object proposals
YOLO: Predict boxes and class probabilities directly from the image in one pass
- Anchor boxes: predefined bounding boxes of different sizes and aspect ratios used to predict the location of objects in YOLO.
NMS: Non-Maximum Suppression, selects the best bounding box among overlapping boxes based on confidence scores.
1×1 convolution mixes channel information and can reduce the number of channels, so later convolutions become cheaper.
Inception module: learns small, medium, and large visual features at the same time.
Transfer Learning Strategies:
- First, if the new dataset is small and similar to the original dataset, we can use the pre-trained model directly.
- Second, if the dataset is similar but has different classes, we freeze the convolutional layers and train only the fully connected classification layer.
- Third, if the dataset is small but not very similar, we freeze the early convolutional layers and fine-tune the later convolutional layers plus the FC layer.
- Finally, if the dataset is large and different, we can fine-tune the whole network.
IoU: Intersection over Union, a metric used to evaluate the accuracy of object detection models by comparing the predicted bounding box with the ground truth bounding box.

CNN 011

2026년 5월 11일 · 약 4분

Eunkwang Shin

Owner

Sequence

has a lot of context to predict the next behavior

Sequence modelling types

One to One Binary classification
- X -> Y'
- Will it rain today? Yes/No
Many to One Sentiment Analysis
- X1, X2, X3, ... -> Y'
- Is this review positive or negative?
One to Many Image Captioning
- X -> Y1, Y2, Y3, ...
- Image: A Women is throwing a frisbee in the park
Many to Many Q&A with LLMs, Language translations
- X1, X2, X3, ... -> Y1, Y2, Y3, ...
- Q: Hey, Siri How's the weather today? A: It's sunny and warm outside.

RNN

Recurrent Neural Network

$y'-t = f(x_t, h_{t-1})$ $y^{'} - t = f (x_{t}, h_{t - 1})$
- $y'-t$ : output at time t
- $x_t$ : input at time t
- $h_{t-1}$ : Past momery

Sequence Modelling

Support for Variable-Length input
Has Temploral Dependency (Long, Short-term)
Preserve the information order
Share parameters across sequence

Attention

Why
- RNNs process sequences one step at a time
- Long sentences lead to Long-term memory loss
- Important words can be hidden in long dependencies
Attention helps to focus on relevant parts of the input
For each output word, atention decides which input word is most important
Computes a weighted sum of all input vectors
Higher weights words are more important

Transformer

Self-Attention is the foundation for Transformers architecture
Entire sequence is processed in parallel
Has Encoder and Decoder block
Stack of Layers with Self Attention and Feed Forward Neural Network

Vision Image Transformer (ViT)

Vision transformer have extensive application in all computer vision tasks
ViT looks at images, like how lanauge model looks at words
Image are represented as sequence of patches

Steps to use ViT

Split an image into patches
Flatten the patches
Produce lower-dimensional linear embeddings from the flattened patches
Add positional embeddings
Feed the sequence as input to a standard transformer encoder
Pretrain the model with image labels (fully supervised on a huge dataset)
Finetune on the downstream dataset for image classification

ViT

CNNs vs Vision Transformer (ViT)

Key Aspects	CNNs	ViT
Input Handling	Processes the entire image using filters (kernels)	Splits image into fixed-size patches (like tokens)
Local vs. Global	Focuses on local patterns first (edges, textures)	Uses global self-attention to relate all patches
Architecture	Hierarchical (`convs -> pools -> deeper features`)	Flat transformer encoder stack
Training Data Need	Works well with limited data	Needs lots of data or pretraining
Computation	Efficient with low-res inputs	Computationally heavier, especially on large images
Parallelism	Limited; uses sequential feature stacking	High; patch processing is highly parallelizable

RF-DETR

Roboflow Detection Transformer

Object detection techniques using Transformers
An improvement over the original DETR (Detection Transformer) model
DETR looks at everything globally but miss small things.
RF-DETR looks globally and understands the relationships between things.
First real-time Transformer-based object detection architecture
Outperforms all object detection models, 60+% mAP on COCO dataset

RF-DETR

Diffusion Models

Generate new data samples (images, audio, text) that is similar to a training dataset by learning to reverse a gradual noise process
Forward Diffusion
- Add noise gradually to the original image for many steps
- Iterate until the image becomes pure noise
- Gaussian noise used (no learning)
Reverse Diffusion
- Denosing, model is trained to predict and reverse this noise
- Use the prediction to denoise the image
- Given a noisy image, it predicts a slightly less noisy image version
- After several steps, it reconstructs a clean and new image from pure noise

Steps to train a diffusion model

Start with real data
Add noise step by step, until the image becomes pure noise
Train a model to reverse this process, denoising to recover the original image
Once trained, the model can start from pure noise and generate new and realistic samples

Applications of Diffusion Models

Given a lof of sprite sample images
Can generate New sprite images
- New image generation from image input

CNN 010

2026년 5월 2일 · 약 5분

Eunkwang Shin

Owner

Drawbacks of Anchor-based detectors

It is sensitive to:

Size
Aspect Ratio
Number of Anchor boxes (Fixed)
To much variation with shape
Small object
May not generalize due to pre-defined anchor boxes
Computation expensive

Anchor-free detectors

Localize objects without using boxes as proposals

Key-point based
Center-based

Key-point based

Locates key object parts in an image
Detects spatial locations or points unique to an object
With human body as an example
Key part of face: nose, eyes, eyebrows, mouth ...
Key point of human body: joints, elbows, knees ...
Object is represented using Key-points

Center-based

Finds positives in the center
Predicts four distances from the positive to the potential object boundary
- Top, left, bottom, right
- {x, y, T, R, B, L}

YOLO

Yolo V1: 2015
- darknet backbone
Yolo V2: 2016
- Anchor boxes
- Batch normalization
Yolo V3: 2018
- Objectness score
- improvement for small objects
Yolo V4/V5: 2020
- Solid Baseline Model
- Lightweight and Fast
- image classification, object detection, and instance segmentation
- Multiple input processing (Video, Image, Live stream)
- Optimize weights
- Developed by Ultralytics (not original author)
Yolo X/R: 2021
- Decoupled head
- First version of Anchor free
- Improvement efficiency in backbone
Yolo V6/V7: 2022
- Faster and more accurate
Yolo NAS/V8: 2023
- Anchor free
- Architectural improvement
- Strong baseline for realtime object detection
Yolo V9/V10/V11: 2024
- Oriented bounding box
- Strong baseline for oriented object detection
Yolo V12: 2025
- Attention mechanism, introduced transformer
- Little slower
Yolo 26: 2026
- Deployment on a small form factor hardware
- realtime object detection on edge devices
- Strongest baseline for edge device deployment (realtime and accuracy)
- Efficient Loss Function

YoloX X

Anchor-free detector in the Yolo Family
Decoupled head used
Label assignment using SimOTA
Use YoloV3 SPP with DarkNet53 backbone
Uses advanced augmentation such as Mix-up & Mosaic

Backbone: Feature extraction
Neck: Aggregation of multi-scale feature
Head: Localization and Classification scores

Decoupled head

decoupled head

Coupled Head: one head gives regression score and classification score (Dog/Cat + Location, BBox)
Decoupled Head:
- First head gives Classification score (Dog/Cat)
- Another head gives Regression score (Location, BBox)

Data Augmentation

mixup augmentation

occluded and overlapped objects
improve model robustness

mosaic augmentation

four images are combined into one
crops and resizes the images to create a new training sample

Yolo 26

Realtime computer vision model
Detection, Segmentation, Classification, Pose, Tracking, OBB (Oriented Bounding Box)
Available in Nano, Small, Medium, Large, XLarge
E2E detection pipeline (NMS-free, Non-Maximum Suppression free)
Designed for edge AI and fast deployment

Why is it faster

NMS-free infrerence removes post-processing overhead
Direct bounding box regression (No DFL, Distribution Focal Loss)
Lower latency and simpler deploymenet graph
CPU-optimized architecture
Up to 43% faster on CPUs than V11

Key Changes

ProgLoss (Progressive Loss Balancing): improves training stability and convergence
STAL (Small-Target-Aware Label Assignment): improves small-object detection
MuSGD optimizer improves convergence speed
Better speed-accuracy trade-off than many previous YOLO models
Ideal for robotics, drones, surveillance, and edge devices

Inference pipeline

Backbone: Efficient Hybrid CNN + Attention
Neck: PAN-FPN (Multi-scale Feature Fusion)
Head (Decoupled & Dual Head)
- One-to-Many Head: Dense supervision (Traning only, many positives)
- One-to-One Head: Single best match NMS-free inference (Inference & tranining)

Tranining pipeline

Instance Segmentation

Identifies each pixel of an object instance
whereas Semantic Segmentation classifies object pixels to specific classes/categories
Instance Segmentation
- SegNet
- DeepMask
- SharpMask
- Mask RCNN
Semantic Segmentation
- Conditional Random Field (CRF)
- Fully Convolutional Networks (FCN)
- U-Net
- Pyramid scene parsing network (PSPNet)

Application of Instance Segmentation

Autonomous Driving
Scene Understanding
Aerial Image Processing

Mask R-CNN

Mask-Region Convolutional Neural Network

An addition to the RCNN family, perfoming instance segmentation
Improved over Faster RCNN
Full Convolutional Network for predicting mask for each class/object.
Two stages:
1. RPN proposes candidiate object bounding boxes
2. Classify the Candidates, refine bounding boxes, and predict mask.

Mask R-CNN architecture

Limitations of Mask R-CNN

Computational Complexity: Traning and inference can be computationally intensive, requiring substantial resources (high resolution images or large datasets).
Small-Object Segmentation: may struggle with accurately segment very small objects due to limited pixel information.
Data Requirements: Training requires a large amount of annotated data, which can be time-consuming and expensive to acquire.
Limited Generalization to Unseen Categories: The model's ability to generalize to unseen object categories is limited.

Semantic Segmentation

u-net

input image -> u-net -> output segmentation map

References

Ge, Z., Liu, S., Wang, F., Li, Z., & Sun, J. (2021). YOLOX: Exceeding YOLO series in 2021. arXiv. https://doi.org/10.48550/arXiv.2107.08430
Ronneberger, O., Fischer, P., & Brox, T. (2015). U-Net: Convolutional networks for biomedical image segmentation. In N. Navab, J. Hornegger, W. M. Wells, & A. F. Frangi (Eds.), Medical Image Computing and Computer-Assisted Intervention – MICCAI 2015 (Vol. 9351, pp. 234–241). Springer. https://doi.org/10.1007/978-3-319-24574-4_28

CNN 009

2026년 4월 22일 · 약 2분

Eunkwang Shin

Owner

Prediciting Bounding Boxes

Using:
- Sliding Window (Slow)
- Selective Search
- Region Proposals
Task:
- Predict Bouding boxes from CNN

Non Maxima Suppression (NMS)

Check the probabilities of each detection and keep ones with score above a certain threshold (0.7)
For remaining boxes, a. Box with highest score is the detection results. b. Discard any remaining boxes with IoU > 0.5 with final detected box c. i.e. overlap with the box with highest score.

Anchor Boxes

Associate each object to:
- A cell which contains its mid-point and
- Anchor box for the cell with highest IoU
Calculate the IoU of Anchor boxes and prediected Bounding Boxes.
- $IoU(P_{bb}, A_{bb}) = \frac{Area of Overlap}{Area of Union}$
$\hat{y} = \{P_0, x, y, h, w, C1, C2, \quad P_0 x, y, h, w, C1, C2\}$ $y^={P0,x,y,h,w,C1,C2,P0x,y,h,w,C1,C2}$
- $P_0$ is objectness score
- $x, y$ are the coordinates of the center of the bounding box relative to
- $h, w$ are the height and width of the bounding box
- $C1, C2$ are the class information for the object in the bounding box

YOLO

Real-time performance with 45 FPS, 0.02 sec per image
Not suitable for small objects
Issues with new or multiple aspect ratios and unable to generalize

SSD, Single Shot Detector

Similar to YOLO, VGG16 base Convolutional Neural Network layers
Take advantage of Anchor boxes with different aspect ratios
Large number of anchors boxes are chosen
Not suitable for small objects
3 times faster than Faster R-CNN
with ResNet-101 base SSD may help in detecting small objects with better features from the CONV layers

SSD 300 architecture

Overview of Object Detection

Base Networks
- VGG156
- ResNet-101
- Inception-v2, v3
- ResNet
- MobileNet
- Alexnet
- ZFNet
Object Detection Framework
- R-CNN family
- YOLO family
- SSD family
- F-RCNN family
Faster-RCNN is more accurate but slower
YOLO/SSD are faster/real-time but may not be very accurate

Reinforcement Learning​

Characteristics of RL​

Framwork of RL​

RL Math​

Probability​

Bonferroni's Inequality​

Boole's Inequality​

Bayes' Rule​

Independent Events​

Conditional Independence​

Induced Probability Function​

Cumulative Distribution Function (CDF)​

Continuous & Discrete Random Variables​

Probability Mass Function (PMF)​

Probability Density Function (PDF)​

Expectation​

Moments​

Covariance​

Correlation​

Joint Distributions​

Marginal Distributions​

Conditional Distributions​

Bernoulli Distribution​

Binomial Distribution​

Geometric Distribution​

Uniform Distribution​

Normal Distribution​

Multivariate Normal Distribution​

Beta Distribution​

Linear Algebra​

Axioms of Vector Spaces​

Subspace​

Norm​

Span​

Range​

Nullspace​

Linear Independence​

Rank​

Orthogonal Matrices​

Quadratic Form​

Eigenvalues & Eigenvectors​

Diagonalization​

Symmetric Matrices​

Definiteness of Symmetric Matrices​

Eigenvalues of a Positive Semidefinite Matrix​

Singular Value Decomposition​

대수의 법칙과 중심극한정리의 개요​

대수의 법칙과 중심극한정리의 개념​

대수의 법칙과 중심극한정리의 시사점​

대수의 법칙과 중심극한정리의 구조 및 메커니즘​

대수의 법칙과 중심극한정리의 개념 구조도​

대수의 법칙과 중심극한정리의 핵심 수식​

1. 약한 대수의 법칙 (WLLN: Weak Law of Large Numbers)​

2. 강한 대수의 법칙 (SLLN: Strong Law of Large Numbers)​

3. 린데베르그-레비 중심극한정리 (Lindeberg-Lévy CLT)​

4. 표본평균의 정규 근사성​

대수의 법칙과 중심극한정리의 핵심요소​

대수의 법칙과 중심극한정리의 비교 및 적용 방안​

대수의 법칙(LLN)과 중심극한정리(CLT)의 상세 비교​

실무 적용 및 비즈니스 활성화 방안​

대수의 법칙과 중심극한정리 도입 시 실무적 고려사항​

단계별 장애 요인 및 극복 방안​

차세대 기술 융합 및 미래 활성화 방안​

다중 에이전트 시스템(MAS) 개념​

다중 에이전트 시스템 구성도, 핵심요소, 적용방안​

단일 에이전트와 다중 에이전트의 구조 대비도​

다중 에이전트 시스템의 핵심 구성요소​

다중 에이전트 시스템 적용방안​

단일 에이전트와 다중 에이전트 시스템의 비교​

다중 에이전트 시스템 적용전략​

다중 에이전트 시스템 도입 시 고려사항​

참조​

맥케이브 순환복잡도 개념​

맥케이브 순환복잡도 개념도​

맥케이브 순환복잡도 구성도, 핵심요소, 적용방안​

맥케이브 순환복잡도 구성도​

맥케이브 순환복잡도 핵심요소​

맥케이브 순환복잡도 적용방안​

맥케이브 순환복잡도와 인지 복잡도 비교​

맥케이브 순환복잡도 도입을 위한 고려사항​

Reinforcement Learning

Characteristics of RL

Framwork of RL

RL Math

Probability

Bonferroni's Inequality

Boole's Inequality

Bayes' Rule

Independent Events

Conditional Independence

Induced Probability Function

Cumulative Distribution Function (CDF)

Continuous & Discrete Random Variables

Probability Mass Function (PMF)

Probability Density Function (PDF)

Expectation

Moments

Covariance

Correlation

Joint Distributions

Marginal Distributions

Conditional Distributions

Bernoulli Distribution

Binomial Distribution

Geometric Distribution

Uniform Distribution

Normal Distribution

Multivariate Normal Distribution

Beta Distribution

Linear Algebra

Axioms of Vector Spaces

Subspace

Norm

Span

Range

Nullspace

Linear Independence

Rank

Orthogonal Matrices

Quadratic Form

Eigenvalues & Eigenvectors

Diagonalization

Symmetric Matrices

Definiteness of Symmetric Matrices

Eigenvalues of a Positive Semidefinite Matrix

Singular Value Decomposition

대수의 법칙과 중심극한정리의 개요

대수의 법칙과 중심극한정리의 개념

대수의 법칙과 중심극한정리의 시사점

대수의 법칙과 중심극한정리의 구조 및 메커니즘

대수의 법칙과 중심극한정리의 개념 구조도

대수의 법칙과 중심극한정리의 핵심 수식

1. 약한 대수의 법칙 (WLLN: Weak Law of Large Numbers)

2. 강한 대수의 법칙 (SLLN: Strong Law of Large Numbers)

3. 린데베르그-레비 중심극한정리 (Lindeberg-Lévy CLT)

4. 표본평균의 정규 근사성

대수의 법칙과 중심극한정리의 핵심요소

대수의 법칙과 중심극한정리의 비교 및 적용 방안

대수의 법칙(LLN)과 중심극한정리(CLT)의 상세 비교

실무 적용 및 비즈니스 활성화 방안

대수의 법칙과 중심극한정리 도입 시 실무적 고려사항

단계별 장애 요인 및 극복 방안

차세대 기술 융합 및 미래 활성화 방안

다중 에이전트 시스템(MAS) 개념

다중 에이전트 시스템 구성도, 핵심요소, 적용방안

단일 에이전트와 다중 에이전트의 구조 대비도

다중 에이전트 시스템의 핵심 구성요소

다중 에이전트 시스템 적용방안

단일 에이전트와 다중 에이전트 시스템의 비교

다중 에이전트 시스템 적용전략

다중 에이전트 시스템 도입 시 고려사항

참조

맥케이브 순환복잡도 개념

맥케이브 순환복잡도 개념도

맥케이브 순환복잡도 구성도, 핵심요소, 적용방안

맥케이브 순환복잡도 구성도

맥케이브 순환복잡도 핵심요소

맥케이브 순환복잡도 적용방안

맥케이브 순환복잡도와 인지 복잡도 비교

맥케이브 순환복잡도 도입을 위한 고려사항