M4RIE
0.20120613
|
Functions | |
mzd_slice_t * | _mzd_slice_mul_naive (mzd_slice_t *C, const mzd_slice_t *A, const mzd_slice_t *B) |
\( C = A \cdot B \) using quadratic polynomial multiplication with matrix coefficients. | |
mzd_slice_t * | _mzd_slice_mul_karatsuba2 (mzd_slice_t *C, const mzd_slice_t *A, const mzd_slice_t *B) |
\( C = A \cdot B \) over \(\mathbb{F}_{2^2}\) using 3 multiplications over \(\mathbb{F}_2\) and 2 temporary \(\mathbb{F}_2\) matrices.. | |
mzd_slice_t * | _mzd_slice_mul_karatsuba3 (mzd_slice_t *C, const mzd_slice_t *A, const mzd_slice_t *B) |
\( C = A \cdot B \) over \(\mathbb{F}_{2^3}\) using 6 multiplications over \(\mathbb{F}_2\) and 3 temporary \(\mathbb{F}_2\) matrices.. | |
mzd_slice_t * | _mzd_slice_mul_karatsuba4 (mzd_slice_t *C, const mzd_slice_t *A, const mzd_slice_t *B) |
\( C = A \cdot B \) over \(\mathbb{F}_{2^4}\) using 9 multiplications over \(\mathbb{F}_2\) and 3 temporary \(\mathbb{F}_2\) matrices.. | |
mzd_slice_t * | _mzd_slice_mul_karatsuba5 (mzd_slice_t *C, const mzd_slice_t *A, const mzd_slice_t *B) |
\( C = A \cdot B \) over \(\mathbb{F}_{2^5}\) using 13 multiplications over \(\mathbb{F}_2\) and 3 temporary \(\mathbb{F}_2\) matrices.. | |
mzd_slice_t * | _mzd_slice_mul_karatsuba6 (mzd_slice_t *C, const mzd_slice_t *A, const mzd_slice_t *B) |
\( C = A \cdot B \) over \(\mathbb{F}_{2^6}\) using 17 multiplications over \(\mathbb{F}_2\) and 3 temporary \(\mathbb{F}_2\) matrices. | |
mzd_slice_t * | _mzd_slice_mul_karatsuba7 (mzd_slice_t *C, const mzd_slice_t *A, const mzd_slice_t *B) |
\( C = A \cdot B \) over \(\mathbb{F}_{2^7}\) using 22 multiplications over \(\mathbb{F}_2\) and 3 temporary \(\mathbb{F}_2\) matrices. | |
mzd_slice_t * | _mzd_slice_mul_karatsuba8 (mzd_slice_t *C, const mzd_slice_t *A, const mzd_slice_t *B) |
\( C = A \cdot B \) over \(\mathbb{F}_{2^8}\) using 27 multiplications over \(\mathbb{F}_2\) and 15 temporary \(\mathbb{F}_2\) matrices. | |
static mzd_slice_t * | _mzd_slice_mul_karatsuba (mzd_slice_t *C, const mzd_slice_t *A, const mzd_slice_t *B) |
\( C = C + A \cdot B \) using Karatsuba multiplication of polynomials over matrices over \(\mathbb{F}_2\). | |
static mzd_slice_t * | mzd_slice_mul_karatsuba (mzd_slice_t *C, const mzd_slice_t *A, const mzd_slice_t *B) |
\( C = A \cdot B \) using Karatsuba multiplication of polynomials over matrices over \(\mathbb{F}_2\). | |
static mzd_slice_t * | mzd_slice_addmul_karatsuba (mzd_slice_t *C, const mzd_slice_t *A, const mzd_slice_t *B) |
\( C = C + A \cdot B\) using Karatsuba multiplication of polynomials over matrices over \(\mathbb{F}_2\). | |
mzd_slice_t * | mzd_slice_mul_scalar (mzd_slice_t *C, const word a, const mzd_slice_t *B) |
\( C = a \cdot B \). | |
mzd_slice_t * | mzd_slice_addmul_scalar (mzd_slice_t *C, const word a, const mzd_slice_t *B) |
\( C += a \cdot B \). | |
static mzd_slice_t * | mzd_slice_mul (mzd_slice_t *C, const mzd_slice_t *A, const mzd_slice_t *B) |
\( C = A \cdot B \). | |
static mzd_slice_t * | mzd_slice_addmul (mzd_slice_t *C, const mzd_slice_t *A, const mzd_slice_t *B) |
\( C = C + A \cdot B \). | |
mzed_t * | mzed_mul (mzed_t *C, const mzed_t *A, const mzed_t *B) |
\( C = A \cdot B \). | |
mzed_t * | mzed_addmul (mzed_t *C, const mzed_t *A, const mzed_t *B) |
\( C = C + A \cdot B \). | |
mzed_t * | _mzed_mul (mzed_t *C, const mzed_t *A, const mzed_t *B) |
\( C = A \cdot B \). | |
mzed_t * | _mzed_addmul (mzed_t *C, const mzed_t *A, const mzed_t *B) |
\( C = C + A \cdot B \). | |
mzed_t * | mzed_addmul_naive (mzed_t *C, const mzed_t *A, const mzed_t *B) |
\( C = C + A \cdot B \) using naive cubic multiplication. | |
mzed_t * | mzed_mul_naive (mzed_t *C, const mzed_t *A, const mzed_t *B) |
\( C = A \cdot B \) using naive cubic multiplication. | |
mzed_t * | _mzed_mul_naive (mzed_t *C, const mzed_t *A, const mzed_t *B) |
\( C = C + A \cdot B \) using naive cubic multiplication. | |
mzed_t * | mzed_mul_scalar (mzed_t *C, const word a, const mzed_t *B) |
\( C = a \cdot B \). | |
mzed_t * | mzed_mul_newton_john (mzed_t *C, const mzed_t *A, const mzed_t *B) |
\(C = A \cdot B\) using Newton-John tables. | |
mzed_t * | mzed_addmul_newton_john (mzed_t *C, const mzed_t *A, const mzed_t *B) |
\(C = C + A \cdot B\) using Newton-John tables. | |
mzed_t * | _mzed_mul_newton_john0 (mzed_t *C, const mzed_t *A, const mzed_t *B) |
\(C = C + A \cdot B\) using Newton-John tables. | |
mzed_t * | _mzed_mul_newton_john (mzed_t *C, const mzed_t *A, const mzed_t *B) |
\(C = C + A \cdot B\) using Newton-John tables. | |
mzed_t * | mzed_mul_strassen (mzed_t *C, const mzed_t *A, const mzed_t *B, int cutoff) |
\( C = A \cdot B \) using Strassen-Winograd. | |
mzed_t * | mzed_addmul_strassen (mzed_t *C, const mzed_t *A, const mzed_t *B, int cutoff) |
\( C = C + A \cdot B \) using Strassen-Winograd. | |
mzed_t * | _mzed_mul_strassen (mzed_t *C, const mzed_t *A, const mzed_t *B, int cutoff) |
\( C = A \cdot B \) using Strassen-Winograd. | |
mzed_t * | _mzed_addmul_strassen (mzed_t *C, const mzed_t *A, const mzed_t *B, int cutoff) |
\( C = A \cdot B \) using Strassen-Winograd. | |
rci_t | _mzed_strassen_cutoff (const mzed_t *C, const mzed_t *A, const mzed_t *B) |
Return heurstic choice for crossover parameter for Strassen-Winograd multiplication given A, B and C. |
|
inlinestatic |
\( C = C + A \cdot B \) using Karatsuba multiplication of polynomials over matrices over \(\mathbb{F}_2\).
This function reduces matrix multiplication over \(\mathbb{F}_{2^e}\) to matrix multiplication over \(\mathbb{F}_2\).
As an example consider \( \mathbb{F}_4 \). The minimal polynomial is \( x^2 + x + 1 \). The matrix A can be represented as A0*x + A1 and the matrix B can be represented as B0*x + B1. Their product C is
\[ A0 \cdot B0 \cdot x^2 + (A0 \cdot B1 + A1 \cdot B0) \cdot x + A1*B1. \]
Reduction modulo x^2 + x + 1 gives
\[ (A0 \cdot B0 + A0 \cdot B1 + A1 \cdot B0) \cdot x + A1 \cdot B1 + A0 \cdot B0. \]
This can be re-written as
\[ ((A0 + A1) \cdot (B0 + B1) + A1 \cdot B1) \cdot x + A1 \cdot B1 + A0 \cdot B0 \]
and thus this multiplication costs 3 matrix multiplications over \(\mathbb{F}_2\) and 4 matrix additions over \(\mathbb{F}_2\).
This technique was proposed in Tomas J. Boothby and Robert W. Bradshaw; Bitslicing and the Method of Four Russians Over Larger Finite Fields; 2009; http://arxiv.org/abs/0901.1413
C | Preallocated return matrix, may be NULL for automatic creation. |
A | Input matrix A. |
B | Input matrix B. |
mzd_slice_t* _mzd_slice_mul_karatsuba2 | ( | mzd_slice_t * | C, |
const mzd_slice_t * | A, | ||
const mzd_slice_t * | B | ||
) |
\( C = A \cdot B \) over \(\mathbb{F}_{2^2}\) using 3 multiplications over \(\mathbb{F}_2\) and 2 temporary \(\mathbb{F}_2\) matrices..
C | Preallocated return matrix, may be NULL for automatic creation. |
A | Input matrix A. |
B | Input matrix B. |
mzd_slice_t* _mzd_slice_mul_karatsuba3 | ( | mzd_slice_t * | C, |
const mzd_slice_t * | A, | ||
const mzd_slice_t * | B | ||
) |
\( C = A \cdot B \) over \(\mathbb{F}_{2^3}\) using 6 multiplications over \(\mathbb{F}_2\) and 3 temporary \(\mathbb{F}_2\) matrices..
The formula was taken from Peter L. Montgomery. "Five, Six, and Seven-Term Karatsuba-Like Formulae" in IEEE TRANSACTIONS ON COMPUTERS, VOL. 54, NO. 3, MARCH 2005/
C | Preallocated return matrix, may be NULL for automatic creation. |
A | Input matrix A. |
B | Input matrix B. |
mzd_slice_t* _mzd_slice_mul_karatsuba4 | ( | mzd_slice_t * | C, |
const mzd_slice_t * | A, | ||
const mzd_slice_t * | B | ||
) |
\( C = A \cdot B \) over \(\mathbb{F}_{2^4}\) using 9 multiplications over \(\mathbb{F}_2\) and 3 temporary \(\mathbb{F}_2\) matrices..
C | Preallocated return matrix, may be NULL for automatic creation. |
A | Input matrix A. |
B | Input matrix B. |
mzd_slice_t* _mzd_slice_mul_karatsuba5 | ( | mzd_slice_t * | C, |
const mzd_slice_t * | A, | ||
const mzd_slice_t * | B | ||
) |
\( C = A \cdot B \) over \(\mathbb{F}_{2^5}\) using 13 multiplications over \(\mathbb{F}_2\) and 3 temporary \(\mathbb{F}_2\) matrices..
The formula was taken from Peter L. Montgomery. "Five, Six, and Seven-Term Karatsuba-Like Formulae" in IEEE TRANSACTIONS ON COMPUTERS, VOL. 54, NO. 3, MARCH 2005/
C | Preallocated return matrix, may be NULL for automatic creation. |
A | Input matrix A. |
B | Input matrix B. |
mzd_slice_t* _mzd_slice_mul_karatsuba6 | ( | mzd_slice_t * | C, |
const mzd_slice_t * | A, | ||
const mzd_slice_t * | B | ||
) |
\( C = A \cdot B \) over \(\mathbb{F}_{2^6}\) using 17 multiplications over \(\mathbb{F}_2\) and 3 temporary \(\mathbb{F}_2\) matrices.
The formula was taken from Peter L. Montgomery. "Five, Six, and Seven-Term Karatsuba-Like Formulae" in IEEE TRANSACTIONS ON COMPUTERS, VOL. 54, NO. 3, MARCH 2005/
C | Preallocated return matrix, may be NULL for automatic creation. |
A | Input matrix A. |
B | Input matrix B. |
mzd_slice_t* _mzd_slice_mul_karatsuba7 | ( | mzd_slice_t * | C, |
const mzd_slice_t * | A, | ||
const mzd_slice_t * | B | ||
) |
\( C = A \cdot B \) over \(\mathbb{F}_{2^7}\) using 22 multiplications over \(\mathbb{F}_2\) and 3 temporary \(\mathbb{F}_2\) matrices.
The formula was taken from Peter L. Montgomery. "Five, Six, and Seven-Term Karatsuba-Like Formulae" in IEEE TRANSACTIONS ON COMPUTERS, VOL. 54, NO. 3, MARCH 2005/
C | Preallocated return matrix, may be NULL for automatic creation. |
A | Input matrix A. |
B | Input matrix B. |
mzd_slice_t* _mzd_slice_mul_karatsuba8 | ( | mzd_slice_t * | C, |
const mzd_slice_t * | A, | ||
const mzd_slice_t * | B | ||
) |
\( C = A \cdot B \) over \(\mathbb{F}_{2^8}\) using 27 multiplications over \(\mathbb{F}_2\) and 15 temporary \(\mathbb{F}_2\) matrices.
C | Preallocated return matrix, may be NULL for automatic creation. |
A | Input matrix A. |
B | Input matrix B. |
8 + 7 temporaries
mzd_slice_t* _mzd_slice_mul_naive | ( | mzd_slice_t * | C, |
const mzd_slice_t * | A, | ||
const mzd_slice_t * | B | ||
) |
\( C = A \cdot B \) using quadratic polynomial multiplication with matrix coefficients.
C | Preallocated return matrix, may be NULL for automatic creation. |
A | Input matrix A. |
B | Input matrix B. |
\( C = C + A \cdot B \).
C | Preallocated product matrix. |
A | Input matrix A. |
B | Input matrix B. |
\( C = A \cdot B \) using Strassen-Winograd.
This function uses Strassen-Winograd multiplication (Bodrato variant) recursively until it reaches the cutoff, where it switches to Newton-John table based multiplication or naive multiplication.
C | Preallocated product matrix. |
A | Input matrix A. |
B | Input matrix B. |
cutoff | Crossover to basecase dimension > 64 |
\( C = A \cdot B \).
C | Preallocated product matrix. |
A | Input matrix A. |
B | Input matrix B. |
\( C = C + A \cdot B \) using naive cubic multiplication.
C | Preallocated product matrix. |
A | Input matrix A. |
B | Input matrix B. |
\(C = C + A \cdot B\) using Newton-John tables.
This is an optimised implementation.
C | Preallocated product matrix. |
A | Input matrix A. |
B | Input matrix B. |
\(C = C + A \cdot B\) using Newton-John tables.
This is a simple implementation for clarity of presentation. Do not call, it is slow.
C | Preallocated product matrix. |
A | Input matrix A. |
B | Input matrix B. |
\( C = A \cdot B \) using Strassen-Winograd.
This function uses Strassen-Winograd multiplication (Bodrato variant) recursively until it reaches the cutoff, where it switches to Newton-John table based multiplication or naive multiplication.
C | Preallocated product matrix. |
A | Input matrix A. |
B | Input matrix B. |
cutoff | Crossover to basecase dimension > 64 |
Return heurstic choice for crossover parameter for Strassen-Winograd multiplication given A, B and C.
C | Matrix (ignored) |
A | Matrix |
B | Martix (ignored) |
|
inlinestatic |
\( C = C + A \cdot B \).
C | Preallocated return matrix. |
A | Input matrix A. |
B | Input matrix B. |
|
inlinestatic |
\( C = C + A \cdot B\) using Karatsuba multiplication of polynomials over matrices over \(\mathbb{F}_2\).
C | Preallocated return matrix. |
A | Input matrix A. |
B | Input matrix B. |
mzd_slice_t* mzd_slice_addmul_scalar | ( | mzd_slice_t * | C, |
const word | a, | ||
const mzd_slice_t * | B | ||
) |
\( C += a \cdot B \).
C | Preallocated product matrix. |
a | finite field element. |
B | Input matrix B. |
|
inlinestatic |
\( C = A \cdot B \).
C | Preallocated return matrix, may be NULL for automatic creation. |
A | Input matrix A. |
B | Input matrix B. |
|
inlinestatic |
\( C = A \cdot B \) using Karatsuba multiplication of polynomials over matrices over \(\mathbb{F}_2\).
C | Preallocated return matrix, may be NULL for automatic creation. |
A | Input matrix A. |
B | Input matrix B. |
mzd_slice_t* mzd_slice_mul_scalar | ( | mzd_slice_t * | C, |
const word | a, | ||
const mzd_slice_t * | B | ||
) |
\( C = a \cdot B \).
C | Preallocated product matrix or NULL. |
a | finite field element. |
B | Input matrix B. |
\( C = C + A \cdot B \).
C | Preallocated product matrix, may be NULL for automatic creation. |
A | Input matrix A. |
B | Input matrix B. |
\( C = C + A \cdot B \) using naive cubic multiplication.
C | Preallocated product matrix. |
A | Input matrix A. |
B | Input matrix B. |
\(C = C + A \cdot B\) using Newton-John tables.
C | Preallocated product matrix, may be NULL for automatic creation. |
A | Input matrix A. |
B | Input matrix B. |
\( C = C + A \cdot B \) using Strassen-Winograd.
This function uses Strassen-Winograd multiplication (Bodrato variant) recursively until it reaches the cutoff, where it switches to Newton-John table based multiplication or naive multiplication.
C | Preallocated product matrix, may be NULL for allocation. |
A | Input matrix A. |
B | Input matrix B. |
cutoff | Crossover to basecase dimension > 64 or 0 for heuristic choice. |
\( C = A \cdot B \).
C | Preallocated return matrix, may be NULL for automatic creation. |
A | Input matrix A. |
B | Input matrix B. |
\( C = A \cdot B \) using naive cubic multiplication.
C | Preallocated product matrix, may be NULL for automatic creation. |
A | Input matrix A. |
B | Input matrix B. |
\(C = A \cdot B\) using Newton-John tables.
C | Preallocated return matrix, may be NULL for automatic creation. |
A | Input matrix A. |
B | Input matrix B. |
\( C = a \cdot B \).
C | Preallocated product matrix or NULL. |
a | finite field element. |
B | Input matrix B. |
The algorithm proceeds as follows:
0) If a direct approach would need less lookups we use that.
1) We generate a lookup table of 16-bit wide entries
2) We use that lookup table to do 4 lookups per word
\( C = A \cdot B \) using Strassen-Winograd.
This function uses Strassen-Winograd multiplication (Bodrato variant) recursively until it reaches the cutoff, where it switches to Newton-John table based multiplication or naive multiplication.
C | Preallocated product matrix, may be NULL for allocation. |
A | Input matrix A. |
B | Input matrix B. |
cutoff | Crossover to basecase dimension > 64 or 0 for heuristic choice |