Skip to main content

Groebner Basis Attacks

Overview

A Groebner basis is a particular generating set of a polynomial ideal that enables systematic solving of systems of polynomial equations. For AO hash functions, the permutation can be written as a system of polynomial equations over Fp\mathbb{F}_p, and computing a Groebner basis of this system is equivalent to solving the permutation.

The attack model

Consider a permutation π:FptFpt\pi: \mathbb{F}_p^t \to \mathbb{F}_p^t built from rr rounds. Each round ii applies an S-box layer SS, a linear layer MM, and adds round constants c(i)c^{(i)}. We introduce intermediate state variables s(0),s(1),,s(r)s^{(0)}, s^{(1)}, \ldots, s^{(r)} where s(0)=(x1,,xt)s^{(0)} = (x_1, \ldots, x_t) is the input and s(r)=(y1,,yt)s^{(r)} = (y_1, \ldots, y_t) is the output.

The round function produces the system:

s(i)=MS(s(i1))+c(i),i=1,,rs^{(i)} = M \cdot S(s^{(i-1)}) + c^{(i)}, \quad i = 1, \ldots, r

For a power map S-box S(x)=xαS(x) = x^{\alpha}, expanding each round gives tt polynomial equations of degree α\alpha:

sj(i)=k=1tMj,k(sk(i1))α+cj(i),j=1,,ts_j^{(i)} = \sum_{k=1}^{t} M_{j,k} \cdot \left(s_k^{(i-1)}\right)^{\alpha} + c_j^{(i)}, \quad j = 1, \ldots, t

The full system has:

  • Variables: t(r+1)t \cdot (r + 1) field elements (the tt state elements at each of the r+1r + 1 layers, including input and output)
  • Equations: trt \cdot r polynomial equations (one per state element per round), each of degree α\alpha
  • Constraints: the output s(r)s^{(r)} is fixed to the known hash value

A preimage attack fixes (y1,,yt)(y_1, \ldots, y_t) and solves for (x1,,xt)(x_1, \ldots, x_t). The intermediate variables s(1),,s(r1)s^{(1)}, \ldots, s^{(r-1)} are unknowns that get eliminated during Groebner basis computation.

For partial rounds (Poseidon), only one equation per round has degree α\alpha; the remaining t1t - 1 are linear (sj(i)=sj(i1)s_j^{(i)} = s_j^{(i-1)} through the identity). This produces a sparser system with fewer high-degree equations, which is precisely what makes it potentially easier to solve.

Macaulay matrices

(See Macaulay. "Some Properties of Enumeration in the Theory of Modular Systems" (Proc. London Math. Soc., 1927), Wikipedia: Macaulay matrix, and Cox, Little, O'Shea. "Ideals, Varieties, and Algorithms" (Springer, 4th ed. 2015), Chapter 9.)

A Macaulay matrix is the central data structure in modern Groebner basis algorithms. It encodes a system of polynomial equations as a matrix suitable for linear algebra.

Given polynomials f1,,fmf_1, \ldots, f_m in nn variables, the degree-DD Macaulay matrix MD\mathcal{M}_D is constructed as follows:

  1. For each polynomial fif_i of degree did_i, multiply it by every monomial of degree Ddi\leq D - d_i. This produces new polynomials of degree D\leq D.
  2. List all monomials of degree D\leq D as column headers, ordered by the chosen monomial ordering (e.g., degrevlex).
  3. Each row of the matrix is one polynomial (an original fif_i times a monomial), with entries being the coefficients of each monomial.

The matrix has:

  • Rows: i=1m(n+DdiDdi)\sum_{i=1}^{m} \binom{n + D - d_i}{D - d_i} (one row per polynomial-monomial product)
  • Columns: (n+DD)\binom{n + D}{D} (one column per monomial of degree D\leq D)

Row-reducing MD\mathcal{M}_D (Gaussian elimination) is equivalent to computing all polynomial reductions up to degree DD. When DD equals the solving degree, the row echelon form reveals the Groebner basis.

The key observation for complexity: the matrix dimensions grow as (n+DD)\binom{n + D}{D}, which is polynomial in nn for fixed DD but exponential in DD for fixed nn. This is why the solving degree DsolD_{\text{sol}} dominates the cost.

S-polynomials

(See Wikipedia: Groebner basis, Cox, Little, O'Shea. "Ideals, Varieties, and Algorithms" (Springer, 4th ed. 2015), Chapter 2, and Buchberger. "An Algorithm for Finding the Basis Elements of the Residue Class Ring of a Zero Dimensional Polynomial Ideal" (1965, English translation 2006).)

The S-polynomial (syzygy polynomial) is the fundamental operation in Groebner basis algorithms. Given two polynomials ff and gg with leading monomials LM(f)\text{LM}(f) and LM(g)\text{LM}(g), the S-polynomial is designed to cancel their leading terms.

Let L=lcm(LM(f),LM(g))L = \text{lcm}(\text{LM}(f), \text{LM}(g)). The S-polynomial is:

S(f,g)=LLT(f)fLLT(g)gS(f, g) = \frac{L}{\text{LT}(f)} \cdot f - \frac{L}{\text{LT}(g)} \cdot g

where LT(f)\text{LT}(f) is the leading term (coefficient times leading monomial). The result has strictly lower degree than LL because the leading terms cancel by construction.

Why S-polynomials matter: if all S-polynomials of pairs in a basis reduce to zero, then the basis is a Groebner basis (this is Buchberger's criterion). If an S-polynomial does not reduce to zero, its remainder is a new polynomial that must be added to the basis. This is the core loop of Buchberger's algorithm.

Solving algorithms

Buchberger's algorithm

(See Wikipedia: Buchberger's algorithm and Buchberger. "An Algorithm for Finding the Basis Elements of the Residue Class Ring of a Zero Dimensional Polynomial Ideal" (1965, English translation 2006).)

The original algorithm (1965) for computing Groebner bases. It iteratively computes S-polynomials of all pairs and reduces them against the current basis. If a reduction yields a nonzero remainder, it is added to the basis and the process repeats. The algorithm terminates when all S-polynomials reduce to zero. Conceptually simple but impractical for large systems due to intermediate expression swell: the polynomials generated during computation can have very large coefficients and many terms, even if the final basis is compact.

F4 (Faugere, 1999)

(See Faugere. "A New Efficient Algorithm for Computing Groebner Bases (F4)" (Journal of Pure and Applied Algebra, 1999).)

F4 replaces the pairwise S-polynomial reductions of Buchberger with sparse linear algebra on Macaulay matrices. In each step:

  1. Select pairs of polynomials and form their S-polynomials
  2. Construct a matrix where rows correspond to polynomial multiples and columns to monomials
  3. Row-reduce this matrix using sparse linear algebra
  4. Extract new basis elements from the reduced rows

The key insight: reducing many S-polynomials simultaneously via matrix operations is far more efficient than reducing them one by one. F4 is the workhorse algorithm used in practice (Magma, SageMath).

Example: Groebner basis of a 1-round system

The following computes the Groebner basis of a toy 1-round AO system using SageMath's default algorithm (a variant of F4). We solve a system where the S-box is x5x^5 and the linear layer is a simple matrix over F101\mathbb{F}_{101}.

Example: Macaulay matrix construction

F4 works by constructing Macaulay matrices from polynomial multiples and row-reducing them. Here we build the degree-DD Macaulay matrix for a small system to illustrate the linear algebra at the core of F4.

Example: scaling with rounds

As the number of rounds increases, the polynomial system grows and becomes harder to solve. This example builds the system for a 2-round and 3-round Poseidon-like permutation and measures the Groebner basis computation.

F5 (Faugere, 2002)

(See Faugere. "A New Efficient Algorithm for Computing Groebner Bases without Reduction to Zero (F5)" (ISSAC 2002).)

F5 improves on F4 by detecting and avoiding useless reductions before they happen. It maintains "signatures" that track the history of each polynomial, allowing it to predict when an S-polynomial will reduce to zero (and thus skip it).

The signature criterion guarantees:

  • No redundant computation: every reduction produces new information
  • The algorithm terminates with a Groebner basis without ever reducing to zero

For regular systems, F5 reaches the theoretical optimal solving degree. In practice, F5 is harder to implement efficiently than F4, and many implementations use F4 with heuristic redundancy elimination instead.

XL (eXtended Linearization)

(See Courtois, Klimov, Patarin, Shamir. "Efficient Algorithms for Solving Overdefined Systems of Multivariate Polynomial Equations" (EUROCRYPT 2000).)

XL (Courtois et al., 2000) takes a different approach. Instead of iteratively building a basis, it:

  1. Extend: multiply each equation by all monomials up to some degree DD, generating new equations
  2. Linearize: treat each monomial as an independent variable
  3. Solve: apply Gaussian elimination on the resulting linear system

If the degree DD is chosen large enough that the number of equations exceeds the number of monomials, the linearized system is overdetermined and solvable.

The required degree DD is related to the solving degree of the Groebner basis computation. XL is conceptually simpler than F4/F5 but typically requires the same or higher degree, making it no more efficient in the generic case. However, variants like XL with Wiedemann are useful for systems where the Macaulay matrix is very sparse.

Comparison

AlgorithmStrategyStrengths
F4Matrix reduction of S-polynomialsFast in practice, mature implementations
F5Signature-based redundancy avoidanceTheoretically optimal, no zero reductions
XLLinearization at fixed degreeSimple to analyze, good for sparse systems

Solving degree and complexity

The cost of computing a Groebner basis depends on the solving degree DsolD_{\text{sol}}, which is the maximum degree reached during the computation.

The complexity is:

O((n+DsolDsol)ω)O\left(\binom{n + D_{\text{sol}}}{D_{\text{sol}}}^{\omega}\right)

where nn is the number of variables and ω2.37\omega \approx 2.37 is the linear algebra exponent.

Hilbert series

(See Wikipedia: Hilbert series and Hilbert polynomial, Cox, Little, O'Shea. "Ideals, Varieties, and Algorithms" (Springer, 4th ed. 2015), Chapter 9, and Hilbert. "Ueber die Theorie der algebraischen Formen" (Mathematische Annalen, 1890).)

The Hilbert series (or Hilbert-Poincare series) of a polynomial ideal II encodes the dimensions of the graded components of the quotient ring R/IR/I. For a polynomial ring R=Fp[x1,,xn]R = \mathbb{F}_p[x_1, \ldots, x_n] and a homogeneous ideal II, the Hilbert series is the formal power series:

HSR/I(z)=d=0dimFp(R/I)dzd\text{HS}_{R/I}(z) = \sum_{d=0}^{\infty} \dim_{\mathbb{F}_p}(R/I)_d \cdot z^d

where (R/I)d(R/I)_d is the vector space of degree-dd elements in the quotient. In concrete terms, the coefficient of zdz^d counts the number of linearly independent monomials of degree dd that are not in the ideal (i.e., they are not leading monomials of any element of a Groebner basis).

For the free polynomial ring (no ideal), the Hilbert series is 1/(1z)n=d(n+d1d)zd1/(1-z)^n = \sum_d \binom{n+d-1}{d} z^d, which counts all monomials.

For a system of mm equations with degrees d1,,dmd_1, \ldots, d_m in nn variables, if the system is regular (see below), the Hilbert series has the closed form:

HS(z)=i=1m(1zdi)(1z)n\text{HS}(z) = \frac{\prod_{i=1}^{m} (1 - z^{d_i})}{(1 - z)^n}

Each factor (1zdi)(1 - z^{d_i}) in the numerator "subtracts" the monomials killed by equation ii. The denominator (1z)n(1 - z)^n accounts for the nn free variables.

Why it matters for Groebner bases: during the Groebner basis computation (F4, F5), the algorithm processes polynomials degree by degree. At each degree dd, the number of new basis elements found corresponds to the drop in the Hilbert function. When the Hilbert function reaches zero, the computation is complete. The degree at which this happens is the solving degree.

Castelnuovo-Mumford regularity

(See Wikipedia: Castelnuovo-Mumford regularity, Bayer, Stillman. "A criterion for detecting m-regularity" (Inventiones mathematicae, 1987), and Eisenbud. "Commutative Algebra: with a View Toward Algebraic Geometry" (Springer, 1995), Chapter 20.)

The Castelnuovo-Mumford regularity (or simply "regularity") of a homogeneous ideal II is an algebraic invariant that, for our purposes, equals the maximum degree reached during an optimal Groebner basis computation.

Formally, for a graded module MM over the polynomial ring, the regularity is defined via the minimal free resolution:

reg(M)=maxi{ji:βi,j(M)0}\text{reg}(M) = \max_i \{ j - i : \beta_{i,j}(M) \neq 0 \}

where βi,j\beta_{i,j} are the graded Betti numbers (they count generators in position ii and degree jj of the resolution). This definition comes from homological algebra, but its practical meaning is direct: the regularity tells you the highest degree you will encounter when computing the Groebner basis.

For a regular system of m=nm = n homogeneous polynomials of degrees d1,,dnd_1, \ldots, d_n in nn variables, the regularity is:

reg=1+i=1n(di1)\text{reg} = 1 + \sum_{i=1}^{n} (d_i - 1)

For example, 3 quadratic equations in 3 variables have regularity 1+3(21)=41 + 3 \cdot (2 - 1) = 4.

Regularity and semi-regularity

(See Bardet, Faugere, Salvy. "On the complexity of Groebner basis computation of semi-regular overdetermined algebraic equations" (INRIA RR-5049, 2003) for the semi-regularity notion, and Faugere. "A New Efficient Algorithm for Computing Groebner Bases without Reduction to Zero (F5)" (2002) for the connection to solving degree.)

For a regular system (generic dense polynomials), the solving degree equals the Castelnuovo-Mumford regularity, which can be read from the Hilbert series of the ideal.

A system is regular if adding each equation to the ideal formed by the previous ones always increases the ideal in the expected way (no "accidental" algebraic relations). Equivalently, the sequence f1,,fmf_1, \ldots, f_m is a regular sequence in the polynomial ring.

The semi-regular assumption relaxes this: the system need not be a regular sequence, but the Hilbert series should still match the prediction from the degree sequence. Under this assumption, the Hilbert series predicts the solving degree:

H(z)=i=1m(1zdi)(1z)nH(z) = \frac{\prod_{i=1}^{m} (1 - z^{d_i})}{(1 - z)^n}

where did_i are the degrees of the mm equations in nn variables. The solving degree is the index of the first non-positive coefficient.

The semi-regular assumption is widely used in cryptographic security estimates. The key question for AO hash functions is whether their polynomial systems are actually semi-regular, or whether the specific algebraic structure (sparse equations, partial rounds) causes the solving degree to be lower than predicted.

Application to AO hash functions

The key question: does the specific structure of AO permutations (sparse equations, partial rounds, specific S-box) make the system easier to solve than a generic system of the same degree?

Dedicated Groebner basis attacks

Several papers have shown that the polynomial systems arising from AO hash functions are not semi-regular and can be solved faster:

References