Groebner Basis Attacks

Overview

A Groebner basis is a particular generating set of a polynomial ideal that enables systematic solving of systems of polynomial equations. For AO hash functions, the permutation can be written as a system of polynomial equations over $\mathbb{F}_p$ , and computing a Groebner basis of this system is equivalent to solving the permutation.

The attack model

Consider a permutation $\pi: \mathbb{F}_p^t \to \mathbb{F}_p^t$ built from $r$ rounds. Each round $i$ applies an S-box layer $S$ , a linear layer $M$ , and adds round constants $c^{(i)}$ . We introduce intermediate state variables $s^{(0)}, s^{(1)}, \ldots, s^{(r)}$ where $s^{(0)} = (x_1, \ldots, x_t)$ is the input and $s^{(r)} = (y_1, \ldots, y_t)$ is the output.

The round function produces the system:

s^{(i)} = M \cdot S(s^{(i-1)}) + c^{(i)}, \quad i = 1, \ldots, r

For a power map S-box $S(x) = x^{\alpha}$ , expanding each round gives $t$ polynomial equations of degree $\alpha$ :

s_j^{(i)} = \sum_{k=1}^{t} M_{j,k} \cdot \left(s_k^{(i-1)}\right)^{\alpha} + c_j^{(i)}, \quad j = 1, \ldots, t

The full system has:

Variables: $t \cdot (r + 1)$ field elements (the $t$ state elements at each of the $r + 1$ layers, including input and output)
Equations: $t \cdot r$ polynomial equations (one per state element per round), each of degree $\alpha$
Constraints: the output $s^{(r)}$ is fixed to the known hash value

A preimage attack fixes $(y_1, \ldots, y_t)$ and solves for $(x_1, \ldots, x_t)$ . The intermediate variables $s^{(1)}, \ldots, s^{(r-1)}$ are unknowns that get eliminated during Groebner basis computation.

For partial rounds (Poseidon), only one equation per round has degree $\alpha$ ; the remaining $t - 1$ are linear ( $s_j^{(i)} = s_j^{(i-1)}$ through the identity). This produces a sparser system with fewer high-degree equations, which is precisely what makes it potentially easier to solve.

Macaulay matrices

(See Macaulay. "Some Properties of Enumeration in the Theory of Modular Systems" (Proc. London Math. Soc., 1927), Wikipedia: Macaulay matrix, and Cox, Little, O'Shea. "Ideals, Varieties, and Algorithms" (Springer, 4th ed. 2015), Chapter 9.)

A Macaulay matrix is the central data structure in modern Groebner basis algorithms. It encodes a system of polynomial equations as a matrix suitable for linear algebra.

Given polynomials $f_1, \ldots, f_m$ in $n$ variables, the degree- $D$ Macaulay matrix $\mathcal{M}_D$ is constructed as follows:

For each polynomial $f_i$ of degree $d_i$ , multiply it by every monomial of degree $\leq D - d_i$ . This produces new polynomials of degree $\leq D$ .
List all monomials of degree $\leq D$ as column headers, ordered by the chosen monomial ordering (e.g., degrevlex).
Each row of the matrix is one polynomial (an original $f_i$ times a monomial), with entries being the coefficients of each monomial.

The matrix has:

Rows: $\sum_{i=1}^{m} \binom{n + D - d_i}{D - d_i}$ (one row per polynomial-monomial product)
Columns: $\binom{n + D}{D}$ (one column per monomial of degree $\leq D$ )

Row-reducing $\mathcal{M}_D$ (Gaussian elimination) is equivalent to computing all polynomial reductions up to degree $D$ . When $D$ equals the solving degree, the row echelon form reveals the Groebner basis.

The key observation for complexity: the matrix dimensions grow as $\binom{n + D}{D}$ , which is polynomial in $n$ for fixed $D$ but exponential in $D$ for fixed $n$ . This is why the solving degree $D_{\text{sol}}$ dominates the cost.

S-polynomials

(See Wikipedia: Groebner basis, Cox, Little, O'Shea. "Ideals, Varieties, and Algorithms" (Springer, 4th ed. 2015), Chapter 2, and Buchberger. "An Algorithm for Finding the Basis Elements of the Residue Class Ring of a Zero Dimensional Polynomial Ideal" (1965, English translation 2006).)

The S-polynomial (syzygy polynomial) is the fundamental operation in Groebner basis algorithms. Given two polynomials $f$ and $g$ with leading monomials $\text{LM}(f)$ and $\text{LM}(g)$ , the S-polynomial is designed to cancel their leading terms.

Let $L = \text{lcm}(\text{LM}(f), \text{LM}(g))$ . The S-polynomial is:

S(f, g) = \frac{L}{\text{LT}(f)} \cdot f - \frac{L}{\text{LT}(g)} \cdot g

where $\text{LT}(f)$ is the leading term (coefficient times leading monomial). The result has strictly lower degree than $L$ because the leading terms cancel by construction.

Why S-polynomials matter: if all S-polynomials of pairs in a basis reduce to zero, then the basis is a Groebner basis (this is Buchberger's criterion). If an S-polynomial does not reduce to zero, its remainder is a new polynomial that must be added to the basis. This is the core loop of Buchberger's algorithm.

Solving algorithms

Buchberger's algorithm

(See Wikipedia: Buchberger's algorithm and Buchberger. "An Algorithm for Finding the Basis Elements of the Residue Class Ring of a Zero Dimensional Polynomial Ideal" (1965, English translation 2006).)

The original algorithm (1965) for computing Groebner bases. It iteratively computes S-polynomials of all pairs and reduces them against the current basis. If a reduction yields a nonzero remainder, it is added to the basis and the process repeats. The algorithm terminates when all S-polynomials reduce to zero. Conceptually simple but impractical for large systems due to intermediate expression swell: the polynomials generated during computation can have very large coefficients and many terms, even if the final basis is compact.

F4 (Faugere, 1999)

(See Faugere. "A New Efficient Algorithm for Computing Groebner Bases (F4)" (Journal of Pure and Applied Algebra, 1999).)

F4 replaces the pairwise S-polynomial reductions of Buchberger with sparse linear algebra on Macaulay matrices. In each step:

Select pairs of polynomials and form their S-polynomials
Construct a matrix where rows correspond to polynomial multiples and columns to monomials
Row-reduce this matrix using sparse linear algebra
Extract new basis elements from the reduced rows

The key insight: reducing many S-polynomials simultaneously via matrix operations is far more efficient than reducing them one by one. F4 is the workhorse algorithm used in practice (Magma, SageMath).

Example: Groebner basis of a 1-round system

The following computes the Groebner basis of a toy 1-round AO system using SageMath's default algorithm (a variant of F4). We solve a system where the S-box is $x^5$ and the linear layer is a simple matrix over $\mathbb{F}_{101}$ .

Example: Macaulay matrix construction

F4 works by constructing Macaulay matrices from polynomial multiples and row-reducing them. Here we build the degree- $D$ Macaulay matrix for a small system to illustrate the linear algebra at the core of F4.

Example: scaling with rounds

As the number of rounds increases, the polynomial system grows and becomes harder to solve. This example builds the system for a 2-round and 3-round Poseidon-like permutation and measures the Groebner basis computation.

F5 (Faugere, 2002)

(See Faugere. "A New Efficient Algorithm for Computing Groebner Bases without Reduction to Zero (F5)" (ISSAC 2002).)

F5 improves on F4 by detecting and avoiding useless reductions before they happen. It maintains "signatures" that track the history of each polynomial, allowing it to predict when an S-polynomial will reduce to zero (and thus skip it).

The signature criterion guarantees:

No redundant computation: every reduction produces new information
The algorithm terminates with a Groebner basis without ever reducing to zero

For regular systems, F5 reaches the theoretical optimal solving degree. In practice, F5 is harder to implement efficiently than F4, and many implementations use F4 with heuristic redundancy elimination instead.

XL (eXtended Linearization)

(See Courtois, Klimov, Patarin, Shamir. "Efficient Algorithms for Solving Overdefined Systems of Multivariate Polynomial Equations" (EUROCRYPT 2000).)

XL (Courtois et al., 2000) takes a different approach. Instead of iteratively building a basis, it:

Extend: multiply each equation by all monomials up to some degree $D$ , generating new equations
Linearize: treat each monomial as an independent variable
Solve: apply Gaussian elimination on the resulting linear system

If the degree $D$ is chosen large enough that the number of equations exceeds the number of monomials, the linearized system is overdetermined and solvable.

The required degree $D$ is related to the solving degree of the Groebner basis computation. XL is conceptually simpler than F4/F5 but typically requires the same or higher degree, making it no more efficient in the generic case. However, variants like XL with Wiedemann are useful for systems where the Macaulay matrix is very sparse.

Comparison

Algorithm	Strategy	Strengths
F4	Matrix reduction of S-polynomials	Fast in practice, mature implementations
F5	Signature-based redundancy avoidance	Theoretically optimal, no zero reductions
XL	Linearization at fixed degree	Simple to analyze, good for sparse systems

Solving degree and complexity

The cost of computing a Groebner basis depends on the solving degree $D_{\text{sol}}$ , which is the maximum degree reached during the computation.

The complexity is:

O\left(\binom{n + D_{\text{sol}}}{D_{\text{sol}}}^{\omega}\right)

where $n$ is the number of variables and $\omega \approx 2.37$ is the linear algebra exponent.

Hilbert series

(See Wikipedia: Hilbert series and Hilbert polynomial, Cox, Little, O'Shea. "Ideals, Varieties, and Algorithms" (Springer, 4th ed. 2015), Chapter 9, and Hilbert. "Ueber die Theorie der algebraischen Formen" (Mathematische Annalen, 1890).)

The Hilbert series (or Hilbert-Poincare series) of a polynomial ideal $I$ encodes the dimensions of the graded components of the quotient ring $R/I$ . For a polynomial ring $R = \mathbb{F}_p[x_1, \ldots, x_n]$ and a homogeneous ideal $I$ , the Hilbert series is the formal power series:

\text{HS}_{R/I}(z) = \sum_{d=0}^{\infty} \dim_{\mathbb{F}_p}(R/I)_d \cdot z^d

where $(R/I)_d$ is the vector space of degree- $d$ elements in the quotient. In concrete terms, the coefficient of $z^d$ counts the number of linearly independent monomials of degree $d$ that are not in the ideal (i.e., they are not leading monomials of any element of a Groebner basis).

For the free polynomial ring (no ideal), the Hilbert series is $1/(1-z)^n = \sum_d \binom{n+d-1}{d} z^d$ , which counts all monomials.

For a system of $m$ equations with degrees $d_1, \ldots, d_m$ in $n$ variables, if the system is regular (see below), the Hilbert series has the closed form:

\text{HS}(z) = \frac{\prod_{i=1}^{m} (1 - z^{d_i})}{(1 - z)^n}

Each factor $(1 - z^{d_i})$ in the numerator "subtracts" the monomials killed by equation $i$ . The denominator $(1 - z)^n$ accounts for the $n$ free variables.

Why it matters for Groebner bases: during the Groebner basis computation (F4, F5), the algorithm processes polynomials degree by degree. At each degree $d$ , the number of new basis elements found corresponds to the drop in the Hilbert function. When the Hilbert function reaches zero, the computation is complete. The degree at which this happens is the solving degree.

Castelnuovo-Mumford regularity

(See Wikipedia: Castelnuovo-Mumford regularity, Bayer, Stillman. "A criterion for detecting m-regularity" (Inventiones mathematicae, 1987), and Eisenbud. "Commutative Algebra: with a View Toward Algebraic Geometry" (Springer, 1995), Chapter 20.)

The Castelnuovo-Mumford regularity (or simply "regularity") of a homogeneous ideal $I$ is an algebraic invariant that, for our purposes, equals the maximum degree reached during an optimal Groebner basis computation.

Formally, for a graded module $M$ over the polynomial ring, the regularity is defined via the minimal free resolution:

\text{reg}(M) = \max_i \{ j - i : \beta_{i,j}(M) \neq 0 \}

where $\beta_{i,j}$ are the graded Betti numbers (they count generators in position $i$ and degree $j$ of the resolution). This definition comes from homological algebra, but its practical meaning is direct: the regularity tells you the highest degree you will encounter when computing the Groebner basis.

For a regular system of $m = n$ homogeneous polynomials of degrees $d_1, \ldots, d_n$ in $n$ variables, the regularity is:

\text{reg} = 1 + \sum_{i=1}^{n} (d_i - 1)

For example, 3 quadratic equations in 3 variables have regularity $1 + 3 \cdot (2 - 1) = 4$ .

Regularity and semi-regularity

(See Bardet, Faugere, Salvy. "On the complexity of Groebner basis computation of semi-regular overdetermined algebraic equations" (INRIA RR-5049, 2003) for the semi-regularity notion, and Faugere. "A New Efficient Algorithm for Computing Groebner Bases without Reduction to Zero (F5)" (2002) for the connection to solving degree.)

For a regular system (generic dense polynomials), the solving degree equals the Castelnuovo-Mumford regularity, which can be read from the Hilbert series of the ideal.

A system is regular if adding each equation to the ideal formed by the previous ones always increases the ideal in the expected way (no "accidental" algebraic relations). Equivalently, the sequence $f_1, \ldots, f_m$ is a regular sequence in the polynomial ring.

The semi-regular assumption relaxes this: the system need not be a regular sequence, but the Hilbert series should still match the prediction from the degree sequence. Under this assumption, the Hilbert series predicts the solving degree:

H(z) = \frac{\prod_{i=1}^{m} (1 - z^{d_i})}{(1 - z)^n}

where $d_i$ are the degrees of the $m$ equations in $n$ variables. The solving degree is the index of the first non-positive coefficient.

The semi-regular assumption is widely used in cryptographic security estimates. The key question for AO hash functions is whether their polynomial systems are actually semi-regular, or whether the specific algebraic structure (sparse equations, partial rounds) causes the solving degree to be lower than predicted.

Application to AO hash functions

The key question: does the specific structure of AO permutations (sparse equations, partial rounds, specific S-box) make the system easier to solve than a generic system of the same degree?

Dedicated Groebner basis attacks

Several papers have shown that the polynomial systems arising from AO hash functions are not semi-regular and can be solved faster:

Bariant et al. (2022): improved attacks on Poseidon exploiting the partial round structure
Keller and Rosemarin (2020): practical Groebner basis attacks on reduced-round Poseidon

References

Buchberger. "An Algorithm for Finding the Basis Elements of the Residue Class Ring of a Zero Dimensional Polynomial Ideal" (1965, English translation 2006)
Faugere. "A New Efficient Algorithm for Computing Groebner Bases (F4)" (1999)
Faugere. "A New Efficient Algorithm for Computing Groebner Bases without Reduction to Zero (F5)" (2002)
Courtois, Klimov, Patarin, Shamir. "Efficient Algorithms for Solving Overdefined Systems of Multivariate Polynomial Equations" (EUROCRYPT 2000)
Bariant, Peyrin. "Algebraic Attacks against Some Arithmetization-Oriented Primitives" (2022)
Keller, Rosemarin. "STARK Friendly Hash Survey" (2020)

Overview​

The attack model​

Macaulay matrices​

S-polynomials​

Solving algorithms​

Buchberger's algorithm​

F4 (Faugere, 1999)​

Example: Groebner basis of a 1-round system​

Example: Macaulay matrix construction​

Example: scaling with rounds​

F5 (Faugere, 2002)​

XL (eXtended Linearization)​

Comparison​

Solving degree and complexity​

Hilbert series​

Castelnuovo-Mumford regularity​

Regularity and semi-regularity​

Application to AO hash functions​

Dedicated Groebner basis attacks​

References​