Towards a new SymPy: part 2 - Polynomials

This is the second part of a series of posts on big changes for SymPy with particular focus on speed. The other posts in this series can be found at Towards a new SymPy. The first post outlined what I consider to be the three core subsystems of SymPy:

  1. Robust numerical evaluation (sympy.core.evalf)

  2. The symbolic subsystem (sympy.core.basic)

  3. The computational algebra subsystem (sympy.polys)

In relation to the computational algebra subsystem, there are three ways in which SymPy can be made faster:

  1. Improve/extend some of the algorithms and features.

  2. Make more use of the computational algebra subsystem in the rest of SymPy.

  3. Make use of FLINT which is a fast C library for computational algebra.

This post will describe SymPy’s computational algebra system for polynomials and how each of these steps could be applied to speed up SymPy. I will talk a bit about FLINT and python-flint but I will also write a separate post about those because I know that some people will be more interested in using python-flint than SymPy itself and I hope to encourage them to contribute to python-flint.

As before I am writing this with the intention that it should be to some extent understandable to non SymPy developers. The primary intended audience though is other SymPy developers because I want them to understand the significance of the work done so far and the changes that I think are needed for the future.

TLDR

The short summary of this post is that:

  • SymPy’s polynomial code is good and is well structured although a minor redesign is needed to make use of sparse polynomials.

  • There are some algorithms that could be improved like polynomial gcd but most are pretty good although limited in speed by being pure Python.

  • It is now possible to use python-flint to bring state of the art speeds to SymPy’s polynomial operations.

  • Some work is needed to enable big speedups from python-flint but from here it is not difficult to do that.

The rest of this post is quite long mainly just because there are a lot of details about the computational algebra subsystem that need to be understood in order to follow the detail of what the problems are and what can or should be done to improve things.

Integers and rationals

First, I will give a brief overview of the computational algebra subsystem. In SymPy the computational algebra subsystem is located in the sympy.polys module. The lowest level of this system are the “domains” which are objects that represent rings and fields. Briefly a ring is a set of “elements” that can be added and multiplied and a field is a ring in which it is also possible to divide any two elements (except 0). The most well known examples of each are the ring of integers \(\mathbf{Z}\) and the field of rational numbers \(\mathbf{Q}\).

Engineers and scientists (rather than pure mathematicians) might wonder why we need to distinguish carefully between e.g. the “ring” of integers or the “field” of rationals and worry about whether or not division is “possible”. In some sense we can always divide two integers (except 0) and just get a rational number. Certainly in SymPy’s symbolic subsystem there is no difficulty in dividing two integers although we do need to be careful to divide SymPy’s Integer objects and not Python’s int objects (whose division operator gives float):

>>> from sympy import Integer
>>> Integer(1)/Integer(2)  # dividing Integer gives Rational
1/2
>>> 1/2  # dividing ints gives floats
0.5

The basic reasons that we need to distinguish carefully between rings and fields and between \(\mathbf{Z}\) and \(\mathbf{Q}\) specifically are not particularly deep or mathematical but rather purely computational. Efficient computing means having well defined types, representations and operations. On a very basic level the representation of an integer is a sequence of bits and the representation of a rational number is a pair of integers. Arithmetic operations manipulate these representations and so an efficient arithmetic heavy algorithm needs to use a well defined representation for its elementary types. If we choose our representation to be \(\mathbf{Z}\) then we cannot represent \(1/2\) without converting to a different representation and so (inexact) division is not allowed. Of course we can always convert any element of \(\mathbf{Z}\) to an element of \(\mathbf{Q}\) but this conversion is not a free operation and the representation as an element of \(\mathbf{Q}\) would be less efficient if actually all of our elements are integers and if we do not need to divide them.

The other reason that we need to distinguish between e.g. \(\mathbf{Z}\) and \(\mathbf{Q}\) in particular or more generally between different “domains” is that we usually want to use different algorithms for different domains. For example, the most efficient algorithm for inverting a matrix or solving a linear system of equations over \(\mathbf{Z}\) is different from the most efficient algorithm for doing the same over \(\mathbf{Q}\). Often the algorithms will convert to different domains for example to invert a matrix of integers we might convert to \(\mathbf{Q}\) and use a division-based algorithm. Alternatively to invert a matrix of rational numbers we might factor out denominators and convert to a matrix over \(\mathbf{Z}\) before using a fraction-free algorithm for inverting a matrix over \(\mathbf{Z}\).

In SymPy’s domain system, there are domain objects like ZZ and QQ that represent \(\mathbf{Z}\) and \(\mathbf{Q}\) respectively. These domain objects are used to construct elements of the domain and to convert between domains e.g.:

>>> from sympy import ZZ, QQ
>>> QQ(2,3)  # construct an element of QQ
MPQ(2,3)
>>> q = QQ(2)  # construct an integer-valued element of QQ
>>> q
MPQ(2,1)
>>> z = ZZ.convert_from(q, QQ)  # convert from QQ to ZZ
>>> z
2
>>> type(z)
<class 'int'>

In this example we construct the rational number \(2/3\). Its representation is the MPQ type which is a pure Python implementation of rational numbers based on Python’s int type (analogous to the Python standard library fractions.Fraction type). The representation for ZZ uses Python’s int type.

GMP and gmpy2

I mentioned the two domains ZZ and QQ above and I will describe the other domains below. Since these two domains are absolutely foundational it is worth discussing their implementation a bit first because it is important to understand the role that gmpy2 and the underlying GMP C library play in this. If gmpy2 is installed then SymPy will use its mpz and mpq types for ZZ and QQ respectively. These are based on the underlying GMP C library:

>>> from sympy import ZZ, QQ
>>> ZZ(2)  # pip install gmpy2 to see this
mpz(2)
>>> QQ(2,3)
mpq(2,3)

Here are plots that can compare timings for multiplication of integers with and without gmpy2 for different bit sizes:

../_images/time_ZZ.svg

For small integers mpz is about the same speed as Python’s int type. Potentially mpz is actually a little slower for very small integers just because CPython’s int is heavily micro-optimised for small integers. In Python all integers use a single arbitrary precision type int which is unusual among programming languages. Since most integers are small, CPython tries to optimise for this case. Possibly also CPython can do some optimisations with int that it cannot do with “third party types” like gmpy2’s mpz.

For larger integers (bigger than machine precision) both mpz and int will represent an integer using multiple “digits” although the digits are referred to as “limbs” in GMP. CPython uses 30-bit limbs because its implementation of integer arithmetic is designed to be portable “generic C” that can be compiled by any compiler. By contrast GMP’s implementation is designed to be as fast as possible at all costs and so it uses handcrafted assembly code for many different CPU architectures. In CPU specific assembly it is possible to access instructions like mulx or to read the CPU’s carry flag after an addition etc. These things are essential for being able to use 64-bit limbs but are not available in generic C. GMP’s mpz type therefore uses 64-bit limbs (as do all widely used fast big integer implementations).

As the bit size increases say to 1000 bits then gmpy2’s mpz becomes a lot faster than CPython’s int. At this bit size the difference in speed to multiply two integers is something like 10x and is primarily due to the fact that mpz uses 64-bit limbs whereas int uses 30-bit limbs. This smaller limb-size essentially means a 4x increase in the number of CPU-level operations needed for a big integer multiplication.

For very large integers gmpy2’s mpz is much faster than int. This is because GMP has a whole hierarchy of algorithms for different bit sizes. CPython’s int type tops out at the Karatsuba algorithm which is also used by GMP for intermediate bit-sizes but GMP has more complex algorithms that are used for larger bit sizes. At these large bit sizes the difference in speed between mpz and int can be enormous. Here we focus on multiplication which is probably more favourable to int than some other operations would be. In fact the slowness of some algorithms used by CPython’s int type was even recently considered to be a security vulnerability to the extent that certain operations were disabled:

https://discuss.python.org/t/int-str-conversions-broken-in-latest-python-bugfix-releases/18889

That did lead to some work on improving CPython’s int algorithms but the disabled operations remain disabled even though the algorithms have been improved to some extent:

>>> import math
>>> x = math.factorial(1559)
>>> print(x)
...
ValueError: Exceeds the limit (4300 digits) for integer string conversion;
use sys.set_int_max_str_digits() to increase the limit

This plot shows timings for multiplication of rational numbers with and without gmpy2 for different bit sizes:

../_images/time_QQ.svg

For small rational numbers gmpy2’s mpq type is about 10x faster than SymPy’s pure Python MPQ type when both are used from Python code for multiplication as shown here. The difference can be up to about 20x for general arithmetic-heavy algorithms. This difference is primarily due to gmpy2 implementing this in C and SymPy’s PythonMPQ being implemented in pure Python. For intermediate bit sizes gmpy2’s mpq is only about 2-3x faster because the bulk of the time is spent in gcd and CPython’s math.gcd function is implemented in C (the Python overhead is less significant at larger bit sizes). The difference in timings here most likely reflects GMP being more microoptimised with CPU-specific assembly, 64-bit limbs etc. At larger bit sizes gmpy2’s mpq becomes asymptotically faster and this is again due to asymptotically faster integer multiplication: every integer operation including gcd will ultimately reduce to multiplication at large sizes.

What this all means is that for some operations SymPy is a lot faster when gmpy2 is installed. Some people reading this might think that it seems absurd to worry about the performance of megabyte sized integers but many symbolic algorithms will generate much larger integers than might be expected. Also when gmpy2 is installed it will be used by mpmath and so it speeds up SymPy’s numeric subsystem as well as the computational algebra subsystem.

The symbolic subsystem is the only one of the three core subsystems of SymPy that does not use gmpy2. I did recently look at changing SymPy to have the symbolic Integer and Rational types use mpz and mpq when gmpy2 is installed but hit a stumbling block that the symbolic subsystem allows “unevaluated rationals” whose gcd is not cancelled:

>>> from sympy import Rational
>>> Rational(3, 6, 1)
3/6

That is apparently documented behaviour. This is not really a useful feature though because the symbolic subsystem could also represent the same thing using Mul and Pow. Having the symbolic Rational type use gmpy2’s mpq for its internal representation would break these “unevaluated rationals” (that should be done anyway though).

One of the nice things about the computational algebra subsystem is that it is possible to swap out the implementation of the domain objects like this. So on the one hand SymPy and its only hard dependency mpmath can be used as entirely pure Python code without gmpy2. This is useful for many people who do not need the performance of gmpy2 and do not want to install it or cannot install it for example if they use a different Python implementation like PyPy. On the other hand if gmpy2 is installed then SymPy will use it and will be a lot faster for some operations but with no other observable change in behaviour.

(Contrast this last point with my previous comments about the difficulty of SymPy using SymEngine to speed up the symbolic subsystem in What about SymEngine?)

The domain system

I have talked a lot about the two domains ZZ and QQ but there are many more domains in SymPy. You can read more about them here:

https://docs.sympy.org/latest/modules/polys/domainsintro.html

People who are familiar with computational algebra will recognise these domains:

  • GF(n): integers mod n (the name GF is misleading)

  • ZZ: the integers

  • QQ: the rational numbers

  • ZZ_I: Gaussian integers

  • QQ_I: Gaussian rationals

  • QQ(a): algebraic number field generated by a

  • RR: the real numbers (floats with fixed precision provided by mpmath)

  • CC: the complex numbers (complex floats with fixed precision)

  • K[x]: polynomials in e.g. x with coefficients in another domain K

  • K[x,y]: multivariate polynomials in x and y.

  • K(x,y): rational functions (ratios of polynomials) in x and y.

  • EX: The expression domain (basically the symbolic subsystem)

  • EXRAW: The raw expression domain.

There are more domains but these are the most important ones. Perhaps the easiest way to see what the domains are for is by using the construct_domain function. This function is used internally by SymPy to choose a domain that could represent some expression from the symbolic subsystem in the domain system:

>>> from sympy import *
>>> x, y = symbols('x, y')
>>> construct_domain([1, 2])
(ZZ, [mpz(1), mpz(2)])

Here we asked for a domain that could represent both 1 and 2. What was returned was the domain ZZ (meaning the integers) and a list of two elements representing 1 and 2 in that domain (as gmpy2 mpz objects in this case). We can try more examples:

>>> construct_domain([x**2, 1])
(ZZ[x], [x**2, 1])
>>> construct_domain([x**2, y])
(ZZ[x,y], [x**2, y])
>>> construct_domain([x**2, y/x])
(ZZ(x,y), [x**2, y/x])
>>> construct_domain([sin(x), y])
(ZZ[y,sin(x)], [(sin(x)), y])
>>> construct_domain([x, 2.0])
(RR[x], [x, 2.0])
>>> construct_domain([x/2, 1])
(QQ[x], [1/2*x, 1])

In each case construct_domain tries to find the simplest domain that can represent all the expressions.

Sparse and dense polynomials

Importantly the polynomial domains are always implemented as “sparse” polynomials. This means that only nonzero terms are stored. This example contrasts the sparse and dense representations of polynomials:

>>> from sympy import QQ, symbols
>>> x, y = symbols('x, y')
>>> e = x**10 + y
>>> p_sparse = QQ[x,y].convert(e)
>>> p_dense = e.as_poly()

This is how the sparse PolyElement and the dense Poly usually look:

>>> p_sparse
x**10 + y
>>> p_dense
Poly(x**10 + y, x, y, domain='ZZ')

This is what their internal representations look like:

>>> dict(p_sparse) # internal sparse representation
{(0, 1): mpq(1,1), (10, 0): mpq(1,1)}
>>> e.as_poly().rep.rep  # internal dense representation
[[mpz(1)], [], [], [], [], [], [], [], [], [], [mpz(1), mpz(0)]]

Here the sparse representation is a dictionary mapping exponent tuples to coefficients. The dense representation is a list of lists of coefficients. The empty lists in the dense representation represent zero terms. The dense representation is described as “dense” because it needs to store zero terms explicitly.

Integers mod n

Some domains that people might expect to be find in the domain system are missing like finite fields of non-prime order e.g. GF(2**3). Essentially that is because most SymPy users are not interested in such things and the rest of the codebase does not need them. Some people have expressed interest in these and contributions are certainly welcome but I guess it has not happened because it is not considered high priority and most users who are interested in such things are more likely to use something like Sage rather than SymPy.

Probably most SymPy users are not interested in GF(n) (integers mod n) either but that is there because it is needed for the algorithms in the other domains. Let me give an example to show how all of this is used:

>>> from sympy import symbols, factor
>>> x, y = symbols('x, y')
>>> e = x**4 - y**4/16
>>> e
x**4 - y**4/16
>>> factor(e)
(2*x - y)*(2*x + y)*(4*x**2 + y**2)/16

So how does this work? First factor converts the expression to a polynomial with coefficients in some domain:

>>> p = e.as_poly()
>>> p
Poly(x**4 - 1/16*y**4, x, y, domain='QQ')
>>> p.domain
QQ

Here the Poly identifies that we have two variables x and y and that the coefficients are in the domain QQ (the rational numbers). We are now ready to call the factorisation algorithm. The factorisation algorithm for polynomials with coefficients in QQ will first factor out the denominator 16 giving a polynomial 16*x**4 - y**4 with coefficients in ZZ. We now want to factorise this but then the algorithm for factorising polynomials over ZZ will convert the problem to factorising polynomials over GF(p) for some prime p. Then we compute the factorisation over GF(p) and convert the result back to a factorisation over ZZ and so on. So the steps in the computation are (with EX representing the ordinary symbolic expressions):

EX -> QQ[x,y] -> ZZ[x,y] -> GF(p)[x,y] -> factored -> ... -> EX

The ... here obscures a bunch of complexity that I don’t want to get into. For those familiar with these things the Zassenhaus algorithm is used by default but the main weakness is that LLL-based techniques are not implemented (so worst case is not polynomial time). For everyone else the algorithms used here are usually good but for certain inputs factor can be very slow when a different algorithm would be a lot faster.

The main point of this factorisation example is just to show the significance of all the different domains including GF(p). Most SymPy users do not care about GF(p) but it is crucial for things that they do care about because it is used by e.g. factor which is in turn used by solve and simplify and so on.

Algebraic number fields

In this example we get EX which is what construct_domain returns when it gives up:

>>> construct_domain([sqrt(2), 1])
(EX, [EX(sqrt(2)), EX(1)])

There is a domain for this but it will not be used by default (we have to pass extension=True):

>>> construct_domain([sqrt(2), 1], extension=True)
(QQ<sqrt(2)>, [ANP([mpq(1,1), mpq(0,1)], [mpq(1,1), mpq(0,1), mpq(-2,1)], QQ), ANP([mpq(1,1)], [mpq(1,1), mpq(0,1), mpq(-2,1)], QQ)])

It might not look nice but that is the domain for the algebraic number field \(\mathbb{Q}(\sqrt{2})\). The ANP stands for “algebraic number polynomial”. The representation of algebraic number fields always uses a primitive element. This representation is efficient for arithmetic but computing the primitive element can be expensive which means that it can be slow to construct the domain. Timings are (on a slow computer):

In [10]: %time ok = construct_domain([sqrt(2)], extension=True)
CPU times: user 26 ms, sys: 0 ns, total: 26 ms
Wall time: 25.2 ms

In [11]: %time ok = construct_domain([sqrt(2), sqrt(3)], extension=True)
CPU times: user 47.4 ms, sys: 0 ns, total: 47.4 ms
Wall time: 46.3 ms

In [12]: %time ok = construct_domain([sqrt(2), sqrt(3), sqrt(5)], extension=True)
CPU times: user 55.2 ms, sys: 0 ns, total: 55.2 ms
Wall time: 53.9 ms

In [13]: %time ok = construct_domain([sqrt(2), sqrt(3), sqrt(5), sqrt(7)], extension=True)
CPU times: user 120 ms, sys: 0 ns, total: 120 ms
Wall time: 118 ms

In [14]: %time ok = construct_domain([sqrt(2), sqrt(3), sqrt(5), sqrt(7), sqrt(11)], extension=True)
CPU times: user 688 ms, sys: 0 ns, total: 688 ms
Wall time: 686 ms

In [15]: %time ok = construct_domain([sqrt(2), sqrt(3), sqrt(5), sqrt(7), sqrt(11), sqrt(13)], extens
    ...: ion=True)
^C^C
KeyboardInterrupt

I don’t know how long that last command would take but I interrupted it after about 5 minutes. I have not investigated why it is so slow but I expect that it can be made faster. I think what it really shows though is that it is a bad idea to even try to compute the primitive element and that it is better to represent algebraic number fields differently in the case of having many algebraic generators.

EX and EXRAW domains

There are more domains than listed above but those are the ones that would usually be created automatically within SymPy when the computational algebra subsystem is used implicitly. There will always be some situations where the symbolic subsystem has some expressions that the computational algebra subsystem cannot represent using a standard ring/field from the list above. In those situations it will use the EX or EXRAW domains. In these domains the elements are actually just symbolic expressions from the symbolic subsystem. This provides an escape hatch that allows code that expects to work with the domains to fall back on using the symbolic subsystem when a more structured domain cannot be found.

The difference between the EX and EXRAW domains is that the elements of the EX domain are always simplified using the high-level cancel function so in this domain c = b + a is equivalent to writing c = cancel(b + a) with ordinary SymPy expressions from the symbolic subsystem. The effect of cancel on a symbolic expression is that always rearranges an expression into a ratio of expanded polynomials and then cancels the polynomial gcd of the numerator and denominator:

>>> from sympy import symbols, cancel
>>> x, y = symbols('x, y')
>>> e = 1/x + x
>>> e
x + 1/x
>>> cancel(e)
(x**2 + 1)/x

Calling cancel on symbolic expressions like this is slow because every call to to cancel has to go through the whole process of identifying a polynomial representation, choosing a domain, converting the expressions into the domain and then after actually computing the cancelled fraction the result needs to be convert back to the symbolic subsystem. If this sort of simplification is wanted then it is always better to use any of the more structured domains above than to use EX because it avoids all the cost of these conversions.

For some algorithms the automatic expansion and cancellation used in EX is exactly what is needed as a method of intermediate simplification to speed up a large calculation and return a result in a mostly canonical form. In some situations though it is preferrable not to have this cancellation (which in itself can be slow) and for this the EXRAW domain is provided. Operations with the EXRAW domain are precisely equivalent to operations in the symbolic subsystem (without calling cancel). All the reasons that it is difficult to build heavy algorithms over the symbolic subsystem apply to the EXRAW domain as well. The EXRAW domain is only really useful for preserving existing behaviour in a situation where we want to change code that currently uses the symbolic system to use the computational algebra subsystem instead. It would almost always be better to use something other than EXRAW (even if just EX) but if we want to be conservative when making changes then EXRAW provides a possible compatibility mechanism.

Using the right domains

Having talked a lot about the domain system above I can now explain how that relates to things sometimes being slow in SymPy and what can be done to improve that.

Firstly, when implementing any arithmetic heavy algorithm like solving a system of linear equations all of the domains described above apart from EX or EXRAW are almost always faster than any algorithm that could be implemented directly with symbolic expressions. The number one reason for slowness in things like computing the inverse of a matrix is just the fact that many such algorithms do not use the domain system at all and instead use the symbolic subsystem.

Secondly, in many cases the EX domain is used when it would not be difficult to choose a better domain instead. This is because the mechanism for constructing domains is quite conservative about what it will accept. An example would be:

>>> from sympy import *
>>> x, y = symbols('x, y')
>>> construct_domain([x + y])
(ZZ[x,y], [x + y])
>>> t = symbols('t')
>>> x = Function('x')
>>> y = Function('y')
>>> construct_domain([x(t) + y(t)])
(EX, [EX(x(t) + y(t))])

Here the functions x(t) and y(t) should be treated the same as the symbols x and y were. A suitable domain can easily be created explicitly:

>>> domain = ZZ[x(t),y(t)]
>>> domain.from_sympy(x(t) + y(t))
(x(t)) + (y(t))

The problem here is just that the code inside construct_domain rejects this domain because it does not want to create a polynomial ring where the generators have free symbols in common (the t in this case). The reason for rejecting this is to try to avoid something like this:

>>> ZZ[sin(t),cos(t)]
ZZ[sin(t),cos(t)]

This ZZ[sin(t),cos(t)] domain is invalid for many situations. the problem with it is that it is possible to create an expression that should really be zero but appears not to be zero:

>>> R = ZZ[sin(t),cos(t)]
>>> s = R.from_sympy(sin(t))
>>> c = R.from_sympy(cos(t))
>>> e = s**2 + c**2 - 1
>>> e
(sin(t))**2 + (cos(t))**2 - 1
>>> R.is_zero(e)
False
>>> R.to_sympy(e).trigsimp()
0

One of the reasons that arithmetic heavy algorithms with domains are so much faster than with symbolic expressions is because in the domain system any expression that is equal to zero should be simplified automatically to zero. Many algorithms need to know whether expressions are zero or not so this is an extremely useful property. Just treating sin(t) and cos(t) as independent variables in a polynomial ring violates this property. Sometimes that would be fine but in other situations it could lead to bugs. Therefore construct_domain refuses to create the ring ZZ[sin(t),cos(t)] to avoid bugs. This refusal leads to the EX domain being used which is much slower and also potentially subject to precisely the same bugs. The advantage of using the EX domain here is mainly that other code can at least be aware that the domain is not well defined.

It is perfectly possible to implement a domain that can represent a ring involving both sin(t) and cos(t). There are already some kinds of domains that can do this although they are not used by default and also are not quite right for what is needed. What we really want is to be able to make a more complicated ring like this:

QQ[sqrt(2),m1,m2,k1,k2,sin(theta),cos(theta),sin(phi),cos(phi)]

In science and engineering the need to work with sin and cos is very common so specialised domains are needed that can handle this for many different variables and can recognise trig identities etc. SymPy does not yet have this but adding it would mostly complete the domain system in terms of being able to represent the sorts of expressions that users typically want to work with. This would be particularly beneficial for example in the case of symbolic calculations in mechanics (as in the sympy.physics.mechanics module) because those calculations involve many systems of linear equations with sin and cos in the coefficients. I have an implementation of a domain that could represent the ring above using sparse polynomials and Groebner bases but it is still incomplete.

There are then two common ways that things might become slower than they should be when using the domain system:

  • Sometimes the EX domain is used conservatively when suitable alternative domains are already there and could easily be used.

  • Sometimes a suitable domain is not yet implemented (e.g. sin/cos).

In either case the result is that a calculation ends up using the EX domain which is a lot slower than any of the other domains. The fixes for thes problems are simple:

  • Improve the logic for deciding which domains are used by default.

  • Add new domains that can represent things like sin and cos for example.

The other common problem is that constructing algebraic fields can be extremely slow (as shown above). This is unfortunate because after construction algebraic fields are much faster than alternate representations for the same expressions. The fix here is to use a different representation for algebraic fields that is not based on primitive elements but this is not a trivial change.

Most of these changes are not particularly difficult to make but in each case the impact of making such a change can be far reaching and hard to predict in full. Each time some calculation is moved from either the symbolic subsystem or the EX domain to a more structured domain the most noticeable effect is to make things (much) faster, and a secondary effect is that it potentially reduces bugs. The third effect is that it leads to the output of the calculation being in a “more canonical” form which is a good thing but it is a change in output in some sense and it is the impact of this change that is hard to predict. Once a calculation is moved to use a more structured domain though it then becomes much easier to optimise things in future because the form of the output is much more predictable and can be preserved exactly under any change of algorithm.

The Poly system

The domain system is from a user-interface perspective the lowest level of the computational algebra subsystem. The next level up are the low-level poly functions for the dup (dense univariate polynomial) and dmp (dense multivariate polynomial) representations. In the dup representation a polynomial is represented as a list of coefficients. There are many functions with the dup_ prefix for operating on this representation e.g. this is how you would multiply two polynomial using the symbolic system:

>>> from sympy import symbols, expand
>>> x = symbols('x')
>>> p1 = 3*x + 4
>>> p2 = 5*x + 6
>>> p3 = expand(p1*p2)
>>> p3
15*x**2 + 38*x + 24

This is how you would do the same with dup_mul:

>>> from sympy import ZZ
>>> from sympy.polys.densearith import dup_mul
>>> p1 = [ZZ(3), ZZ(4)]
>>> p2 = [ZZ(5), ZZ(6)]
>>> p3 = dup_mul(p1, p2, ZZ)
>>> p3
[mpz(15), mpz(38), mpz(24)]

A multivariate polynomial involving e.g. two symbols x and y can be thought of as a polynomial whose coefficients are polynomials e.g.:

>>> x, y = symbols('x, y')
>>> p = x**2*y + x**2 + x*y**2 - x - y
>>> p.collect(x)
x**2*(y + 1) + x*(y**2 - 1) - y

Here we could think of p as a polynomial in x whose coefficients are polynomials in y. If a univariate polynomial is represented as a list of coefficients then a multivariate polynomial can be represented as a list of lists of coefficients. This is the dmp representation:

>>> p.as_poly(x, y).rep.rep
[[mpz(1), mpz(1)], [mpz(1), mpz(0), mpz(-1)], [mpz(-1), mpz(0)]]

There are many functions like dmp_mul etc for operating on this representation. In a simple example like this it looks okay but actually it is a very bad idea to use this list of lists (of lists of …) representation when we have many more than two variables:

>>> x0, x1, x2, x3, x4, x5, x6, x7, x8, x9 = syms = symbols('x:10')
>>> p = x0*x9 - x5*x3 + x6
>>> p.as_poly(syms).rep.rep
[[[[[[[[[[mpz(1), mpz(0)]]]]]]]]], [[[[[[[[[mpz(-1)]]]], [[[[]]]]]], [[[[[[mpz(1)]]], [[[]]]]]]]]]]

These deeply nested recursive lists are very inefficient to work with compared to the flatter sparse representation which in this case looks like:

>>> dict(ZZ[syms].from_sympy(p))
{(0, 0, 0, 0, 0, 0, 1, 0, 0, 0): mpz(1), (1, 0, 0, 0, 0, 0, 0, 0, 0, 1): mpz(1), (0, 0, 0, 1, 0, 1, 0, 0, 0, 0): mpz(-1)}

Altogether there are about 500 dup_* and dmp_* functions. These collectively implement just about all of the operations and algorithms that you might want for doing anything with polynomials and there are in many cases multiple algorithms for the same operation. It is clear from the design of these that the intention was at some point to rewrite all of this code in C to speed things up.

The next level up are the polynomial classes like the DMP class. The DMP class wraps up a list of lists in the representation shown above along with a reference to the associated domain. The DMP class then provides around 150 methods that are implemented by calling down to the lower-level routines.

The next level above that is Poly which is the main user-facing class for polynomials. The Poly class is a wrapper around the DMP class that provides a lot of convenience methods for doing things like converting to and from the symbolic subsystem, and other representations.

These levels Poly, DMP and the dup representation can be seen by creating a Poly and then looking at its rep attribute:

>>> p = Poly(x**2 + 1)
>>> p          # Poly
Poly(x**2 + 1, x, domain='ZZ')
>>> p.rep      # DMP
DMP([1, 0, 1], ZZ, None)
>>> p.rep.rep  # dup
[1, 0, 1]

The final level are the “polys” functions like factor. These are the highest level functions that are used by most SymPy users and can operate on expressions from the symbolic subsystem:

>>> from sympy import factor
>>> factor(x**2 - 1)
(x - 1)*(x + 1)

The levels altogether then are:

  1. Domains (e.g. ZZ, QQ, ZZ[x], QQ[x] etc)

  2. Low-level poly functions (e.g. dup_mul, dmp_mul etc)

  3. Polynomial classes (e.g. DMP)

  4. High-level Poly class.

  5. Polynomial functions for symbolic expressions (e.g. factor)

The intention was always that some of these levels would be replaced by more efficient implementations in C but it is not obvious at which level that was intended to happen. The Poly class is too high-level and interfaces a lot with the symbolic subsystem so it is not easy or desirable to rewrite that in C. The dup_ and dmp_ functions are closely tied to their own (suboptimal) data representations so without rewriting every single one of them it would not be possible to swap out this whole layer.

There are two levels that are good candidates for swapping in alternate implementations wholesale then:

  • The domain level

  • The DMP class

At the domain level SymPy already swaps out the implementations of ZZ and QQ when gmpy2 is installed but the same could be done for all of the commonly used domains like ZZ[x], QQ[x] etc if the implementations of these from e.g. python-flint were used. The DMP class is also a good candidate for swapping out just because it is nicely self-contained. A wrapper class could be made like FlintDMP that wraps a polynomial type from python-flint and provides the same interface as DMP currently does. That would make it easy to swap in a faster implementation of the DMP class when python-flint is installed.

Besides swapping out a whole layer the easiest way to make some things faster would just be to swap out a few key functions that involve expensive algorithms e.g. like dup_zz_factor or dmp_zz_factor which are the base functions for factorising univariate ot multivariate polynomials in ZZ[x] or ZZ[x,y,...] respectively. Swapping out a single function like this is suboptimal because it implies potentially costly conversions from a more efficient C representation to e.g. the inefficient DMP representation but it would be a very simple change to make and could immediately bring big speed ups for important operations.

Sparse polynomials

The system of levels in the polys module that I described above is all nicely designed and mostly well implemented. There are some algorithms that would be nice to have like asymptotically fast multiplication and division of polynomials but mostly it has the pieces and algorithms that are needed. The big glaring problem with it though is that it is all based on dense polynomials and the DMP representation. This is very inefficient for polynomials with many variables. The system works nicely for univariate polynomials but many of the algorithms necessarily become inefficient for multivariate polynomials. Most SymPy users are not really interested in using only univariate polynomials (e.g. solving equations with only one symbol or something) so this is a significant limitation.

At some point the mistake of basing everything on dense polynomials was realised and a new class PolyElement implementing sparse polynomials was added. The implementation of PolyElement is mostly complete and it is used in many places but not always. For example the domain system always uses (sparse) PolyElement but the Poly class always uses (dense) DMP.

It would be possible to add an equivalent of the DMP class based on sparse polynomials that Poly could use internally. It is not clear if it was intended that PolyElement would be that class or if another level was expected. Looking at the code right now the easiest thing would be to add a new class in between PolyElement and Poly that implements the same interface as DMP but using sparse polynomials internally, let’s call it SMP. Then Poly could use DMP or SMP internally but DMP would only ever be used for univariate polynomials. Then the structure would look like:

  1. Domains (ZZ …)

  2. dup_* (dense) or PolyElement (sparse)

  3. DMP (dense) or SMP (sparse)

  4. Poly (dense or sparse internally)

  5. factor etc

It is not completely clear if that is the right thing to do though because actually many Poly methods are designed tightly around what makes sense in the dense representation or around things that only make sense for univariate polynomials. Probably the interface of Poly and the signatures and behaviour of the methods and functions that operate on it would have been designed differently if sparse and multivariate polynomials had been considered from the start.

In general it is not really clear to me if it even makes sense to have a generic interface that covers both univariate and multivariate polynomials because often the operations that make sense for one case do not make sense for the other. The fact that all of the low-level code needs to have completely separate functions for univariate (dup_*) and multivariate (dmp_*) polynomials shows that it is not really possible to write generic code that ignores this distinction.

An alternate scheme for sparse multivariate polynomials could look like:

  1. Domains

  2. PolyElement

  3. SPoly (sparse version of Poly)

Then depending on context you would choose whether you wanted to use Poly for univariate polynomials or SPoly for multivariate polynomials (perhaps SPoly should be called MPoly for “multivariate”). This approach would require more careful design because it would impact more on code that end users would potentially write rather than just being a rearrangement of internal code. We would need to decide what e.g. expr.as_poly() should return and changing that to SPoly/MPoly in some cases might not be backwards compatible so perhaps we need new methods like expr.as_mpoly().

Polynomial GCD

Apart from actually making use of sparse polynomials in Poly (or SPoly) the implementation of sparse polynomials is mostly complete but the biggest weakness is the use of poor algorithms for polynomial gcd. The current implementation of polynomial gcd for PolyElement uses “heugcd” for ZZ or QQ or switches to the DMP representation and uses subresultant PRS for all other domains. This is bad for several reasons:

  • The “heugcd” algorithm is often okay but can be much worse than other algorithms in some cases.

  • Sometimes “heugcd” is “unlucky” and fails (the current code does not fall back to anything else in this case and just blows up instead).

  • Switching from the sparse to the dense representation can be very inefficient for all the reasons that it is usually better to use a sparse representation in the first place (for multivariate polynomials).

  • The subresultant PRS algorithm is not generally the best algorithm in many cases either.

Ideally there would be a range of different algorithms for sparse polynomial gcd and the best one would be chosen based on sparsity, degree etc. It is not too hard to port the dense PRS algorithm to the sparse implementation but ultimately other algorithms should often be used. There is a whole module sympy.polys.modulargcd which has about 3000 lines of code implementing modular algorithms for gcd over many different domains including algebraic number fields. It looks like good code but is not used anywhere in SymPy and its status is unclear.

Mostly I think that the main route to speeding up polynomial operations in SymPy is just by making use of python-flint but the pure Python implementation should at least use reasonable algorithms. The poor polynomial gcd algorithms are a major weakness that directly impacts high-level operations like solve, integrate etc. I will give two examples of how polynomial gcd can slow down other things. The first is computing the inverse of the matrix described in this issue:

https://github.com/sympy/sympy/issues/25403

The matrix includes I (\(\sqrt{-1}\)) and so the domain of the elements of the matrix is something like ZZ_I[a,b,c,...,g] where ZZ_I means the Gaussian integers. The newly added fraction-free RREF algorithm for computing the inverse can compute the inverse of each component of the matrix in the form of a numerator matrix and scalar denominator relatively quickly e.g.:

In [2]: %time Mnum, den = M2parts[0].to_DM().inv_den()
CPU times: user 1.2 s, sys: 10.8 ms, total: 1.21 s
Wall time: 1.21 s

That takes 1 second which I think is still too slow but most users will tolerate waiting that long for something like this. We want to combine the numerator matrix and the denominator though which means dividing them and ideally the gcd should be cancelled to bring the inverse matrix into its simplest form:

In [3]: %time Minv = Mnum.to_field() / den
^C^C
KeyboardInterrupt

I interupted that after about 5 minutes so I’m not sure how long it actually would have taken. The slow part here is 25 polynomial gcd calculations. The polynomial expressions are moderately large but better algorithms could do this a lot faster. In this case because the domain of the polynomials is ZZ_I the dense PRS algorithm is used rather than heugcd and that is just very slow in this case. We can avoid the gcd calculations here but in other cases it is definitely better to cancel the gcd and it is awkward to have to work around some extremely slow cases by returning less simplified results in most other cases. This sort of consideration applies not just to matrix inverse but e.g. solving a system of linear equations so solve, linsolve etc and many other algorithms that use these operations internally (e.g. integration algorithms).

Another example of slowness caused by polynomial gcd is in the “heurisch” integration algorithm. This algorithm has essentially three computational steps that take place in a loop:

  • Differentiate some expression.

  • Cancel a gcd between two polynomials.

  • Solve a sparse system of linear equations.

The system of linear equations is solved using the computational algebra subsystem (DomainMatrix). Theoretically the linear algebra should be the slow part but in SymPy it is not because the other operations are computed using the symbolic subsystem and end up being much slower. It should be possible to stay entirely in the computational algebra subsystem but the reason for using the symbolic subsystem is that the gcd calculations would be too slow in the computational algebra subsystem. Polynomial gcd is precisely the sort of thing that should be more efficient in the computational algebra subsystem but it is not because the polynomial rings used here have huge numbers of symbols like QQ_I[x1,x2,...,x1000]. The dense gcd algorithms are extremely slow in this case because each step needs to recurse through 1000 levels of nested lists even if the polynomials involved only have a couple of terms.

The heurisch algorithm works around slow gcd by converting to a symbolic expression and then calling cancel (which internally converts to the computational algebra subsystem and back) and then converting back again to the original polynomial ring. The slow part ends up being often just the conversions back and forth between different representations but those are only happening to avoid the slow gcd algorithms. The end result of this is that “heurisch” which is the main workhorse for integrate in SymPy is often very slow as an indirect result of sparse polynomial gcd being slow.

These things would all be made a lot faster by using python-flint but I still think that it is worth improving the pure Python algorithms here as well because it is not really that hard to make big improvements.

Python-flint

The python-flint library is a Python wrapper around the C library FLINT. FLINT is a C library for fast arithmetic and other operations with polynomials, integers, matrices etc and is used by many other computer algebra systems such as SageMath, Mathematica, Maple, OSCAR etc. FLINT itself is built on top of GMP and MPFR but then provides hugely expanded functionality over the top. The FLINT library provides fast implementations for all of the domains that I have mentioned above (ZZ, QQ, GF(p), ZZ[x], QQ[x] etc) and much more:

https://flintlib.org/

The FLINT library is now maintained by Fredrik Johansson who is also the original author of SymPy’s evalf and the mpmath library that is SymPy’s only hard dependency. Fredrik went on to create Arb, Calcium and the generic-rings project. Fredrik recently took over maintainership of FLINT and merged all of these projects together into FLINT 3.0 (AKA MegaFLINT). Altogether FLINT is around 1 million lines of C code with many contributors and provides state of the art implementations for the sorts of things that I have been discussing above and much more. The FLINT codebase has a lot of widely used mature code but is also still under active development.

The python-flint library is a wrapper around FLINT that provides a Python interface to a small subset of the functionality of FLINT (and Arb):

https://fredrikj.net/python-flint/

https://github.com/flintlib/python-flint

Fredrik originally created python-flint but it had not been actively maintained having only a few releases in 2017 and 2018. Over time I have been working on making python-flint more usable and this has now culminated in the releases of python-flint 0.4.0-0.4.2 during the last month. The major changes in these releases are that there are now CI-built wheels for Linux, macOS and Windows that can be installed with pip install python-flint as well as conda packages conda install -c conda-forge python-flint (thanks to Isuru Fernando). There was a major Windows bug that basically made python-flint unusable on Windows and that is now fixed. Lastly I made a bunch of smaller changes to make python-flint more usable and better tested and to fix a few minor bugs.

The end result of that work is that now python-flint is perfectly usable and well tested although still very incomplete in wrapping all of FLINT’s functionality. My own interest in working on this was to make it possible to use python-flint as a drop-in replacement for the pure Python implementations of parts of SymPy’s computational algebra subsystem. Now that python-flint is usable though others have become interested and there are two new active contributors working on expanding its functionality to add finite fields GF(q) and (critical for SymPy) multivariate sparse polynomials.

To give a quick example of what it would mean in terms of speed to make use of python-flint let’s compare the time taken to factorise a polynomial between SymPy’s existing implementation and using python-flint:

In [16]: nums = [1, 2, 3, 4, 5] * 50

In [17]: nums2 = [1, 2, 3, 4, 5] * 50

In [18]: p1 = Poly(nums, x)

In [19]: p2 = Poly(nums2, x)

In [20]: p3 = p1*p2

In [21]: %time result = p3.factor_list()
CPU times: user 7.57 s, sys: 10.1 ms, total: 7.58 s
Wall time: 7.58 s

In [22]: import flint

In [23]: p1 = flint.fmpz_poly(nums[::-1]) # reverse coefficient order

In [24]: p2 = flint.fmpz_poly(nums2[::-1])

In [25]: p3 = p1*p2

In [26]: p3.degree()
Out[26]: 498

In [27]: %time result = p3.factor()
CPU times: user 83.1 ms, sys: 0 ns, total: 83.1 ms
Wall time: 82.3 ms

The polynomial here is of degree 500 and the factorisation takes 8 seconds with SymPy’s existing implementation and 80 milliseconds with python-flint which is a speedup of 100x. This ratio is approximately what can be expected when comparing an optimised C implementation with a pure Python implementation of the same low-level algorithm: SymPy’s existing algorithm is okay in this case but is limited by being implemented in Python. FLINT is not just highly optimised C code though, as it also uses state of the art algorithms and so in the cases discussed above where SymPy’s algorithms are not so good (e.g. polynomial gcd) the speedup can be much larger than 100x.

Here is a comparison of factorisation time vs degree:

../_images/time_factor.svg

For the timings in this plot I generate two polynomials of degree \(d\) with random integer coefficients in the range from 1 to 10, multiply them together and then time how long it takes to factorise the result. The plot compares the timings against \(3d^2\) for FLINT and \(200d^2\) for SymPy. Both curves show an approximately quadratic scaling but with SymPy being about 60x slower than FLINT up to around degree 100. Then the SymPy curve looks like it starts to bend upwards and gets a lot slower above degree 100. The 60x speed difference is the difference between a pure Python implementation of the low-level operations vs FLINT’s optimised C implementation. The upwards bend in the SymPy curve is most likely the result of some algorithmic difference and I think it means that for larger polynomials SymPy will become extremely slow.

While both curves in the plot show approximately quadratic scaling it is worth noting that the worst case for both implementations is worse than \(O(d^2)\). For SymPy’s algorithms the worst case is in fact exponential like \(O(2^d)\) and I am not sure exactly what it would be for FLINT but I think maybe the best bounds are something like \(O(d^6)\). For randomly generated polynomials like shown here we will tend not to see these worst cases but that does not necessarily make them unlikely in practice. For example this issue shows a case where SymPy’s minpoly is extremely slow because factor is slow for a polynomial of degree 162:

https://github.com/sympy/sympy/issues/22400

Using FLINT (see the change to dup_zz_factor below) SymPy can compute the operation from that issue in 300 milliseconds:

In [1]: %time ok = minpoly(root(2,3)+root(3,3)+(-1+I*sqrt(3))/2*root(5,3))
CPU times: user 345 ms, sys: 3.67 ms, total: 349 ms
Wall time: 347 ms

The 60x speed difference above would suggest that SymPy’s existing algorithm would take around 20 seconds to compute the same result but in fact it takes much longer than that. I’m not sure how long but it is more than 5 minutes so at least 1000x slower. In the issue I suggested a change that would bring this down to about 40 seconds so more like 100x slower which is worth doing. Also SymPy now has the LLL algorithm which was discussed there and so that could be used to speed this up.

We could spend a lot of time trying to optimise SymPy’s existing algorithms and in some cases that is worthwhile but ultimately it will always be better to use python-flint when speed is needed. For many of the lower-level algorithms a pure Python implementation will never be able to come close to the speed of FLINT and so if we want SymPy users to be able to use SymPy and have state of the art speed then we just need to use something like python-flint. Of course if anyone would like to improve SymPy’s factorisation algorithm then that is fine and it is impressive what previous SymPy contributors have achieved while working only in Python. For working on making SymPy faster for end users though the immediate priority is to make more use of python-flint.

I think that the way to think about what FLINT is is that it is like a BLAS library for computer algebra systems. Most scientific computing is done with machine precision types like 64-bit floats etc and in that context when you want to compute say the inverse of a matrix then you would absolutely use a BLAS library to do that. You might be working in Python, R or Julia or something but the BLAS library is written in C or Fortran and is just much faster than anything you could write directly. Even if you could write a fast implementation of BLAS in say Python there would be no benefit in doing so because it is better to be able to share things like that e.g. the same BLAS library can be used from Python, R, Julia, Matlab etc.

If you were working in C you would still not write your own code to do what BLAS does because it is not just faster but also more reliable, more accurate, less buggy etc than you would likely achieve without a lot of work. In fact even an optimised C implementation would likely be slower than BLAS because your BLAS library would have not just better algorithms but also things like hand-crafted assembly (much like GMP does), SIMD etc, with the aim of being faster than C.

Anyone who knows about BLAS libraries would say that it would be absurd to try to reimplement BLAS in Python when you can just use an off the shelf BLAS library (pip install numpy already does this for you). This is essentially the same situation for computer algebra systems and libraries like FLINT. It is nice that SymPy has pure Python implementations of many algorithms and there are situations where those are useful in practice but if you want anything like top-end speed for CPU-heavy operations then you need to use something like FLINT. Then if you do use FLINT there is no actual reason why working in Python should be slower than anything else because if the bulk of the time is spent in FLINT then it will be just as fast whether you call it from Python, C or anything else.

I will write more specifically about what is currently implemented in python-flint and what work is needed in a separate post. I will also write more about matrices in a separate post but for now I will just say that FLINT provides both polynomials and matrices over all of the domains mentioned above and SymPy could use those to achieve state of the art speeds for many operations. In the same way that SymPy currently uses gmpy2 to speed up ZZ and QQ it would be possible to speed up every part of the computational algebra subsystem by using python-flint. There are also many other features like Arb and Calcium that SymPy would benefit from but that is not the topic of this post.

Using python-flint

Since python-flint is now easily installable and usable I have added support for using it in SymPy. In SymPy 1.12 and earlier the SYMPY_GROUND_TYPES environment variable can be used to specify whether or not to use gmpy2 for ZZ and QQ. If the environment variable is not set to anything then the default is to use gmpy2 if it is installed and otherwise to use CPython’s int and SymPy’s PythonMPQ types as described above. With the latest changes on the SymPy master branch the SYMPY_GROUND_TYPES environment variable can also be set to flint to use python-flint instead of gmpy2 (if it is installed). The PR implementing this change is:

https://github.com/sympy/sympy/pull/25474

Using this looks like:

$ SYMPY_GROUND_TYPES=flint isympy
...
In [1]: type(ZZ(1))
Out[1]: flint._flint.fmpz

The default is currently to use gmpy2 if the environment variable is not set explicitly but I am considering whether the default should be changed to flint in SymPy 1.13. So far SymPy does not use enough of python-flint’s features to derive major benefits from it and so being conservative it is only used if the user opts in explicitly but that is definitely not what we would want longer term. (Changing the default would only potentially affect anything for those who already have python-flint installed which I suspect is not many people.)

With this change SymPy can now use python-flint for ZZ and QQ. Since FLINT depends on GMP and uses GMP for its integer and rational number types using python-flint for ZZ and QQ is not really much different from using gmpy2. The main difference is that python-flint provides many more things and using python-flint’s elementary types for e.g. ZZ makes it easier to later add support for polynomials in e.g. ZZ[x].

Besides ZZ and QQ the only other things that SymPy currently (on master) uses python-flint for are the internal dense implementation of DomainMatrix over ZZ and QQ and a few number theory functions that would otherwise be provided by gmpy2:

https://github.com/sympy/sympy/pull/25495

https://github.com/sympy/sympy/pull/25577

I will talk more about the DomainMatrix changes in a separate post but for now the point is that SymPy can now use python-flint for a few things and it is now very easy to use it for more things. So far SymPy does not get a major benefit from this but the hard work is now done so that we can just flip a few switches and make some important things much faster.

A simple demonstration of making something faster is that this change would make factorisation of polynomials in ZZ[x] faster when python-flint is installed (and SYMPY_GROUND_TYPES=flint is set):

diff --git a/sympy/polys/factortools.py b/sympy/polys/factortools.py
index de1821a89f..d9a833e6c1 100644
--- a/sympy/polys/factortools.py
+++ b/sympy/polys/factortools.py
@@ -1,5 +1,7 @@
 """Polynomial factorization routines in characteristic zero. """

+from sympy.external.gmpy import GROUND_TYPES
+
 from sympy.core.random import _randint

 from sympy.polys.galoistools import (
@@ -76,6 +78,12 @@
 from math import ceil as _ceil, log as _log


+if GROUND_TYPES == 'flint':
+    from flint import fmpz_poly
+else:
+    fmpz_poly = None
+
+
 def dup_trial_division(f, factors, K):
     """
     Determine multiplicities of factors for a univariate polynomial
@@ -662,6 +670,12 @@ def dup_zz_factor(f, K):
     .. [1] [Gathen99]_

     """
+    if GROUND_TYPES == 'flint':
+        f_flint = fmpz_poly(f[::-1])
+        cont, factors = f_flint.factor()
+        factors = [(fac.coeffs()[::-1], exp) for fac, exp in factors]
+        return cont, factors
+
     cont, g = dup_primitive(f, K)

     n = dup_degree(g)
oscar@nuc:~/current/activ

This simple change does not cause any outwardly observable change in behaviour and does not result in any test failures. It just makes one operation a lot faster e.g. we can now factorise a 2000 degree polynomial in 2 seconds:

In [11]: p1 = random_poly(x, 1000, 1, 5)

In [12]: p2 = random_poly(x, 1000, 1, 5)

In [13]: p3 = p1.as_poly() * p2.as_poly()

In [14]: p3.degree()
Out[14]: 2000

In [15]: %time ok = p3.factor_list()
CPU times: user 2.09 s, sys: 2.6 ms, total: 2.09 s
Wall time: 2.09 s

I have not waited long enough to see how long this operation would take with SymPy’s existing implementation but it is definitely a lot longer than 2 seconds.

It would be easy to speed up specific functions like dup_zz_factor piecemeal like this and it is worth doing that to some extent. All polynomial operations would be a lot faster though if python-flint was used and converting to and from fmpz_poly is relatively slow compared to many fmpz_poly operations. The best approach then would be to swap out a layer somewhere in the poly system. In this case I think what makes the most sense is to make a wrapper class that holds an fmpz_poly internally but provides the same interface as DMP. Then Poly could use this class internally and the result would be that all polynomial operations would be much faster including just basic arithmetic like addition and multiplication. The structure would then be like:

  1. Domains (ZZ …)

  2. dup_* (SymPy) or fmpz_poly (python-flint)

  3. DMP (SymPy) or FlintDMP (python-flint)

  4. Poly (holds DMP or FlintDMP internally)

  5. factor etc

If SymPy did not already have existing implementations for many of the things that python-flint provides then using python-flint would bring a massive expansion in SymPy’s capabilities. There are other things that FLINT/python-flint have that SymPy does not currently have but those are not the topic here. For the sorts of things that I discussed above SymPy already has existing implementations of the algorithms but they are just a lot slower. SymPy will need to keep those pure Python implementations (so that it can still be used without python-flint) so making use of python-flint’s capabilities for polynomials is really just about speed. In that sense it is not really necessary to try to use all of FLINT’s features in SymPy because the speed benefit mostly comes just from using it for the lowest levels: there is no big penalty in writing higher-level algorithms in Python rather than C.

The biggest bang-for-buck that SymPy can get from FLINT would be from using FLINT’s sparse multivariate polynomials. Multivariate polynomials are crucial for many of the things that SymPy users want to do and SymPy’s existing implementations are slow because of being pure Python or dense or because of have algorithms that are far from optimal in some cases like polynomial gcd. The FLINT library has state of the art algorithms for sparse polynomial arithmetic, gcd, factorisation etc which are (or should be) the bottlenecks for many things in SymPy like solving systems of equations, matrix operations, integration, simplification etc. Unfortunately python-flint does not yet expose FLINT’s sparse polynomials but that is being worked on and I hope that the next release of python-flint will provide this. This one feature would bring enormous speedups to SymPy for many things that users want to do.

Probably if SymPy was using python-flint’s existing features and also its sparse polynomials then the potential for simple game-changing speedups of relevance to SymPy users from using FLINT’s other algebra features would be a lot smaller. It would definitely be worth adding those other features but after the sparse polynomials the next biggest benefits for SymPy users would come from using other features like Arb rather than more of FLINT’s algebra features.

What needs to be done

Some significant work has already been done to make it possible to use python-flint to make SymPy faster but there is still a lot of work to do. Also there are things that should be improved in SymPy’s existing algorithms or code structure to make it faster without python-flint and/or to make it easier to leverage python-flint for improving SymPy’s polynomial capabilities.

This is what can be done now without python-flint or with python-flint as it stands today:

  1. Some existing operations and algorithms should be improved like polynomial gcd (port PRS to sparse polynomials, investigate modulargcd module etc).

  2. The sparse polynomial representation should be used more widely in SymPy whether that means adding SMP at the level below Poly or introducing a new SPoly or MPoly class at the Poly level instead.

  3. A wrapper class like FlintDMP or something should be added that can wrap python-flint’s dense univariate polynomial types for use by Poly so that it can use python-flint internally for arithmetic, factor etc over ZZ and QQ.

  4. The ANP class used for algebraic number fields should be able to wrap a FLINT fmpq_poly internally because that is a situation where dense univariate polynomials are very important in the domain system (perhaps it should wrap DMP or FlintDMP).

  5. Algebraic number fields should be reimplemented based on sparse polynomials rather than primitive elements though (this is a much bigger task than anything else mentioned here).

  6. Domains like ZZ_I[x, y] should be reimplemented over the top of sparse polynomials like ZZ[x, y] rather than directly over ZZ_I - this would already speed them up but would also prepare them to get faster with python-flint.

There is an important design decision to be made about whether to use the SMP approach or the SPoly approach described above. Everything else listed above is unambiguous and is just work that should be done.

Another area where python-flint can immediately bring big improvements is GF(p) since python-flint provides an implementation of GF(p) as well as matrices and polynomials over GF(p). I have not listed this above just because for what most users want to do the need for GF(p) is mainly in the internals of things like factorisation and if SymPy gets those algorithms from FLINT then it depends much less on its own GF(p) domain. (If anyone else is interested in this though it would not be hard to make use of python-flint here.)

These changes require the sparse polynomials to be added in python-flint and are the absolute top priority items that would make the biggest difference to the things that users want to do:

  1. FLINT’s sparse polynomials should be exposed in python-flint.

  2. SymPy should use python-flint’s sparse polynomials internally for domains like ZZ[x, y].

  3. SymPy should use python-flint’s sparse polynomials for expensive operations involving multivariate polynomials like factor, gcd etc.

  4. SymPy should use python-flint’s sparse polynomials internally for Poly. This would be easy if SymPy already used its own sparse representation (i.e. SMP or SPoly).

All sorts of things would be made much, much faster by making these changes. In many cases the speed differences would be beyond the sort of thing that it makes sense to express as e.g. 10x faster because really a calculation that is currently so slow as to be effectively impossible would become achievable in a reasonable time.

There are reasons beyond just speed for most of these changes as well. It is also worth noting that any work on python-flint benefits the wider Python ecosystem as well and that leveraging FLINT means sharing efforts across language ecosystems like Python and Julia as well as other computer algebra systems.

Some of the changes I have described above are not really that difficult to do and could be done quite soon. Others are bigger more long-term projects. For me the top ticket is adding FLINT’s sparse polynomials to python-flint and then having SymPy use them for the domains which I don’t think would be particularly difficult.