Thanks. Finding the right combo to balance the largest root if it were non-real and end up with real coefficients seems harder than balancing a smaller root, ie the polynomial with the larger real root also has an n-1 degree polynomial with real coefficients and smaller maximal root as a factor, whereas for the one with a non-real largest root this n-1 polynomial factor has to have non-real coefficients in a way that exactly balance the largest term to make all coefficients real again and now has less space in the complex plane in order to do so. I am not explaining it well as I am away from a keyboard and I am sure it is very complicated to prove but I am not a priori very surprised either way.
I think you're thinking about it in a slightly weird way, if I am understanding you correctly.
Picking random roots would be a very different thing. Here were only dealing with polynomials whose complex roots come in conjugate pairs (I.e. if a+ib is a root then so is a-ib) this gives you real coefficients "for free" since
(x - (a+ib))(x-(a-ib)) = x^2 -2ax + a^2 + b^2
Theres no coincidental balancing act in choosing the roots to obtain real coefficients. It happens because of the type of polynomials we've chosen.
Say that it is true that for degree N-1 polynomials the largest root is real with more than 0.5 probability. Then for degree N, either you multiply a degree N-1 with real coefficients by a linear (first order) polynomial with a real root that may also be larger than the one in the N-1,, or you multiply it with a first order polynomial of a complex root that has to be conjugate to an existing one in the (non-real coefficient) N-1 degree which can only be as large as the one already in the degree N-1; that particular degree N-1 polynomial has to have a distribution that is similar to those of the ones with real coefficients except for one additional unbalanced root. For a degree 1 polynomial with real coefficients the probability that the root is real is trivially 1, so my natural guess is that with a careful framing of a (double?) induction one could work out a proof that the probability remains larger than 0.5 for all N. (The second induction would be over polynomials that are products of those with real coefficients times a complex first order polynomial). I am guessing it is not that simple because somebody would have blurted it out, but my own (perhaps slightly weird) way to think about it doesn’t lead to any prior surprise.
If you're multiplying a polynomial P by a linear factor L = (x-a) with a being a nonreal complex number then at least one of these statements is true
1. P does not have real coefficients
2. PL does not have real coefficients
I think you try to avoid this in what you say by multiplying P by a linear term with a complex root whose conjugate is already in P, but if the conjugate is in P then P wasn't real to start with.
The term with the complex root has to "balance" with its conjugate for things to work out. E.g. (x-a)^n (x-a*)^k has real coefficients only when n=k or a is real.
That’s why you have to do a double induction (I’m not a mathematician so my terminology may be wrong) and also keep proving that those special polynomials that are a product of a complex linear term with a polynomial of real coefficients also have a higher (or equal) to 50% chance of having the largest root be real. For degree 2, I think one can prove this directly, and the rest is also an induction.
I don't think this extra claim about polynomials which are a product of a complex linear term and a polynomial with real coefficients is true.
Let a be distributed uniformly in (-1,1) and let b be distributed uniformly in the unit disk (i.e its a complex number of absolute value less than or equal to 1).
The roots of (x-a)(x-b) are (obviously) a and b. The average absolute value of b is 2/3 while the average absolute value of a is 1/2, you can also work out that the probability that |a|<|b| is 2/3, so with probability 2/3 the largest root will be complex.
This just gets worse if you let b be uniformly distributed in the unit square instead of disk, so you need some quite funky distribution on b to make this work which seems to be unjustified.
Thanks. The funky distribution of b has to be exactly the one that results in uniform (or the desired) random distribution of the real coefficients of the 3rd order real polynomial that is the resulting polynomial when you multiply your example polynomial by the linear term with conjugate b. And if this hold for the resulting 3rd order polynomial with real (random) coefficients then I think that the double induction works.
(Edit: the distribution of b has be to such that the distribution of the resulting 3rd order polynomials together with the distribution of 3rd order polynomials with 3 real roots have the desired random distribution of the coefficients.)
We know the desired random distribution of the 3rd order real polynomials (a product of uniform random distributions say), and I believe we can work out the distribution of the subset that have linear roots only and the fraction of random real polynomials that have real roots, so the distribution for the polynomials with roots of type a, b, and conj b, comes from the (weighted) difference of these two, and leads to the joint distribution of a, b (because conj b is fixed given b).
I didn’t want to claim that a full proof is not going to be messy in the details, I just think that this type of construct makes me feel not surprised that the largest root is more likely to be real.
It feels like this problem must have been solved previously, probably in some physics context (maybe in areas related to applications of random matrix theory or dynamical systems).
It is possible that this second recursion doesn’t hold at N=3 (because there is enough of the density covered by the fraction of real polynomials), but then only starts at higher degrees (it will eventually be easier because the N-2 degree will have a higher chance to have a root higher than the new complex b). So this makes is certainly more complicated to prove but still seems intuitively correct to me.
For large n, log(n) is tiny compared to n, so it's very surprising that over half the time the largest root is one of the tiny number of real roots