The first fast-in-practice algorithm for factoring polynomials in Q[x] was given by Zassenhaus in 1969. Then in 1982 LLL (Lenstra, Lenstra, Lovasz) gave the first polynomial time algorithm. However, on most inputs, Zassenhaus was faster and thus remained the default algorithm in all computer algebra systems. This meant that the algorithms for "best practical performance" vs "best theoretical complexity" were different.
In my paper Factoring polynomials and the knapsack problem (J. of Number Theory 2002, implementation) I gave an algorithm to factor polynomials in Q[x]. It had the best performance in practice so it quickly became the default algorithm in Maple, Magma, NTL, and PARI/GP. However, there was no complexity bound, further increasing the gap between "best practical performance" vs "best theoretical complexity".
Why was it "faster in practice" even though it had no complexity bound? Say f has degree O(1000) and coefficients with O(1000) digits. Then in the algorithms from LLL and Schonhage, you apply lattice reduction to find a vector with O(1000) entries, each with O(1000) digits. In my algorithm, however, you search for a vector with r entries in {0,1} where r is the number of p-adic factors (the number r is almost always much smaller than the degree, though you can construct worst-case inputs where r = degree/2). Since the vectors we look for are so much smaller (fewer entries, and the entries are very small) it is reasonable to expect that the algorithm will run much faster. However, complexity bounds for lattice reduction were given in terms of the size of the input, and not in terms the size of the vectors we're looking for. So a good complexity bound (i.e. better than BHKS'2004 which gave the same complexity as before) can only be obtained if we can bound the complexity of lattice reduction in terms of the size of the output.
The key idea is that the vectors we're not looking for could potentially be large, so to get a good complexity bound, we gradually increase vector sizes and discard vectors before they become large (gradual sub-lattice reduction and LLL-with-removals). Since the vectors we're looking for have entries of size O(1), and we have O(r) vectors, we can organize the LLL-with-removals in such a way that we never work with entries of size larger than O(r). Our strategy has other applications as well.
A good complexity analysis was done in the PhD thesis of Andy Novocin and HN'2010. The algorithm needed some modifications in order to make its complexity boundable. We showed in ISSAC'2011 that these modifications can be done in a way that does not harm the practical performance, in fact, the practical performance was further improved because of the "early termination" strategy. This meant that for the first time since LLL 1982, the algorithm for "best practical performance" was the same as that for "best theoretical complexity".
The "early termination" strategy is about Hensel lifting. Say we Hensel lift the p-adic factors mod p^a. In Zassenhaus, p^a is high enough to ensure that if f is reducible, then the factors of f can be recovered from their images mod p^a. That is fine if f has two large factors, however, if f has only 1 large factor (f is irreducible, or the product of one large factor and some small factors) then p^a is usually higher than necessary. We only need p^a to be high enough so that (i) the combinatorial problem can be solved (if f is irreducible, then "solving the combinatorial problem" = "proving irreducibility") and (ii) all factors except the biggest one can be recovered from their images mod p^a. In practice it is common for a polynomial f to have 1 large factor, and zero or more small factors. For those inputs, i.e. for most inputs, we can factor f with less Hensel lifting than the default in Zassenhaus algorithm. This "less Hensel lifting" means that we can factor faster than Zassenhaus on most inputs. If you write a new factorization implementation, the "early termination" strategy (i.e. start out with less Hensel lifting than in Zassenhaus) should probably be part of the implementation, because it speeds up the computation for most inputs, and, as shown in ISSAC'2011, it does not hurt the CPU time on "hard" inputs.
A more detailed complexity analysis is given in my ISSAC'2013 tutorial slides (to check all steps you'll also need HN'2010 and ISSAC'2011). On slide 3 you see that the complexity still has degree 6. However, the old degree-6 term was N^4(N^2+h^2) whereas the new degree-6 term is r^6 which is usually much smaller (r could be as high as N/2 but that only occurs for very rare inputs called Swinnerton-Dyer polynomials, otherwise, r is typically much smaller than N).