Question :
Running the code of linear binary pattern for Adrian. This program runs but gives the following warning:
C:Python27libsitepackagessklearnsvmbase.py:922: ConvergenceWarning: Liblinear failed to converge, increase the number of iterations.
"the number of iterations.", ConvergenceWarning
I am running python2.7 with opencv3.7, what should I do?
Answer #1:
Normally when an optimization algorithm does not converge, it is usually because the problem is not wellconditioned, perhaps due to a poor scaling of the decision variables. There are a few things you can try.
 Normalize your training data so that the problem hopefully becomes more well
conditioned, which in turn can speed up convergence. One
possibility is to scale your data to 0 mean, unit standard deviation using
ScikitLearn’s
StandardScaler
for an example. Note that you have to apply the StandardScaler fitted on the training data to the test data.  Related to 1), make sure the other arguments such as regularization
weight,C
, is set appropriately.  Set
max_iter
to a larger value. The default is 1000.  Set
dual = True
if number of features > number of examples and vice versa. This solves the SVM optimization problem using the dual formulation. Thanks @Nino van Hooff for pointing this out, and @JamesKo for spotting my mistake.  Use a different solver, for e.g., the LBFGS solver if you are using Logistic Regression. See @5ervant‘s answer.
Note: One should not ignore this warning.
This warning came about because

Solving the linear SVM is just solving a quadratic optimization problem. The solver is typically an iterative algorithm that keeps a running estimate of the solution (i.e., the weight and bias for the SVM).
It stops running when the solution corresponds to an objective value that is optimal for this convex optimization problem, or when it hits the maximum number of iterations set. 
If the algorithm does not converge, then the current estimate of the SVM’s parameters are not guaranteed to be any good, hence the predictions can also be complete garbage.
Edit
In addition, consider the comment by @Nino van Hooff and @5ervant to use the dual formulation of the SVM. This is especially important if the number of features you have, D, is more than the number of training examples N. This is what the dual formulation of the SVM is particular designed for and helps with the conditioning of the optimization problem. Credit to @5ervant for noticing and pointing this out.
Furthermore, @5ervant also pointed out the possibility of changing the solver, in particular the use of the LBFGS solver. Credit to him (i.e., upvote his answer, not mine).
I would like to provide a quick rough explanation for those who are interested (I am :)) why this matters in this case. Secondorder methods, and in particular approximate secondorder method like the LBFGS solver, will help with illconditioned problems because it is approximating the Hessian at each iteration and using it to scale the gradient direction. This allows it to get better convergence rate but possibly at a higher compute cost per iteration. That is, it takes fewer iterations to finish but each iteration will be slower than a typical firstorder method like gradientdescent or its variants.
For e.g., a typical firstorder method might update the solution at each iteration like
x(k + 1) = x(k) – alpha(k) * gradient(f(x(k)))
where alpha(k), the step size at iteration k, depends on the particular choice of algorithm or learning rate schedule.
A second order method, for e.g., Newton, will have an update equation
x(k + 1) = x(k) – alpha(k) * Hessian(x(k))^(1) * gradient(f(x(k)))
That is, it uses the information of the local curvature encoded in the Hessian to scale the gradient accordingly. If the problem is illconditioned, the gradient will be pointing in less than ideal directions and the inverse Hessian scaling will help correct this.
In particular, LBFGS mentioned in @5ervant‘s answer is a way to approximate the inverse of the Hessian as computing it can be an expensive operation.
However, secondorder methods might converge much faster (i.e., requires fewer iterations) than firstorder methods like the usual gradientdescent based solvers, which as you guys know by now sometimes fail to even converge. This can compensate for the time spent at each iteration.
In summary, if you have a wellconditioned problem, or if you can make it wellconditioned through other means such as using regularization and/or feature scaling and/or making sure you have more examples than features, you probably don’t have to use a secondorder method. But these days with many models optimizing nonconvex problems (e.g., those in DL models), second order methods such as LBFGS methods plays a different role there and there are evidence to suggest they can sometimes find better solutions compared to firstorder methods. But that is another story.
Answer #2:
I reached the point that I set, up to max_iter=1200000
on my LinearSVC
classifier, but still the “ConvergenceWarning” was still present. I fix the issue by just setting dual=False
and leaving max_iter
to its default.
With LogisticRegression(solver='lbfgs')
classifier, you should increase max_iter
. Mine have reached max_iter=7600
before the “ConvergenceWarning” disappears when training with large dataset’s features.
Answer #3:
Explicitly specifying the max_iter
resolves the warning as the default max_iter
is 100. [For Logistic Regression].
logreg = LogisticRegression(max_iter=1000)
Answer #4:
Please incre max_iter to 10000 as default value is 1000. Possibly, increasing no. of iterations will help algorithm to converge. For me it converged and solver was ‘lbfgs’
log_reg = LogisticRegression(solver='lbfgs',class_weight='balanced', max_iter=10000)