A Direct Logistic Fit of the 1950-1990 Data
We have been using calculus ideas to study the USA population census data. The results have led us to the conclusion that a logistic curve provides the best fit of the data for the period 1950-1990. This is an important conclusion because it suggests that of all the possible functions to choose from, we should try to fit a logistic function directly to the data. What does this mean? It means that starting with the data points and the function
, where
.
our problem is to determine the values of
that minimize the distances between the given census data points and the corresponding values on the logistic curve. How do we do this? If instead of the logistic function the function were linear, we could use Maple's built-in least squares line-fitting routine. We already have used this tool at several junctures in our study. But Maple has no built-in non-linear curve fitting routines. So, it looks like we are stuck. However, the Maple Users Group (MUG) has come to the rescue just in time. Members of the MUG have developed a set of non-linear curve fitting procedures that we will try out here. You should comment at the end on how well you think these numerical methods work.
MUG nonlinear curve fit procedure: activate the cells inside
After activating the above cells, we first read in our data points:
>
dataCensus:=readdata(USAPopCensus17901990,[integer,float]):
ndc:=nops(dataCensus):
data3:=[seq(dataCensus[i],i=17..21)];
Next, we define the function for which we want to determine the constants L, K, c in the expression for u:
>
with(math3):
clear (L,K,c);
u:=L/(1+exp(K*c)*exp(-K*t));
Following the outline provided by the MUG, we make a list of t-values:
> Lt:=[seq(data3[i][1],i=1..5)];
Then we make a list of the corresponding y-values:
> Ly:=[seq(data3[i][2],i=1..5)];
Now, we have to make a guess as to the approximate values of L, K, c. From our previous work this is not a problem and we take
. We then follow the instructions from the MUG quote: "The default number of iterations is 5. We [might] try using 4 iterations (
iter = 4
). The last argument (optional) must be a name. When present the iteration history will be put in that variable. The first 5 arguments of the procedure must be present and in the proper order. The last two arguments are optional. If both are present the number of iterations must be given first. In that case the same name may be used over and over again without apostrophes."
>
with(math3):
clear(L,K,c,sq);
> NLF:=nonlinfit(Lt,Ly,y=u,t,{L=350000,K=.029,c=1961},iter=5,sq);
> sq;
Now that the fit is complete, here is an example of how to get hold of the expressions we want to use:
>
sq[5];
sq[5][1];
rhs(sq[5][1]);
g:=rhs(NLF);
g:=unapply(g,t);
We grab L, K, c from the output:
>
with(math3):
clear(L,K,c);
L:=rhs(sq[4][1]);
c:=rhs(sq[4][2]);
K:=rhs(sq[4][3]);
Then we plot the fitted logistic function and the data points:
>
f:=x->L/(1+exp(-K*(x-c)));
p1:=plot(f(x),x=1950..2010):
p2:=plot(data3,style=point):
with(plots):
display([p1,p2]);
And for fun, we project the curve further into the future in a separate graph:
>
p1:=plot(f(x),x=1950..2100):
p2:=plot(data3,style=point):
with(plots):
display([p1,p2]);
Finally, as we have done in the past we compute the total error for comparison purposes:
>
ferror:=[seq([data3[i][1],abs(f(1950+10*(i-1))-data3[i][2]),abs(f(1950+10*(i-1))-data3[i][2])/data3[i][2]*100],i=1..5)]:
toterrorf:=add((f(1950+10*(i-1))-data3[i][2])^2,i=1..5):
with(math3):
headers:=["year","absolute error","% error"]:
printtable(ferror,"NLF-Direct Logistic Fit", headers);
TotalSqrError:=toterrorf;
NLF-Direct Logistic Fit
year absolute error % error
--------------------------------------------------
1950 727.241600 .479445
1960 1419.593400 .785734
1970 241.342700 .117798
1980 1041.434000 .457327
1990 577.996500 .231269