|
Frequently Asked Questions
on Stata
Installation and
Introduction
Data
Input and Output
-
How do I read data from an Excel spreadsheet into Stata?
- How do I save data that I am using to a Stata
file?
- How do I download a Stata .dta file from blackboard?
(When I click on the file I see nonsense characters
on the screen.)
- How do I read an ascii file (usually a .txt or
.csv file) into Stata?
- Can Stata read XML files? If I have an Excel spreadsheet, can I convert it to XML and then have Stata read this?
Basic Stata Use
-
How do I set my working directory?
-
How do I calculate means, variances, and standard deviations?
-
How do I delete observations from a data set?
-
How do I have Stata report normal tail areas and inverse normal
tail areas?
-
How do I use Stata to calculate tail areas and critical values for
the t distribution.
-
How do I calculate the correlation between two variables?
-
How do I generate a list of random numbers from a uniform
distribution?
-
How do I calculate confidence intervals?
-
How do I calculate fitted values and residuals from a regression?
Graphing
- How do I make a
bar plot?
- How do I copy, print or save a graph?
- How do I make a boxplot?
- How do I make a scatter plot?
- How do I add a title to a scatter plot?
- How do I overlay a regression line on a scatter
plot?
- How do I make a scatter plot such that there
are different
symbols for different points?
- How do I overlay a normal density on a histogram?
- How do I change the size of the labels in a bar plot?
- How do I change the scale of a y-axis?
- How do I add a title to a plot?
- How do I tell Stata to plot a function?
- How do I make a histogram where bin width is N?
- How do I change the legend labels for a scatter plot has, for example, one symbol for females and one for males?
- How do I have Stata combine two regression lines on the same plot?
Potential Problems
- Why don't my graphs show up properly in Mac:Word?
- Why can't I open my .dta file?
Question:
How do I read data from an Excel spreadsheet into Stata?
Answer: Start by opening your spreadsheet in Excel.
Use your mouse to highlight the columns in your spreadsheet that
you want to copy to Stata. Make sure that you highlight complete
columns, i.e., do not ignore a header line if one exists (a header
line contains variable names). Now, select Copy from the Edit menu
in Excel. At this point you can quit Excel but doing so is not required.
Start Stata as you normally would. From the command line type "edit"
and you should now see a blank spreadsheet. Select "Paste"
from the Edit menu in Stata, and you should see your data. Close
the edit window, and you are done.
Return
to top of page.
Question:
How do I save data that I am using to a Stata file?
Answer: select "Save"
or "Save As" from the Stata File menu. You will be prompted
for a directory where you want to save your dataset. Pick a good
place, select a good filename, and save it. Now, to load your dataset,
you can double-click on it. This will start Stata and automatically
load your data.
Return
to top of page.
Question:
How do I download a Stata .dta file from blackboard? (When I click
on the file I see nonsense characters on the screen.)
Answer: The problem is that your
browser thinks that the .dta file is a text file. Right click on
the file (or Control-Click if using a one-button mouse on a Mac)
and save the file to your hard disk. Then make sure that the file
has the correct .dta suffix; sometimes browsers will add a .txt
suffix after a .dta suffix.
Return
to top of page.
Question:
How do I read an ascii file (usually a .txt or .csv file) into Stata?
Answer: Use the insheet command.
Let's suppose that the file you want to read is called newdata.txt
and is on your Desktop. Go to the File menu and choose Select Working
Folder. Choose your Desktop as the folder. To see if your ascii
file called newdata.txt is indeed in the folder, type "ls"
to list the files in your working directory (which you set to the
Desktop). If you do not see the file, then stop and either move
the file to your Desktop or select a working folder that contains
the file. Now, type "insheet using newdata.txt" to read
the file into Stata. This assumes that the file has a header line.
Return
to top of page.
Question:
How do I copy, print or save a graph?
Answer:
After you have a plot in a Stata graph window, right click on it
(or Control-Click if you are using a one-button mouse on a Mac).
This will give you a menu that will allow you to print the graph.
Or, you can save it in a variety of formats. Or, you can copy your
plot to the clipboard. If you choose the copy option, then open
a word processor like Microsoft Word and select Paste. This will
put a copy of the graph in a text document.
Note: Graphs copied and pasted into MS-Word for Windows do not show up properly when
the Word document is opened on a Mac. To work around this problem, save the graph from Stata as a .tif
file first and then insert that file into your Word document using Insert->Picture->From File. The
graph will then appear correctly whether the Word document is opened on a Mac or a Windows machine.
Return
to top of page.
Question:
How do I make a bar plot?
Answer:
Suppose that you have a variable called height and a variable called
gender. If you type "graph bar height" you will get a
barplot of height for all observations in your dataset. You can
also type "graph bar height, over(gender)" to get side-by-side
bar plots of height for men and women.
Return
to top of page.
Question:
How do I set my working directory?
Answer:
Your working directory is the directory or folder in which Stata
looks when you give it a disk access command.
To
see what your current working directory is, type "pwd"
To see the files and folders in your working directory, type "ls"
To move up one level in your directory tree, type "cd .."
For
instance, if you are in /Home/Users/johndoe/Stata and you type
"cd .." then you will be in /Home/Users/johndoe.
If your current working directory has a folder called "myfolder"
in it and if you want to change your working directory to "myfolder,"
type "cd myfolder." This, for example, could move you
from "/Home/Users/johndoe/Stata" to /Home/Users/johndoe/Stata/myfolder."
Important
note! If your folder name has a space in it, you have to enclose
the name in quotes. For example: cd "My Folder"
You can use multiple "cd .." and "cd NAMEHERE"
commands to move anywhere you want in your hard disk (NAMEHERE
refers to a folder into which you want to move; remember to enclose
NAMEHERE in quotes if necessary).
Return
to top of page.
Question:
How do I make a boxplot?
Answer:
See the FAQ item for bar plots. The syntax is
very similar, i.e., graph box VAR1, over(VAR2) with VAR1 and VAR2
suitably defined. The "over(VAR2)" part can be dropped
in which case a boxplot of VAR1 for all observations will be produced.
Return
to top of page.
Question:
How do I calculate means, variances, and standard deviations?
Answer:
Use the command "summarize." You can simply type "summarize,"
in which case you will get means, standard deviations, and so forth
for all variables in memory. Or, you can type
summarize VARNAME
which
will give you a summary of the variable VARNAME. Also, add the option
detail, as in "summarize, detail" or
summarize VARNAME,
detail
to
get various percentiles.
Return
to top of page.
Question:
How do I delete observations from a data set?
Answer: Use the "drop" command. Suppose that a data set
has 10 observations. If you type "drop in 5" then the
5th observation will be deleted. Similarly, you can type "drop
in 1/3" to drop the first three observations. Another way to
drop delete observations is to use an if" clause. For example,
"drop if VARNAME<4" will drop all observations that
have VARNAME<4. One could have more complication expressions
like "drop if VARNAME1<5 & VARNAME2>5" and so
forth.
Return
to top of page.
Question:
How do I have Stata report normal tail areas and inverse normal
tail areas?
Answer:
To compute the left tail area for a given z value, use the following
two commands:
scalar y = norm(z)
scalar list y
where
z is the value of interest.
To
compute the inverse tail area for an area equal to p, use the following
two commands:
scalar y = invnorm(p)
scalar list y
The
use of y is generic, and any acceptable label will work.
Return
to top of page.
Question:
How do I make a scatter plot?
Answer:
Use the command "scatter," as in:
scatter YVAR
XVAR
which
will make a scatter plot with YVAR on the y-axis and XVAR on the
x-axis.
Question: How do I overlay a regression line on a
scatter plot?
Answer:
Use the following command:
twoway (scatter
YVAR XVAR) (lfit YVAR XVAR)
Note
that YVAR and XVAR must be in this specified order.
Return
to top of page.
Question: How do I make a scatter plot such that there are different
symbols for different points?
Answer:
The solution is to use the graphing option "msymbol" ("m"
stands for marker) in conjunction with two or more || clauses. For
example, the following command will make a scatter plot of the two
variables height and years with squares for men and circles for
women:
scatter
height years if gender=="m", msymbol(square) || scatter
height years if gender=="f", msymbol(circle)
One can string together any number of clauses depending on the number
of categories desired. To see the types of symbols use the command
"palette symbolpalette." Note that Stata abbreviates things
like "S" for square and so forth, i.e., "msymbol(S)"
is the same as "msymbol(square)."
Return
to top of page.
Question: How do I calculate the correlation between two variables?
Answer:
Use the command "correlate" as in
correlate VARNAME1
VARNAME2
which
will produce a 2 x 2 correlation matrix. One can also type
correlate VARNAME1
VARNAME2 ... VARNAMEk
which
will produce a k X k correlation matrix. Typing "correlate"
without any arguments produces a correlation matrix for all variables.
Return
to top of page.
Question: How do I overlay a normal density on a histogram?
Answer:
Use the option "normal" when making the histogram. For example:
histogram VARNAME,
normal
will
add a normal density to a histogram of VARNAME. The estimated mean
and variance of the density are based on sample moments.
Return
to top of page.
Question: How do I generate a list of random numbers from a uniform
distribution?
Answer:
The command "generate x = uniform()" will draw random values from
the unit interval. These values can be scaled to arbitrary intervals,
i.e., "generate x1 = x*10" after assigning x as before will generate
x1 from a uniform distribution on the interval from zero through
ten.
Return
to top of page.
Question: How do I calculate confidence intervals?
Answer:
Use the command "ci." For example, to make a 98% confidence interval
for a continuous variable called VARNAME, enter "ci VARNAME, level(98)."
Note the use of the level option to specify the level of the desired
confidence interval. If your variable is binary as opposed to continuous,
i.e., consists of zeroes and ones, then add the option "binomial"
as in "ci VARNAME, binomial level(98)." There are various options
for how to calculate binomial confidence intervals; these are described
in online help.
Return
to top of page.
Question: How do I use Stata to calculate tail areas and critical
values for the t distribution.
Answer:
Use the function ttail(n,t) where n is degrees of freedom and t
is the critical value of interest. Also, use the function invttail(n,p)
where p is a right tail area from a t distribution with n degrees
of freedom.
For
example,
scalar y = ttail(5,2)
scalar list y
will return the upper tail area (to the right of 2) of a t distribution
with 5 degrees of freedom.
Similarly,
scalar y = invttail(5,.05)
scalar list y
will return the critical value from a t distribution with 5 degrees
of freedom such that the area to the right of the value is 0.05.
NOTE:
Stata is very picky about spaces. Make sure that function calls
do not have spaces after arguments. In other words, "ttail
(5,2)" will generate an error but "ttail(5,2)" will
not.
Return
to top of page.
Question: How do I add a title to a scatter plot?
Answer:
Use the option "title" with scatter as in:
scatter VAR1 VAR2, title("TITLE GOES HERE")
Return
to top of page.
Question: How do I calculate fitted values and residuals from a regression?
Answer:
After a successful "regress" command, the command
predict r, residual
will create a new variable r that contains residual values. Any variable name may be used.
Similarly,
predict yhat
will create a new variable yhat that contains fitted values. As before, any variable name
may be used.
Return
to top of page.
Question: How do I change the size of the labels in a bar plot?
Answer:
You need to set the option labsize in your bar plot command. For example,
graph bar VARNAME1, over(VARNAME2,label(labsize(small)))
Here "small" refers to a given size. Other sizes are possible, i.e.,
medsmall, large, and so forth. To see the available sizes type:
graph query textsizestyle
Return
to top of page.
Question: Why don't my graphs show up properly in Mac:Word?
Answer:
Graphs copied and pasted into MS-Word for Windows do not show up properly when
the Word document is opened on a Mac. To work around this problem, save the graph from Stata as a .tif
file first and then insert that file into your Word document using Insert->Picture->From File. The
graph will then appear correctly whether the Word document is opened on a Mac or a Windows machine.
Return
to top of page.
Question: Why can't I open my .dta file?
Answer:
If you see the error "no room to add more observations..." Stata is telling you the file is too large for the amount of memory
allocated to the program. (The default allocation is 1meg.) You can allocate more memory to Stata by using the 'set memory'
command as follows:
set mem 10m
This will allocate 10meg to Stata which should be sufficient for most data sets.
You can make the change in allocation permanent by using the 'perm' option. For example:
set mem 10m, perm
This will allocate 10meg to Stata every time the program is started.
Return
to top of page.
Question: How do I change the scale of a y-axis?
Answer:
To set the y axis scale, use the "yscale()" option at the end of your plot command. For example:
hist partners, by(sororityfrat) yscale(range(0 .4))
This produces a histogram with a y-axis scale of 0 - 0.4.
Return
to top of page.
Question: How do I add a title to a plot?
Answer:
Use the "title()" option at the end of your plot command. For example:
hist partners if partners < 30, title("Partners Under 30")
Return
to top of page.
Question: How do I tell Stata to plot a function?
Answer:
Use the "twoway function" command. Here is an example:
twoway function y=2 * x + 3, range(0 4)
This will plot the function f(x) = 2x+3 for x=0 to x=4.
If you have a function that needs to be broken into pieces, use various pieces joined by ||. For example,
twoway function y=2 * x + 3, range(0 4) || function y=3*x-9, range(4 6)
There is nothing special about a two-part function.
Return
to top of page.
Question: Can Stata read XML files? If I have an Excel spreadsheet, can I convert it to XML and then have Stata read this?
Answer:
Here are the steps for one, converting Excel to XML and two, reading XML into Stata:
In Excel:
- Open the xls file.
- Click file, save as.
- Under Save as Type, scroll down to the XML spreadsheet.
- Click Save.
In Stata:
- Click file, import, xml data.
- Browse for the file.
- For document type, select Excel Spreadsheet.
- Check the "First Row is Variable Names" box.
- Click ok.
Return
to top of page.
Question: How do I make a histogram where bin width is N?
Answer:
Use the width option. For example, histogram VARNAME, width(N).
Return
to top of page.
Question: How do I change the legend labels for a scatter plot has, for example, one symbol for females and one for males?
Answer:
You need to use a legend option to do this. For example, suppose that you want to make a scatter plot of y on x based on a third variable gender. You could use the following command:
scatter y x if gender=="f", msymbol(circle) || scatter y x if gender=="m", msymbol(square) legend(label(1 "Female") label(2 "Male"))
This is easily generalized to multiple groups, different colors, and so forth.
Return
to top of page.
Question: How do I have Stata combine two regression lines on the same plot?
Answer:
If you want to plot a regression line (y on x) for one set of points (gender == "F") and another for a different set of points (gender == "M"), you can use the following command:
graph twoway (scatter y x if gender=="F") (lfit y x if gender=="F") (scatter y x if gender=="M") (lfit y x if gender=="M")
This command is easily generalized for multiple groups and so forth.
Return
to top of page.
|