Home >
Software & Computers >
Software >
Statistics >

How do I read data from an Excel spreadsheet into Stata?

How do I save data that I am using to a Stata file?

How do I download a Stata .dta file from blackboard? (When I click on the file I see nonsense characters on the screen.)

How do I read an ascii file (usually a .txt or .csv file) into Stata?

Can Stata read XML files? If I have an Excel spreadsheet, can I convert it to XML and then have Stata read this?

How do I set my working directory?

How do I calculate means, variances, and standard deviations?

How do I delete observations from a data set?

How do I have Stata report normal tail areas and inverse normal tail areas?

How do I use Stata to calculate tail areas and critical values for the t distribution.

How do I calculate the correlation between two variables?

How do I generate a list of random numbers from a uniform distribution?

How do I calculate confidence intervals?

How do I calculate fitted values and residuals from a regression?

How do I divide a variable by a constant (or multiply it or add to it)?

I want to calculate a chi-squared statistic for some tabulated data. I have the table, but do not have the data that generated it. How can I have Stata calculate the statistics that I need?

How do I make a bar plot?

How do I copy, print or save a graph?

How do I make a boxplot?

How do I make a scatter plot?

How do I add a title to a scatter plot?

How do I overlay a regression line on a scatter plot?

How do I make a scatter plot such that there are different symbols for different points?

How do I overlay a normal density on a histogram?

How do I change the size of the labels in a bar plot?

How do I change the scale of a y-axis?

How do I add a title to a plot?

How do I tell Stata to plot a function?

How do I make a histogram where bin width is N?

How do I change the legend labels for a scatter plot has, for example, one symbol for females and one for males?

How do I have Stata combine two regression lines on the same plot?

How do I make a histogram for a distribution if I already have aggregated counts?

Why don't my graphs show up properly in Mac:Word?

Why can't I open my .dta file?

**Question: How do I read data from an Excel spreadsheet into Stata?**

Answer: Start by opening your spreadsheet in Excel. Use your mouse to highlight the columns in your spreadsheet that you want to copy to Stata. Make sure that you highlight complete columns, i.e., do not ignore a header line if one exists (a header line contains variable names). Now, select Copy from the Edit menu in Excel. At this point you can quit Excel but doing so is not required. Start Stata as you normally would. From the command line type "edit" and you should now see a blank spreadsheet. Select "Paste" from the Edit menu in Stata, and you should see your data. Close the edit window, and you are done.

**Question: How do I save data that I am using to a Stata file?**

Answer: Select "Save" or "Save As" from the Stata File menu. You will be prompted for a directory where you want to save your dataset. Pick a good place, select a good filename, and save it. Now, to load your dataset, you can double-click on it. This will start Stata and automatically load your data.

Answer: The problem is that your browser thinks that the .dta file is a text file. Right click on the file (or Control-Click if using a one-button mouse on a Mac) and save the file to your hard disk. Then make sure that the file has the correct .dta suffix; sometimes browsers will add a .txt suffix after a .dta suffix.

**Question: How do I read an ascii file (usually a .txt or .csv file) into Stata?**

Answer: Use the insheet command. Let's suppose that the file you want to read is called newdata.txt and is on your Desktop. Go to the File menu and choose Select Working Folder. Choose your Desktop as the folder. To see if your ascii file called newdata.txt is indeed in the folder, type "ls" to list the files in your working directory (which you set to the Desktop). If you do not see the file, then stop and either move the file to your Desktop or select a working folder that contains the file. Now, type "insheet using newdata.txt" to read the file into Stata. This assumes that the file has a header line.

**Question: How do I copy, print or save a graph?**

Answer: After you have a plot in a Stata graph window, right click on it (or Control-Click if you are using a one-button mouse on a Mac). This will give you a menu that will allow you to print the graph. Or, you can save it in a variety of formats. Or, you can copy your plot to the clipboard. If you choose the copy option, then open a word processor like Microsoft Word and select Paste. This will put a copy of the graph in a text document.

**Note:** Graphs copied and pasted into MS-Word for Windows do not show up properly when the Word document is opened on a Mac. To work around this problem, save the graph from Stata as a .tif file first and then insert that file into your Word document using Insert->Picture->From File. The graph will then appear correctly whether the Word document is opened on a Mac or a Windows machine.

Answer: Use the "tabi" command. For example, the command

tabi 9 7 \ 13 7\ 29 26\ 50 47 \ 41 49, chi2

will make a 5 x 2 table using the provided counts and will calculate the chi-squared statistic.

**Question: How do I make a bar plot?**

Answer: Suppose that you have a variable called height and a variable called gender. If you type "graph bar height" you will get a barplot of height for all observations in your dataset. You can also type "graph bar height, over(gender)" to get side-by-side bar plots of height for men and women.

**Question: How do I set my working directory?**

Answer: Your working directory is the directory or folder in which Stata looks when you give it a disk access command.

To see what your current working directory is, type "pwd"

To see the files and folders in your working directory, type "ls"

To move up one level in your directory tree, type "cd .."

For instance, if you are in /Home/Users/johndoe/Stata and you type "cd .." then you will be in /Home/Users/johndoe.

If your current working directory has a folder called "myfolder" in it and if you want to change your working directory to "myfolder," type "cd myfolder." This, for example, could move you from "/Home/Users/johndoe/Stata" to /Home/Users/johndoe/Stata/myfolder."

Important note! If your folder name has a space in it, you have to enclose the name in quotes. For example: cd "My Folder"

You can use multiple "cd .." and "cd NAMEHERE" commands to move anywhere you want in your hard disk (NAMEHERE refers to a folder into which you want to move; remember to enclose NAMEHERE in quotes if necessary).

**Question: How do I make a boxplot?**

Answer: See the FAQ item for bar plots. The syntax is very similar, i.e., graph box VAR1, over(VAR2) with VAR1 and VAR2 suitably defined. The "over(VAR2)" part can be dropped in which case a boxplot of VAR1 for all observations will be produced.

**Question: How do I calculate means, variances, and standard deviations?**

Answer: Use the command "summarize." You can simply type "summarize," in which case you will get means, standard deviations, and so forth for all variables in memory. Or, you can type

summarize VARNAME

which will give you a summary of the variable VARNAME. Also, add the option detail, as in "summarize, detail" or

summarize VARNAME, detail

to get various percentiles.

**Question: How do I delete observations from a data set?**

Answer: Use the "drop" command. Suppose that a data set has 10 observations. If you type "drop in 5" then the 5th observation will be deleted. Similarly, you can type "drop in 1/3" to drop the first three observations. Another way to drop delete observations is to use an if" clause. For example, "drop if VARNAME<4" will drop all observations that have VARNAME<4. One could have more complication expressions like "drop if VARNAME1<5 & VARNAME2>5" and so forth.

**Question: How do I have Stata report normal tail areas and inverse normal tail areas?**

Answer: To compute the left tail area for a given z value, use the following command:

display normal(z)

where z is the value of interest.

To compute the inverse tail area for an area equal to p, use the following command:

display invnormal(p)

The use of y is generic, and any acceptable label will work.

**Question: How do I make a scatter plot?**

Answer: Use the command "scatter," as in:

scatter YVAR XVAR

which will make a scatter plot with YVAR on the y-axis and XVAR on the x-axis.

**Question: How do I overlay a regression line on a scatter plot?**

Answer: Use the following command:

twoway (scatter YVAR XVAR) (lfit YVAR XVAR)

Note that YVAR and XVAR must be in this specified order.

**Question: How do I make a scatter plot such that there are different symbols for different points?**

Answer: The solution is to use the graphing option "msymbol" ("m" stands for marker) in conjunction with two or more || clauses. For example, the following command will make a scatter plot of the two variables height and years with squares for men and circles for women:

scatter height years if gender=="m", msymbol(square) || scatter

height years if gender=="f", msymbol(circle)

One can string together any number of clauses depending on the number of categories desired. To see the types of symbols use the command "palette symbolpalette." Note that Stata abbreviates things like "S" for square and so forth, i.e., "msymbol(S)" is the same as "msymbol(square)."

**Question: How do I calculate the correlation between two variables?**

Answer: Use the command "correlate" as in

correlate VARNAME1 VARNAME2

which will produce a 2 x 2 correlation matrix. One can also type

correlate VARNAME1 VARNAME2 ... VARNAMEk

which will produce a k X k correlation matrix. Typing "correlate" without any arguments produces a correlation matrix for all variables.

**Question: How do I overlay a normal density on a histogram?**

Answer: Use the option "normal" when making the histogram. For example:

histogram VARNAME, normal

will add a normal density to a histogram of VARNAME. The estimated mean and variance of the density are based on sample moments.

**Question: How do I generate a list of random numbers from a uniform distribution?**

Answer: The command "generate x = uniform()" will draw random values from the unit interval. These values can be scaled to arbitrary intervals, i.e., "generate x1 = x*10" after assigning x as before will generate x1 from a uniform distribution on the interval from zero through ten.

**Question: How do I calculate confidence intervals? **

Answer: Use the command "ci." For example, to make a 98% confidence interval for a continuous variable called VARNAME, enter "ci VARNAME, level(98)." Note the use of the level option to specify the level of the desired confidence interval. If your variable is binary as opposed to continuous, i.e., consists of zeroes and ones, then add the option "binomial" as in "ci VARNAME, binomial level(98)." There are various options for how to calculate binomial confidence intervals; these are described in online help.

**Question: How do I use Stata to calculate tail areas and critical values for the t distribution.**

Answer: Use the function ttail(n,t) where n is degrees of freedom and t is the critical value of interest. Also, use the function invttail(n,p) where p is a right tail area from a t distribution with n degrees of freedom.

For example,

display ttail(5,2)

will return the upper tail area (to the right of 2) of a t distribution with 5 degrees of freedom.

Similarly,

display invttail(5,.05)

will return the critical value from a t distribution with 5 degrees of freedom such that the area to the right of the value is 0.05.

**NOTE:** Stata is very picky about spaces. Make sure that function calls do not have spaces after arguments. In other words, "ttail (5,2)" will generate an error but "ttail(5,2)" will not.

**Question: How do I add a title to a scatter plot? **

Answer: Use the option "title" with scatter as in:

scatter VAR1 VAR2, title("TITLE GOES HERE")

**Question: How do I calculate fitted values and residuals from a regression? **

Answer: After a successful "regress" command, the command

predict r, residual

will create a new variable r that contains residual values. Any variable name may be used. Similarly,

predict yhat

will create a new variable yhat that contains fitted values. As before, any variable name may be used.

**Question: How do I divide a variable by a constant (or multiply it or add to it)?**

Answer: If the variable is called var1, then the command

replace var1 = var1 / k

changes var1 to var1 / k where k is a specified constant. You could also use

replace var1 = k * var1

or

replace var1 = k + var1

if you wanted to multiply var1, add to it, and so forth.

**Question: How do I change the size of the labels in a bar plot? **

Answer: You need to set the option labsize in your bar plot command. For example,

graph bar VARNAME1, over(VARNAME2,label(labsize(small)))

Here "small" refers to a given size. Other sizes are possible, i.e., medsmall, large, and so forth. To see the available sizes type:

graph query textsizestyle

**Question: Why don't my graphs show up properly in Mac:Word?**

Answer: Graphs copied and pasted into MS-Word for Windows do not show up properly when the Word document is opened on a Mac. To work around this problem, save the graph from Stata as a .tif file first and then insert that file into your Word document using Insert->Picture->From File. The graph will then appear correctly whether the Word document is opened on a Mac or a Windows machine.

**Question: Why can't I open my .dta file?**

Answer: If you see the error "no room to add more observations..." Stata is telling you the file is too large for the amount of memory allocated to the program. (The default allocation is 1meg.) You can allocate more memory to Stata by using the 'set memory' command as follows:

set mem 10m

This will allocate 10meg to Stata which should be sufficient for most data sets.

You can make the change in allocation permanent by using the 'perm' option. For example:

set mem 10m, perm

This will allocate 10meg to Stata every time the program is started.

**Question: How do I change the scale of a y-axis? **

Answer: To set the y axis scale, use the "yscale()" option at the end of your plot command. For example:

hist partners, by(sororityfrat) yscale(range(0 .4))

This produces a histogram with a y-axis scale of 0 - 0.4.

**Question: How do I add a title to a plot? **

Answer: Use the "title()" option at the end of your plot command. For example:

hist partners if partners < 30, title("Partners Under 30")

**Question: How do I tell Stata to plot a function? **

Answer: Use the "twoway function" command. Here is an example:

twoway function y=2 * x + 3, range(0 4)

This will plot the function f(x) = 2x+3 for x=0 to x=4. If you have a function that needs to be broken into pieces, use various pieces joined by ||. For example,

twoway function y=2 * x + 3, range(0 4) || function y=3*x-9, range(4 6)

There is nothing special about a two-part function.

Answer: Here are the steps for one, converting Excel to XML and two, reading XML into Stata:

In Excel:

- Open the xls file.
- Click file, save as.
- Under Save as Type, scroll down to the XML spreadsheet.
- Click Save.
In Stata:

- Click file, import, xml data.
- Browse for the file.
- For document type, select Excel Spreadsheet.
- Check the "First Row is Variable Names" box.
- Click ok.

**Question: How do I make a histogram where bin width is N?**

Answer: Use the width option. For example, histogram VARNAME, width(N).

Answer: You need to use a legend option to do this. For example, suppose that you want to make a scatter plot of y on x based on a third variable gender. You could use the following command:

scatter y x if gender=="f", msymbol(circle) || scatter y x if gender=="m", msymbol(square) legend(label(1 "Female") label(2 "Male"))

This is easily generalized to multiple groups, different colors, and so forth.

**Question: How do I have Stata combine two regression lines on the same plot?**

Answer: If you want to plot a regression line (y on x) for one set of points (gender == "F") and another for a different set of points (gender == "M"), you can use the following command:

graph twoway (scatter y x if gender=="F") (lfit y x if gender=="F") (scatter y x if gender=="M") (lfit y x if gender=="M")

This command is easily generalized for multiple groups and so forth.

**Question: How do I make a histogram for a distribution if I already have aggregated counts?**

Answer: For example, I want to make a histogram for the variable yvar. Instead of having individual observations on yvar I instead have the number of times yvar falls in various ranges. Suppose, then, that yvar falls into k ranges. In your data matrix you have a variable called yvarclass (which ranges from 1 to k) and a variable called ycount (which corresponds to counts in classes).

twoway bar ycount yvarclass

will make the appropriate barplot, which by construction here will be a histogram.