IntroEDEntry
From EpiDataWiki
A short overview of data entry with EpiData
- EpiData is a program for entering and documenting data.
Use EpiData when you have collected data on paper and you want to do statistical analyses or tabulation of data. Basic frequency tables and lists of data can be made, but other than that EpiData is focused on data entry and documentation of data. During data entry calculation of summary scales or restrictions to values can be defined. You can choose an item from a list and save the corresponding numerical code (1 = No 2= Yes), the text lists are exported as "value labels" for statistical programs. Dates are easily entered, e.g. 2301 will be formatted as 23/01/2003 if entered in year 2003 in a "dd/mm/yyyy" field.
EpiData is suitable for simple datasets where you have one source of data (e.g. one questionnaire or one laboratory registration form) as well as datasets with many or branching data forms; only the simple situation is described here. The principle is rooted in the simplicity of the dos program Epi Info v6. The idea is that you write simple text lines and the program converts this to a data entry form for which you can add further control of entry, conditional jumping to other fields or calculations.
- Downloading and Installing EpiData
Downloading and installing EpiData is free of charge.
Download it from the EpiData Asociation Web site and follow the instructions when you run the installation file. EpiData will not interfere with the setup of your computer. EpiData consists of one program file and help files. (In technical terms: EpiData comes as a few files and does not depend on, install or replace any DLL files in your system directory. Options are saved in an ini file).
- Limits
No limit on number of observations. (tested with >100.000). Search with index in 80.000 records < 1 sec on Pentium I 200Mhz). Specification of the data file structure must fit within 999 lines of text.
- How to work with EpiData
The EpiData screen has a “standard” windows layout with one menu line and two toolbars.
The "Work Process toolbar" guides you from "1. Define data" to “6. Export data” for analysis.
Define data by writing three types of information for each variable:
A.. Name of input field (variable, e.g. v1 or exposure).
B.. Text describing the variable. (e.g. sex or "day of birth")
C.. A field definition, e.g. ## for two digit numerical.
Other field types are boolean (yes-no), encrypted or soundex fields.
Setting options (file menu)
It is important that the user decides on one of two principles for naming variables. With “first word” the name of the input fields will be taken as first word on the definition line. With “automatic” a combination of the first 10 characters in the line. The example would give:
a. v1sex (10 first characters in sentence) “automatic” field name is v1sex
b. v1 (first word of sentence). “first word” field name is v1
Other options: e.g. colour of background and fields, line height etc.Users of e.g. Stata or SPSS should use the "first word" principle; field names will be variable names. Users of Stata should choose lowercase field names.
After writing the defintion you can preview your dataform or create the actual data file.
3. Add/Revise Checks - at Entry of Data
A strong part of EpiData is the possibility to specify rules and calculations during data entry.
• Restrict data entry to certain values and give text descriptions to the numerical codes entered.
• Specify sequence of data entry E.g. fill out certain questions for males only, (jumps)
• Apply calculations during data entry. E.g. age at visit based on date of visit and date of birth. But typically most calculations are done at the analysis stage.
• Help messages and other extended definitions of computations, if .. then ...endif structures.
(See examples installed with EpiData, or get further examples from www.epidata.dk/examples.php).
When you start the ”add/revise” part a new screen appears: At the top the variable name (v8) is shown. below that the label (Rigidfix)and the variable type (number). Following this the definition blocks:
Range, Legal defines which data can be entered.
Jumps specifies where to go to after entry (here on value 1 jump to field v10)
Must enter: If set to Yes a value must be given. (otherwise leaving the field blank is accepted)
Repeat: Repeat value from previous record, e.g. if data are from groups a value will repeat until next group. The value can still be changed.
Value label: For categorical data this defines the meaning of the values. E.g. 1=man 2=women. The values are edited via the ”+” . The drop-down list lets you pick an existing label definition.
Edit: Many other aspects can be defined here as “free hand” editing for this field. See the collection of commands in the help file.
Save: Save current definitions.
In the example both Range,Legal and Value labels are defined. In a typical data entry this would not be the case. Only one of them would be used.
Open the file and enter, add or search data.
The blue explanatory text to the right of the input fields is added by EpiData after entry of data based on labels in check file. An example of calculated fields is the Body mass index and age.
Files saved - Files extension
- A. Dataform definition file. E.g. first.qes
- B. Actual datafile containing the data. E.g. first.rec.
- C. A file with the defined checks. E.g. first.chk
- D. Supplementary files, e.g. first.not with notes taken during data entry or first.log with documentation.
After creating the datafile you can document file structure. An example (part of first.rec) is:
And after entering data you may list values for some or all records or you could view the data with or without labels in a spreadsheet like window (Try Ctrl+Alt+V) or in packed way.
A "codebook" can include raw frequency tables. (example not based on first.rec file)
6. Export for analysis and securing data.
Export the data for analyses or backup all files associated with a dataset to a selected user defined backup folder. When exporting data to Stata you should select Export's lowercase option.
As part of the tool menu you can also create zip file compatible archives with optional encryption. The encryption applies AES/Rijndael strong encryption approved by many data authorities.
Please note that there is no way to retrieve a password if you forget it.
- Tools and other
Epidata includes other aspects like comparing two files and listing differences at field level. Revising data file structure without losing already entered data. Hierarchical coding, relational data entry, check for logical consistency.
