Requires Decipher Cloud
This tool is the command line version of the Edit Data feature found in the downloads menu.
The Tab Import functionality is a mechanism that allows reading in a tab-delimited file and appending, modifying or deleting records in a survey. The Tab Import tool has two main purposes:
- Cleaning existing data
- Uploading external data into v2
tabimport [-k key] [-c] [-a] [-N] [--nobackup] [--layout=###] [--yes] surveyPath file.txt
2: How it Works
The Tab Import tool will upload a tab-delimited file into an existing survey and append or modify the existing data. Tab Import will only recognize variables that are seen as output from the targeted survey. In other words, if you are able to see a variable in the downloaded tab-delimited data, you can modify it using Tab Import.
The "variables.xls" file can be used to match data layouts and you can create custom data layouts using the New Layout option found in the Downloads menu.
Tab Import updates data based on the variables provided in the header of the tab-delimited file. You can upload partial data sets that only include the variables that you need to update and you are not required to provide the entire data set.
Tab Import is often used to perform the following tasks:
- Import external data so clients can utilize reporting and production tools
- Transform data into alternative tables
- tab-delimited --> Triple-S, SPSS, fixed-width, CB, etc...
- Recode and clean data
- update markers
- delete/disqualify respondents
- update data
- remove duplicates based on IP address
A more advanced tool, transform, lets you apply Python code to your variables and qualify/delete respondents.
The examples below are simple and should help to get you familiar with the Tab Import process and end result.
3.1: Example 1
To learn the Tab Import tool, we'll begin with a simple scenario. Take a look at the following survey, specifically the question variables and
setMarker function call:
<?xml version="1.0" encoding="UTF-8"?> <survey name="Survey" alt="Tab Import Ex #1" extraVariables="source,list,url,record,ipAddress,userAgent" compat="118" state="testing" setup="time,quota,term" unique=""> <samplesources default="1"> <samplesource list="1" title="default"> <exit cond="qualified"><b>Thanks again for completing the survey!</b></exit> <exit cond="terminated"><b>Thank you for your selection!</b></exit> <exit cond="overquota"><b>Thank you for your selection!</b></exit> </samplesource> </samplesources> <radio label="Q1" title="Are you..."> <row label="r1">Male</row> <row label="r2">Female</row> </radio> <number label="Q2" optional="0" size="3" verify="range(1, 120)" title="How old are you?"></number> <suspend/> <exec cond="Q2.ival gt 17">setMarker('Age_17+')</exec> <html label="End" where="survey"> <p>Thank you!</p> </html> <marker name="qualified"/> </survey>
The survey above will generate the following tab-delimited data:
uuid markers status Q1 Q2 qtime vos t4s87amvy4b1wuvu Age_17+,qualified 3 2 76 30.245002138445 5 b0e6njaww638bgu7 Age_17+,qualified 3 1 91 32.446309568718 12 25a0yt89t5zeznwz Age_17+,qualified 3 1 29 28.911985701804 12 cgeua8a308bvxe28 Age_17+,qualified 3 2 66 35.590599491062 13 xzxe66dvqj54nytg qualified 3 1 5 28.386099702762 5 wt4xjkq8aj401gy6 Age_17+,qualified 3 1 72 29.058076265442 4 p9zp7u66197hfb3t qualified 3 1 5 30.897337177573 13 80fkbz5kzn1zdnxt Age_17+,qualified 3 1 27 29.251838559104 4 ... ...
We can use the Tab Import tool to alter any of the variables listed above (e.g. markers, Q1, Q2, etc...). However, it's unlikely that we would want to touch variables unique to the respondent like
You may have noticed an error in the survey code. We have a set a marker named "Age_17+" for respondents with an age value greater than 17.
<exec cond="Q2.ival gt 17">setMarker('Age_17+')</exec>
We should have named this marker "Age_18+" instead. This is where the Tab Import tool comes in. In addition to modifying the survey code to set the correct maker, we would need to alter the existing data for those respondents who have already received the incorrect marker.
One way to accomplish this task is to download the tab-delimited data, use a "Find & Replace" tool to change "Age_17" to "Age_18" and use the Tab Import script to update the existing data. Below is the modified tab-delimited file we will upload to our project:
uuid markers t4s87amvy4b1wuvu Age_18+,qualified b0e6njaww638bgu7 Age_18+,qualified 25a0yt89t5zeznwz Age_18+,qualified cgeua8a308bvxe28 Age_18+,qualified xzxe66dvqj54nytg qualified wt4xjkq8aj401gy6 Age_18+,qualified p9zp7u66197hfb3t qualified 80fkbz5kzn1zdnxt Age_18+,qualified ... ...
We will use the
uuid variable to update each record accordingly for each respondent. We can use any variable that will uniquely identify each respondent.
Since we're only modifying the markers string, it is the only variable we need to included in the file. Note that if the variables are exactly the same, then no modifications will be made, so you may include the entire data set without affecting the data. In our project's directory, we can run the following command to correct the data:
tabimport -kuuid . file.txt
The Tab Import tool provides helpful information that you should review before confirming. If the number of records update and unchanged looks correct, you may proceed. A backup of the data will be made in the data/ folder.
[user@ project1234]$ tabimport -kuuid . file.txt Assuming . refers to temp/project1234 Importable fields: uuid markers Skipped fields: vlist vos vosr14oe vbrowser vbrowserr14oe vmobiledevice vmobileos Rows read from input file: 100 not processed 0 unmatched deletions 0 Rows in results file: 100 unchanged: 17 rewritten: 83 deleted: 0 created: 0 Fields processed: 1500 updated: 83 identical: 1417 ignored: 0 (e.g. record, weight) erronous: 0 Continue import? (Y/N) Y Import completed. Backup in temp/project1234/data/old-results/001.results.* temp/project1234: markers in data file and in markers database were mismatched and were updated
That's it! We used the
-kuuid option to update respondents based on their
uuid variable, that way the data matched accordingly. If we download the data again, we'll see that the appropriate marker was modified and "Age_18+" can be seen instead of "Age_17+".
3.2: Example 2
Building off of the example above, we'll delete all respondents who do not have the "Age_18+" marker. To delete respondents, we need to supply an extra field unique to the Tab Import script called "action".
Using the custom reporting tool, you can generate a tab-delimited file containing only those respondents that do not have the "Age_18+" marker. It should look similar to this:
uuid markers status Q1 Q2 qtime vos xzxe66dvqj54nytg qualified 3 1 5 28.386099702762 5 wt4xjkq8aj401gy6 qualified 3 1 72 29.058076265442 4 p9zp7u66197hfb3t qualified 3 1 5 30.897337177573 13 ... ...
To delete these respondents, we'll add an extra column labelled "actions" and "DEL" as the value:
record uuid markers ... action 105 xzxe66dvqj54nytg qualified ... DEL 108 wt4xjkq8aj401gy6 qualified ... DEL 112 p9zp7u66197hfb3t qualified ... DEL ... ...
Similar to the previous example, if we run the Tab Import tool using this file, we'll see some helpful information as to what will happen and we can confirm or deny the process.
[user@ project1234]$ tabimport -kuuid . file.txt Assuming . refers to temp/project1234 Importable fields: record uuid date markers status Q1 Q2 qtime start_date source list url ipAddress userAgent dcua session action Skipped fields: vlist vos vosr14oe vbrowser vbrowserr14oe vmobiledevice vmobileos Rows read from input file: 17 not processed 0 unmatched deletions 17 Rows in results file: 100 unchanged: 83 rewritten: 0 deleted: 17 created: 0 Fields processed: 0 updated: 0 identical: 0 ignored: 0 (e.g. record, weight) erronous: 0 Continue import? (Y/N) Y Import completed. Backup in temp/project1234/data/old-results/002.results.* temp/project1234: markers in data file and in markers database were mismatched and were updated
The result shows us that 17 records were deleted and 83 records were unchanged.
4: Options & Modifiers
The Tab Import tool is equipped with a handful of options, as shown in the syntax below:
tabimport [-k key] [-c] [-a] [-N] [--nobackup] [--layout=###] [--yes] surveyPath file.txt
|-k key||The unique variable to reference each respondent by. (Default: uuid)
(e.g. tabimport -kID . file.txt )
|-c||Create a new record if no matching key is found. This is useful when uploading new tab-delimited data to the project. (e.g. tabimport -c . extra-data.txt ) If -c is not used, the record will be skipped.|
|-a||Allow importing into notdp variables. (i.e. variables hidden from data downloads). With -a you should see question labels that use question/row/col label. Otherwise the label is generated through altlabel/variables.xls labels.|
|-N||Suppresses email sent to project team regarding edited data. This is useful for tabimport scripts that run daily.|
|nobackup||Does not create a backup of data. (Be careful!)|
|--layout||Import data using variables modified in a Custom Data Layout
(See: Custom Data Layouts)
|--yes||Skips the confirmation prompt.|
|surveyPath||The path to the survey. (e.g. selfserve/9d3/proj1234 or . if in the working directory)|
|file.txt||The tab-delimited file containing the data to upload. Naming convention does not matter.|
|Note: Variables contained in this file are case sensitive and must match 100%. Extra fields not found will create warning, but are ignored.|
4.1: Delete Respondents
Demonstrated in Example 2, the Tab Import tool allows you to delete respondents by providing an additional field called "action" with "DEL" supplied as the value. Any record containing the "DEL" value will be removed from the data.
4.2: Disqualify Respondents
Respondents can be disqualified by modifying the "markers" field. Set this field's value to equal "DISQUALIFY:reason" and when tabimported, the respondent's markers will be adjusted appropriately so that they show up as a disqualified complete.
- The "qualified" marker will appear as "bad:qualified"
- All quota markers will be removed
- A new marker, "reason", will be set
- Note: This can be set to any text. (e.g. DISQUALIFY:speeder )
4.3: Custom Data Layouts
Using the survey builder? Learn more about the Data Layout Manager.
In the downloads menu of your project, you have the ability to create custom data layouts. This enables you to move variable positions and modify the labels.
When you download data with the labels modified, the system will not be able to detect the proper variables unless you use the
--layout=### modifier where
### should be replaced with the ID of the custom data layout (shown at the top of the layout).
For example, create a custom data layout using the New Layout link. Using this tool, you have the ability to modify the labeling for each variable.
(e.g. Q1r1 -> foo, vos -> bar )
If you download this data in the tab-delimited format to modify variables and re-upload, you will need to let the Tab Import tool know which layout ID to reference when adding the data back in. For example, a programmer can generate the tab-delimited data for a project with the layout using the following command:
generate --layout=617 . all tab > tabbed_data.txt
After adjustments have been made (e.g. updating markers, resetting values, etc...), you can import the data back in using
tabimport and the
tabimport --layout=617 -kuuid . tabbed_data.txt
This will recognize any altered variable names (e.g. Q1r1 -> foo ) and update the data values at Q1 accordingly.
4.4: Recover Original Data
To recover backup data, you will need to copy the following files:
- data/old-results/XXX.results.bin to data/results.bin
- data/old-results/XXX.results.text to data/results.strings
5: What's Next?
Learn about a similar data modification tool, transform.