Skip to main content

Decipher Support

All the topics, resources needed for Decipher.

FocusVision Knowledge Base

Creating and Managing an Auxiliary Database

Overview

The Auxiliary Database system is an alternative way to read and write data in the system. This allows us to do so without doing multiple read/write operations to an external file, which is quite resource intensive. The database system can be used to lock a survey down by a list of predefined valid ids, to store data for a respondent across multiple visits of the same survey (e.g. a diary study), or when assigning giftcodes for respondents. We can have any number of databases created for our project, as long as they have different names. Any database within a survey is not accessed by the system until we explicitly tell it to do so, via Python code in our project.

Creating the Database

When creating our database, we need to place the python code that does so within a exec block with the when="init" attribute added to it. This will make sure that the code that creates the database is only executed once, when the survey is first loaded.

Note: To find out more about the different values of the when attribute, click here.

There are different methods which we can use to create our database, which we will cover below:

Creating the Database From a File

To create the database from a file, we need to make sure that we have a tab-delimited text file uploaded into our system. The file should be uploaded as a System (Root) File.

Note: If you cannot upload your file as a text file, you can upload it as a .dat file. To do that, open your file in a plain text editor, and use the Save As option. You can then manually type out the file extension and the filename when saving.

Note: You can replace and modify your uploaded data file at anytime after creating our database. The system will automatically notice the changes in the file and update accordingly.

Once we have our file uploaded, we can create our database using the following exec block. This exec blocks assumes we have a file containing some respondent ids uploaded to our survey.:

<exec when="init">
db=Database("ids.dat")
</exec>

This will create a database within our project called db. This database will hold all of the information from our tab-delimited file. The database will automatically assume that the first column in our file is the one that holds the ids.

If we upload a file with multiple data columns to use as our database, we can specify which column in our file contains our ids like so:

<exec when="init">
db=Database("ids.dat", column=2)
</exec>

In the above example, we specify that the third column in our file will contain our ids (column indexing starts with 0 as the first column index). Additionally, if we have headers specified for our columns, we can tell the system to ignore headers when creating the database with the below example:

<exec when="init">
db=Database("ids.dat", column=2, skip=1)
</exec>

The skip parameter will tell the system to skip the first row of our file (the one containing the headers when creating the database). A similar functionality to the above example can be achieved by using the columnName parameter. If we have column headers in our file, we can use columnName="myColumn" to specify which column in our file contains our ids:

<exec when="init">
db=Database("ids.dat", columnName="myColumn")
</exec>

Note: Using the columnName parameter automatically sets skip=1 as well.

Creating an Empty Database

In some cases we would want to use a database to check for duplicate entries on questions across all respondents without having a pre-defined file. In those cases, we can create an empty database, and store each respondent’s answer so we can check for duplicate entries. To create an empty database, we can use the below syntax:

<exec when="init">
db=Database(name="passwords")
</exec>

In this case, instead of specifying a file to reference within our project, we are creating our database by giving it a name. This is needed to discern between multiple databases if we have more than one in our survey.

Checking for the Source ID

There are three types of checks we can execute on a database. We can check whether a respondent’s id is part of our list of allowed ids, we can check if an id is currently being used by another respondent, or if a specific id has already completed our survey. These three checks can be used separately to display different error messages to respondents, or in conjunction to create conditions within our survey.

Checking if a Value is in our File

If creating our database from a file, we can check whether specific ids that our respondents use are part of our file. We can use the db.valid()command to check if a respondent’s id is part of the database:

<exec>
isValid = db.valid(source)
</exec>

Note: The above example assumes that respondents come in with source as their id variable. To use a different variable as the id, you need to set-up a URL Variable in your Sample Sources.

The above code will return either True, if the value for source is present within our file, or False, if it isn’t.

Note: The id values are checked case insensitively. This means that "ABC"  is the same as "abc" when we check for existing values.

Checking if an ID Value is Currently in Use

To check if a respondent has already started our survey with a specific id, but not yet completed it, we can use the db.inuse() command. We can use the command as shown below:

<exec>
isInUse= db.inuse(source)
</exec>

Using this command can prevent multiple respondents from starting the survey at the same time with the same id. If the id in question is not in use, the db.inuse command will also assign that id to the current respondent’s survey session. If a respondent drops out of the survey, the id that they are using will be stay assigned to them for 15 minutes of inactivity. If they do not resume the survey in that time period, the id is free to be assigned to a different respondent. To resume the survey with the same id, a respondent needs to use the same browser as when they started the survey.

Checking if an ID has Completed the Survey

Using the db.has(source) command, we can check if an id has completed our survey:

<exec>
completed= db.has(source)
</exec>

Note: This requires that each completed record is added to the completion database by usingdb.add().

Adding to the Database

Adding the Completion ID

In order to mark an id as having completed the survey, we must add it back to our database using the db.add() command:

<exec when="finished">
db.add(source)
</exec>

The when="finished" attribute added to the exec block ensures that the source id will be added to the completed ids only when a respondent finishes the survey, e.g. qualifies, is terminated, or is overquota. To allow partial ids to be added to the database, remove the when="finished" attribute.

Appending Custom Values to the Database

In addition to storing and checking ids of respondents, the Auxiliary Database System can store custom values for each id. This can be done by adding a value to the db.add() command:

<exec when="finished">
db.add(source,value)
</exec>

Where value can be a question value, a timestamp, or any number/text. Below is an example of adding a single select question’s value to the database:

<radio
  label="Q1">
  <title>Which brand is your favorite?</title>
  <comment>Please select one</comment>
  <row label="r1">Brand A</row>
  <row label="r2">Brand B</row>
  <row label="r3">Brand C</row>
</radio>

<suspend/>
 
<exec when="finished">
db.add(ID,Q1.val)
</exec>

The above example will produce the following record within our Database, if we test with an ID of 123:

uuid                    date                    key     value
48w879cp5a4s577v        11/04/2016 16:49        123     0

Overriding Existing Records in our Database

In some cases, we want to allow respondents to take our survey multiple times, but only store their latest responses, or update already stored information in our database, either from a later point in our current study, or from a different study altogether. In these cases, we can use the replace=True syntax when adding the updated record back into our database, like so:

db.add(ID,Q1.val,replace=True)

The above syntax will replace any existing records in our database with the same ID value, with the latest one we are looking to add. This will avoid creating multiple database entries for the same respondent, which can create conflicts when looking those entries up.

Appending Multiple Values to the Database

The database system allows only one value to be saved for each record. If we need to append multiple values to our database, we need to store all of the values we want in a python dictionary or another comparable data structure. Let’s take the following two questions as an example:

<radio
  label="Q1">
  <title>Which brand is your favorite?</title>
  <comment>Please select one</comment>
  <row label="r1">Brand A</row>
  <row label="r2">Brand B</row>
  <row label="r3">Brand C</row>
</radio>

<suspend/>

<text
  label="Q2">
  <title>Why is this brand your favorite?</title>
  <comment>Please select one</comment>
</text>

In this case if we want to store the values of both questions within our database, we can create a dictionary within our exec blocks, and add both of our items in it, like so:

<exec when="finished">
values={} #initialize our empty dictionary
values["Q1"]=Q1.val #add Q1.val to our dictionary
values["Q2"]=Q2.val #add Q2.val to our dictionary
db.add(source,values)
</exec>

The above code will produce the following values in our database for a source value of 123:

uuid                    date                    key     value
hkyknpvkjskpskna        01/03/2017 13:51        123     {'Q1': 2, 'Q2': 'It is awesome !'}

What we can see here is that the value column now has a dictionary stored in it, which was the values dictionary that we created in our exec.

Retrieving from the Database

To retrieve data saved in a database, we can use the db.get() command to pull any information that was saved.

<exec>
dataValue = db.get(source)
</exec>

The above will return either the value the key was saved with, or a Key Error (which will result in a fatal) if it can not find that key in the database.

Cross-Survey Communication

The Auxiliary Database System also allows for cross-survey communication between databases. This allows us to use and/or update data stored in a database in survey A, by accessing it from survey B. In order to enable cross survey communication between databases, the survey which contains the actual database (survey A) must have an acl.txt file created, which allows other surveys to access the information in survey A.

 As an example, let’s say that Survey A has the path selfserve/abc/surveyA and Survey B has the path selfserve/abc/surveyB. To allow Survey B to access a database created in Survey A, we need to upload an acl.txt file to Survey A with the following contents:

selfserve/abc/surveyB

Note: To allow all directories to access a database, we can add * to our acl.txt file, which is a wildcard symbol. Additionally, we can allow all projects in a certain directory to have access, by specifying sefserve/abc/* in our acl.txt file.

Once we have our acl.txt file created, we can use the following syntax to access the database db1 in Survey A, from Survey B.

Note: The below example assumes Survey A already has a database created by using db=Database(name="db1").

<exec>
externalDB = Database(survey="selfserve/abc/surveyA", name="db1")
<exec>

The above syntax allows us to directly access Survey A’s database in our Survey B. This also allows us to now add or pull data to/from that database, as well as update already existing records in it. 

Note: When accessing a database from a different study, you can only use the has()/get()/add() commands. Commands such as inuse() will not work.

Viewing a Database List

  Requires Decipher Cloud

The dbshow command can be used to see a list of all the databases we have created for a project, or a specific database’s contents. The command has the following syntax:

dbshow survey name

Where survey is the path to the project we wish to check, and name is the name of our database, if we want to view a specific one. If we navigate to a project in the shell, we can run dbshow . Which will produce a result similar to the below:

[pagov] /home/hermes/v2/selfserve/214e/161102$ dbshow .
Assuming . refers to selfserve/214e/161102
db

The above means that this project has one database created, called db. To view its contents, we can run:

dbshow . db

Which will show us something similar to the below:

uuid                    date                    key     value
hkyknpvkjskpskna        01/03/2017 14:13        123     {'Q1': 2, 'Q2': 'It is awesome !'}

Editing a Database

  Requires Decipher Cloud

Accessing a database via the shell also allows us to make edits for it, by using the dbimport script. The dbimport script can import or delete data from a tab delimited text file, and uses the following syntax:

dbimport survey name filename.txt

survey - the path to the survey in which our database is created
name - the name of our database
filename - the name of the file we will use for the import. (the file must be uploaded to the server)

The tab delimited text file we create can have up to 3 columns in it - key, uuid, and data. The created file should have at least key as an existing column in order for the command to work properly. When importing data into already existing records, we also need to specify the uuid and data columns. An example of the contents of this file can be found below:

uuid                   key       data
hkyknpvkjskpskna       123       {'Q1': 2, 'Q2': 'It is awesome !'}
zxcdcpvkjsasduta       234       2
yricmynvieciytnc       345       4
vmgnciekcunvedkf       456       11

If we name the above file import.txt we can run the following command to import data into an already existing database named db. In our study (assuming we are in our project directory):

dbimport . db import.txt

Which will update all entries with the corresponding keys in our existing database.

Advanced Modifiers

In some cases, we may want to import data paired with a key, but not associate it to a specific respondent  uuid. To do that, we can use the -u switch when running our command, to randomly generate unique user ids to our records:

dbimport -u . db import.txt

If our datafile contains python syntax, the system will import it as a string by default. To make sure our python code is read properly, we can use the -e switch:

dbimport -e . db import.txt

To delete matching entries in our file from our database, we can use the -d switch:

dbimport -d . db import.txt

Here you can match on any one or all of the key, data and uuid fields. For example if you have a file with just a uuid column then all database entries for this database for the specified uuids are deleted. If you have both uuid and key then only those entries where both uuid and key match each line are deleted.

Re-indexing a Database

Usage:

dbreindex <survey name> <database name>

Occasionally it can be useful to reindex the text file. If you have an invited.txt file that's 10 million entries long that is frequently updated, v2 will build an invited.txt.index file with which it can tell whether a particular ID is likely within invited.txt or not without having to read all 10 million lines. However building that index file can take a while, which can be of annoyance to respondents entering the survey and dealing with 2-3 minute timeouts. In that case use e.g., dbreindex abc/surveyA invited, to rebuild that index from the command line.