Data capture, coding and cleansing, documentation

Completed questionnaires cannot be analysed until researchers have processed the information they contain in a computerized data file. The type of processing depends on both how the information was collected (on paper, via internet; self- or interviewer-administered) and how the researchers intend to analyse it. During the various phases of data file processing, INED’s Surveys Department can assist researchers in different ways (advice, expert assessment, supervision of recruited staff, management of certain operations or study components).

Data capture

The Surveys Department provides research teams with various methods for recording questionnaire responses. Depending on the survey field, target population, and budget constraints, one or more complementary methods will be proposed.

Several data capture methods exist:

  •  manual entry of responses from paper questionnaires

This method has been employed for many years at INED; today, a special data capture application is used;

  •  the CAPI (Computer-Assisted Personal Interviewing) method

The Surveys Department can also set up CAPI surveys, where interviewers enter answers directly on a laptop computer during the interview in the respondent’s home;

  • optical scanning

The Surveys Department has optical scanning equipment with two data capture stations. This system requires a particular type of questionnaire layout. It is extremely quick for closed-ended questions but data entry clerks are usually needed to enter answers to open-ended questions. Once the data has been captured and is in digital form, the dataset can be indexed in a database and consulted more quickly than questionnaire data on paper;

  •  data capture via internet

The Surveys Department has been working for several years to develop internet questionnaires that can be administered via the CAWI method (Computer-Assisted Web Interviewing);

  •  The CATI method (Computer-Assisted Telephone Interviewing)

For some surveys, the telephone method is used to administer questionnaires and capture answers. This type of data collection is usually outsourced since the Surveys Department does not have the required equipment and infrastructure.

Coding

Response coding involves assigning predetermined codes to responses in order to facilitate processing and analysis. This is often a necessary stage in data file production. Coding may be done before, during, or after data capture, as determined by the type of analysis to be performed. Some types of analysis require specific information or coding that is not directly related to the survey topic.

Responses to some questions—e.g., respondent’s occupation—require particularly complex processing at the coding stage. Specialized software such as the INSEE-developed SICORE programme may be used. This type of coding can impact on questionnaire design, since it may be necessary to include specific questions on the respondent’s occupation.

Data cleansing

In data cleansing, the data file is checked in a multitude of ways and tested for consistency in order to improve data quality. This stage usually takes place after questionnaire response capture, but if the capturing process is long (several months), data may be cleansed during capture.

Several types of checks are performed:

  •  Consistency tests

Consistency tests are run to detect data capture errors and inconsistencies in respondent statements and to check the consistency of situations that are difficult to identify during questionnaire review or data capture;

  •  Filter question validity checks

Survey questionnaires contain a number of filter questions to ensure that respondent only answers questions relevant to their own situation.