---
title: "Getting started with resquin"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{resquin tutorial}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

## An introduction to `resquin`

This short tutorial describe the functions in `resquin` and how you can use them
on a technical level. For a more substantive introduction see the (forthcoming) 
article [Using resquin in practice](https://matroth.github.io/resquin/articles/resquin_in_practice.html).

Functions in `resquin` calculate response quality indicators for survey
data stored in a data frame or tibble. The functions assume that the
input data frame is structured in the following way:

-   The data frame is in wide format, meaning each row represents one
    respondent, each column represents one variable.
-   All variables have the same number of response options.
-   The variables are in same the order as the questions respondents saw
    while taking the survey.
-   All responses have integer values.
-   Missing values are set to `NA`.
-   (For `resp_styles()`) Reverse keyed variables are in their original form. No items were
    recoded.

### Example dataset of survey responses

Consider the following (fake) data set of survey responses.

```{r}
# A test data set with three items and ten respondents
testdata <- data.frame(
  var_a = c(1,4,3,5,3,2,3,1,3,NA),
  var_b = c(2,5,2,3,4,1,NA,2,NA,NA),
  var_c = c(1,2,3,NA,3,4,4,5,NA,NA))

testdata
```

The data set contains responses to three survey questions (var_a,var_b
and var_c) from ten respondents. All three survey question allow
responses on a scale from 1 to 5. Some respondents have missing values,
which are set to `NA`.

Lets use this data set to calculate response quality indicators.

### `resp_styles()`: Response style indicators

Response styles capture systematic shifts in respondents response
behavior. For example, respondents with an extreme response style may
only choose the lowest and highest categories (in our example 1 and 5)
while mid-point responder only choose the midpoint of a scale (in our
example 3).

To calculate response styles we can use the `resp_styles()` function.
First, we need to specify our data argument `x`. Then, we need to
specify the minimum and maximum of the scales used in our questionnaire
(`scale_min` and `scale_max` respectively). Remember that all questions
included must have the same number of response options. We will discuss
the arguments `min_valid_responses` and `normalize` later.

```{r}
library(resquin)
# Calculating response style indicators for all respondents with no missing values
results_response_styles <- resp_styles(
  x = testdata,
  scale_min = 1,
  scale_max = 5,
  min_valid_responses = 1, # Excludes respondents with less than 100% valid responses
  normalize = T)  # Presents results in percent of all responses

round(results_response_styles,2)
```

The resulting data frame contains five columns corresponding to the
middle response style (MRS), acquiescence response style (ARS),
disaquiescence response style (DRS), extreme response style (ERS), and
non-extreme response style (NERS) - you can learn more about the
response styles in the help file of the function using `?resp_styles`.

Each respondent receives one value for each indicator, given that they
can be calculated. Because `normalize` is set to `TRUE`the values are
expressed as the share of responses of a respondent that can be
attributed to a response style. For example, respondent one has an ERS
value of 0.67 meaning that two out of three responses can be identified
as extreme responses. On the other hand, respondent one does not have
any mid-point response, leading to a value of 0 in the MRS column.

Instead of calculating proportions, we can extract the counts of
responses that can be attributed to a response option by setting
`normalize` to `FALSE`.

Finally, we can decide to include or exclude respondents from receiving
response style values by setting `min_valid_responses`, which can take
values from 0 to 1. `min_valid_responses` sets the share of valid
responses (i.e. non-missing responses) a respondent must have to receive
response style values. A value of 0 indicates that response style values
should be calculated for all respondents, regardless of whether or not
they have missing values. A value of 1 indicates that response styles
should only be calculated for respondents who have valid responses on
all variables. Values between 0 and 1 indicate the share of responses
that need to be valid to be included in the response style calculations.

### `resp_distributions()`: Intra-individual response distribution indicators

`resp_distributions()` calculates indicators which reflect the location
and variability of responses within a respondent. `resp_distributions()`
works similar to `resp_styles()`: We need to specify the data argument
and we can include or exclude respondents from the calculations based on
amount of missing data they exhibit (for an explanation see paragraph
above).

```{r}
# Calulating response distribution indicators for all respondents with no missing values
results_resp_distributions <- resp_distributions(
  x = testdata,
  min_valid_responses = 1) # Excludes respondents with less than 100% valid responses

round(results_resp_distributions,2)
```

The resulting data frame contains eight columns:

-   n_na: number of intra-individual missing answers

-   prop_na: proportion of intra-individual missing responses

-   ii_mean: intra-individual mean

-   ii_median: intra-individual median

-   ii_sd: intra-individual standard deviation

-   mahal: Mahalanobis distance per respondent.

You can learn more about the response distribution indicators using `?resp_distributions``