R package to analyse Q methodology data

View the Project on GitHub aiorazabala/qmethod

Data management

A Q study can quickly involve quite a lot of different kinds of interrelated data, including a concourse, a Q set (or sample), a condition of instruction as well as the actual Q sorts. This page suggests some best practices for reproducible, cumulative and systematic Q research, as developed during the first keyneson study.

These best practices can be best implemented using the import functions import.q.concourse, build.q.set, import.q.sorts, import.q.feedback and the print function make.cards, allowing for a one-stop-shop development, iteration and administration of a Q study. However, nothing in the R Package qmethod requires that you follow these best practices; functions are generic and applicable to a wide range of use cases.

Not all of these practices and facilities will be immediately appropriate for all studies, especially small and ad-hoc studies. However, as Q methodology grows and consolidates, more researchers may confront similar challenges to which these best practices provide preliminary solutions.

The below suggestions proceed from simpler, basic to more advanced data management suggestions.

TL;DR: If you’d rather not read a lengthy piece, but look at an example and get started right away, check out Max Held’s Q study keyneson, for which these practices were developed.

What Makes For Best Practices (in Q Research)?

In spite of the great diversity of approaches to Q methodology, some criteria of good research practice may be universally acceptable, including:

  1. Reproducibility. Some other researcher should be able to precisely track and reproduce all steps taken during a research project, especially when it involves empirical analysis. Aside from a deeper commitment to open science, this can also help everyone avoid small, but consequential mistakes.
    In Q methodology, reproducibility may imply:
    • that the gathering, sourcing and editing of the concourse of items be well documented,
    • that the sampling of a Q set from a concourse be documented and justified,
    • that the condition of instruction be documented or
    • that the data entry, verification and cleaning of Q sorts be (programmatically) documented. While concepts of external validity or test/retest reliability do not easily apply to a Q study of subjectivity, reproducible Q research may also involve a replication of a given Q study, with other people, or at another time.
  2. Cumulativeness. Q studies, as other research, should build on, or be informed by previous work in a systematic way. A concourse theory of communication (Stephenson 1978), on which Q methodology is premised, especially, may suggest that any scientific attempt to tap in this multitude of subjective statements should build on past attempts at doing so, and be open to future revisions.
    In Q methodology, cumulative research may imply:
    • that other researchers, or the wider public get to suggest edits or additions to a concourse of statements,
    • that other researchers sample new Q sets from an existing concourse, shared and co-developed between several researchers and the public,
    • that the same Q sets are used in different Q studies, using a different participant p set, or condition of instruction or,
    • that Q researchers conduct meta analyses, comparing factors extracted from different or same Q sets, but with different people, at different times, and so on.
  3. Systematicity. For Q methodology, systematicity may imply:
    • that a method for sampling concourse items into a Q set (structured or unstructured) is documented in a way so that they may be applied to another concourse, or another study,
    • that suggestions for edits or additions to concourse or Q set are (publicly) documented, including justifications for rejected edits or,
    • that the full set of items in the Q set, the concourse and its sources (if applicable), are easily navigable by other researchers, even if this material cannot all be published in established outlets.

Naming Items

Items in a Q set or concourse may need to be referred to in different ways, depending on the study.

Full Item Wording or Stimulus

Items themselves may take many forms, including longer or shorter written language, but also other stimulus material such as pictures. Conventionally, let us refer to this as the full item stimulus or full item wording, depending on the stimulus. An example (from keyneson) would be:

Labor is not a commodity.

Full item wordings may best be saved as individual text files in one directory. It is recommended to use flat text files and not binary/proprietary word processor files (such as *.doc), because the former are smaller, more robust, future-proof and easily transferable. A full item wording file may simply look like this:

Labor is not a commodity.

The import function import.q.concourse included with the R package qmethod expects *.TEX as a file extension (which stands for the LaTeX typesetting language, but LaTeX markup is strictly optional. If you wish to use LaTeX formatting, you can just add markup as in a normal LaTeX file, with no preamble or other declarations needed. For example,

Labor is \emph{not} a commodity.

would yield

Labor is not a commodity.

Item Handles

Depending on the length of these items, and the desired output format, researchers may find it cumbersome to always refer to items by their full item wording. Instead, items can be conveniently assigned an item handle, which should be short and meaningful to the researcher (say, labor-no-commodity, for the above example). Researcher can then use this item handle to:

Item IDs

Another need to refer to items in some shorthand way arises during the administration of a Q study. To record participant Q sorts, it would often be too cumbersome to refer to items by their full wording. Instead, researchers will usually enter some short identifier to record a participants Q sort.

In some settings, it may also not be advisable to have participants see the above item handles, because these meaningful snippets may be understood as additional stimulus by participants, and affect their sorts in unintended ways. (This may be a similar effect to using Q-cards made from different material, or in different colors for different items).

For that reason, a unintelligible identifier, or ID may be advisable to refer to items for Q sort administration.

The import functions import.q.sorts, import.q.feedback and the printing function make.cards included in the R package qmethod allow for two ways of doing this:

  1. Researchers can manually enter arbitrary strings to identify items, such as the customary sta001. In this case, researcher should specify their manual IDs using the manual.lookup options in the above functions (see R documentation for details). Such manual IDs can either be “hard-coded” in R, or they can be conveniently read in from a *.csv file using the read.csv function of base R. Such an example ids.csv file may look like this:

  2. Alternatively, researchers can use the above import and print functions to create an automatic hash from the full item wording. A hash is a cryptographic way to transform much longer pieces of information into short summaries. The same full item wording will always produce the same hash (using the same algorithm), but you cannot reconstruct the full item wording from only the hash, if you don’t know the set of possible statements from which the hash was created. The hash value will be some arbitrary string such as 3ed68fde.
    Hashing is default behavior for the above functions and is recommended for several reasons:
  3. Manual ID tables are a frequent source of errors
  4. Computers can do this kind of identifying job better than humans
  5. A hash value will automatically change if something in the full item wording changes, allowing for a highly reliable way to relate recorded Q sorts back to the items used during administration. For example, if, at the last minute before Q sort administration
Labor is not a commodity.

is changed to

Labor is something that can be bought and sold like everything else on the market.

the hash value created by `make.cards` and expected by input functions will *automatically* change, thus negating the possibility of confusing one item version for another.
Using hash values (and proper version control), researchers will always know exactly what variant of an item people saw and sorted.

This is how items created by make.cards using an ID look like (in this case, a manual ID):

q card printout

You can easily break out individual cards, with their ID on the back, and the full item wording on the front:

q card breakout

Notice (from the qmethod manual):

Hashed identification has not been widely tested in Q studies and should be used with great care and only for extra convenience. When using hash identification, researchers should be careful to record the precise item wordings at the time of hashing for the printed Q cards, preferably with a version control system. Researchers should also record the complete Q sorts of participants in an unhashed form, such as a picture of the completed sort in full wordings, in case problems with the hashing arise.
This function currently only works for Avery Zweckform C32010 templates, designed in /cardtemplates/AveryZweckformC32010.Rnw. If you would like support for other templates, check out / chip in here.

Directory Structure

One Language, One Condition

The simplest directory structure, starting from the root of some Q study, should look like this:

├── feedback
│   └── JohnDoe.csv  # these include possible feedback with one line per item
├── qsorts
│   ├── JaneDoe.csv  # these include the full sorts, recorded in raw form
│   └── JohnDoe.csv
└── sample
    ├── concourse
    │   ├── life-with-q.tex  # these include the full item wordings
    │   ├── q-uprising.tex
    │   ├── r-dominance.tex
    │   ├── small-village.tex
    │   └── video.tex
    │   └── ids.csv  # this includes the IDs, if hard entered 
    └── sampling-structure.csv  # this includes a list of items to be sampled into the q-set

Multilingual, Multi-Condition

The import and print functions in qmethod also support multilingual, and multi-condition Q studies. In this case, the arguments conditions and languages should be specified when calling the functions. The functions will then expect these conditions and languages in the directory structure.

With all bells and whistles, taken from the importexample data shipped with qmethod, a directory should look like this:

├── feedback
│   ├── after  # same conditions as specified in function call
│   │   └── JohnDoe.csv
│   └── before
├── qsorts
│   ├── after  # same conditions as specified in function call
│   │   ├── JaneDoe.csv
│   │   └── JohnDoe.csv
│   └── before
│       ├── JaneDoe.csv
│       └── JohnDoe.csv
└── sample
    ├── concourse
    │   ├── english  # same languages as specified in function call
    │   │   ├── life-with-q.tex
    │   │   ├── q-uprising.tex
    │   │   ├── r-dominance.tex
    │   │   ├── small-village.tex
    │   │   └── video.tex
    │   ├── german
    │   │   ├── life-with-q.tex
    │   │   ├── q-uprising.tex
    │   │   ├── r-dominance.tex
    │   │   ├── small-village.tex
    │   │   └── video.tex
    │   └── ids.csv
    └── sampling-structure.csv

File Types

The above directory includes the following different kinds of files:

Item Feedback Files

This is where you store item feedback received from participants.

The idea of these files is that such item feedback may be instructive in later factor interpretations, during which it can be called programmatically.

A file may look like this:

i01,"I don't like Asterix and Obelix",FALSE
i02,"There is a typo here!",TRUE

Q Sorts Files

This is where you record raw Q-sorts, as prepared by participants.

A very simple file may look like this:

"-1","0","1"  # this first line will be interpreted as variable names

Item Files

This is where you save actual full item wordings.

A very simple file may look like this:

And life is not easy for the R-legionaries who bother to read the works of Stephenson and Brown, for these posit actual Q logics of inquiry.

Additional Files

If manual IDs are used (not recommended), that file may also be saved as a *.csv to enable others to reproduce it. Conventions are not important, so as long as the file is correctly read in and modified as expected for input in import.q.sorts or make.cards.

An ID file may look like this:


If a Q set is a selected from a concourse using structured sampling, that sampling subset may also be saved as a *.csv, to enable others to reproduce it. Conventions are not important, so as long as the file is correctly read in and modified as expected for input in build.q.set. An sampling structure file may include arbitrary additional columns, but should include the item handles.

A sampling file may look like this:


Why All This Fuss?

Maintaining such a directory structure and the below file types has a number of advantages and enables good research practice.

With Everything, Please

If you’re curious, what a Q study with all of these suggestions looks like, check out Max Held’s keyneson repository.

A selection of that directory structure looks like this:

├── README.md
├── feedback
│   ├── after
│   │   ├── Frank.csv
│   │   ├── Ingrid.csv
│   │   ├── ...
│   │   └── Wolfgang.csv
│   └── before
│       ├── Claus.csv
│       ├── Frank.csv
│       ├── ...
│       └── Susanne.csv
├── keyneson-sample  # this is a git submodule
│   ├── README.md
│   ├── keyneson-concourse   # this is a git submodule
│   │   ├── README.md
│   │   ├── english
│   │   │   ├── ability-2-pay.tex
│   │   │   ├── all-people-own-earth.tex
│   │   │   ├── ...
│   │   │   └── yield-2-capital-norm.tex
│   │   ├── german
│   │   │   ├── ability-2-pay.tex
│   │   │   ├── all-people-own-earth.tex
│   │   │   ├── ...
│   │   │   └── yield-2-capital-norm.tex
│   │   ├── ids.csv
│   │   └── keyneson-concourse.wiki  # this is a git submodule
│   │       └── Home.md
│   ├── keyneson-sample.wiki   # this is a git submodule
│   │   ├── Home.md
│   │   └── sampling-structure.md
│   └── sampling-structure.csv
├── keyneson.wiki  # this is a git submodule
│   ├── Home.md
│   ├── Q-Sort-Form.pdf
│   └── condition-of-instruction-de.md
└── qsorts
    ├── after
    │   ├── Christian.csv
    │   ├── Frank.csv
    │   ├── ...
    │   └── Wolfgang.csv
    └── before
        ├── Christian.csv
        ├── Claus.csv
        ├── ...
        └── Wolfgang.csv