The ALPHA Network assembles and curates individual-level data from different sources, including demographic surveillance, verbal autopsy interviews, serological and sexual behaviour surveys, and individually-linked data from medical facilities. In addition, several of the ALPHA Network partner institutes have collected qualitative data on HIV and AIDS care services utilisation, and have conducted health facility surveys.

Access to individual-level data

Selected anonymised microdata are available under licence from DataFirst: ALPHA collection. These include datasets prepared for analysis of HIV incidence and of mortality by HIV status. These datasets differ from those produced for harmonisation (described below) and more closely resemble those used in published analyses of the data. The structure of the datasets are described in these documents:

Demographic data from several of the ALPHA Network studies are also available through INDEPTH’s iShare Repository.

Kisesa, Rakai and uMkhanyakude share clinic data with the IeDEA Network. It is possible to link the ALPHA data from these studies with the IeDEA clinical data for the subset of study participants who have attended a clinc which shares data with IeDEA. Masaka has linked clinic data prepared in IeDEA format. To access these data, potential users should contact ALPHA at

ALPHA code

The code used to produce the datasets used for analysis of HIV incidence and mortality is on GitHub, with versions available for Stata and R users: ALPHA code for Stata and code for R.

Specifications for harmonised ALPHA data

The data are organised into 10 different tables, referred to as the data specifications. Each of these is described in more detail below. Most analyses require several data specifications to be combined. Code used in preparing and analysing the data is available from our GitHub repository: ALPHA code for Stata and code for R.

Residency (formerly called 6.1) contains date of birth, the periods of time spend resident in the study and how these ended. This information is used for survival analysis. Residency data spec definition.

HIV tests (formerly called 6.2b) contains dates and results of HIV tests done for research purposes and, for some studies, information on tests done in other settings and self-reported HIV status. HIV tests data spec definition.

Parent links (formerly called 7.1) contains the parent and child id links to allow matching of parent-child information. Parent links data spec definition.

Births (formerly called 7.2) contains the dates of birth of children who were born in the study area during the study follow up time. Births data spec definition.

Surveys (formerly called 7.4) contains socio-demographic information ( such as marital status and education) which is collected from a variety of sources in each study. Surveys data spec definition.

VA- verbal autopsy (formerly called 8.1c) contains the responses to the verbal autopsy questionnaires inbinary variables for each sign or symptom. VA data spec definition.

HIV clinic (formerly called 9.2) contains data on HIV clinic visits including information on ART. HIV clinic data spec definition.

HIV diagnosis and treatment (HIV Dx Tx) (formerly called 9.1) contains self-reported information on HIV testing, care and treatment histories. HIV dx tx data spec definition.

Sexual behaviour (formerly called 10.1) contains self-reported information on partnerships, contraception and sexual behaviour. Sexual behaviour data spec definition.

Non-communicable diseases (NCDs) (formerly called 11.1) contains self-reported data on NCD risks and history and anthropometric measurements. NCDs data spec definition.