Page 1 of 1

Coding String Variables

Posted: Sun Feb 09, 2025 5:09 am
by asimd23
One of the valuable lessons we’ve learned with a longitudinal dataset is that you always need to be thinking of the next round, even if it is 3-4 years away and how the data needs to be accessible and usable. String variables are never archived as it’s nearly impossible to ensure that no confidential or sensitive information is included in the text, especially if it’s in the local language. However, this doesn’t mean that the string variables can’t be coded to become useful. Over the last two years we’ve put in a lot of effort to code some key string variables such as the locations of our children and names of the schools our children are and canada rcs data have attended. To say this was tedious would be an understatement. For location variables many of our country data managers had to go so far as reviewing individual address of our children. With 12,000 children – some who have moved every round – this task was time intensive, but we hope that the added benefit of deducing who has moved each round using codes will strengthen analysis. This is also important when coding our schools. Whereas before this data was simply omitted, it is now available in code format so that if the code stays the same you can see where children have stayed in the same school or at what point in their trajectory they moved school.

Calculated Variables

Young Lives also generates calculated variables such as health indicators, wealth and consumption index’s and test scores for each round of data. These data are constantly updated after each round, as are the methods for calculating these scores, and submitted alongside our collected data to the public archive. These indicators are often widely used and can be quite difficult to create. By including these in our public datasets we hope to save the user’s time in their own analysis.

Panel Data set

After the submission of the Round 3 data to the UK Data Service, we began to build a panel dataset for each country. This set contained the core data that has been collected across Rounds 1 -3 and was submitted to the public archive. Again, the aim of this was to aide user analysis by providing a dataset that summarises variables that have remained constant across the rounds for each country and cohort. After our recent submission of the Round 4 data, we have updated the panel dataset and plan to submit this in the coming weeks.