Assessing data quality
Once researchers become accustomed to working with complex datasets, including those derived from, for example, social media, the general principles and challenges of curating these data are fundamentally the same as those for smaller, traditional data. The UK Data Service’s team of experts has over 20 years of experience brazil rcs data curating data and can provide support for researchers who are about to embark upon processing and analysing big data. look at how one manages small data effectively and then to apply those very same principles to big data.
A key part of curating data is assessing the quality of the data to be used. For example, how do you treat missing data? The process for dealing with missing data in traditional datasets can be applied to big data also. There are tools that can auto-inform a researcher which parts of a dataset are missing, for example, algorithms that pull out the outliers if a researcher sets the parameters. Google refine can help with this, as can the many free R tools that are available. The UK Data Service also assists researchers in identifying potential issues – we always flag up a quality for which we do not know the provenance, so the researcher is better informed about the limits of the data.