How GitHub and Stack Overflow Data Were Verified for Research Accuracy | HackerNoon
Briefly

This article discusses the methodologies for ensuring validity in research related to Copilot users. It emphasizes the importance of construct and external validity, detailing how the authors employed consensus strategies and pilot experiments to mitigate bias in data labelling, extraction, and analysis. The risks of personal bias were acknowledged, and processes for achieving accuracy were implemented. The article also notes that internal validity was not considered as variable relationships were not explored, though the findings emphasize the careful choice of data sources to optimize external validity.
To enhance construct validity, we implemented strategies such as pilot experiments for data labelling agreements and consensus involvement to mitigate personal bias.
The primary threat to external validity in our study arises from the selection of data sources, emphasizing the importance of carefully choosing repositories for research.
In conducting our research, we ensured joint agreement among authors during data analysis to address uncertainties, employing a negotiated agreement approach to resolve conflicts.
Our study recognizes that internal validity was not considered, as we did not investigate variable relationships, focusing instead on manual data handling processes.
Read at Hackernoon
[
|
]