Manual annotation of vast amounts of data obtained from social media is a costly commitment because the work must be done by experts with domain knowledge. The amount of data required to train the algorithm is in the tens of thousands of records. Extraction of ADRs from generic platforms such as Twitter is more difficult than from platforms that are highly specialized. Twitter users generate over 400 million posts daily, of which most have no relevance in drug safety. According to Sarker et al. (2015), the most studied online resources for ADR extraction from the internet include specialized platforms such as DailyStrength or MedHelp, Yahoo! groups, AskAPatient, Drugs.com, DrugRatingZ, PatientsLikeMe, MediGuard, ForumClinic, Medications.com, SteadyHealth and Twitter. 
Lexicons used for ADR detection
Sarker et al (2015) prepared a comprehensive review of methods used to utilize social media for pharmacovigilance purposes . FDA Adverse Event Reporting System (FAERS), the standard post-market surveillance database, has been widely used for pharmacovigilance research, including social media scanning. MedEffect Canada a reporting system in Canada, was also used in some studies as a standard for identification of ADRs. Coding Symbols for a Thesaurus of Adverse Reaction Terms (COSTART), originally developed by the FDA, is no longer in use as it was superseded by MedDRA. Medical Dictionary for Regulatory Activities (MedDRA) was developed in the late 1990s by the International Council for Harmonisation of Technical Requirements for Pharmaceuticals for Human Use (ICH). It is a highly specific, globally used, standardized dictionary of adverse drug reaction terms. Consumer Health Vocabulary (CHV) and the Unified Medical Language System (UMLS) are the most important resources used for social media monitoring. The CHV dictionary translates adverse event terms used by laypeople into medical terminology. UMLS is a broad meta-thesaurus of medical terms. CHV and UMLS are often used in combination. Another valuable resource is Side Effect Resource (SIDER) that provides all listed side effects of marketed drugs derived from package inserts and public documents in a machine-readable format.
Freifeld et al. (2014) screened 6.9 million Twitter posts for 23 pharmaceutical products, using the FAERS database as a standard to identify potential events. About 60,000 identified drug-event pairs were manually annotated to yield 4,401 Proto-AEs (posts with a resemblance to AE) . Ginn et al (2014) manually screened 10,822 tweets identified by an algorithm to yield 1,200 tweets that included an adverse drug reaction .
The quantity of potential ADEs that require manual annotation by a specialized expert does not seem to be significantly dropping. Moreover, according to Ginn et al. (2014), the authors of the largest available annotated dataset from generic media (Twitter), only about 10% of drug-related posts contain an ADR.
Sarker et al. (2015) in their review compared system recall, precision and F-scores for ADR extraction when manually annotated data is used for evaluation. They identified 11 studies that involved manual annotation of datasets, the size of the annotated dataset ranging from 125 to 3150. The highest recall reached 0.89 while the lowest recall was 0.56. Precision ranged from 0.54 to 0.87 and F-score from 0.58 to 0.84. These parameters will likely improve as the annotated datasets grow .
Overall, detection, extraction, and annotation of data derived from social media remain a significant technical challenge.
 Sarker, A., Ginn, R., Nikfarjam, A., O’Connor, K., Smith, K., & Jayaraman, S. et al. (2015). Utilizing social media data for pharmacovigilance: A review. Journal Of Biomedical Informatics, 54, 202-212. doi: 10.1016/j.jbi.2015.02.004
 Freifeld, C., Brownstein, J., Menone, C., Bao, W., Filice, R., Kass-Hout, T., & Dasgupta, N. (2014). Digital Drug Safety Surveillance: Monitoring Pharmaceutical Products in Twitter. Drug Safety, 37(5), 343-350. doi: 10.1007/s40264-014-0155-x
 Ginn, R., Pimpalkhute, P., Nikfarjam, A., Patki, A., O’Connor, K., & Sarker, A. et al. (2019). Mining Twitter for Adverse Drug Reaction Mentions: A Corpus and Classification Benchmark. Lrec bIOtEXm. Proceedings of the fourth workshop on building and evaluating resources for health and biomedical text processing. 2014; 2:1-8. Retrieved from http://www.nactem.ac.uk/biotxtm2014/papers/Ginnetal.pdf