PPV measures the probability of a record included in the BCC directory conditional on the record being present on Weedmaps, calculated as the number of records that were present on both Weedmaps and the BCC directory divided by the number of records present on Weedmaps. NPV measures the probability of a record excluded from the BCC directory conditional on the record being absent on Weedmaps, calculated as the number of records that were neither present on Weedmaps nor present on the BCC directory divided by the number of records being absent on Weedmaps. You will notice that specificity and NPV cannot be calculated in this example, because we were not able to identify a “true negative”, a record that was excluded from Weedmaps and also absent in the BCC directory. In fact, not all validity statistics were applicable to a combination of a gold standard and a test with the current study design . Following tobacco outlet research , we considered validity statistics 0-0.2 to be poor, 0.21-0.4 to be fair, 0.41-0.6 to be moderate, 0.61-0.8 to be good, and 0.81-1.0 to be very good. R Version 3.5.3 was used to calculate 95% confidence intervals for all the validity statistics. We computed overall statistics as well as the statistics by dispensary category and county population size . Locations of call-verified active brick-and-mortar dispensaries in California were mapped with ArcGIS Version 10.5. A total of 2,121 business records were combined from BCC and the three online crowd sourcing platforms after online data cleaning. BCC, Weedmaps, Leafly, and Yelp had 630, 811, 535, and 1,468 records included in the combined database, respectively. The overlaps across the data sources were presented in Figure S1. Only 240 records were present in all four data sources. Following call verification, the 2,121 records were reduced to 826,pipp grow rack which were confirmed to be active brick-and-mortar dispensaries. Among the 1,295 records removed during call verification, 56.0% were closed, 4.2% were not open yet, 38.0% were not selling marijuana, and 1.8% had no storefronts .
BCC, Weedmaps, Leafly, and Yelp had 486, 659, 459, and 471 records included in these 826 verified dispensaries, respectively. The overlaps across the data sources were presented in Figure S2. The 826 records included 77 recreational-only, 65 medical-only, and 684 recreational & medical dispensaries. The dispensary category was based on self-reporting by dispensary staff in call verification. Table 1 reports validity statistics using the BCC licensing directory as the gold standard. When the test was whether being present on each online crowd sourcing platform after online data cleaning, Leafly had good sensitivity and Weedmaps and Yelp had moderate sensitivity . It indicated that 70% of the BCC licensing directory could be found on Leafly. Leafly also had very good PPV , yet Yelp’s PPV was only fair . It indicated that 83% of Leafly records were included in the BCC licensing directory. When the test was whether passing call verification, Leafly still had the highest sensitivity and PPV , and Yelp had the highest specificity and NPV . It indicated that, call-verified Leafly records performed the best for identifying truly licensed dispensaries and call-verified Yelp records performed the best for identifying truly unlicensed dispensaries in this scenario. Table 2 reports validity statistics using the call-verified, combined database as the gold standard. When the test was whether being present in each data source after online data cleaning, Weedmaps had the highest sensitivity and BCC, Leafly, and Yelp all had moderate level of sensitivity ranging from .56 to .59. It indicated that 80% of the call-verified, combined database of active dispensaries could be found on Weedmaps. Leafly and Weedmaps had very good PPV , and Yelp’s PPV was only fair . It indicated that 86% of Leafly records were included in the call-verified, combined database of active dispensaries. When the test was whether passing call verification, sensitivity statistics remained the same as when the test was whether being present in each data source. This was because call-verified businesses in each data source were a subset of the businesses included in each data source before call verification, such that the numerators and denominators for sensitivity calculation remained the same. Yelp had the highest NPV and Leafly had the lowest NPV . Table 3 reports the agreement between BCC, online crowd sourcing platforms, and call verification in terms of the category of the 630 licensed dispensaries.
Approximately 25% of the licensed dispensaries on Weedmaps and 29% of the licensed dispensaries on Leafly posted their category that disagreed with what was approved in the BCC license. Approximately 12% of the call-verified, licensed dispensaries stated their category in call verification that disagreed with what was approved in the BCC license. Most of the businesses that stated an unapproved category on online crowd sourcing platforms and/or in call verification claimed themselves to be recreational & medical when they were only licensed for recreational-only or medical-only. Table S3 quantifies category-specific validity statistics when the gold standard was whether being present in the BCC licensing directory. Leafly had the highest sensitivity in recreational-only and recreational & medical categories and Weedmaps had the highest sensitivity in medical-only category, regardless of the definition of a test. Table S4 quantifies category-specific validity statistics when the gold standard was whether being present in the call verified, combined database. When the test was whether being present in each data source after online data cleaning, Weedmaps had the highest sensitivity in identifying recreational-only and medical-only dispensaries, yet BCC had the highest sensitivity in identifying recreational & medical dispensaries. When the test was whether passing call verification, Weedmaps overall had the highest sensitivity in all three categories. In 2019, California had 16 counties with a population size above one million and 42 counties with a population size below one million. Table S5 reports validity statistics by county population size when the gold standard was whether being present in the BCC licensing directory. Leafly had the highest sensitivity regardless of test definition and county population size. Table S6 reports validity statistics by county population size when the gold standard was whether being present in the call-verified, combined database. Regardless of test definition, Weedmaps had the highest sensitivity in more populated counties and BCC had the highest sensitivity in less populated counties.This study is the first to assess the validity of secondary data sources in identifying brick and-mortar marijuana dispensaries across a large state.
We reported the validity of online crowd sourcing platforms in enumerating licensed dispensaries and the validity of state licensing directory and online crowd sourcing platforms in enumerating active dispensaries. Regarding the validity of using online crowd sourcing platforms in identifying the BCC licensing directory, all three online crowd sourcing platforms were able to include over 50% records in the BCC directory, with Leafly containing the largest number of licensed dispensaries . These findings suggested that the online crowd sourcing platforms could serve as a reasonable proxy for the licensing directory. It evidences the validity for many existing and future studies to utilize online crowd sourcing platforms for dispensary identification, especially if a licensing system is not open to the public or is updated infrequently. It should be noted, however, that the dispensary category registered in the BCC directory may be mismatched with the “de facto” category in which dispensaries operated. Over 25% licensed dispensaries on online crowd sourcing platforms posted their category that disagreed with the BCC license and over 10% call-verified,4×8 botanicare tray licensed dispensaries stated their category in call verification that disagreed with the BCC license. Particularly, most of such dispensaries claimed themselves to be recreational & medical while they were only licensed for recreational only or medical only. Such disagreement might be intentionally used as a means of attracting customers or be reflective of how dispensaries operate in practice. Regarding the validity of using the state licensing directory in identifying active brick and-mortar dispensaries, over 20% licensed dispensaries did not pass call verification. This indicated that business licenses may not accurately represent businesses’ operation status in reality. For instance, a business may have been closed before its license is expired and a business may not be open yet even though its license has been approved. In the final 826 call-verified dispensaries, 58.8% were included in the BCC licensing directory. This indicated that the BCC directory failed to capture unlicensed dispensaries, which accounted for over 40% of the total active dispensaries in California. Solely relying on a state licensing directory would overestimate active, licensed dispensaries whereby overlook active, unlicensed dispensaries. Regarding the validity of using online crowd sourcing platforms in identifying active brick-and-mortar dispensaries, Weedmaps had a nearly very good sensitivity; it contributed 80% of the records in the final call-verified, combined database. It had the highest sensitivity in identifying recreational-only and medical-only dispensaries. It was also the most sensitive database in identifying dispensaries in more populated counties, which were mostly urban areas. The high concentration of dispensaries and intense competition in urban areas may motivate more businesses to promote themselves on this highly visible and popular platform .
Leafly had the lowest sensitivity in identifying active dispensaries. It also had the lowest sensitivity in identifying all three dispensary categories. It is likely because the costs of advertising on Leafly were substantially higher than other online crowd sourcing platforms specialized in marijuana . Only 32% of the businesses listed on Yelp were verified to be active brick-and-mortar dispensaries. This is not surprising because Yelp, which provides a general business listing service not specifically designed for marijuana industry, had more records irrelevant to marijuana dispensary than Weedmaps and Leafly. Taken together, no single secondary data source could provide a reasonably complete and accurate list of active brick-and-mortar dispensaries in a large state like California. We recommend surveillance and research to consider their unique strengths and weaknesses when a single data source is used to minimize required resources. When resources are available, we recommend the integration of multiple secondary data sources, preferably including a licensing directory and multiple online crowd sourcing platforms, as well as verification through phone calls such as what has been done in this study or through even better approaches such as a field census. The verification could considerably improve the accuracy of the data compiled from secondary data sources. Our findings were overall consistent with the two smaller-scale studies conducted in California, both in Los Angeles County. One was conducted in 2016-2017, before recreational marijuana dispensaries were allowed to open . This study obtained medical marijuana dispensary information from five online crowd sourcing platforms. Weedmaps was suggested to be the most accurate and up-to-date platform, contributing to 95% of the final records. Call verification was conducted in 10% of the dispensaries and found to generally align with the information posted on online crowd sourcing platforms. The other study was conducted in 2018-2019, after recreational marijuana dispensaries were allowed to open . It extracted data from Weedmaps and Yelp and verified dispensary information through site visits. About 80% dispensaries that were determined to be active through online data cleaning were confirmed to be active in site visits, and licensed dispensaries accounted for roughly 40% of the active dispensaries. Neither study reported validity statistics for each specific data source. Our study expanded on the prior research by covering a much larger geographic region, computing detailed validity statistics for each data source by dispensary category and county population size, and by using two gold standards and two tests to demonstrate validities in different scenarios and for different purposes. This study has limitations. First, due to the lack of feasibility of conducting a field census in such a large geographic region, phone calls were made to verify information obtained from secondary data sources. While this approach was cost effective, businesses not listed in these secondary data sources were excluded from the analysis, potentially the smaller, unlicensed dispensaries that did not intend to promote themselves on online crowd sourcing platforms because of cost and law enforcement concerns. Future research using field census approach is warranted to assess to what extent unlicensed dispensaries were underrepresented in our study.