This visualization is an interactive dashboard visualization of the Smithsonian Institute's Open Access dataset, which contains 11.9 Million Records. The dashboard allows insight into the composition of the Smithsonian's collections, including what, when, and where items come from. Specifically, the visualization looks at the unit (such as the National Museum of American History or the Human Studies Film Archives), country, and time items come from (particularly if within the last couple centuries). Interaction enables filtering to a specific unit, allowing comparison in trends between units in addition to the whole.
Since the Open Access dataset contains 11 million records, the data is in its own way opaque. At 26GB uncompressed, it's too large for me to load at once, much less interactively search through. To accomplish this task, the data was sampled for basic structure (at the start of this project the format specification had not been released) and then processed with Python and Jupyter Lab. After logging summaries and anomalies, string processing was used to clean up typos, inconsistencies, and similar issues. Finally, a JSON file was created with the aggregated data.
At a technical level, the data was processed using command-line tools (including
awk), Python, Jupyter Lab, and Regex; the visualization was created with d3 and a fork of Semantic UI.