The goal of collecting data is ultimately to gain insights that help teams and individuals make better decisions. However, it takes a lot of time and thoughtful data manipulation to transform raw data into something that is meaningful. In fact, most data practitioners report spending 80% of their time "cleaning data"—but what does this work look like and how might we better support the individuals building insightful, optimized data artifacts that make us all more successful?
A New Way to Manipulate Data
Rill is introducing a new tool to help data practitioners build data intuition to optimize the datasets that power downstream decision support tools or dashboards. Our tool takes a new approach to exploring data by increasing the observability of common patterns of analytical questions using cutting edge technologies that support data inquiry at the speed of conversation. When we can explore data as fast as we can ask the next question, we are increasing cognitive flow and productivity in data work in a way that few tools do today.
What is the Core Analysis Loop?
Data cleaning isn’t as janitorial as it sounds. Most of the work is focused on building intuition about what the data represents through iterative analysis called the Core Analysis Loop. The Core Analysis Loop reflects an iterative sequence of questions and analyses that build understanding and trust in the final result set—steps like: collect, wrangle, profile, model, evaluate, report, decide.
At Rill we are focused on making this analysis loop as intuitive and frictionless as possible by increasing the observational surface area of your data and the speed of inquiry in each loop. Let’s take a look at a common example to explore what this iterative Core Analysis Loop feels like in practice.
A data analyst is asked, “How did our paid advertising campaign contribute to product revenue last week compared to the week before?” To answer this question, the data practitioner might try to build some intuition about the purchase funnel by examining related tables.
This long process of transforming raw data into a consumable insight took hundreds of Core Analysis Loops to explore what the data represents and get to the right answer in the right context.
Faster Loops—Always Inspecting Your Work
Most databases that scale take seconds to minutes to complete a query and surface the results. This means that data practitioners have become accustomed to agonizing over what query they have written before hitting the run button and waiting. While we wait our minds wander. We lose the thread of our thoughts that were focused on the analytical question at hand…we are DMing a coworker about some unrelated task…we are checking twitter. When the query is finished we need to reorient ourselves to what we were doing and find our flow again.
Rill Developer turns these conventions on their head using a representative sample from datasets loaded into a smart duckDB queue to optimize for fast analytical queries. Together, these methods can achieve a new level of analytical speed that removes the need for a run button entirely. The query is run for you with each keystroke in the SQL workspace and you get immediate feedback about each successful results set or an error that guides your next keystroke.
Trustworthy Results—Profiling Table Columns
In addition to processing the analytical query of interest at greater speed, we are reactively surfacing many of the “data cleaning” micro-analyses practitioners want to see when we are building data intuition such as the cardinality of a dimension (How many segments would this dimension create?), percent of nulls (Is this data complete?), and top k examples (How common are the most frequent values? What percentage of the whole?).
Because you have this information immediately available without having to write any SQL you gain velocity in your thinking and start querying with robust data intuition at your fingertips.
Increased Observability—The Shape of Transformed Data
As we query our data we are changing the information that is contained in the output relative to the inputs to serve many goals. Sometimes we are answering an ad-hoc question, but when we are ready to formalize a set of ad-hoc questions into a data framework we want more.
Oftentimes we are aligning our data to a governance model that our organization holds, creating a framework for understanding a set of questions, and reducing the cost of the artifact that contains these insights. Though we understand that the goal of generating this framework is to have a well understood and optimized data model, few tools provide rich feedback about whether you have been successful and to what degree.
To this end, our tool surfaces profiling for each output column dynamically with each keystroke of the query; that helps us understand if we have built the right artifacts. We also compute rollup factors such as the number of rows in the inputs relative to the output to know whether we have reduced our data footprint in the way we expect.
Bring the Flow back to your Data Workflow
As data practitioners, we have become accustomed to very slow feedback cycles and tedious analysis as we wait for yet another query to finish executing. As our toolings' capacity for analytics speed improves we should be leveraging these technologies to gain more velocity, flow, and ultimately value in our work streams.
At Rill Data we are committed to reducing the cognitive burden that keeps us from being our most productive data-selves. Forget what you know about the limitations of tools today and start answering your data questions at the speed of conversation. Give our tool a try and let us know how it improves your workflow!