Brooklyn Data Co.

By BROOKLYN DATA CO.

MARCH 16, 2021

PeopleTechnology

Becoming an analytics engineer: two insider views

By BROOKLYN DATA CO.

MARCH 16, 2021

PeopleTechnology

Meet Raphaela “Raphi” Abramson and Eli Kastelein, as they discuss getting into the field of analytics engineering and what it means to be successful.

Ever wondered about a role that brings the best of software engineering to data? Alongside the rise of dbt (which we are big fans of here at BDC), has come the growth of analytics engineers, who do just that. We’re lucky to have these folks on our team here at Brooklyn Data, and they bring value to our clients every day. We asked two about their path to a career in analytics engineering, the skills needed to get there, and what their days look like.

“ I think a common behavior across analytics engineers is curiosity—you want to know where the data is coming from and why it’s formatted that way, and you also (to the annoyance of your peers) frequently want to tinker with it. ”

— RAPHI

“ It’s easy to overlook the complexity involved in the data transformation, which is why I think there will continue to be a growing demand for analytics engineers. Because when you don’t have your data transformations under control, everything else becomes really really difficult.”

— ELI

Tell us about yourself.

Raphi: I’m an analytics engineer at Brooklyn Data Co with a background in political science and political data. I love reading (my favorite app on my phone is Goodreads, followed by Libby), and enjoy baking and (very) occasionally knitting. I live in New Jersey with my husband and toddler.

Eli: For the first four years of my career I worked as a data analyst in the tech industry, but then I joined BDC as an analytics engineer. I was born and raised in Vancouver, Canada and I currently live here with my wife. My hobbies are playing basketball, watching sports and reading Twitter.

How did your career path push you towards being an analytics engineer/how did you end up here?

“...the minute I saw the “Analytics Engineer” position it was like a lightbulb turned on above my head. The role focused specifically on all the things I loved about my job at the time—modeling data, managing pipelines, and building reports.”

— RAPHI

Raphi: As my career developed, I discovered that I really enjoyed building reports and managing pipelines, but did not particularly enjoy pulling ad-hoc analyses and explaining reports. I loved the first part of the process--i.e., ingesting data and developing reporting infrastructure--but was more than happy for my coworkers to take on explaining insights and managing weekly reporting meetings for the reports I built.

I always felt self-conscious about it because I understood that the only way to advance as an analyst was to become more focused on working with clients and less focused on building reports. It made me very unhappy to realize that the part of my job that I loved was the part that I would have to give up if I didn’t want my career to stall. I didn’t know about analytics engineering at that point, and was seriously exploring data engineering since it seemed to offer me the best path towards doing some (but not all) of what I loved most of the time.

Luckily, a friend and former coworker from Brooklyn Data Co reached out to me about applying for a role. I looked at the available roles, and the minute I saw the “Analytics Engineer” position it was like a lightbulb turned on above my head. The role focused specifically on all the things I loved about my job at the time—modeling data, managing pipelines, and building reports. I’ve been doing the job for over a year now, and haven’t regretted a day of it.

“I realized that the limiting factor of our data org wasn’t our ability to do cutting-edge analysis using fancy formulas and algorithms. Rather, we were constrained by our ability to provide clean, transformed, and documented data to the rest of the company on a reliable schedule. ”

— ELI

Eli: At my very first job in data, I worked as an analyst on a small data team with eight headcount total. We were writing a lot of queries and calculations that were quite complicated, but all of the work existed either inside our business intelligence (BI) tool, in Jupyter notebooks, or in SQL files that lived on our laptops. So none of that work was being shared between us, and we weren't able to build on top of our existing efforts. The work we put in wasn't able to compound on itself.

I thought to myself "there has to be a better way". So I initially started pushing us to adopt some better practices, like checking SQL code into git and creating views for some of our most common transformations.

That was a good step but we still had a lot of pain points. For example:

There was no concept of production vs development or permissions, so we were often breaking things by working directly inside the main BI environment.
Our data warehouse had views built on top of views on top of views, meaning that you couldn't rename a table or chart without having to drop and rebuild a long list of downstream dependencies. This made developing new transformations take much longer than it should have.
Our raw data was never making it into the warehouse reliably because we had written all of the extraction and load logic ourselves. Having to deal with over a dozen different APIs inside of Airflow was a constant headache for our small team.

That experience really opened my eyes, because I realized that the limiting factor of our data org wasn't our ability to do cutting-edge analysis using fancy formulas and algorithms. Rather, we were constrained by our ability to provide clean, transformed, and documented data to the rest of the company on a reliable schedule. I found that more than anything, our stakeholders just wanted their dashboards to work and refresh with new data a few times a day. But because our data team wasn't able to deliver on that, we never gained the trust of the rest of the company, despite all of our hard work.

Back at the start of my career, I assumed my career path would become more and more mathematical and analytics-focused as I progressed, but so far it's been the opposite. I had this idea that working in data meant working on month-long machine learning projects culminating in academic style reports or presentations, but that hasn't been my experience at all. There just isn't demand for that type of analysis inside the average tech startup, and I think that's true for a lot of other industries as well.

The majority of questions that data teams get from their stakeholders are much more straightforward (How many sessions do we get per day? What are our best-selling products?). But the underlying data transformations to support those simple requests are where things can be much more complicated. This is why it's valuable to invest in analytics engineering. Because they're the ones who can build you a sessions table out of raw clickstream events, or stitch together data from three different payment systems into a single fact table. It's easy to overlook the complexity involved in the data transformation, which is why I think there will continue to be a growing demand for analytics engineers. Because when you don't have your data transformations under control, everything else becomes really really difficult.

What do you typically do during the day/what is your role?

Raphi: The core concept behind an analytics engineer is the ability to wear three hats simultaneously: an analyst, an engineer, and the end user. Your job is to sit at the intersection of those three roles and figure out the best way to get, model, and receive information. You need to consider all three when you model your data:

Engineer: Is this data easy to ingest? Will the ingestion method cause problems down the line? Are there any security considerations?
Analyst: Are the joins as efficient as possible? Will these models work in a BI tool?
User: How can I find the answer to my organization’s KPIs quickly and accurately? Do I understand the field names?

The majority of your work will take place within the analyst role as you survey and model the data. Tools like Stitch and Fivetran have made extracting and loading data much easier, so most data engineering can take place through them. More complex data engineering is either handled by data engineers or analytics engineers who are more skilled in data engineering.

That being said, the analytics engineer role is new territory within the data world, so the role likely shifts depending on each person and where they work. For example, your day-to-day as an analytics engineer will likely move around more if you work for a consulting agency (like Brooklyn Data Co.) than it would if you work in-house for a company with fairly consistent data needs.

I’m currently working on two clients with very different needs: one of them is a complete data model build out on Snowflake with a lot of custom processes into their models, and another client is dealing with a complicated data migration. My days between those two are very different. The first client is very modeling-heavy and requires a lot of analytics engineering. I get to spend eight-hour days crafting models and drafting YAML files (those are my favorite days), with occasional chunks of time where I’m recreating their BI dashboards with the new models. The other client is much more analyst-focused—I’ve been spending most of my time QAing the migration and identifying bugs. The second client will eventually transition to modeling, but until we get there I’m just wearing my “analyst hat.”

How can analytics engineers add value to the companies they work for?

Eli: One way to view the role of an analytics engineer is as a bridge between the engineering department and the business department (source). My goal is to empower people on the business side of the company to take advantage of technology, specifically the data that's being collected. And there are some shifts happening on both the technology side and the business side that explain why analytics engineering is starting to emerge as a critical practice.

On the technology side, we've now had access to amazing tools like dbt, Snowflake, Looker, and Fivetran that weren't available even five years ago. The common theme of these tools is that they're making our lives easier and freeing up our time to focus on high-impact work. We can spend less time on generic data tasks and more time working on problems that are specific to the business.

Historically it's been difficult for analytics teams to add value to their companies, largely because the tools weren't very good. I think the main reason it took so long for the analytics engineer role to emerge was because the tools needed to do the job well didn’t exist yet. Like I mentioned in my description of my first job, people working in data couldn't take advantage of some of the basic tooling that software engineers have had for years (version control, testing, packages). But just within the last few years we've entered into a period where the tooling has rapidly improved, and analytics engineers can really start accomplishing more with fewer resources.

On the business side, we're seeing more and more demand for data, and the concept of being data-driven has gone completely mainstream. Every department in the modern company is asking for their own reporting. So a large part of analytics engineering is figuring out how to empower these organization that are hungry for data.

As many companies have found out, just connecting a BI tool to a raw data source is a recipe for disaster. When you skip the step of building out a well designed transformation layer, your final product won’t be reliable or accurate, and any sort of maintenance becomes a headache. So as an analytics engineer, I spend a lot of time really understanding the business use cases with the goal of empowering the data consumers to get as much value from the data as possible.

What advice would you give to someone who is considering a career as an analytics engineer?

“...the six-month period where I first started using SQL on the job was the highest velocity learning phase of my life. And if you can nail down SQL then you’ll have an immediately useful skill, and you’ll feel like you have superpowers. ”

— ELI

“I think anyone who has used SQL and developed their own ad-hoc tables and tests to get around clunky reporting systems will latch onto dbt—you’ll recognize it as the thing you were trying to do but couldn’t quite do.”

— RAPHI

Raphi: I always thought I was going to be a lawyer when I was growing up, so I’m still occasionally surprised that I ended up here. That being said, I think that the fundamental skills for being an analytics engineer aren’t tech-specific per se. You do need to learn SQL and occasionally other languages to do your job, but those are pretty straightforward to learn, especially if you have a project that requires it.

I think a common behavior across analytics engineers is curiosity—you want to know where the data is coming from and why it's formatted that way, and you also (to the annoyance of your peers) frequently want to tinker with it. Another common behavior is flexibility. A good analytics engineer can think about data from a user’s perspective, an analyst’s perspective, and an engineer’s perspective.

If you’ve been reading this article and nodding your head, I’d suggest focusing on learning SQL, researching data modeling and warehouse design, git, and learning how to use dbt. I had never used dbt before I became an analytics engineer, but I acclimated pretty quickly. I think anyone who has used SQL and developed their own ad-hoc tables and tests to get around clunky reporting systems will latch onto dbt—you’ll recognize it as the thing you were trying to do but couldn’t quite do. I’d also recommend joining dbt’s Slack community—it’s a great resource for anyone interested in analytics engineering and data modeling.

I also recommend researching the differences between data warehouses like Redshift, Snowflake, BigQuery, etc., along with their best use cases. For example, Snowflake has great JSON parsing functions, which can greatly simplify your transformation code.

Finally, experiment! You don’t need to be hired as an analytics engineer to work on your analytics engineering skills. Install dbt on your computer and try to replicate the process you already have (and are probably frustrated with). Be open to learning new ways to manage your data, and reach out with questions when you need help.

Eli: As a college dropout, I got into data from a self-taught background, and I was forced to learn the skills of self-education really early on in my career. I’ve found that having those skills has proved to be valuable later in my career, as I’ve been able to successfully pivot into this new area of analytics engineering. So I would recommend getting good at teaching yourself new things, and to me what that boils down to is being able to explore your curiosities. Don’t ever force yourself to study things that you don’t find interesting, but if you think you might be interested in data, then analytics engineering is a great path to go down.

I would recommend finding a way to use SQL in a real-world business setting if at all possible. Mode’s free community version is a good resource that would allow you to simulate writing SQL at a real company. If you find a way to use dbt too in combination with SQL, then that’s even better. Looking back, I think that the six-month period where I first started using SQL on the job was the highest velocity learning phase of my life. And if you can nail down SQL then you’ll have an immediately useful skill, and you’ll feel like you have superpowers.

I would echo Raphi’s point about joining the dbt Slack and also the one for the Locally Optimistic blog. I’ve found both of those communities to be very helpful because you can connect with other data professionals that are experiencing the same problems as you within their own respective companies. But it can still be useful for somebody earlier in their career, because it paints a picture of what working on a data team is actually like. And practically speaking, I’ve gotten a lot of personal value from them because I’ve found my last two jobs from connections I’ve made through those communities.

Special thanks to our team members Raphi and Eli for sharing their experience, and to Alexis Johnson-Gresham for contributing to this post!

Curious about how an Analytics Engineer could make an impact on your data team? Wondering what the best way to craft a job description is? We can help with that. Reach out: hello@brooklyndata.co

Build data capabilities that last with Brooklyn Data Co

Get in touch