Archive
A collection of 38 issues
Latest
3 inequalities for good models π
Guidelines for good models, that often hold true.
3 VS Code Snippets, to make writing in Quarto even easier! π¨
Today, I bring you something hands-on: 3 snippets to implement figures in Quarto with ease. Cross-referencing, captions and all.
These 3 snippets include: r-figure, r-figure-subplot2 and r-figure-subplot3. You can find them at the end of this article.
Ask better questions
If you are working on the wrong problem, the quality of even the best neural networks, probabilistic and gradient boosted models would be useless. However, in data science, it is assumed that the problem being worked on is always clear. It is not.
3 inequalities for good code π
Guidelines for good code, that often hold true.
Code Review #3: The Creator
A curation of articles on the data scientist's role as a creator, by ds-econ
Write your next paper with quarto: A Github template for you!
Find the GitHub Template below!
Code Review #2: Ethics
A curation of articles on Ethics in data science, by ds-econ.
Your experiences needed
Code Review #1: Writing βοΈ
In the past I have written a lot about how to set up your programming environment and how to communicate your research.
Make your slides look nice! A GitHub project template for Quarto.
Creating presentations can be a time-consuming and challenging task, especially if you have to incorporate your code and output in them. However, what if you can create beautiful and engaging slides, again and again, in no time using Quarto and Github?
Quarto is a modern and versatile open-source tool that
PSA: New Newsletters every Wednesday π
Does my model do what it is supposed to do? π§ͺ
Driving your web-scraper - Conclusion π
π¨πΌβπ»This is an excerpt from my long form article Web-scraping requires you to think: How to construct your web-scraper ethically and make it more human.
You can find the full article (with more context) now on Medium
To make use of the vehicle, we need to execute it in a
Building your web-scraper π§
π¨πΌβπ»This is an excerpt from my long form article Web-scraping requires you to think: How to construct your web-scraper ethically and make it more human.
You can find the full article (with more context) now on Medium
Now that we thought about some ideas behind how a vehicle could look
Drawing up your web-scraper π·ββοΈ
π¨πΌβπ»This is an excerpt from my long form article Web-scraping requires you to think: How to construct your web-scraper ethically and make it more human.
You can find the full article (with more context) now on Medium
Before we start writing the code for our web-vehicle, we need to start
Humanise your web-scraper ποΈ
π¨πΌβπ»This is an excerpt from my long form article Web-scraping requires you to think: How to construct your web-scraper ethically and make it more human.
You can find the full article (with more context) now on Medium
When programming a web-scraper, let your code sleep. A lot.
Think about where
Stay out of jail! A primer on staying of the legal side of things with web-scraping
Today, I have a special for you: A snippet from an upcoming long-form article on Medium on my thought process behind web-scraping. The snippet for today contains some pointers on the legality of using web-scraping.
Disclaimer: Legal Considerations
I am no lawyer. Still, I think that the following recommendation is
Be Courageous! π€Ί
Networks, Scale, and Elegance: What constitutes a good programming language for data science? π
What makes a programming language great for data science? Network effects, scaleability and ease of communication are certainly three important factors.
Who is your audience?
Ask yourself: Who are you coding / writing / researching for? Data scientists have an audience, such as screenwriters or movie directors. Who is your audience?
Do you have a data science uniform?
Professionals wear a uniform.β¨ Firefighters wear their gear, doctors wear their gurneys, and bakers wear their aprons.
When I do my work, I love to wear my sports coat. It is my uniform for data science.β¨It makes me feel like a professional.
Do you have a data science uniform?
NEW NEW NEW on ds-econ π
Attention: This is a PSA delivered to members of ds-econ.com
Expect crucial information, BTS content and the secret to the universe.
Your first data science internship: 5 things I had to learn twice
So far, I have worked as an intern in data science twice. Once at the German Federal Bank, and the other time at Lidl.
Interning in data science is valuable and helps you learn programming fast.
Either one or everyone: Two paths for accountability in data science π¨πΌββοΈ
Professionals need to hold each other accountable. Especially data scientists.
If there is nobody who can judge you work, what keeps you from cheating / slacking / lying?
There are two paths you can take. A hard one and a scary one.
There is no shortcut: Data scientists need hard training ποΈ
Every data scientist needs rigorous training in mathematics, statistics, and programming.
Are you a data scientists? You need a blog. π¨πΌβπ»
Be the glue guy on your research team π
As data scientists, allegedly the Sexiest Job in the 21st century, it is easy to view ourselves as the superstars of our research team. In fact, we are the glue guys, the people that do humble work and make everyone else better by doing so.
No such thing as coder's block π§±
There is no such thing as writer's block.
Seth Godin has been preaching this for years, to writers and other creatives.
It also applies to us, data scientists: There is no such thing as coder's block.
Writers see an empty page and are "[afraid] of bad writing".
Programmers see an
Change to VS Code! An overview with a minimal setup for data science.
A glimpse at VS Code, and a minimal setup for data science. Snippets, keyboard shortcuts, and extensions for data science.
Reproducible, Ethical and Collaborative Data Science: The Turing Way Project
π¬"Make reproducible research too easy not to do" β Turing Way motto
The Turing Way Project, is an open-source project which should excite any academic with an interest in data science. It is an almanac of techniques and best practices for making your research more reproducible and ready for an open
Seeded Topic Models as a Yard Stick: Implement them in R with keyATM
In text analysis, topic models are a prominent approach to extract overall themes from large collections of documents. Maybe the most widely used model in this domain is the Latent Dirichlet Allocation (LDA).
LDA is a probabilistic topic model, and exists in many variations. Today we will look at one
Big Git Energy: Headaches with large files and GitHub, common pitfalls, and purging files from the course of history
Large projects often have large data files. Recently, I tried to push the repository of my master thesis to GitHub to share my project with others.
Sure enough, there were problems. Lots of them.
As you might experience some bumps in the road when sharing big data files via GitHub,
The community's 5 years of Data Science β YOUR experiences with talking to stakeholders, asking great questions, and bad communication
In my recent post 5 Years of Data Science β Thoughts on hard starts, overrated comments, and garbage data we talked about 5 things, that I learned in my first 5 years of data science.
However, I was curious about what you learned. Hence, I turned to reddit /r/datascience to
Heads up! Quarto is here to stay. Immediately combine R & Python in your next document: An extension on a recent post.
Quarto is a generalisation of R Markdown by the same core developers. This framework allows you to weave together R, Python, Julia, and OJS with the code's output and your writing.
5 years of Data Science - Thoughts on hard starts, overrated comments, and garbage data
In this post, I look back at my first 5 years in data science and some of the most important things that I learnt. We answer the questions, why learning a second programming language is easier, how you can enhance you code documentation and why hard to get data is the real deal.
In 2 minutes: Sharing data with git subtree
Git is of great use and provides version control to your research project, and makes collaboration easy. The often overlooked module git subtree allows you to publish a specific subfolder of your project, e.g. your data folder, to a separate GitHub repository.
3 frameworks into one β Write your next paper with R Studio!
Here, I will show you my current workflow to writing research papers in my studies. This is an approach to come up with a coherent and simple setup, which weaves together the literature and data related aspects of a project.