Let’s talk about Python concepts used for data science. This is a valuable and growing field in 2024, and there are many things you’ll need to know if you want to use this programming language to evaluate data. Below, I’ll share 10 Python concepts I wish I knew earlier in my data science career. I included detailed explanations for each, including code examples. This will help introduce and reinforce Python concepts that you’ll use again and again.
-
Boolean Indexing & Multi-Indexing
When it comes to data science and Python, Pandas is the name of the game! And one of the things that sets Pandas apart is its powerful indexing capabilities. Sure, basic slicing is intuitive for Pandas users, but there’s much more you can do with advanced indexing methods, like boolean indexing and multi-indexing.
What is boolean indexing, though? Well, this is an elegant way to filter data based on criteria. So rather than explicitly specifying index or column values, you pass a condition, and Pandas returns rows and columns that meet it. Cool, but what is multi-indexing? Sometimes known as hierarchical indexing, this is especially useful for working with higher-dimensional data. This lets you work with data in a tabular format (which is 2D by nature) while preserving the dataset’s multi-dimensional nature.
I bet you’re already itching to add these ideas to your Python projects! The real benefit of these methods is the flexibility they bring to data extraction and manipulation. After all, this is one of the major activities of data science!
\'\'\' Hackr.io: 10 Python Concepts I Wish I Knew Earlier Advanced Indexing & Slicing: General Syntax \'\'\' # Boolean Indexing df[boolean_condition] # Multi-Indexing (setting) df.set_index(['level_1', 'level_2'])
-
Regular Expressions
Ask any data scientist; they’ll probably all have a tale about challenges with messy or unstructured data. This is where the magical power of those cryptic-looking regular expressions comes into play! Regex is an invaluable tool for text processing, as we can use it to find, extract, and even replace patterns in strings.
And yes, I know that learning regular expressions can seem daunting at first, given the cryptic-looking patterns that they use. But trust me, when you understand the basic building blocks and rules, it becomes an extremely powerful tool in your toolkit. It’s almost like you’ve learned to read The Matrix!
That said, it always helps to have a regex cheat sheet handy if you can’t quite remember how to formulate an expression. When it comes to Python, the re module provides the interface you need to harness regular expressions. You can match and manipulate string data in diverse and complex ways by defining specific patterns.
\'\'\' Hackr.io: 10 Python Concepts I Wish I Knew Earlier Regular Expressions: General Syntax \'\'\' import re # Basic match re.match(pattern, string) # Search throughout a string re.search(pattern, string) # Find all matches re.findall(pattern, string) # Replace patterns re.sub(pattern, replacement, string)
-
String Methods
Whether you’re working with text data, filenames, or data cleaning tasks, String processing is ubiquitous in data science. In fact, if you’ve taken a Python course, you probably found yourself working with Strings a lot! Thankfully, Python strings come with a host of built-in methods that make these tasks significantly simpler.
So whether you want to change case, check prefixes/suffixes, split, join, and more, there’s a built-in method that does just that. Awesome! Generally speaking, String methods are straightforward, but their real power shines when you learn how and when to combine them effectively.
And, because Python‘s string methods are part of the string object, you can easily chain them together, resulting in concise and readable code. Pythonic indeed!
\'\'\' Hackr.io: 10 Python Concepts I Wish I Knew Earlier String Methods: Commonly Used Methods \'\'\' # Change case string.upper() string.lower() string.capitalize() # Check conditions string.startswith(prefix) string.endswith(suffix) # Splitting and joining string.split(delimiter) delimiter.join(list_of_strings)
-
Lambda Functions
Python lambda functions are one of those techniques that you need to have in your toolkit when it comes to data science! The TL-DR is that they provide a quick and concise way to declare small functions on the fly. Yep, no need for the def keyword or a function name here!
And, when you pair these with functions like map() and filter(), lambda functions really shine for data science. Pick up any good Python book, and you’ll see this in action!
If you’re not quite sure why, no problem! Let’s take a quick detour. With map() you can apply a function to all items in an input sequence (like a list or tuple). The filter() function also operates on sequences, but it constructs an iterator from the input sequence elements that return True for a given function.
\'\'\' Hackr.io: 10 Python Concepts I Wish I Knew Earlier Lambda with map() and filter() Example \'\'\' # Original list of numbers numbers = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10] # Double each number using map() and lambda doubled_numbers = list(map(lambda x: x*2, numbers))