data science fundamentals

Python for Data Science

Python has become the de facto language for data science. In this chapter, we'll learn the essential Python concepts and libraries for data science.

Setting Up Your Environment

Installing Python and Required Tools

# Check Python version
python --version

# Install pip if not installed
curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py
python get-pip.py

# Install essential packages
pip install numpy pandas matplotlib seaborn jupyter scikit-learn

Essential Python Libraries for Data Science

NumPy - Numerical Computing

import numpy as np

# Creating arrays
arr = np.array([1, 2, 3, 4, 5])
matrix = np.array([[1, 2], [3, 4]])

# Basic operations
print(arr.mean())  # Mean
print(arr.std())   # Standard deviation
print(matrix.shape)  # Array dimensions

Pandas - Data Manipulation

import pandas as pd

# Creating a DataFrame
data = {
    'Name': ['John', 'Anna', 'Peter'],
    'Age': [28, 22, 35],
    'City': ['New York', 'Paris', 'London']
}
df = pd.DataFrame(data)

# Basic operations
print(df.head())  # View first few rows
print(df.describe())  # Statistical summary

Working with Jupyter Notebooks

Jupyter Notebooks are interactive computing environments perfect for data science:

Start Jupyter:

jupyter notebook

Create a new notebook
Write and execute code cells
Add markdown documentation

Best Practices

Code Organization

# Imports at the top
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

# Constants
DATA_PATH = "data/dataset.csv"
RANDOM_SEED = 42

# Functions
def load_data(path):
    """Load and preprocess data."""
    return pd.read_csv(path)

def analyze_data(df):
    """Perform basic analysis."""
    return df.describe()

Data Science Workflow

Import required libraries
Load and inspect data
Clean and preprocess
Analyze and visualize
Document findings

Exercises

Create a NumPy array and perform basic operations
Load a CSV file into a Pandas DataFrame
Create a simple data visualization
Write a function to clean data

In the next chapter, we'll dive deeper into data analysis fundamentals.

On This Page

Python for Data Science Setting Up Your Environment Installing Python and Required Tools Essential Python Libraries for Data Science NumPy - Numerical Computing Pandas - Data Manipulation Working with Jupyter Notebooks Best Practices Code Organization Data Science Workflow Exercises