data science fundamentals

Python for Data Science

Python has become the de facto language for data science. In this chapter, we'll learn the essential Python concepts and libraries for data science.

Setting Up Your Environment

Installing Python and Required Tools

# Check Python version
python --version

# Install pip if not installed
curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py
python get-pip.py

# Install essential packages
pip install numpy pandas matplotlib seaborn jupyter scikit-learn

Essential Python Libraries for Data Science

NumPy - Numerical Computing

import numpy as np

# Creating arrays
arr = np.array([1, 2, 3, 4, 5])
matrix = np.array([[1, 2], [3, 4]])

# Basic operations
print(arr.mean())  # Mean
print(arr.std())   # Standard deviation
print(matrix.shape)  # Array dimensions

Pandas - Data Manipulation

import pandas as pd

# Creating a DataFrame
data = {
    'Name': ['John', 'Anna', 'Peter'],
    'Age': [28, 22, 35],
    'City': ['New York', 'Paris', 'London']
}
df = pd.DataFrame(data)

# Basic operations
print(df.head())  # View first few rows
print(df.describe())  # Statistical summary

Working with Jupyter Notebooks

Jupyter Notebooks are interactive computing environments perfect for data science:

  1. Start Jupyter:
jupyter notebook
  1. Create a new notebook
  2. Write and execute code cells
  3. Add markdown documentation

Best Practices

Code Organization

# Imports at the top
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

# Constants
DATA_PATH = "data/dataset.csv"
RANDOM_SEED = 42

# Functions
def load_data(path):
    """Load and preprocess data."""
    return pd.read_csv(path)

def analyze_data(df):
    """Perform basic analysis."""
    return df.describe()

Data Science Workflow

  1. Import required libraries
  2. Load and inspect data
  3. Clean and preprocess
  4. Analyze and visualize
  5. Document findings

Exercises

  1. Create a NumPy array and perform basic operations
  2. Load a CSV file into a Pandas DataFrame
  3. Create a simple data visualization
  4. Write a function to clean data

In the next chapter, we'll dive deeper into data analysis fundamentals.