The `pandas` library is indispensable for data scientists, analysts, and anyone working with data in Python. It provides high-performance, easy-to-use data structures and data analysis tools. Below is a concise reference guide for common use cases with `pandas`, formatted in Markdown syntax: # `pandas` Reference Guide ## Installation ``` pip install pandas ``` ## Basic Concepts ### Importing pandas ```python import pandas as pd ``` ### Data Structures - **Series**: One-dimensional array with labels. - **DataFrame**: Two-dimensional, size-mutable, potentially heterogeneous tabular data with labeled axes. ## Creating DataFrames ```python # From a dictionary df = pd.DataFrame({ 'A': [1, 2, 3], 'B': ['a', 'b', 'c'] }) # From a list of lists df = pd.DataFrame([ [1, 'a'], [2, 'b'], [3, 'c'] ], columns=['A', 'B']) ``` ## Reading Data ```python # Read from CSV df = pd.read_csv('filename.csv') # Read from Excel df = pd.read_excel('filename.xlsx') # Other formats include: read_sql, read_json, read_html, read_clipboard, read_pickle, etc. ``` ## Data Inspection ```python # View the first n rows (default 5) df.head() # View the last n rows (default 5) df.tail() # Data summary df.info() # Statistical summary for numerical columns df.describe() ``` ## Data Selection ```python # Select a column df['A'] # Select multiple columns df[['A', 'B']] # Select rows by position df.iloc[0] # First row df.iloc[0:5] # First five rows # Select rows by label df.loc[0] # Row with index label 0 df.loc[0:5] # Rows with index labels from 0 to 5, inclusive ``` ## Data Manipulation ```python # Add a new column df['C'] = [10, 20, 30] # Drop a column df.drop('C', axis=1, inplace=True) # Rename columns df.rename(columns={'A': 'Alpha', 'B': 'Beta'}, inplace=True) # Filter rows filtered_df = df[df['Alpha'] > 1] # Apply a function to a column df['Alpha'] = df['Alpha'].apply(lambda x: x * 2) ``` ## Handling Missing Data ```python # Drop rows with any missing values df.dropna() # Fill missing values df.fillna(value=0) ``` ## Grouping and Aggregating ```python # Group by a column and calculate mean grouped_df = df.groupby('B').mean() # Multiple aggregation functions grouped_df = df.groupby('B').agg(['mean', 'sum']) ``` ## Merging, Joining, and Concatenating ```python # Concatenate DataFrames pd.concat([df1, df2]) # Merge DataFrames pd.merge(df1, df2, on='key') # Join DataFrames df1.join(df2, on='key') ``` ## Saving Data ```python # Write to CSV df.to_csv('filename.csv') # Write to Excel df.to_excel('filename.xlsx') # Other formats include: to_sql, to_json, to_html, to_clipboard, to_pickle, etc. ``` `pandas` is incredibly powerful for data cleaning, transformation, analysis, and visualization. This guide covers the basics, but the library's capabilities are vast and highly customizable to suit complex data manipulation and analysis tasks.