Files
the_information_nexus/work/tbx/data_work.md
2024-04-19 03:49:28 +00:00

18 KiB

Certainly! Here's an example of how you can load and explore the data in a Jupyter Notebook using Python and the Pandas library:

```python
import pandas as pd

# Load the data from CSV file
df = pd.read_csv('raw_data.csv')

# Display the first 5 rows of the data
print("First 5 rows:")
display(df.head())

# Display the last 5 rows of the data
print("\nLast 5 rows:")
display(df.tail())

# Get information about the DataFrame
print("\nDataFrame Info:")
display(df.info())

# Get descriptive statistics of the numeric columns
print("\nDescriptive Statistics:")
display(df.describe())

# Check for missing values
print("\nMissing Values:")
display(df.isnull().sum())

# Get the data types of each column
print("\nData Types:")
display(df.dtypes)

# Get the shape of the DataFrame (rows, columns)
print(f"\nDataFrame Shape: {df.shape}")

Output:

First 5 rows:
P&L Rollup Sales Channel Booked Date Customer PO Number Order Number Line End Customer Name Line Number Ordered Quantity Manufacturer Item Number - Root Item Number - Child1 Item Number - Child2 Item Number - Child3 Item Number - Child4 Item Number - Child5 Item Description Product Family Service Contract Duration Buyer Special Pricing Type Deal ID Special Pricing Account Number Material Designator SO Line Type Ship From Consigned Org Order Type Org Code Ext. Global Revenue Ext. Global Cost CDSD Dist Deviation Price Ext Request Date Promise Date Schedule Ship Date Schedule Arrival Date Ship Confirm Date Actual Shipment Date Billed Revenue Cisco GP Cisco Mulit Yr Adj. Adj. Cisco GP Adj. Cisco Deviation Price Cisco Subsidy Non-Cisco GP Total GP Placeholder VES Fee VES Total GP No Invoice Cisco Multi Year VES UK Order Margin ISO Week #
0 TBX AT&T AT&T-PPI - Universal CPE-Hayes, Kyle 14-Feb-2024 4334036721 11349009 - 1 2 SILICOM CPW000066 nan nan nan nan nan Cable,Power Cord,AC,Brazil,10A/250V,1.8M,BLK Other - Faerber, Jennifer Leigh - nan - - Telco Drop Ship-Invoice N Telco Drop Ship 24 5.40 5.40 - 15-Feb-2024 - - - - - $ 5.40 $ - $ - $ - $ - $ - $ - $ - nan 0 0 FALSE FALSE FALSE FALSE FALSE 0% 7
1 TBX AT&T ATT Whitebox Distribution - UK 18-Jan-2024 4334035695 11309329 - 1 1 SILICOM CBL000077 nan nan nan nan nan Cable,USB2 to RS-232 Cisco pinout,RJ-45 console Other - Faerber, Jennifer Leigh - nan - - WWTUK Drop Ship-Invoice N WWTUK Mixed 100 7.62 7.62 - 02-Feb-2024 15-Feb-2024 15-Feb-2024 15-Feb-2024 14-Feb-2024 14-Feb-2024 $ 7.62 $ - $ - $ - $ - $ - $ - $ - nan 0 0 FALSE FALSE FALSE FALSE TRUE 0% 3
2 TBX AT&T ATT Whitebox Distribution - UK 18-Jan-2024 4334035695 11309329 - 2 2 SILICOM CPW000024 nan nan nan nan nan Cable,Power Cord,AC,Italy,10A/250V,1.8M,BLK Other - Faerber, Jennifer Leigh - nan - - WWTUK Drop Ship-Invoice N WWTUK Mixed 100 5.55 5.55 - 02-Feb-2024 15-Feb-2024 15-Feb-2024 15-Feb-2024 14-Feb-2024 14-Feb-2024 $ 5.55 $ - $ - $ - $ - $ - $ - $ - nan 0 0 FALSE FALSE FALSE FALSE TRUE 0% 3
3 TBX AT&T ATT Whitebox Distribution - UK 18-Jan-2024 4334035695 11309329 - 3 1 SILICOM KXVE CPE MECH nan nan nan SILICOM KIT- K#VE-CPE-MECH VE-CPE MECHANICAL PAR Other - Faerber, Jennifer Leigh - nan - - WWTUK Drop Ship-Invoice N WWTUK Mixed 100 15.32 14.40 - 02-Feb-2024 15-Feb-2024 15-Feb-2024 15-Feb-2024 14-Feb-2024 14-Feb-2024 $ 15.32 $ 0.92 $ - $ 0.92 $ - $ - $ 0.92 $ 1.84 nan 0 0 FALSE FALSE FALSE FALSE TRUE 12% 3
4 TBX AT&T CDSD - ATT Diversity Plus 02-Jan-2024 PO3000254372 11284576 Department of Justice 1 1 CISCO SYSTEMS (CISCOPRO) C8500 12X nan nan nan nan Cisco Catalyst 8500-12X Edge Platform Catalyst Family - Berry, Gregory M. (Greg) DEAL 75430194 75430194 - Telco Drop Ship-Invoice N Telco Drop Ship 24 47,159.49 46,134.29 46,134.29 02-Jan-2024 01-Feb-2024 01-Feb-2024 01-Feb-2024 24-Jan-2024 27-Jan-2024 $ 47,159.49 $ 1,025.20 $ - $ 1,025.20 $ 46,134.29 $ 830.42 $ - $ 1,855.62 nan 0 0 FALSE TRUE FALSE FALSE FALSE 4% 1
Last 5 rows:
P&L Rollup Sales Channel Booked Date Customer PO Number Order Number Line End Customer Name Line Number Ordered Quantity Manufacturer Item Number - Root Item Number - Child1 Item Number - Child2 Item Number - Child3 Item Number - Child4 Item Number - Child5 Item Description Product Family Service Contract Duration Buyer Special Pricing Type Deal ID Special Pricing Account Number Material Designator SO Line Type Ship From Consigned Org Order Type Org Code Ext. Global Revenue Ext. Global Cost CDSD Dist Deviation Price Ext Request Date Promise Date Schedule Ship Date Schedule Arrival Date Ship Confirm Date Actual Shipment Date Billed Revenue Cisco GP Cisco Mulit Yr Adj. Adj. Cisco GP Adj. Cisco Deviation Price Cisco Subsidy Non-Cisco GP Total GP Placeholder VES Fee VES Total GP No Invoice Cisco Multi Year VES UK Order Margin ISO Week #
9 TBX AT&T CDSD - ATT Diversity Plus 02-Jan-2024 PO3000254372 11284576 Department of Justice 10 1 CISCO SYSTEMS (CISCOPRO) GLC SX MMD nan nan nan 1000BASE-SX SFP transceiver module, MMF, 850nm, DOM SFP Family - Berry, Gregory M. (Greg) DEAL 75430194 75430194 - Telco Drop Ship-Invoice N Telco Drop Ship 24 281.81 275.68 275.68 02-Jan-2024 01-Feb-2024 01-Feb-2024 01-Feb-2024 24-Jan-2024 27-Jan-2024 $ 281.81 $ 6.13 $ - $ 6.13 $ 275.68 $ 4.96 $ - $ 11.09 nan 0 0 FALSE TRUE FALSE FALSE FALSE 4% 1
10 TBX AT&T CDSD - ATT Diversity Plus 02-Jan-2024 PO3000254372 11284576 Department of Justice 11 1 CISCO SYSTEMS (CISCOPRO) IOSXE AUTO MODE nan nan nan IOS XE Autonomous boot up mode for Unified image Other - Berry, Gregory M. (Greg) DEAL 75430194 75430194 - Telco Drop Ship-No Invoice N Telco Drop Ship 24 0.00 0.00 0.00 02-Jan-2024 01-Feb-2024 01-Feb-2024 01-Feb-2024 28-Jan-2024 28-Jan-2024 $ - $ - $ - $ - $ - $ - $ - $ - nan 0 0 TRUE TRUE FALSE FALSE FALSE 0% 1
11 TBX AT&T CDSD - ATT Diversity Plus 02-Jan-2024 PO3000254372 11284576 Department of Justice 12 2 CISCO SYSTEMS (CISCOPRO) PWR CH1 750WACR nan nan nan Cisco C8500 750W AC Power Supply Power Supply - Berry, Gregory M. (Greg) DEAL 75430194 75430194 - Telco Drop Ship-No Invoice N Telco Drop Ship 24 0.00 0.00 0.00 02-Jan-2024 01-Feb-2024 01-Feb-2024 01-Feb-2024 28-Jan-2024 28-Jan-2024 $ - $ - $ - $ - $ - $ - $ - $ - nan 0 0 TRUE TRUE FALSE FALSE FALSE 0% 1
12 TBX AT&T CDSD - ATT Diversity Plus 02-Jan-2024 PO3000254372 11284576 Department of Justice 13 2 CISCO SYSTEMS (CISCOPRO) CAB AC nan nan nan nan AC Power Cord (North America), C13, NEMA 5-15P, 2.1m Other - Berry, Gregory M. (Greg) DEAL 75430194 75430194 - Telco Drop Ship-No Invoice N Telco Drop Ship 24 0.00 0.
---

Using Jupyter Notebook for this analysis is a great idea! Jupyter Notebook provides an interactive environment where you can combine code, visualizations, and explanatory text in a single document. This makes it easy to create a narrative flow for your analysis and share the results with others.

Here's how you can structure your Jupyter Notebook for a more scroll-through analysis:

1. Introduction
- Provide an overview of the dataset and the goals of the analysis
- Describe the data source and any relevant background information

2. Data Loading and Exploration
- Load the CSV file into a Pandas DataFrame
- Explore the data using functions like `head()`, `tail()`, `info()`, `describe()`
- Provide commentary on the structure and quality of the data

3. Data Cleaning and Preprocessing
- Handle missing values and data type conversions
- Create new calculated columns as needed
- Describe the steps taken to clean and preprocess the data

4. Sales Performance Analysis
- Group data by relevant dimensions and time periods
- Calculate and display key sales metrics 
- Use visualizations to highlight trends and patterns
- Provide insights and commentary on the findings

5. Profitability Analysis
- Calculate and display profitability metrics by segment
- Use visualizations to compare profitability across categories
- Provide insights and commentary on the findings

6. Customer Analysis
- Group data by customer and calculate key metrics
- Identify top customers and analyze their characteristics
- Use visualizations to highlight customer trends and patterns
- Provide insights and commentary on the findings

7. Insights and Recommendations
- Summarize the key takeaways from the analysis
- Provide data-driven recommendations for improvement
- Prioritize actions based on potential impact

8. Next Steps
- Discuss potential future analyses or data collection efforts
- Provide guidance on how to operationalize the insights

Throughout the notebook, use a combination of code cells for data manipulation and analysis, and markdown cells for commentary, insights, and recommendations. Use visualizations wherever possible to make the findings more engaging and easier to understand.

Here's an example of how the notebook might flow:

![Jupyter Notebook Example](https://i.imgur.com/aTWiXLG.png)

To create charts and graphs in Jupyter Notebook, you can use the `%matplotlib inline` magic command to display plots directly in the notebook. For example:

```python
import pandas as pd
import matplotlib.pyplot as plt

%matplotlib inline

# Load data
df = pd.read_csv('raw_data.csv')

# Analyze sales by month
sales_by_month = df.groupby(pd.Grouper(key='Date', freq='M'))['Revenue'].sum().reset_index()

# Create line chart
plt.figure(figsize=(10,6))
plt.plot(sales_by_month['Date'], sales_by_month['Revenue'])
plt.title('Sales Revenue by Month')
plt.xlabel('Month')
plt.ylabel('Revenue ($)')
plt.show()

This will create a line chart of sales revenue by month directly in the Jupyter Notebook.

By combining code, visualizations, and commentary in a logical flow, you can create a compelling and informative analysis that's easy for others to follow and understand.

Let me know if you have any other questions as you create your Jupyter Notebook!