Getting Started with Pandas Library: DataFrame and Series

Saranya Bhattacharjee
3 min readJul 5, 2021
Data Analysis
Source:Freepik

Just starting out ? Confused between the two data structures,Dataframe and Series?

Trust me,even I was puzzled between the two when I first delved into Data Science and Machine Learning.

So,here I am going to give you a brief introduction to the two core objects in pandas so that it can be a smooth sailing as you dig deeper into Pandas.

Pandas’ documentation states,”pandas is an open source, BSD-licensed library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language.”

There are two core objects in pandas: the DataFrame and the Series.

DataFrame

Dataframe

A DataFrame is a data structure we use quite often in pandas that serves as a tabular representation of data. Think of a DataFrame as an Excel spreadsheet or database table.

pd.DataFrame({'Yes': [50, 21], 'No': [131, 2]})

In this example, the “0, No” entry has the value of 131. The “0, Yes” entry has a value of 50, and so on.

DataFrame entries are not limited to integers. For instance, here’s a DataFrame whose values are strings:

pd.DataFrame({'Bob': ['I liked it.', 'It was awful.'], 
'Sue': ['Pretty good.', 'Bland.']},
index=['Product A', 'Product B'])

In the above example,we are using the pd.DataFrame() constructor to generate these DataFrame objects. The syntax for declaring a new one is a dictionary whose keys are the column names (Bob and Sue in this example), and whose values are a list of entries. This is the standard way of constructing a new DataFrame, and the one you are most likely to encounter.

Also,the dictionary-list constructor assigns values to the column labels, but just uses an ascending count from 0 (0, 1, 2, 3, …) for the row labels. Sometimes this is OK, but oftentimes we will want to assign these labels ourselves.

The list of row labels used in a DataFrame is known as an Index. As shown in the above example, we can assign values to it by using an index parameter in our constructor.

Series

Series

A Series, by contrast, is a sequence of data values. If a DataFrame is a table, a Series is a list. And in fact you can create one with nothing more than a list:

pd.Series([1, 2, 3, 4, 5])

A Series is, in essence, a single column of a DataFrame. So you can assign column values to the Series the same way as before, using an index parameter. However, a Series does not have a column name, it only has one overall name:

pd.Series([30, 35, 40], index=['2015 Sales', '2016 Sales', '2017 Sales'], name='Product A')

The Series and the DataFrame are intimately related. It’s helpful to think of a DataFrame as actually being just a bunch of Series “glued together”.

So, I guess I was able to make you folks understand the two data structures.I tried to make it compact and easy to understand for an absolute beginner.Hope you guys liked it.

Thanks for Reading…!!!!!

Saranya

--

--