Setting datatypes in pandas DataFrames

I was working a solution to a StackoverFlow question and felt I could elaborate on the answer just a tiny bit.

In the original question, the questioner asked why they were unable to assign a decimal value (a float) to a cell in a pandas DataFrame. You can see the original question here.

QUESTION:
The question focused on trying to assign the answer to the following division equation to a cell in column 1 and row 1. Instead of getting a float, an integer is stored in that cell.

in: df['column1']['row1'] = 1 / 331616
in: df['column1']['row1']
out: 0

ANSWER:
My answer, with elaboration, follows…

pandas appears to be presuming that the datatype is an integer (int). This is because by default, pandas attempts to infer and assign a datatype to a column based on the data stored in the column. This link to the pandas documentation has a summary of the types of data that can be inferred.

If, when created, a DataFrame has integers stored in a given column, then you can’t later store floats in that column unless you set or change the datatype.

There are several ways to address this issue, either by setting the datatype to a float when the DataFrame is constructed OR by changing (or casting) the datatype (also referred to as a dtype) to a float on the fly. Let’s look at both techniques.

Setting the datatype (dtype) during construction:

>>> import pandas as pd

In making this simple DataFrame, we provide a single example value (1) and the columns for the DataFrame are defined as containing floats during creation

>>> df = pd.DataFrame([[1]], columns=['column1'], index=['row1'], dtype=float)
>>> df['column1']['row1'] = 1 / 331616
>>> df
      column1
row1 0.000003

Converting the datatype on the fly:

>>> df = pd.DataFrame([[1]], columns=['column1'], index=['row1'], dtype=int)
>>> df['column1'] = df['column1'].astype(float)
>>> df['column1']['row1'] = 1 / 331616
>>> df
      column1
row1 0.000003

Counting objects

The collections library has many wonderful tools, including the
Counter object, which greatly simplifies the counting of objects.

Let’s look at several examples:

  • counting letters in a string
  • counting items in a list
  • counting tuples within a tuple
  • special functions focused on using the results of a count

We’ll start by looking at three simple Python variables that contain elements we want to count:

>>> mystring = 'python pythonista pythonic the pythons'
>>> mylist = [11, 22, 22, 33, 33, 33, 42]
>>> mytuples = (('first', 'alpha'),
                ('first', 'alpha'), 
                ('second', 'beta'), 
                ('third', 'gamma'))

We import the Counter object from the collections module:

from collections import Counter

Then we provide the item we want counted as an argument to the Counter class:

>>> mystring = 'python pythonista pythonic the pythons'
>>> chars = Counter(mystring)
>>> chars
Counter({' ': 4,              # NOTE: the space is a character
         'a': 1,              # The counts are not in any order
         'c': 1,
         'e': 1,
         'h': 5,
         'i': 2,
         'n': 4,
         'o': 4,
         'p': 4,
         's': 2,
         't': 6,
         'y': 4})

This works just as well with our other samples:

>>> mylist = [11, 22, 22, 33, 33, 33, 42]
>>> integers = Counter(mylist)
>>> integers
Counter({11: 1, 22: 2, 33: 3, 42: 1})
>>> mytuples = (('first', 'alpha'), 
                ('first', 'alpha'), 
                ('second', 'beta'), 
                ('third', 'gamma'))
>>> tuples = Counter(mytuples)
>>> tuples
Counter({('first', 'alpha'): 2,
         ('second', 'beta'): 1, 
         ('third', 'gamma'): 1})

We can display the most common items in a Counter:

We do so, using the .most_common() function.
By providing an argument n to the function, we can limit our results to just nelements:

>>> chars.most_common(3)
[('t', 6), ('h', 5), ('p', 4)]

Interesting, we can see the elements organized into groups, based on the actual character, number, etc.

The .elements() method creates an iterable object, so to easily see the contents, it is common to encapsulate the result in another function, such as list() or sorted().

>>> list(chars.elements())
['p', 'p', 'p', 'p', 'y', 'y', 'y', 'y', 't', 't', 't', 't', 't', 't',
 'h', 'h', 'h', 'h', 'h', 'o', 'o', 'o', 'o', 'n', 'n', 'n', 'n', ' ',
 ' ', ' ', ' ', 'i', 'i', 's', 's', 'a', 'c', 'e']

>>> sorted(chars.elements())
[' ', ' ', ' ', ' ', 'a', 'c', 'e', 'h', 'h', 'h', 'h', 'h', 'i', 'i',
 'n', 'n', 'n', 'n', 'o', 'o', 'o', 'o', 'p', 'p', 'p', 'p', 's', 's',
 't', 't', 't', 't', 't', 't', 'y', 'y', 'y', 'y']

Counters come with a variety of other methods, most of which mirror dictionary methods:

['clear', 'copy', 'elements', 'fromkeys', 'get', 'items', 'keys',
 'most_common', 'pop', 'popitem', 'setdefault', 'subtract', 'update',
 'values']

Happy coding!