### Question :

Numpy seems to make a distinction between `str`

and `object`

types. For instance I can do ::

```
>>> import pandas as pd
>>> import numpy as np
>>> np.dtype(str)
dtype('S')
>>> np.dtype(object)
dtype('O')
```

Where dtype(‘S’) and dtype(‘O’) corresponds to `str`

and `object`

respectively.

However pandas seem to lack that distinction and coerce `str`

to `object`

. ::

```
>>> df = pd.DataFrame({'a': np.arange(5)})
>>> df.a.dtype
dtype('int64')
>>> df.a.astype(str).dtype
dtype('O')
>>> df.a.astype(object).dtype
dtype('O')
```

Forcing the type to `dtype('S')`

does not help either. ::

```
>>> df.a.astype(np.dtype(str)).dtype
dtype('O')
>>> df.a.astype(np.dtype('S')).dtype
dtype('O')
```

Is there any explanation for this behavior?

##
Answer #1:

Numpy’s string dtypes aren’t python strings.

Therefore, `pandas`

deliberately uses native python strings, which require an object dtype.

First off, let me demonstrate a bit of what I mean by numpy’s strings being different:

```
In [1]: import numpy as np
In [2]: x = np.array(['Testing', 'a', 'string'], dtype='|S7')
In [3]: y = np.array(['Testing', 'a', 'string'], dtype=object)
```

Now, ‘x’ is a `numpy`

string dtype (fixed-width, c-like string) and `y`

is an array of native python strings.

If we try to go beyond 7 characters, we’ll see an immediate difference. The string dtype versions will be truncated:

```
In [4]: x[1] = 'a really really really long'
In [5]: x
Out[5]:
array(['Testing', 'a reall', 'string'],
dtype='|S7')
```

While the object dtype versions can be arbitrary length:

```
In [6]: y[1] = 'a really really really long'
In [7]: y
Out[7]: array(['Testing', 'a really really really long', 'string'], dtype=object)
```

Next, the `|S`

dtype strings can’t hold unicode properly, though there is a unicode fixed-length string dtype, as well. I’ll skip an example, for the moment.

Finally, numpy’s strings are actually mutable, while Python strings are not. For example:

```
In [8]: z = x.view(np.uint8)
In [9]: z += 1
In [10]: x
Out[10]:
array(['Uftujoh', 'b!sfbmm', 'tusjohx01'],
dtype='|S7')
```

For all of these reasons, `pandas`

chose not to ever allow C-like, fixed-length strings as a datatype. As you noticed, attempting to coerce a python string into a fixed-with numpy string won’t work in `pandas`

. Instead, it always uses native python strings, which behave in a more intuitive way for most users.