Taking snapshots of pandas data frames
pytest-regtest implements snapshot testing for pandas data frames
snapshot.check offers options for managing numerical accuracies and
implements tailored diagnostics for failing tests.
Write a snapshot test for a data frame
# test_dataframe.py
import pandas as pd
import numpy as np
def test_dataframe(snapshot):
df = pd.DataFrame(np.eye(3), columns=["a", "b", "c"])
snapshot.check(df, atol=1e-2, rtol=1e-2)
Run the test
The first execution of this test fails:
$ pytest -v test_dataframe.py
============================= test session starts ==============================
platform linux -- Python 3.12.12, pytest-9.0.3, pluggy-1.6.0 -- /home/docs/checkouts/readthedocs.org/user_builds/pytest-regtest/checkouts/latest/.venv/bin/python
cachedir: .pytest_cache
rootdir: /tmp/tmpp4oa895v
plugins: regtest-2.5.0, cov-7.1.0
collecting ... collected 1 item
test_dataframe.py::test_dataframe FAILED [100%]
=================================== FAILURES ===================================
________________________________ test_dataframe ________________________________
snapshot error(s) for test_dataframe.py::test_dataframe:
snapshot not recorded yet:
> test_dataframe.py +6
> snapshot.check(df, atol=1e-2, rtol=1e-2)
a b c
0 1.0 0.0 0.0
1 0.0 1.0 0.0
2 0.0 0.0 1.0
---------------------------- pytest-regtest report -----------------------------
total number of failed regression tests: 0
total number of failed snapshot tests : 1
=========================== short test summary info ============================
FAILED test_dataframe.py::test_dataframe
============================== 1 failed in 0.13s ===============================
Reset the snapshot
Let us record the snapshot:
$ pytest -v --regtest-reset test_dataframe.py
============================= test session starts ==============================
platform linux -- Python 3.12.12, pytest-9.0.3, pluggy-1.6.0 -- /home/docs/checkouts/readthedocs.org/user_builds/pytest-regtest/checkouts/latest/.venv/bin/python
cachedir: .pytest_cache
rootdir: /tmp/tmpp4oa895v
plugins: regtest-2.5.0, cov-7.1.0
collecting ... collected 1 item
test_dataframe.py::test_dataframe RESET [100%]
---------------------------- pytest-regtest report -----------------------------
total number of failed regression tests: 0
total number of failed snapshot tests : 0
the following output files have been reset:
_regtest_outputs/test_dataframe.test_dataframe__0
============================== 1 passed in 0.01s ===============================
Now test test passes:
$ pytest -v test_dataframe.py
============================= test session starts ==============================
platform linux -- Python 3.12.12, pytest-9.0.3, pluggy-1.6.0 -- /home/docs/checkouts/readthedocs.org/user_builds/pytest-regtest/checkouts/latest/.venv/bin/python
cachedir: .pytest_cache
rootdir: /tmp/tmpp4oa895v
plugins: regtest-2.5.0, cov-7.1.0
collecting ... collected 1 item
test_dataframe.py::test_dataframe PASSED [100%]
---------------------------- pytest-regtest report -----------------------------
total number of failed regression tests: 0
total number of failed snapshot tests : 0
============================== 1 passed in 0.01s ===============================
Break the test
We break the test by using the column name d instead of c:
# test_dataframe.py
import pandas as pd
import numpy as np
def test_dataframe(snapshot):
df = pd.DataFrame(np.eye(3), columns=["a", "b", "d"])
snapshot.check(df, atol=1e-2, rtol=1e-2)
$ pytest -v test_dataframe.py
============================= test session starts ==============================
platform linux -- Python 3.12.12, pytest-9.0.3, pluggy-1.6.0 -- /home/docs/checkouts/readthedocs.org/user_builds/pytest-regtest/checkouts/latest/.venv/bin/python
cachedir: .pytest_cache
rootdir: /tmp/tmpp4oa895v
plugins: regtest-2.5.0, cov-7.1.0
collecting ... collected 1 item
test_dataframe.py::test_dataframe FAILED [100%]
=================================== FAILURES ===================================
________________________________ test_dataframe ________________________________
snapshot error(s) for test_dataframe.py::test_dataframe:
snapshot mismatch:
> test_dataframe.py +6:
> snapshot.check(df, atol=1e-2, rtol=1e-2)
--- current
+++ expected
@@ -3,4 +3,4 @@
--- ------ -------------- -----
0 a 3 non-null float64
1 b 3 non-null float64
- 2 d 3 non-null float64
+ 2 c 3 non-null float64
--- current
+++ expected
@@ -1,4 +1,4 @@
- a b d
+ a b c
0 1.0 0.0 0.0
1 0.0 1.0 0.0
2 0.0 0.0 1.0
---------------------------- pytest-regtest report -----------------------------
total number of failed regression tests: 0
total number of failed snapshot tests : 1
=========================== short test summary info ============================
FAILED test_dataframe.py::test_dataframe
============================== 1 failed in 0.14s ===============================
Changing the display options
pytest-regtest uses per default the display options from pandas.
Adapting print options can lead to more readable diagnostic
messages.
In the following example we create a data frame with 61 rows:
# test_datafrae_with_printoptions.py
import pandas as pd
import numpy as np
def test_dataframe(snapshot):
rows = 61
matrix = np.arange(rows * 3).reshape(-1, 3) * 1.001
df = pd.DataFrame(matrix, columns=["a", "b", "c"])
snapshot.check(df)
The default settings of pandas truncate the display of data frames with more than 60 rows:
$ pytest -v test_datafrae_with_printoptions.py
============================= test session starts ==============================
platform linux -- Python 3.12.12, pytest-9.0.3, pluggy-1.6.0 -- /home/docs/checkouts/readthedocs.org/user_builds/pytest-regtest/checkouts/latest/.venv/bin/python
cachedir: .pytest_cache
rootdir: /tmp/tmpp4oa895v
plugins: regtest-2.5.0, cov-7.1.0
collecting ... collected 1 item
test_datafrae_with_printoptions.py::test_dataframe FAILED [100%]
=================================== FAILURES ===================================
________________________________ test_dataframe ________________________________
snapshot error(s) for test_datafrae_with_printoptions.py::test_dataframe:
snapshot not recorded yet:
> test_datafrae_with_printoptions.py +9
> snapshot.check(df)
a b c
0 0.000 1.001 2.002
1 3.003 4.004 5.005
2 6.006 7.007 8.008
3 9.009 10.010 11.011
4 12.012 13.013 14.014
.. ... ... ...
56 168.168 169.169 170.170
57 171.171 172.172 173.173
58 174.174 175.175 176.176
59 177.177 178.178 179.179
60 180.180 181.181 182.182
[61 rows x 3 columns]
---------------------------- pytest-regtest report -----------------------------
total number of failed regression tests: 0
total number of failed snapshot tests : 1
=========================== short test summary info ============================
FAILED test_datafrae_with_printoptions.py::test_dataframe
============================== 1 failed in 0.13s ===============================
To fix this we change the max_rows setting:
# test_datafrae_with_printoptions.py
import pandas as pd
import numpy as np
def test_dataframe(snapshot):
rows = 61
matrix = np.arange(rows * 3).reshape(-1, 3) * 1.001
df = pd.DataFrame(matrix, columns=["a", "b", "c"])
# 'None" value means "unlimited":
with pd.option_context("display.max_rows", None):
snapshot.check(df)
Now we can see all rows:
$ pytest -v test_datafrae_with_printoptions.py
============================= test session starts ==============================
platform linux -- Python 3.12.12, pytest-9.0.3, pluggy-1.6.0 -- /home/docs/checkouts/readthedocs.org/user_builds/pytest-regtest/checkouts/latest/.venv/bin/python
cachedir: .pytest_cache
rootdir: /tmp/tmpp4oa895v
plugins: regtest-2.5.0, cov-7.1.0
collecting ... collected 1 item
test_datafrae_with_printoptions.py::test_dataframe FAILED [100%]
=================================== FAILURES ===================================
________________________________ test_dataframe ________________________________
snapshot error(s) for test_datafrae_with_printoptions.py::test_dataframe:
snapshot not recorded yet:
> test_datafrae_with_printoptions.py +11
> snapshot.check(df)
a b c
0 0.000 1.001 2.002
1 3.003 4.004 5.005
2 6.006 7.007 8.008
3 9.009 10.010 11.011
4 12.012 13.013 14.014
5 15.015 16.016 17.017
6 18.018 19.019 20.020
7 21.021 22.022 23.023
8 24.024 25.025 26.026
9 27.027 28.028 29.029
10 30.030 31.031 32.032
11 33.033 34.034 35.035
12 36.036 37.037 38.038
13 39.039 40.040 41.041
14 42.042 43.043 44.044
15 45.045 46.046 47.047
16 48.048 49.049 50.050
17 51.051 52.052 53.053
18 54.054 55.055 56.056
19 57.057 58.058 59.059
20 60.060 61.061 62.062
21 63.063 64.064 65.065
22 66.066 67.067 68.068
23 69.069 70.070 71.071
24 72.072 73.073 74.074
25 75.075 76.076 77.077
26 78.078 79.079 80.080
27 81.081 82.082 83.083
28 84.084 85.085 86.086
29 87.087 88.088 89.089
30 90.090 91.091 92.092
31 93.093 94.094 95.095
32 96.096 97.097 98.098
33 99.099 100.100 101.101
34 102.102 103.103 104.104
35 105.105 106.106 107.107
36 108.108 109.109 110.110
37 111.111 112.112 113.113
38 114.114 115.115 116.116
39 117.117 118.118 119.119
40 120.120 121.121 122.122
41 123.123 124.124 125.125
42 126.126 127.127 128.128
43 129.129 130.130 131.131
44 132.132 133.133 134.134
45 135.135 136.136 137.137
46 138.138 139.139 140.140
47 141.141 142.142 143.143
48 144.144 145.145 146.146
49 147.147 148.148 149.149
50 150.150 151.151 152.152
51 153.153 154.154 155.155
52 156.156 157.157 158.158
53 159.159 160.160 161.161
54 162.162 163.163 164.164
55 165.165 166.166 167.167
56 168.168 169.169 170.170
57 171.171 172.172 173.173
58 174.174 175.175 176.176
59 177.177 178.178 179.179
60 180.180 181.181 182.182
---------------------------- pytest-regtest report -----------------------------
total number of failed regression tests: 0
total number of failed snapshot tests : 1
=========================== short test summary info ============================
FAILED test_datafrae_with_printoptions.py::test_dataframe
============================== 1 failed in 0.14s ===============================